Andy Glew's comp-arch.net wiki, http://semipublic.comp-arch.net

If you are reading this elsewhere, e.g. at site waboba.info, it is an unauthorized copy, and probably a malware site.
comp-arch.net wiki on hold from October 17, 2011

Single Instruction Multiple Threads (SIMT)

From CompArch

(Redirected from SIMT)
Jump to: navigation, search

SIMT = Single Instruction Multiple Thread

SIMT is a term, apparently created by Nvidia, to describe their GPU architectures. These GPU architectures are fundamentaly SIMD, Single Instruction Multiple Data.

Although the term SIMT was apparently coined by Nvidia, the overall architecture is not specific to Nvidia. At about the same time ATI and Intel came up with similar architectures that might also be described as SIMT - with some differences.

Many have accused Nvidia of inventing SIMT purely as a marketing term, adding no value over and above SIMD.

However, I (Andy Glew), disagree. The term SIMD has had many meanings and interpretations. To many, Cray-1 style vector machines are SIMD - and the GPUs are definitely not examples of that, even though the underlying hardware may have similarities.

On a vector machine such as the Cray1, or its modern incarnations such as Intel MMX and SSE, there is a single processor, sequencing instructions. Some instructions are scalar, operating on 32 or 64 bit data. Some instructions are vector, operating on, e.g. 512 bits of data interpreted as 16 32-bit floating point numbers. Nevertheless these are single instructions operating on multiple data items.

The key point about the 200x-era GPUs is that each "lane" of vector computation can be interpreted as a separate thread. A single hardware sequencer operates on a group of such threads in parallel. If all execute the same instruction, they are said to be coherent. A single instruction fetch is broadcast to all of the individual processing elements. If the threads branch in different directions, they are said to be diverged. The single instruction sequencer keeps track of which have diverged. It fetches a single instruction per cycle, and distributes it to all of those processing elements enabled by the mask. There is instruction support to recognize control convergence, so that instruction fetch coherence can be re-established - ENDIF and ENDLOOP

The key point about the 200x-era GPUs (IMHO) is that each "lane" of vector computation can be interpreted as a separate thread. The same program would run if each thread were running independently on a separate core. The SIMT-ness is a microarchitectre technique, that realizes efficiencies when threads are aligned and coherent.

I therefore think that it is valid to distinnguish SIMT from SIMD. Perhaps as a subset.

Terminology Discussion

However, the term SIMT is too narrow. The hardware need not be limited to a single instruction per cycle. There could be 2 - DIMT, Dual Instruction Multiple Data - or N - NIMT, N Instruction Multiple Data.

To avoid this narrowness I prefer the term coherent multithreading (CMT).

(Perhaps coherent instruction multithreading (CIMT).)

Personal tools
No more shadowing