One example of such a machine is the Connection Machine (CM-1, CM-2 and CM-200), built by Thinking Machines Corporation. The CM-2 can have as many as 64K 1-bit processors (processing cells). The individual processing cells of the Connection Machine are extremely simple. The basic operation of the processing element is to read two bits from the external memory and one flag, combine them according to a specified logical operation producing two bits of results, and to write the resulting bits into the external memory and an internal flag, respectively. Even though one could specify floating point operations using the processing cells, for efficiency a floating point processor is attached to each set of 32 processing cells. Each processing cell has its own memory.
To handle general communication inside the Connection Machine, special
purpose routers are used. Routers handles messages for 16 processing
cells. The communications network of the Connection Machine is wired
in the form of a hypercube
. Local
communications can be handled more efficiently by the nodes using the
NEWS grid
. For further details on the
communication algorithm the reader is referenced to
[16].
The Connection Machine is hosted by a front-end computer (usually a SUN-4). The host talks to the processing cells through a microcontroller that acts as a bandwidth amplifier between the host and the processing cells. Because the processing cells are able to execute instructions at a higher rate then the host is able to specify, the host specifies higher-level macroinstructions, that are interpreted by the microcontroller to produce the nanoinstructions for the processing cells. Upon receiving an instruction a processing unit can choose to execute it or not, depending on the current state of its flags.
Several languages have been ported to the Connection Machine. Among these, LISP and C have been extended to LISP* and C* [41] which allow parallel constructs. C* allows data structures to be spread over a set of processors, so operations on the data structures can be done concurrently, as in the case for matrix summation. C* has proved to be so useful for parallel programming with its high-level constructs that it is now in the process of being ported to other parallel architectures. One of its most useful features is to allow the programmer to virtualize a general matrix machine with an arbitrary number of processors.
For complete information on the Connection Machine, see Danny Hillis's doctoral dissertation [16] and the set of manuals from Thinking Machines Corporation. There are other SIMD machines, for instance, the MasPar MP-1 [25] is a recent machine that is currently commercially available. Its architecture is quite different than the Connection Machine and its processing cells are much more powerful, the reader is referenced to the literature for further reference on the MP-1.