next up previous contents
Next: Volume Rendering Up: Parallel Computer Architectures Previous: MIMD machines

Performance Considerations

Some models of performance for parallel computers algorithm have been developed and the one described here is based on Stone [22]. Even though one needs to take into account the architectural overhead imposed by the system, by only counting the time of computation and the time of communication and synchronization, one can analyze the performance of algorithms on a given architecture.

In more specific terms, one has to assume the algorithm will have several tasks and that it can be parallelized by running these tasks concurrently. Let R be the number of units of time spent in the computation of the tasks and let C be the number of units of time spent in communication and any other overhead imposed by the chosen algorithm. The ratio can be used to determine the optimal point of performance in a given architecture.

The ratio can be used as a measure of task granularity:

A general rule of thumb is that if the ratio is very small, then one can use more processors in hopes that the communication is to be spread over the processors. The ratio remains the same and one can use a larger number of less powerful processors for the same task. If on the other hand is big, the optimum number of processors will usually be smaller, and more powerful processors can be used.

Another very important issue in parallel algorithm design is to match the communication patterns supported by the machine being used. For instance, in a shared memory machine, one may want to interleave the data in memory to avoid hot spotsgif. Also, if the machine supports local memory, one may want to allocate all the data that is not needed by the other processor tasks in local memory to minimize bus bandwidth.

For distributed memory machines, one may want to interleave or limit global communication phases so that the network traffic does not hurt the performance. All of these issues are highly architecture and algorithm dependent but the general idea of minimizing and organizing communication to avoid contention for data is very useful in practice.

One important measure of performance for a parallel algorithms is its speedup. This is probably the most misused measure in benchmarking. While most authors define speedup to be the ratio of the running time of the best sequential algorithm for the problem to the running time T of the parallel algorithm (), it is often incorrectly measured as the ratio of the time it takes for the parallel algorithm to run in one node configuration to the time to run the algorithm in n nodes of the same machine.

Parallel algorithms for MIMD machines are discussed in [53], for a discussion on Teraflops machines (of the future) see [21], and for general information on high performance systems, see [22,15]. In the rest of this paper, several parallel algorithms will be described together with the specific machine used for the implementation.



next up previous contents
Next: Volume Rendering Up: Parallel Computer Architectures Previous: MIMD machines



Claudio Silva
Thu Apr 20 16:03:37 EDT 1995