The idea in the splatting as defined by Ulrich Neumman [13] is
to slice the dataset one plane at a time. The planes will be taken out
in the orthogonal direction of the dataset that most approximates the
image plane direction. In the beginning each GP is assigned, in a
round-robin fashion, a set of planes to compute
.
Each GP will process a whole plane at a time. They will transform, clip and break each voxel in this plane into a kernel description that will be sent to the renderers where the QEE will be used to evaluate the kernel. For efficiency, the gaussian kernel is approximated by a quadratic kernel that can be computed in the QEE of the renderers. The compositing is done in the renderers as it receives new planes.
The algorithm takes advantage of the fact that the Pixel-Plane 5 can
be configured with a variable set of GPs and renderers. As the GPs
prepare data for the renderers, once the data is sent the GPs have to
wait for the renderers to finish rendering. The algorithm can work in
two ways: either the set of GPs can sent more planes than can be
process by the renderers (Render-Bound performance) or the
renderers can process planes faster than the GPs can send (
GP-Bound performance). The slowest link determines the speed of the
algorithm. For instance, for the Render-Bound case, a token
is
create for each renderer
and at any given point in time only one
GP can send its current processed plane to a renderer
, the GP
that has correspondent token. GPs pass the tokens around as the
renderers finish processing their planes.
Figure 8: Data synchronization on the Pixel-Plane 5. The left GP holds
the right to renderer R1 while passing the usage of renderer R0 to the
GP on the right.
As the data is loaded in each GP in the beginning by the host, each GP will have to hold three planes for each one plane it processes. Even though each GP only does computations for one plane, it needs both neighbors to be able to calculate the gradient of that voxel used in shading.
This algorithm leads itself to a very efficient implementation on the
Pixel-Plane 5 because the compositing and the kernel calculations are
being done in the renderers. The system is able to achieve 4 frames
per second on a
dataset using a GP-Bound configuration (40 GPs
and 16 renderers). Neumann [13] did a detailed analysis
of the number of cycles it takes to render the datasets. He found that
the utilization of the computing power (both GPs and renderers) is
lower than peak performance. One of the main problem is the need for
synchronization at every plane.
Another implementation of splatting in parallel architectures were
done by Westover in the Pixel-Plane 5 [12] and in a
network of Sun SPARCs [11]. Todd Elvins
[40] did an implementation on the nCUBE where he tryed
several combinations optimizations of the algorithm. His times for a
dataset were around 23 seconds for an 8 processor
configuration.