next up previous
Next: PVR Commands Up: The PVR System Previous: The PVR System

The Rendering Engine

 

Our rendering engine is a parallel ray caster that supports both object space and image space load balancing. We currently support two different object space decompositions: slabs [MPS92,SK94] and K-d trees [MPHK94]. In order to support image space load balancing, the ray caster implements the idea of clusters, first introduced in [MPS92]. Unlike Montani et al. [MPS92] though, we use a static load balancing scheme and compositing is performed on complete images (or sub-images), instead of on a ray by ray basis. Our K-d tree implementation differs from the one described in Ma et al. [MPHK94] in that it uses a PARC ray caster and a non-uniform load balancing scheme (briefly described below).

In order to speed up rendering, we use PARC (polygon assisted ray casting) [ASK92]. PARC can be characterized as a presence acceleration technique [DH92], like the octree decompositions by Levoy [Lev90]. Instead of stepping through the whole volume for rendering, only the parts that contain relevant data are used. This saves an enormous amount of rendering time, not only in volume stepping, but also because it greatly decreases the number of compositing and shading calculations one needs to perform. In order to skip over empty space inside volumes, our implementation of PARC uses pre-calculated cubes aligned with the primary axes to bound cubes inside the volume. For each particular view, we scan convert the cubes into a Z buffer (implemented in software) to obtain closer bounds on the intervals where the ray integrals need to be calculated. This method achieves speeds comparable with the fastest high quality volume renderers. The only pre-processing done is a scan over the volume to determine which sub-cubes have values within the specified threshold.

  
Figure 3: During slab-based load balancing, each processor gets a range of continuous data set slabs. The number of full PARC sub-cubes determines the exact partition ratio.

  
Figure 4: During K-d Tree based load balancing, each processor gets a sub-cube of the original data set as indicated in the leaves of the partition tree. Here, we also use the number of full PARC sub-cubes to identify the partition ratios.

Data distribution is done in one of two forms as is shown in Figures 3 and 4. In order to avoid the load unbalance that appears when a naive subdivision is used with a PARC ray caster, we use the technique described in [SK94]. We need to minimize ; where P denotes the number of processors, and the number of PARC sub-cubes processor i has. Intuitively, this is the same as spreading the PARC sub-cubes as much as possible over all the processors. The easiest way to implement the technique suggested above is to perform top-down cuts in the input when the number of processors is a power of two, making sure that the number of PARC sub-cubes in each half of the decomposition at every iteration is cut approximately in half. For K-d tree decompositions, the only difference is that instead of partitioning the data in half at each step we move the plane of the cut until the number of PARC sub-cubes is approximately half.

A very important feature of our ray caster is how compositing is performed. To achieve the best possible performance, rendering and compositing are executed concurrently. Sub-images are generated and starting from the first processor in a cluster, they are sent from neighbor to neighbor and at every step they are combined, until the complete image reaches the collector node. The exact details of this operation are described in [SK94]. Compositing performance is discussed later in this paper.



next up previous
Next: PVR Commands Up: The PVR System Previous: The PVR System



Claudio Silva
Thu Apr 20 13:45:22 EDT 1995