This dissertation identifies a class of parallel polygon rendering algorithms suitable for interactive use on multicomputers, and presents a methodology for designing efficient algorithms within that class. The methodology was used to design a new polygon rendering algorithm that uses the frame-to-frame coherence of the screen image to evenly partition the rasterization at reasonable cost. An implementation of the algorithm on the Intel Touchstone Delta at Caltech, the largest multicomputer at the time, renders 3.1 million triangles per second. The rate was measured using a 806,640 triangle model and 512 i860 processors, and includes back-facing triangles. A similar algorithm is used in Pixel-Planes 5, a system that has specialized rasterization processors, and which, when introduced, had a benchmark score for the SPEC Graphics Performance Characterization Group "head" benchmark that was nearly four times faster than commercial workstations. The algorithm design methodology also identified significant performance improvements for Pixel-Planes 5.
All fully parallel polygon rendering algorithms have a sorting step to redistribute primitives or fragments according to their screen location. The algorithm class mentioned above is one of four classes of parallel rendering algorithms identified; the classes are differentiated by the type of data that is communicated between processors. The identified algorithm class, called sort-middle, sorts screen-space primitives between the transformation and rasterization.
The design methodology uses simulations and performance models to help make the design decisions. The resulting algorithm partitions the screen during rasterization into adaptively sized regions with an average of four regions per processor. The region boundaries are only changed when necessary: when one region is the rasterization bottleneck. On smaller systems, the algorithm balances the loads by assigning regions to processors once per frame, using the assignments made during one frame in the next. However, when 128 or more processors are used at high frame rates, the load balancing may take too long, and so static load balancing should be used. Additionally, a new all-to-all communication method improves the algorithm's performance on systems with more than 64 processors.
Cited By
- Popescu V and Rosen P (2006). Forward rasterization, ACM Transactions on Graphics (TOG), 25:2, (375-411), Online publication date: 1-Apr-2006.
- Popescu V, Eyles J, Lastra A, Steinhurst J, England N and Nyland L The WarpEngine Proceedings of the 27th annual conference on Computer graphics and interactive techniques, (433-442)
- Xie F and Shantz M Adaptive hierarchical visibility in a tiled architecture Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware, (75-84)
- Chen M, Stoll G, Igehy H, Proudfoot K and Hanrahan P Simple models of the impact of overlap in bucket rendering Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware, (105-112)
- Cox M and Bhandari N Architectural implications of hardware-accelerated bucket rendering on the PC Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware, (25-34)
Recommendations
Interactive Approximate Rendering of Reflections, Refractions, and Caustics
Reflections, refractions, and caustics are very important for rendering global illumination images. Although many methods can be applied to generate these effects, the rendering performance is not satisfactory for interactive applications. In this paper,...
Interactive hair rendering under environment lighting
We present an algorithm for interactive hair rendering with both single and multiple scattering effects under complex environment lighting. The outgoing radiance due to single scattering is determined by the integral of the product of the environment ...