Rendering, the process of taking a representation of a scene and translating it into an image, is fundamentally a sorting problem. The rendered image is a result of what is visible or occluded from the vantage point chosen for that rendering. In interactive computer graphics, where an image is rendered at frame rates allowing a user to manipulate objects in the scene or the viewing parameters, a bottleneck will develop when either the scene's complexity is too great for the graphics processor or the rendered image is too large to fit in the graphics processors memory. Bottlenecks like these can be overcome using parallel graphics processing techniques.
A particular class of rendering algorithms is the raster-geometry algorithm which accepts scenes described as a set of geometric primitives, which are then sorted according to their visibility and then rasterized into a frame buffer for output. Parallel raster geometry algorithms can be classified into three groups according to where in the graphics processing pipeline the visibility sort occurs. An algorithm can be classified as sort-first, sort-middle, or sort-last, which identifies where in the parallel system's pipeline the sort occurred as well as what is being sorted.
With any parallel processing, the key to success is how well the algorithm scales when more computing resources are added to the system. In particular, the granularity of the task decomposition must not be too fine, such that the parallel overheads balloon, or too coarse, leaving elements within the system overwhelmed and others underwhelmed. For a parallel algorithm to be practical it must understand the overheads involved in order to provide a load balancing algorithm.
In this dissertation, this classification of parallel algorithms are described and the overheads for each are shown. Of particular interest in this dissertation is the sort-first class of algorithms since they are the least researched and a growing class of algorithms. The sort-first algorithms are increasing in popularity in large part due to the relatively low financial costs of commodity personal computer (PC) clusters that have high-performance graphics processing units. Furthermore, PC clusters and sort-first algorithms are well suited for large tiled displays.
The thesis of the dissertation is that the workload of a sort-first system can be balanced by finding non-overlapping image-space regions of equal work by partitioning the display using a cost function that represents the work at a per-pixel level. Furthermore, the parallel processing overheads involved with sort-first rendering are analyzed and a case is made to show that primitives overlapping multiple screen regions is a negligible overhead so long as the average primitive's average image-space area is proportional to the overlap region along the regions borders. A heuristic is presented which gives the bounds for appropriately sized image-space regions given an average sized image-space primitive.
To support the thesis, an analysis of the sort-first architecture's computational and communication overheads. Special attention is paid to the relationship between the partitioning of the display and the primitive sizes. A load balancing algorithm for sort-first rendering is presented which uses a dynamic mesh-style work partitioning scheme which repartitions the work according to each processor's view frustum. The repartitioning is done according to how much work is accomplished within each sheared viewing frustum.
The cost model is based on the OpenGL graphics pipeline. The cost model, which is modeled on each stage of the OpenGL state machine, is validated against a number of experiments which benchmark each stage of the graphics pipeline. The experimental results show that direct observation of the graphics pipeline is not possible. To derive a per-pipeline-stage cost model, a simulator of the graphics hardware platform or a programmer'sinterface to the graphics hardware is needed to gather performance characteristics of a graphics hardware. Other possible cost models are outlined in the conclusion as future work.
Index Terms
- A general cost model for sort-first parallel graphics processing
Recommendations
Sort-First Parallel Volume Rendering
Sort-first distributions have been studied and used far less than sort-last distributions for parallel volume rendering, especially when the data are too large to be replicated fully. We demonstrate that sort-first distributions are not only a viable ...
Out-of-core sort-first parallel rendering for cluster-based tiled displays
Parallel graphics and visualisationWe present a sort-first parallel system for out-of-core rendering of large models on cluster-based tiled displays. The system renders high-resolution images of large models at interactive frame rates using off-the-shelf PCs with small memory. Given a ...
A data distributed sort-first parallel rendering system for VR applications
VRCAI '04: Proceedings of the 2004 ACM SIGGRAPH international conference on Virtual Reality continuum and its applications in industryParallel rendering is used in VR applications that need real time rendering for large-scale scenes or high-resolution display. Sort-first architecture is often used to build high performance parallel graphics rendering systems. Data centralized sort-...