Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Low Latency Photon Mapping Using Block Hashing

Abstract For hardware accelerated rendering, photon mapping is especially useful for simulating caustic lighting effects on non-Lambertian surfaces. However, an efficient hardware algorithm for the computation of the k nearest neighbours to a sample point is required.

Graphics Hardware (2002) Thomas Ertl, Wolfgang Heidrich, and Michael Doggett (Editors) Low Latency Photon Mapping Using Block Hashing Vincent C. H. Ma and Michael D. McCool Computer Graphics Lab, School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada Abstract For hardware accelerated rendering, photon mapping is especially useful for simulating caustic lighting effects on non-Lambertian surfaces. However, an efficient hardware algorithm for the computation of the k nearest neighbours to a sample point is required. Existing algorithms are often based on recursive spatial subdivision techniques, such as kd-trees. However, hardware implementation of a tree-based algorithm would have a high latency, or would require a large cache to avoid this latency on average. We present a neighbourhood-preserving hashing algorithm that is low-latency and has sub-linear access time. This algorithm is more amenable to fine-scale parallelism than tree-based recursive spatial subdivision, and maps well onto coherent block-oriented pipelined memory access. These properties make the algorithm suitable for implementation using future programmable fragment shaders with only one stage of dependent texturing. Categories and Subject Descriptors (according to ACM CCS): I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism, Color, Shading, Shadowing, and Texture. terpolation step that joins light paths propagated from light sources with rays traced from the camera during rendering, and it is but one application of the well-studied k nearest neighbours (kNN) problem. 1. Introduction Photon mapping, as described by Jensen25 , is a technique for reconstructing the incoming light field at surfaces everywhere in a scene from sparse samples generated by forward light path tracing. In conjunction with path tracing, photon mapping can be used to accelerate the computation of both diffuse and specular global illumination . It is most effective for specular or glossy reflectance effects, such as caustics24 . Jensen uses the kd-tree5, 6 data structure to find these nearest photons. However, solving the kNN problem via kd-trees requires a search that traverses the tree. Even if the tree is stored as a heap, traversal still requires random-order memory access and memory to store a stack. More importantly, a search-path pruning algorithm, based on the data already examined, is required to avoid accessing all data in the tree. This introduces serial dependencies between one memory lookup and the next. Consequently, a hardware implementation of a kd-tree-based kNN solution would either have high latency, or would require a large cache to avoid such latency. In either case a custom hardware implementation would be required. These properties motivated us to look at alternatives to tree search. The benefits of migrating photo-realistic rendering techniques towards a real-time, hardware-assisted implementation are obvious. Recent work has shown that it is possible to implement complex algorithms, such as ray-tracing, using the programmable features of general-purpose hardware accelerators35 and/or specialised hardware 41 . We are interested in hardware support for photon mapping: specifically, the application of photon maps to the direct visualisation of caustics on non-Lambertian surfaces, since diffuse global illumination effects are probably best handled in a real-time renderer using alternative techniques such as irradiance volumes18 . Since photon mapping is already making an approximation by using kNN interpolation, we conjectured that an approximate kNN (AkNN) solution should suffice so long as visual quality is maintained. In this paper we investigate a hashing-based AkNN solution in the context of high- Central to photon mapping is the search for the set of photons nearest to the point being shaded. This is part of the inc The Eurographics Association 2002. 89 Ma and McCool / Low Latency Photon Mapping Using Block Hashing et al.39 . Many other recursive subdivision-based techniques have also been proposed for the kNN and AkNN problems, including kd-B-trees36 , BBD-trees4 , BAR-trees9 , PrincipalAxis Trees33 , the R-tree family of data structures27 , and ANN-trees30 . Unfortunately, all schemes based on recursive search over a tree share the same memory dependency problem as the kd-tree. performance hardware-based (specifically, programmable shader-based) photon mapping. Our major contribution is an AkNN algorithm that has bounded query time, bounded memory usage, and high potential for fine-scale parallelism. Moreover, our algorithm results in coherent, non-redundant accesses to block-oriented memory. The results of one memory lookup do not affect subsequent memory lookups, so accesses can take place in parallel within a pipelined memory system. Our algorithm is based on array access, and is more compatible with current texture-mapping capabilities than tree-based algorithms. Furthermore, any photon mapping acceleration technique that continues to rely on a form of kNN (such as irradiance caching7 ) can still benefit from our technique. The second category of techniques are based on building and searching graphs that encode sample-adjacency information. The randomised neighbourhood graph approach3 builds and searches an approximate local neighbourhood graph. Eppstein et al.11 investigated the fundamental properties of a nearest neighbour graph. Jaromczyk and Toussaint surveyed data structures and techniques based on Relative Neighbourhood Graphs23 . Graph-based techniques tend to have the same difficulties as tree-based approaches: searching a graph also involves stacks or queues, dependent memory accesses, and pointer-chasing unsuited to high-latency pipelined memory access. In Section 2, we first review previous work on the kNN and the approximate k-nearest neighbour (AkNN) problems. Section 3 describes the context and assumptions of our research and illustrates the basic hashing technique used in our algorithm. Sections 4 and 5 describe the details of our algorithm. Section 6 presents numerical, visual quality, and performance results. Section 7 discusses the mapping of the algorithm onto a shader-based implementation. Finally, we conclude in Section 8. Voronoi diagrams can be used for optimal 1-nearest neighbour searches in 2D and 3D10 . This and other pointlocation based techniques17 for solving nearest neighbour problems do not need to calculate distances between the query point and the candidates, but do need another data structure (like a BSP tree) to test a query point for inclusion in a region. 2. Previous Work Jensen’s book25 covers essentially all relevant previous work leading up to photon mapping. Due to space limitations, we will refer the reader to that book and focus our literature review on previous approaches to the kNN and AkNN problems. Hashing approaches to the kNN and AkNN problems have recently been proposed by Indyk et al.20, 21 and Gionis et al.16 . These techniques have the useful property that multi-level dependent memory lookups are not required. The heart of these algorithms are simple hash functions that preserve spatial locality, such as the one proposed by Linial and Sasson31 , and Gionis et al.16 . We base our technique on the latter. The authors also recognise recent work by Wald et al.45 on real-time global illumination techniques where a hashing-based photon mapping technique was apparently used (but not described in detail). Any non-trivial algorithm that claims to be able to solve the kNN problem faster than brute-force does so by reducing the number of candidates that have to be examined when computing the solution set. Algorithms fall into the following categories: recursive spatial subdivision, point location, neighbourhood graphs, and hashing. Amongst algorithms based on recursive spatial subdivision, the kd-tree 5 method is the approach commonly used to solve the kNN problem14 . An advantage of the kd-tree is that if the tree is balanced it can be stored as a heap in a single array. While it has been shown that kd-trees have optimal expected-time complexity6 , in the worst case finding the k nearest neighbours may require an exhaustive search of the entire data structure via recursive decent. This requires a stack the same size as the depth of the tree. During the recursion, a choice is made of which subtree to search next based on a test at each internal node. This introduces a dependency between one memory access and the next and makes it hard to map this algorithm into high-latency pipelined memory accesses. Numerous surveys and books 1, 2, 15, 42, 43, 17 provide further information on this family of problems and data structures developed to solve them. 3. Context We have developed a novel technique called Block Hashing (BH) to solve the approximate kNN (AkNN) problem in the context of, but not limited to, photon mapping. Our algorithm uses hash functions to categorise photons by their positions. Then, a kNN query proceeds by deciding which hash bucket is matched to the query point and retrieving the photons contained inside the hash bucket for analysis. One attraction of the hashing approach is that evaluation of hash functions takes constant time. In addition, once we have the hash value, accessing data we want in the hash table takes only a single access. These advantages permit us to Much work has been done to find methods to optimise the kd-tree method of solving the kNN and AkNN problems. See Christensen26 , Vanco et al.44 , Havran19 , and Sample c The Eurographics Association 2002. 90 Ma and McCool / Low Latency Photon Mapping Using Block Hashing uniform quantisation of spatial position, and is characterised by P and the sequence T . It is important to note that hT gives each partition of the domain space delineated by T equal representation in hash space. Depending on the location of the thresholds, hT will contract some parts of the domain space and expand other parts. If we rely on only a single hash table to classify a data set, a query point will only hash to a single bucket within this table, and the bucket may represent only a subset of the true neighbourhood we sought. Therefore, multiple hash tables with different thresholds are necessary for the retrieval of a more complete neighbourhood around the query point (See Figure 2.) avoid operations that are serially dependent on one another, such as those required by kd-trees, and are major stepping stones towards a low-latency shader-based implementation. Our technique is designed under two assumptions on the behaviour of memory systems in (future) accelerators. First, we assume that memory is allocated in fixed-sized blocks . Second, we assume that access to memory is via burst transfer of blocks that are then cached. Under this assumption, if any part of a fixed-sized memory block is “touched”, access to the rest of this block will be virtually zero-cost. This is typical even of software implementations on modern machines which rely heavily on caching and burst-mode transfers from SDRAM or RDRAM. In a hardware implementation with a greater disparity between processing power and memory speed, using fast block transfers and caching is even more important. Due to these benefits, in BH all memory used to store photon data is broken into fixed-sized blocks. hT1 t0 t1 t2 t3 t4 hT2 t0 t1 t0 t2 t1 t3 t2 t3 t4 t4 hT3 3.1. Locality-Sensitive Hashing Figure 2: An example of multiple hash functions classifying a dataset. The long vertical line represents the query value. The union of results multiple hash tables with different thresholds represents a more complete neighbourhood. Since our goal is to solve the kNN problem as efficiently as possible in a block-oriented cache-based context, our hashing technique requires hash functions that preserve spatial neighbourhoods. These hash functions take points that are close to each other in the domain space and hash them close to each other in hash space. By using such hash functions, photons within the same hash bucket as a query point can be assumed to be close to the query point in the original domain space. Consequently, these photons are good candidates for the kNN search. More than one such scheme is available; we chose to base our technique on the Locality-Sensitive Hashing (LSH) algorithm proposed by Gionis et al.16 , but have added several refinements (which we describe in Section 4). To deal with n-dimensional points, each hash table will have one hash function per dimension. Each hash function generates one hash value per coordinate of the point (See i Figure 3.) The final hash value is calculated by ∑n−1 i=0 hi P , where hi are the hash values and P is the number of thresholds. If P were a power of two, then this amounts to concatenating the bits. The only reason we use the same number of thresholds for each dimension is simplicity. It is conceivable that a different number of thresholds could be used for each dimension to better adapt to the data. We defer the discussion of threshold generation and query procedures until Sections 4.2 and 4.4, respectively. The hash function in LSH groups one-dimensional real numbers in hash space by their spatial location. It does so by partitioning the domain space and assigning a unique hash value to each partition. Mathematically, let T = {ti | 0 ≤ i ≤ P} be a monotonically increasing sequence of P + 1 thresholds between 0 and 1. Assume t0 = 0 and tP = 1, so there are P − 1 degrees of freedom in this sequence. Define a one-dimensional Locality-Sensitive Hash Function hT : [0, 1] → {0 . . . P − 1} to be hT (t) = i, where ti ≤ t < ti+1 . In other words, the hash value i can take on P different values, one for each “bucket” defined by the threshold pair [ti ,ti+1 ). An example is shown in Figure 1. t0 t1 t2 t3 hy P Figure 3: Using two hash functions to handle a 2D point. Each hash function will be h x used to hash one coordinate. LSH is very similar to grid files34 . However, the grid file was specifically designed to handle dynamic data. Here, we assume that the data is static during the rendering pass. Also, the grid file is more suitable for range searches than it is for solving the kNN problem. t4 Figure 1: An example of hT . The circles and boxes represent values to be hashed, while the vertical lines are the thresholds in T . The boxes lie between t1 and t2 , thus they are given a hash value of 1. 3.2. Block-Oriented Memory Model It has been our philosophy that hardware implementations of algorithms should treat off-chip memory the same way software implementations treat disk: as a relatively slow, “outof-core”, high-latency, block-oriented storage device. This The function hT can be interpreted as a monotonic nonc The Eurographics Association 2002. 91 Ma and McCool / Low Latency Photon Mapping Using Block Hashing analogy implies that algorithms and data structures designed to optimise for disk access are potentially applicable to hardware design. It also drove us to employ fixed-sized blocks to store the data involved in the kNN search algorithm, which are photons in the context of this application. After photons have been traced into the scene, the algorithm organises the photons into fixed-sized memory blocks, creates a set of hash tables, and inserts photon blocks into the hash tables. In the second phase, the hash tables will be queried for a set of candidate photons from which the k nearest photons will be selected for each point in space to be shaded by the renderer. In our prototype software implementation of BH, each photon is stored in a structure similar to Jensen’s “extended” photon representation25 . As shown in Figure 4, each component of the 3D photon location is represented by a 32bit fixed-point number. The unit vectors representing incoming direction d̂ and surface normal n̂ are quantised to 16-bit values using Jensen’s polar representation. Photon power is stored in four channels using sixteen-bit floating point numbers. This medium-precision signed representation permits other AkNN applications beyond that of photon mapping. Four colour channels are also included to better match the four-vectors supported in fragment shaders. For the photon mapping application specifically, our technique is compatible with the replacement of the four colour channels with a Ward RGBE colour representation46 . Likewise, another implementation might use a different representation for the normal and incident direction unit vectors. | 32 x y z n̂ d̂ c1 c2 c3 c4 | 16 | | 4.1. Organizing Photons into Blocks Due to the coherence benefits associated with block-oriented memory access, BH starts by grouping photons and storing them into fixed-sized memory blocks. However, these benefits are maximised when the photons within a group are close together spatially. We chose to use the Hilbert curve13 to help group photons together. The advantage of the Hilbert curve encoding of position is that points mapped near each other on the Hilbert curve are guaranteed to be within a certain distance of each other in the original domain22 . Points nearby in the original domain space have a high probability of being nearby on the curve, although there is a non-zero probability of them being far apart on the curve. If we sort photons by their Hilbert curve order before packing them into blocks, then the photons in each block will have a high probability of being spatially coherent. Each block then corresponds to an interval of the Hilbert curve, which in turn covers some compact region of the domain (see Figure 7a). Each region of domain space represented by the blocks is independent, and regions do not overlap. Figure 4: Representation of a photon record. The 32-bit values (x, y, z) denote the position of a photon and are used as keys. Two quantised 16-bit unit vectors d̂, n̂ and four 16-bit floating point values are carried as data. All photon records are stored in fixed-sized memory blocks. BH uses a 64 32-bit-word block size, chosen to permit efficient burst-mode transfers over a wide bus to transaction-oriented DRAM. Using a 128-bit wide path to DDR SDRAM, for instance, transfer of this block would take eight cycles, not counting the overhead of command cycles to specify the operation and the address. Using nextgeneration QDR SDRAM this transfer would take only four cycles (or eight on a 64-bit bus, etc.) BH sorts photons and inserts them into a B+ -tree8 using the Hilbert curve encoding of the position of each photon as the key. This method of spatially grouping points was first proposed by Faloutsos and Rong12 for a different purpose. Since a B+ -tree stores photon records only at leaves, with a compatible construction the leaf nodes of the B+ -tree can serve as the photon blocks used in the later stages of BH. One advantage of using a B+ -tree for sorting is that insertion cost is bounded: the tree is always balanced, and in the worst case we may have to split h nodes in the tree, when the height of the tree is h. Also, B+ -trees are optimised for block-oriented storage, as we have assumed. Our photon representation occupies six 32-bit words. Since photon records are not permitted to span block boundaries, ten photons will fit into a 64-word block with four words left over. Some of this extra space is used in our implementation to record how many photons are actually stored in each block. For some variants of the data structures we describe, this extra space could also be used for flags or pointers to other blocks. It might be possible or desirable in other implementations to support more or fewer colour channels, or channels with greater or lesser precision, in which case some of our numerical results would change. The B+ -tree used by BH has index and leaf nodes that are between half full to completely full. To minimise the final number of blocks required to store the photons, the leaf nodes can be compacted (see Figure 5.) After the photons are sorted and compacted, the resulting photon blocks are ready to be used by BH, and the B+ -tree index and any leaf nodes that are made empty by compaction are discarded. If the complete set of photons is known a priori, the compact B+ -tree37 for static data can be used instead. This data structure maintains full nodes and avoids the extra compaction step. 4. Block Hashing Block Hashing (BH) contains a preprocessing phase and a query phase. The preprocessing phase consists of three steps. c The Eurographics Association 2002. 92 Ma and McCool / Low Latency Photon Mapping Using Block Hashing (a) weeding out duplicates as each block contains a unique set of photons. This means fewer comparisons have to be made and the individual photons are only accessed once per query, during post-processing of the merged candidate set to find the k nearest photons. Consequently, the transfer of each photon block through the memory system happens at most once per query. All photons accessed in a block are potentially useful contributions to the candidate set, since photons within a single block are spatially coherent. Due to our memory model assumptions, once we have looked at one photon in a block it should be relatively inexpensive to look at the rest. Index node Empty cell in leaf node Occupied cell in leaf node (b) Figure 5: Compaction of photon blocks. (a) B+ -tree after inserting all photons. Many leaf nodes have empty cells. (b) All photon records are compacted in the leaf nodes. Regardless, observe that each photon block contains a spatially clustered set of photons disjoint from those contained in other blocks. This is the main result we are after; any other data structures that can group photons into spatially-coherent groups, such as grid files34 , can be used in place of the B+ -tree and space-filling curve. (a) 4.2. Creating the Hash Tables (b) (c) The hash tables used in BH are based on the LSH scheme described in Section 3.1. BH generates L tables in total, serving as parallel and complementary indices to the photon data. Each table has three hash functions (since photons are classified by their 3D positions), and each hash function has P + 1 thresholds. Figure 7: Block hashing illustrated. (a) Each block corresponds to an interval of the Hilbert curve, which in turn covers some compact region of the domain. Consequently, each bucket (b) represents all photons (highlighted with squares) in each block with at least one photon hashed into it (c). BH employs an adaptive method that generates the thresholds based on the photon positions. For each dimension, a histogram of photon positions is built. Then, the histogram is integrated to obtain a cumulative distribution function (cdf ). Lastly, stratified samples are taken from the inverse of the cdf to obtain threshold locations. The resulting thresholds will be far apart where there are few photons, and close together where photons are numerous. Ultimately this method attempts to have a similar number of photons into each bucket. Each bucket in a hash table corresponds to a rectangular region in a non-uniform grid as shown in Figure 7b. Each block is inserted into the hash tables once for each photon within that block, using the position of these photons to create the keys. Each bucket of the hash table refers to not only the photons that have been hashed into that bucket, but also all the other photons that belong to the same blocks as the hashed photons (see Figure 7c.) Since each photon block is inserted into each hash table multiple times, using different photons as keys, a block may be hashed into multiple buckets in the same hash table. Of course, a block should not be inserted into a bucket more than once. More importantly, our technique ensures that each block is inserted into at least one hash table. Orphaned blocks are very undesirable since the photons within will never be considered in the subsequent AkNN evaluation and will cause a constant error overhead. Hence, our technique does not naïvely drop a block that causes a bucket to overflow. Hash tables are stored as a one-dimensional array structure, shown in Figure 6. The hash key selects a bucket out of the Pn available buckets in the hash table. Each bucket refers up to B blocks of photons, and has space for a validity flag per reference, and storage for a priority value. We defer the discussion on the choice of P, L and B until Section 5. B Priority V V V V Figure 6: Hash table bucket layout However, there may be collisions that cause buckets to overflow, especially when a large bucket load factor is chosen to give a compact hash table size, or there exists a large variation in photon density (which, of course, is typical in this application). Our algorithm uses two techniques to address this problem. The first technique attempts to insert every block into every hash table, but in different orders on different hash tables, such that blocks that appear earlier in the ordering are not favoured for insertion in all tables. BH uses a technique similar to disk-striping38 , illustrated by the 4.3. Inserting Photon Blocks In BH, references to entire photon blocks, rather than individual photons, are inserted into the hash tables. One reason for doing so is to reduce the memory required per bucket. Another, more important, reason is that when merging results from multiple hash tables (Section 3.1), BH needs to compare only block addresses instead of photons when c The Eurographics Association 2002. 93 Ma and McCool / Low Latency Photon Mapping Using Block Hashing (a) pseudo code in Figure 8. An example is given in the diagram in the same figure. for h from 0 to (number_of_hash_tables-1) for b from 0 to (number_of_blocks-1) idx = (h+b) modulo L insert block[b] into hashtable[idx] endfor endfor 0 1 2 0 1 2 3 4 5 0 1 2 0 1 2 3 4 5 0 1 2 0 1 2 3 4 5 (b) (c) Data point Query point Matched point Photon Block Hash Table Bucket 1st iteration 2nd iteration Figure 9: Merging the results from multiple hash tables. (a) the query point retrieves different candidates sets from different hash tables, (b) the union set of candidates after merging, and (c) the two closest neighbours selected. 3rd iteration Figure 8: Striping insertion strategy The second technique involves a strategy to deal with overflow in a bucket. For each photon block, BH keeps the count of buckets that the block has been hashed into so far. When a block causes overflow in a bucket, the block in the bucket that has the maximum count will be bumped if that count is larger than one, and larger than that of the incoming block. This way we ensure that all blocks are inserted into at least one bucket, given adequate hash table sizes, and no block is hashed into an excessive number of buckets. There needs to be a way to select the buckets from which the Ak candidate photons are obtained. Obviously, we want to devise a heuristic to pick the “best” candidates. Suppose every bucket in every hash table has a priority given by α = |bucket_capacity − #entries − #overflows| where “#overflows” is the number of insertion attempts after the bucket became full. The priority can be pre-computed and stored in each bucket of each hash table during the insertion phase. The priority of a bucket is smallest when the bucket is full but never overflowed. Conversely, when the hash bucket is underutilised or overflow has occurred, α will be larger. If a bucket is underutilised, it is probably too small spatially (relative to local sample density). If it has experienced overflow, it is likely too large spatially, and covers too many photon block regions. 4.4. Querying A query into the BH data structure proceeds by delegating the query to each of the L hash tables. These parallel accesses will yield as candidates all photon blocks represented by buckets that matched the query. The final approximate nearest neighbour set comes from scanning the unified candidate set for the nearest neighbours to the query point (see Figure 9.) Note that unlike kNN algorithms based on hierarchical data structures, where candidates for the kNN set trickle in as the traversal progresses, in BH all candidates are available once the parallel queries are completed. Therefore, BH can use algorithms like selection29 (instead of a priority queue) when selecting the k nearest photons. During each query, BH will sort the L buckets returned from the hash tables by their priority values, smallest values of α first. Subsequently, buckets are considered in this order, one by one, until the required Ak photons are found. In this way the more “useful” buckets will be considered first. Each query will retrieve one bucket from each of the L hash tables. If the application can tolerate elevated inaccuracy in return for increased speed of query (for example, to pre-visualise a software rendering), it may be worthwhile to consider using only a subset of the L candidate sets. Block hashing is equipped with a user-specified accuracy setting: Let A ∈ IN be an integer multiplier. The block hashing algorithm will only consider Ak candidate photons in the final scan to determine the k nearest photons to a query. Obviously the smaller A is, the fewer photons will be processed in the final step; as such, query time is significantly reduced, but with an accuracy cost. Conversely, a higher A will lead to a more accurate result, but it will take more time. Experimental results that demonstrate the impact of the choice of A will be explored in Section 6. 5. Choice of Parameter Values Block Hashing is a scheme that requires several parameters: B, the bucket capacity; L, the number of hash tables whose results are merged; and P, the number of thresholds per dimension. We would like to determine reasonable values for these parameters as functions of k, the number of nearest neighbours sought, and N, the number of photons involved. It is important to realize the implications of these parameters. The total number of 32-bit pointers to photon blocks is given by LP3 B. Along with the number of thresholds 3LP, this gives the memory overhead required for BH. The upper bound for this value is 6N, the number of photons multiplied by the six 32-bit words each photon takes up in our implementation. If we allow B to be a fixed constant for now, c The Eurographics Association 2002. 94 Ma and McCool / Low Latency Photon Mapping Using Block Hashing the constraint LP3 + 3LP ≤ N arises from the reasonable assumption that we do not want to have more references to blocks than there are photons, or more memory used in the index than in the data. 6. Results For BH to be a useful AkNN algorithm, it must have satisfactory algorithmic accuracy. Moreover, in the context of photon mapping, BH must also produce good visual accuracy. This section will demonstrate that BH satisfies both requirements, while providing a speed-up in terms of the time it takes to render an image, even in a software implementation. Empirically, L = P = ln N has turned out to be a good choice. The value ln N remains sub-linear as N increases, and this value gives a satisfactory index memory overhead ratio: There are a total of B(ln N)4 block references. Being four bytes each, the references require 4B(ln N)4 bytes. With each hash table, there needs to be 3LP = 3(ln N)2 thresholds. Represented by a 4-byte value each, the thresholds take another 12(ln N)2 bytes. Next, assuming one photon block can hold ten photons, N photons requires N/10 blocks; each block requires 64 words, so the blocks require 25.6N bytes in total. The total memory required for N photons, each occupying 6 words, is 24N bytes. This gives an overhead ratio of (4B(ln N)4 + 12(ln N)2 + 25.6N − 24N)/24N. To measure algorithmic accuracy, our renderer was rigged to use both the kd-tree and BH based photon maps. For each kNN query the result sets were compared for the following metrics: False negatives: # photons incorrectly excluded from the kNN set. Maximum distance dilation: the ratio of bounding radii around the neighbours reported by the two algorithms. (1) Average distance dilation: the ratio of the average distances between the query point and each of the nearest neighbours reported by the two algorithms. The choice of B is also dependent on the value of k specified by the situation or the user. However, since it is usual in photon mapping that k is known ahead of time, B can be set accordingly. B should be set such that the total number of photons retrieved from the L buckets for each query will be larger than k. Mathematically speaking, each photon block in our algorithm has ten photons, hence 10LB ≫ k. In particular, 10LB > Ak should also be satisfied. Since we choose L = ln N, rearranging the equation yields: B > Ak/(10 ln N) For example, assuming A = 16, N = 2000000, k = 50, then ⌈B⌉ = 6. To gauge visual accuracy, we calculate a Caustic RMS Error metric, which compares the screen space radiance difference between the caustic radiance values obtained from kd-tree and BH. A timing-related Time Ratio metric is calculated as a ratio of the time taken for a query into the BH data structure versus that for the kd-tree data structure. Obviously, as A increases, the time required for photon mapping using BH approaches that for a kd-tree based photon mapping. If we substitute B back into Equation 1, we obtain the final overhead equation (4(Ak/10)(ln N)3 + 12(ln N)2 + 1.6N)/24N. Our first test scene, shown in Figure 11, with numerical results in Figure 12, consists of a highly specular ring placed on top of a plane with a Modified Phong28 reflectance model. This scene tests the ability of BH to handle a caustic of varying density, and a caustic that has been cast onto a non-Lambertian surface. (2) Figure 10 plots the number of photons versus the memory overhead. For the usual range of photon count in a photon mapping application, we see that the memory overhead, while relative large for small numbers of photons, becomes reasonable for larger numbers of photons, and has an asymptote of about 6%. Of course, if we assumed different block size (cache line size), these results would vary, but the analysis is the same. Memory Overhead Ratio (%) 30 Figure 13 shows a second scene consisting of the Venus bust, with a highly specular ring placed around the neck of Venus. Figure 14 shows the numerical statistics of this test scene. The ring casts a cardioid caustic onto the (non-planar) chest of the Venus. This scene demonstrates a caustic on a highly tessellated curved surface. Global illumination is also used for both scenes, however the times given are only for the query time of the caustic maps. A=16 A=8 A=4 25 20 The general trend to notice is that for extremely low accuracy (A) settings the visual and algorithmic performance of BH is not very good. The candidate set is simply not large enough in these cases. However, as A increases, these performance indicators drop to acceptable levels very quickly, especially between values of A between 2 and 8. After A = 8 diminishing returns set in, and the increase in accuracy incurs a significant increase in the cost of the query time required. This numerical error comparison is parallelled by the 15 10 5 0.25 0.50 0.75 1.00 1.25 Number of Photons 1.50 1.75 2.00 Figure 10: Plot of photon count vs. memory overhead incurred by BH, assuming k = 50. c The Eurographics Association 2002. 95 Ma and McCool / Low Latency Photon Mapping Using Block Hashing (a) kd-tree (b) BH, A=4 (a) kd-tree (b) BH, A=16 (c) BH, A=8 (d) BH, A=4 Figure 13: “Venus with Ring” images Figure 11: “Ring” image Max Radius Dilation Avg Radius Dilation 1.5 0.4 0.2 1.2 1.1 0 2 4 6 8 10 12 14 16 18 20 Accuracy Setting (A) 1 0 2 4 1.4 0.02 RMS error Time Ratio 0.01 0.005 6 8 10 12 14 16 18 20 Accuracy Setting (A) Timing Ratio 1.2 0.015 0 2 0.05 1.3 4 1.06 1.04 1.02 1 6 8 10 12 14 16 18 20 Accuracy Setting (A) 0 2 4 1.2 RMS error 6 8 10 12 14 16 18 20 Accuracy Setting (A) Timing Ratio 1 0.04 0.03 0.02 0.01 0 1 Max Radius Dilation Avg Radius Dilation 1.08 Time Ratio 0.6 1.1 False− 1.4 Radiance RMS Error 0.8 0 Radiance RMS Error 1.6 False− Dilation Ratio Average #errors 1 4 3.5 3 2.5 2 1.5 1 0.5 0 Dilation Ratio (d) BH, A=8 Average #errors (c) BH, A=16 0.8 0.6 0.4 0.2 0 2 4 6 8 10 12 14 16 18 20 Accuracy Setting (A) 0 0 2 4 6 8 10 12 14 16 18 20 Accuracy Setting (A) 0.8 0.6 Figure 14: “Venus with Ring” numerical statistics 0.4 0.2 0 0 2 4 6 8 10 12 14 16 18 20 Accuracy Setting (A) 0 0 2 4 6 8 10 12 14 16 18 20 Accuracy Setting (A) to infinity, the kNN density estimator does converge on the true density, but always from below. Figure 12: “Ring” numerical statistics 7. Hardware Implementation visual comparison of the images: the image rendered with A = 8 and A = 16 are virtually indistinguishable. These results suggest that intermediate values of A, between 8 to 10, should be used as a good compromise between query speed and solution accuracy. There are two ways to approach a hardware implementation of an algorithm in a hardware accelerator: with a custom hardware implementation, or as an extension or exploitation of current hardware support. While there would be certain advantages in a full custom hardware implementation of BH, this would probably lead to a large chunk of hardware with low utilisation rates. Although there are many potential applications of AkNN search beyond photon mapping (we list several in the conclusions), it seems more reasonable to consider first if BH can be implemented using current hardware and programmable shader features, and if not, what the smallest incremental changes would be. We have concluded that BH, while not quite implementable on today’s graphics hardware, should be implementable in the near future. It is apparent from the query time ratio plots that there exists a close-to-linear relationship between values of A and time required for a query into BH. This is consistent with the design of the A parameter; it corresponds directly to the number of photons accessed and processed for each query. Another important observation that can be made from the visual comparisons is that images with greater approximation value A look darker. This is because the density estimate is based on the inverse square of the radius of the sphere enclosing the k nearest neighbours. The approximate radius is always larger than the true radius. This is an inherent problem with any approximate solution to the kNN problem, and indeed even with the exact kNN density estimator: as k goes We consider only the lookup phase here, since the preprocessing would indeed require some custom hardware support, but support which perhaps could be shared with other useful features. In the lookup phase, (1) we compute hash c The Eurographics Association 2002. 96 Ma and McCool / Low Latency Photon Mapping Using Block Hashing 8. Conclusion and Future Work keys, (2) look up buckets in multiple hash tables, (3) merge and remove duplicates from the list of retrieved blocks and optionally sorting by priority, (4) retrieve the photon records stored in these blocks, and (5) process the photons. Steps (1) and (5) could be performed with current shader capabilities, although the ability to loop would be useful for the last step to avoid replicating the code to process each photon. Computing the hash function amounts to doing a number of comparisons, then adding up the zero-one results. This can be done in linear time with a relatively small number of instructions using the proposed DX9 fragment shader instruction set. If conditional assignment and array lookups into the register file are supported, this could be done in logarithmic time using binary search. We have presented an efficient, scalable, coherent and highly parallelisable AkNN scheme suitable for the highperformance implementation of photon mapping. The coherent memory access patterns of BH lead to improved performance even for a software implementation. However, in the near future we plan to implement the lookup phase of BH on an accelerator. Accelerator capabilities are not quite to the point where they can support this algorithm, but they are very close. What is missing is some form of true run-time conditional execution and looping, as well as greater capacity in terms of numbers of instructions. However, unlike tree-based algorithms, block hashing requires only bounded execution time and memory. Steps (2) and (4) amount to table lookups and can be implemented as nearest-neighbour texture-mapping operations with suitable encoding of the data. For instance, the hash tables might be supported with one texture map giving the priority and number of valid entries in the bucket, while another texture map or set of texture maps might give the block references, represented by texture coordinates pointing into another set of texture maps holding the photon records. An accelerator-based implementation would be most interesting if it is done in a way that permits other applications to make use of the fast AkNN capability it would provide. AkNN has many potential applications in graphics beyond photon maps. For rendering, it could also be used for sparse data interpolation (with many applications: visualisation of sparse volume data, BRDF and irradiance volume representation, and other sampled functions), sparse and multiresolution textures, procedural texture generation (specifically, Worley’s texture47 functions), direct ray-tracing of point-based objects40 , and gap-filling in forward projection point-based rendering48 . AkNN could also potentially be used for non-rendering purposes: collision detection, surface reconstruction, and physical simulation (for interacting particle systems). Unlike the case with tree-based algorithms, we feel that it will be feasible to implement BH as a shader subroutine in the near future, which may make it a key component in many potential applications of advanced programmable graphics accelerators. Step (3) is difficult to do efficiently without true conditionals and conditional looping. Sorting is not the problem, as it could be done with conditional assignment. The problem is that removal of a duplicate block reduces the number of blocks in the candidate set. We would like in this case to avoid making redundant photon lookups and executing the instructions to process them. Without true conditionals, an inefficient work-around is to make these redundant texture accesses and process the redundant photons anyhow, but discard their contribution by multiplying them by zero. We have not yet attempted an implementation of BH on an actual accelerator. Without looping, current accelerators do not permit nearly enough instructions in shaders to process k photons for adequate density estimation. However, we feel that it might be feasible to implement our algorithm on a next-generation shader unit using the “multiply redundant photons by zero” approach, if looping (a constant number of times) were supported at the fragment level. For a more detailed description of the block hashing algorithm, please refer to the author’s technical report32 . 9. Acknowledgements This research was funded by grants from the National Science and Engineering Research Council of Canada (NSERC), the Centre for Information Technology of Ontario (CITO), the Canadian Foundation for Innovation (CFI), the Ontario Innovation Trust (OIT), and the Bell University Labs initiative. We expect that the generation that follows DX9-class accelerators will probably have true conditional execution and looping, in which case implementation of BH will be both straightforward and efficient, and will not require any additional hardware or special shader instructions. It will also only require two stages of conditional texture lookup, and lookups in each stage can be performed in parallel. In comparison, it would be nearly impossible to implement a treebased search algorithm on said hardware due to the lack of a stack and the large number of dependent lookups that would be required. With a sufficiently powerful shading unit, of course, we could implement any algorithm we wanted, but BH makes fewer demands than a tree-based algorithm does. References 1. 2. c The Eurographics Association 2002. 97 P. K. Agarwal. Range Searching. In J. E. Goodman and J. O’Rourke, editors, Handbook of Discrete and Computational Geometry. CRC Press, July 1997. 2 P. K. Agarwal and J. Erickson. Geometric range searching and its relatives. Advances in Discrete and Computational Geometry, 23:1–56, 1999. 2 Ma and McCool / Low Latency Photon Mapping Using Block Hashing 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. S. Arya and D. M. Mount. Approximate Nearest Neighbor Queries in Fixed Dimensions. In Proc. ACM-SIAM SODA, 1993. 2 S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An Optimal Algorithm for Approximate Nearest Neighbor Searching. In Proc. ACM-SIAM SODA, pages 573–582, 1994. 2 J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), September 1975. 1, 2 J. L. Bentley, B. W. Weide, and A. C. Chow. Optimal Expected-Time Algorithms for Closest Point Problems. ACM TOMS, 6(4), December 1980. 1, 2 Per H. Christensen. Faster Photon Map Global Illumination. Journal of Graphics Tools, 4(3):1–10, 1999. 2 D. Comer. The Ubiquitous B-Tree. ACM Computing Surveys, 11(2):121–137, June 1979. 4 C. A. Duncan, M. T. Goodrich, and S. G. Kobourov. Balanced aspect ratio trees: Combining the advantages of k-d trees and octrees. In Proc. ACM-SIAM SODA, volume 10, pages 300–309, 1999. 2 H. Edelsbrunner. Algorithms in Combinatorial Geometry. Springer-Verlag, 1987. 2 D. Eppstein, M. S. Paterson, and F. F. Yao. On nearest neighbor graphs. Discrete & Computational Geometry, 17(3):263–282, April 1997. 2 C. Faloutsos and Y. Rong. DOT: A Spatial Access Method Using Fractals. In Proc. 7th Int. Conf. on Data Engineering,, pages 152–159, Kobe, Japan, 1991. 4 C. Faloutsos and S. Roseman. Fractals for Secondary Key Retrieval. In Proc. 8th ACM PODS, pages 247– 252, Philadelphia, PA, 1989. 4 J. H. Freidman, J. L. Bentley, and R. A. Finkel. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM TOMS, 3(3):209–226, 1977. 2 V. Gaede and O. Günther. Multidimensional access methods. ACM Computing Surveys (CSUR), 30(2):170–231, 1998. 2 A. Gionis, P. Indyk, and R. Motwani. Similarity Search in High Dimensions via Hashing. In Proc. VLDB, pages 518–529, 1999. 2, 3 J. E. Goodman and J. O’Rourke, editors. Handbook of Discrete and Computational Geometry. CRC Press, July 1997. ISBN: 0849385245. 2 G. Greger, P. Shirley, P. Hubbard, and D. Greenberg. The irradiance volume. IEEE CG&A, 18(2):32–43, 1998. 1 V. Havran. Analysis of Cache Sensitive Representation for Binary Space Partitioning Trees. Informatica, 23(3):203–210, May 2000. 2 P. Indyk and R. Motwani. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proc. ACM STOC, pages 604–613, 1998. 2 P. Indyk, R. Motwani, P. Raghavan, and S. Vem- 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. pala. Locality-Preserving Hashing in Multidimensional Spaces. In Proc. ACM STOC, pages 618–625, 1997. 2 H. V. Jagadish. Linear clustering of objects with multiple attributes. In Proc. Acm-sigmod, pages 332–342, May 1990. 4 J. W. Jaromczyk and G. T. Toussaint. Relative Neighborhood Graphs and Their Relatives. Proc. IEEE, 80(9):1502–1517, September 1992. 2 H. W. Jensen. Rendering Caustics on Non-Lambertian Surfaces. Computer Graphics Forum, 16(1):57–64, 1997. ISSN 0167-7055. 1 H. W. Jensen. Realistic Image Synthesis Using Photon Mapping. A.K. Peters, 2001. 1, 2, 4 H. W. Jensen, F. Suykens, and P. H. Christensen. A Practical Guide to Global Illumination using Photon Mapping. In SIGGRAPH 2001 Course Notes, number 38. ACM, August 2001. 2 J. K. P. Kuan and P. H. Lewis. Fast k Nearest Neighbour Search for R-tree Family. In Proc. of First Int. Conf. on Information, Communication and Signal Processing, pages 924–928, Singapore, 1997. 2 E. P. Lafortune and Y. D. Willems. Using the Modified Phong BRDF for Physically Based Rendering. Technical Report CW197, Department of Computer Science, K.U.Leuven, November 1994. 7 C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 2001. 6 K.-I. Lin and C. Yang. The ANN-Tree: An Index for Efficient Approximate Nearest-Neighbour Search. In Conf. on Database Systems for Advanced Applications, 2001. 2 N. Linial and O. Sasson. Non-Expansive Hashing. In Proc. acm stoc, pages 509–518, 1996. 2 V. Ma. Low Latency Photon Mapping using Block Hashing. Technical Report CS-2002-15, School of Computer Science, University of Waterloo, 2002. 9 J. McNames. A Fast Nearest-Neighbor Algorithm Based on a Principal Axis Search Tree. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(9):964–976, 2001. 2 J. Nievergelt, H. Hinterberger, and K. C. Sevcik. The Grid File: an adaptable, symmetric multikey file structure. ACM TODS, 9(1):38–71, March 1984. 3, 5 T. J. Purcell, I. Buck, W. R. Mark, and P. Hanrahan. Ray Tracing on Programmable Graphics Hardware. In to appear in Proc. SIGGRAPH, 2002. 1 J. T. Robinson. The K-D-B-tree: A Search Structure for Large Multidimensional Dynamic Indexes. In Proc. acm sigmod, pages 10–18, 1981. 2 Arnold L. Rosenberg and Lawrence Snyder. Time- and space-optimality in b-trees. ACM TODS, 6(1):174–193, 1981. 4 K. Salem and H. Garcia-Molina. Disk striping. In IEEE ICDE, pages 336–342, 1986. 5 c The Eurographics Association 2002. 98 Ma and McCool / Low Latency Photon Mapping Using Block Hashing 39. N. Sample, M. Haines, M. Arnold, and T. Purcell. Optimizing Search Strategies in kd-Trees. May 2001. 2 40. G. Schaufler and H. W. Jensen. Ray Tracing Point Sampled Geometry. Rendering Techniques 2000, pages 319–328, June 2000. 9 41. J. Schmittler, I. Wald, and P. Slusallek. SaarCOR - A Hardware Architecture for Ray Tracing. In to appear at EUROGRAPHICS Graphics Hardware, 2002. 1 42. M. Smid. Closest-Point Problems in Computational Geometry. In J. R. Sack and J. Urrutia, editors, Handbook on Computational Geometry. Elsevier Science, Amsterdam, North Holland, 2000. 2 43. P. Tsaparas. Nearest neighbor search in multidimensional spaces. Qualifying Depth Oral Report 31902, Dept. of Computer Science, University of Toronto, 1999. 2 44. M. Vanco, G. Brunnett, and T. Schreiber. A Hashing Strategy for Efficient k-Nearest Neighbors Computation. In Computer Graphics International, pages 120– 128. IEEE, June 1999. 2 45. I. Wald, T. Kollig, C. Benthin, A. Keller, and P. Slusallek. Interactive global illumination. Technical report, Computer Graphics Group, Saarland University, 2002. to be published at EUROGRAPHICS Workshop on Rendering 2002. 2 46. G. Ward. Real Pixels. In James Arvo, editor, Graphics Gems II, pages 80–83. Academic Press, 1991. 4 47. Steven Worley. A cellular texture basis function. In Proc. SIGGRAPH 1996, pages 291–294. ACM Press, 1996. 9 48. M. Zwicker, H. Pfister, J. van Baar, and M. Gross. Surface Splatting. Proc. SIGGRAPH 2001, pages 371–378, 2001. 9 c The Eurographics Association 2002. 99 Ma and McCool / Low Latency Photon Mapping Using Block Hashing (a) kd-tree (b) BH, A=16 (c) BH, A=8 (d) BH, A=4 (c) BH, A=8 (d) BH, A=4 Figure 13: “Ring” (a) kd-tree (b) BH, A=16 Figure 14: “Venus with Ring” c The Eurographics Association 2002. 158