Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Automatically enhancing locality for tree traversals with traversal splicing

Published: 19 October 2012 Publication History

Abstract

Generally applicable techniques for improving temporal locality in irregular programs, which operate over pointer-based data structures such as trees and graphs, are scarce. Focusing on a subset of irregular programs, namely, tree traversal algorithms like Barnes-Hut and nearest neighbor, previous work has proposed point blocking, a technique analogous to loop tiling in regular programs, to improve locality. However point blocking is highly dependent on point sorting, a technique to reorder points so that consecutive points will have similar traversals. Performing this a priori sort requires an understanding of the semantics of the algorithm and hence highly application specific techniques. In this work, we propose traversal splicing, a new, general, automatic locality optimization for irregular tree traversal codes, that is less sensitive to point order, and hence can deliver substantially better performance, even in the absence of semantic information. For six benchmark algorithms, we show that traversal splicing can deliver single-thread speedups of up to 9.147 (geometric mean: 3.095) over baseline implementations, and up to 4.752 (geometric mean: 2.079) over point-blocked implementations. Further, we show that in many cases, automatically applying traversal splicing to a baseline implementation yields performance that is better than carefully hand-optimized implementations.

References

[1]
T. Aila and T. Karras. Architecture considerations for tracing incoherent rays. In Proceedings of the Conference on High Performance Graphics, HPG '10, pages 113--122, Aire-la-Ville, Switzerland, Switzerland, 2010. Eurographics Association.
[2]
M. Amor, F. Argüello, J. López, O. G. Plata, and E. L. Zapata. A data parallel formulation of the barnes-hut method for n -body simulations. In Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia, pages 342--349, 2001.
[3]
J. Barnes and P. Hut. A hierarchical o(n log n) force-calculation algorithm. Nature, 324(4):446--449, December 1986.
[4]
J. L. Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18:509--517, September 1975.
[5]
E. Bingham and H. Mannila. Random projection in dimensionality reduction: applications to image and text data. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '01, pages 245--250, New York, NY, USA, 2001. ACM.
[6]
T. M. Chilimbi, B. Davidson, and J. R. Larus. Cache-conscious structure definition. In Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, pages 13--24, 1999.
[7]
T. M. Chilimbi, M. D. Hill, and J. R. Larus. Cache-conscious structure layout. In Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, pages 1--12, 1999.
[8]
T. M. Chilimbi and J. R. Larus. Using generational garbage collection to implement cache-conscious data placement. In Proceedings of the 1st international symposium on Memory management, pages 37--48, 1998.
[9]
C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, pages 229--241, 1999.
[10]
J. Dongarra, K. London, S. Moore, P. Mucci, and D. Terpstra. Using papi for hardware performance monitoring on linux systems. In In Conference on Linux Clusters: The HPC Revolution, Linux Clusters Institute, 2001.
[11]
T. Ekman and G. Hedin. The jastadd extensible java compiler. In Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications, pages 1--18, 2007.
[12]
A. Frank and A. Asuncion. UCI machine learning repository, 2010.
[13]
A. Georges, D. Buytaert, and L. Eeckhout. Statistically rigorous java performance evaluation. In Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications, OOPSLA '07, pages 57--76, New York, NY, USA, 2007. ACM.
[14]
R. Ghiya, L. Hendren, and Y. Zhu. Detecting parallelism in c programs with recursive data structures. IEEE Transactions on Parallel and Distributed Systems, 1:35--47, 1998.
[15]
R. Ghiya and L. J. Hendren. Is it a tree, a dag, or a cyclic graph? a shape analysis for heap-directed pointers in c. In POPL '96: Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 1--15, 1996.
[16]
A. G. Gray and A. W. Moore. $N$-Body Problems in Statistical Learning. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems (NIPS) 13 (Dec 2000), 2001.
[17]
M. Greenspan and M. Yurick. Approximate kd-tree search for efficient ICP. In Fourth International Conference on 3-D Digital Imaging and Modeling, pages 442--448, 2003.
[18]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. SIGKDD Explor. Newsl., 11(1):10--18, Nov. 2009.
[19]
Y. Jo and M. Kulkarni. Enhancing locality for recursive traversals of recursive structures. In Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications, pages 463--482, 2011.
[20]
K. Kennedy and J. Allen, editors. Optimizing compilers for modren architectures:a dependence-based approach. 2001.
[21]
M. Kulkarni, M. Burtscher, K. Pingali, and C. Cascaval. Lonestar: A suite of parallel irregular programs. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 65--76, April 2009.
[22]
C. Lattner and V. Adve. Automatic pool allocation: improving performance by controlling data structure layout in the heap. In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 129--142, 2005.
[23]
G. Loosli, S. Canu, and L. Bottou. Training invariant support vector machines using selective sampling, 2005.
[24]
E. Mansson, J. Munkberg, and T. Akenine-Moller. Deep coherent ray tracing. In Proceedings of the 2007 IEEE Symposium on Interactive Ray Tracing, pages 79--85, 2007.
[25]
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation Techniques for Storage Hierarchies. IBM Systems Journal, 9(2):78--117, 1970.
[26]
L. A. Meyerovich, T. Mytkowicz, and W. Schulte. Data parallel programming for irregular tree computations. In 3rd USENIX workshop on hot topics in parallelism, 2011.
[27]
N. Mitchell, L. Carter, and J. Ferrante. Localizing non-affine array references. In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, pages 192--, 1999.
[28]
B. Moon, Y. Byun, T.-J. Kim, P. Claudio, H.-S. Kim, Y.-J. Ban, S. W. Nam, and S.-E. Yoon. Cache-oblivious ray reordering. ACM Trans. Graph., 29(3):28:1--28:10, July 2010.
[29]
P. A. Navratil. Memory-efficient, scalable ray tracing. PhD thesis, 2010.
[30]
P. A. Navratil, D. S. Fussell, C. Lin, and W. R. Mark. Dynamic ray scheduling to improve ray coherence and bandwidth utilization. In Proceedings of the 2007 IEEE Symposium on Interactive Ray Tracing, RT '07, pages 95--104, Washington, DC, USA, 2007. IEEE Computer Society.
[31]
S. M. Omohundro. Five balltree construction algorithms. Technical report, 1989.
[32]
M. Pharr, C. Kolb, R. Gershbein, and P. Hanrahan. Rendering complex scenes with memory-coherent ray tracing. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques, pages 101--108, 1997.
[33]
V. K. Pingali, S. A. McKee, W. C. Hseih, and J. B. Carter. Computation regrouping: restructuring programs for temporal data cache locality. In Proceedings of the 16th international conference on Supercomputing, pages 252--261, 2002.
[34]
M. Rinard and P. C. Diniz. Commutativity analysis: a new analysis technique for parallelizing compilers. ACM Trans. Program. Lang. Syst., 19(6):942--991, 1997.
[35]
M. Sagiv, T. Reps, and R. Wilhelm. Parametric shape analysis via 3-valued logic. ACM Transactions on Programming Languages and Systems, 24(3), May 2002.
[36]
J. P. Singh, C. Holt, T. Totsuka, A. Gupta, and J. Hennessy. Load balancing and data locality in adaptive hierarchical n-body methods: Barnes-hut, fast multipole, and radiosity. J. Parallel Distrib. Comput., 27(2):118--141, 1995.
[37]
M. M. Strout, L. Carter, and J. Ferrante. Rescheduling for locality in sparse matrix computations. In Proceedings of the International Conference on Computational Sciences-Part I, pages 137--148, 2001.
[38]
D. N. Truong, F. Bodin, and A. Seznec. Improving cache behavior of dynamically allocated data structures. In Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, pages 322--, 1998.
[39]
B. Walter, K. Bala, M. Kulkarni, and K. Pingali. Fast agglomerative clustering for rendering. In IEEE Symposium on Interactive Ray Tracing (RT), pages 81--86, August 2008.
[40]
Z. Wang, C. Wu, and P.-C. Yew. On improving heap memory layout by dynamic pool allocation. In Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, CGO '10, pages 92--100, New York, NY, USA, 2010. ACM.

Cited By

View all
  • (2015)Tree dependence analysisACM SIGPLAN Notices10.1145/2813885.273797250:6(314-325)Online publication date: 3-Jun-2015
  • (2015)Tree dependence analysisProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2737924.2737972(314-325)Online publication date: 3-Jun-2015
  • (2022)UniRec: a unimodular-like framework for nested recursions and loopsProceedings of the ACM on Programming Languages10.1145/35633336:OOPSLA2(1264-1290)Online publication date: 31-Oct-2022
  • Show More Cited By

Index Terms

  1. Automatically enhancing locality for tree traversals with traversal splicing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 47, Issue 10
    OOPSLA '12
    October 2012
    1011 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2398857
    Issue’s Table of Contents
    • cover image ACM Conferences
      OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applications
      October 2012
      1052 pages
      ISBN:9781450315616
      DOI:10.1145/2384616
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 October 2012
    Published in SIGPLAN Volume 47, Issue 10

    Check for updates

    Author Tags

    1. cache
    2. irregular programs
    3. locality transformations
    4. temporal locality
    5. tree traversals

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2015)Tree dependence analysisACM SIGPLAN Notices10.1145/2813885.273797250:6(314-325)Online publication date: 3-Jun-2015
    • (2015)Tree dependence analysisProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2737924.2737972(314-325)Online publication date: 3-Jun-2015
    • (2022)UniRec: a unimodular-like framework for nested recursions and loopsProceedings of the ACM on Programming Languages10.1145/35633336:OOPSLA2(1264-1290)Online publication date: 31-Oct-2022
    • (2021)Compiling pattern matching to in-place modificationsProceedings of the 20th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3486609.3487204(123-129)Online publication date: 17-Oct-2021
    • (2021)Reasoning about recursive tree traversalsProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441617(47-61)Online publication date: 17-Feb-2021
    • (2020)Postcondition-preserving fusion of postorder tree transformationsProceedings of the 29th International Conference on Compiler Construction10.1145/3377555.3377884(191-200)Online publication date: 22-Feb-2020
    • (2019)Composable, sound transformations of nested recursion and loopsProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314592(902-917)Online publication date: 8-Jun-2019
    • (2019)A Decidable Logic for Tree Data-Structures with MeasurementsVerification, Model Checking, and Abstract Interpretation10.1007/978-3-030-11245-5_15(318-341)Online publication date: 11-Jan-2019
    • (2017)Miniphases: compilation using modular and efficient tree transformationsACM SIGPLAN Notices10.1145/3140587.306234652:6(201-216)Online publication date: 14-Jun-2017
    • (2017)Locality Transformations for Nested Recursive Iteration SpacesACM SIGARCH Computer Architecture News10.1145/3093337.303772045:1(281-295)Online publication date: 4-Apr-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media