Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Array regrouping and structure splitting using whole-program reference affinity

Published: 09 June 2004 Publication History

Abstract

While the memory of most machines is organized as a hierarchy, program data are laid out in a uniform address space. This paper defines a model of reference affinity, which measures how close a group of data are accessed together in a reference trace. It proves that the model gives a hierarchical partition of program data. At the top is the set of all data with the weakest affinity. At the bottom is each data element with the strongest affinity. Based on the theoretical model, the paper presents k-distance analysis, a practical test for the hierarchical affinity of source-level data. When used for array regrouping and structure splitting, k-distance analysis consistently outperforms data organizations given by the programmer, compiler analysis, frequency profiling, statistical clustering, and all other methods we have tried.

References

[1]
W. Abu-Sufah, D. Kuck, and D. Lawrie. On the performance enhancement of paging systems through program analysis and transformations. IEEE Transactions on Computers, C-30(5):341--356, May 1981.]]
[2]
R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, October 2001.]]
[3]
G. Almasi, C. Cascaval, and D. Padua. Calculating stack distances efficiently. In Proceedings of the first ACM SIGPLAN Workshop on Memory System Performance, Berlin, Germany, June 2002.]]
[4]
J. Anderson, S. Amarasinghe, and M. Lam. Data and computation transformation for multiprocessors. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Santa Barbara, CA, July 1995.]]
[5]
K. Beyls and E. D'Hollander. Reuse distance as a metric for cache behavior. In Proceedings of the IASTED Conference on Parallel and Distributed Computing and Systems, August 2001.]]
[6]
K. Beyls and E. D'Hollander. Reuse distance-based cache hint selection. In Proceedings of the 8th International Euro-Par Conference, Paderborn, Germany, August 2002.]]
[7]
B. Calder, C. Krintz, S. John, and T. Austin. Cache-conscious data placement. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, Oct 1998.]]
[8]
T. M. Chilimbi. Efficient representations and abstractions for quantifying and exploiting data reference locality. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, June 2001.]]
[9]
T. M. Chilimbi, B. Davidson, and J. R. Larus. Cache-conscious structure definition. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Atlanta, Georgia, May 1999.]]
[10]
T. M. Chilimbi, M. D. Hill, and J. R. Larus. Cache-conscious structure layout. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Atlanta, Georgia, May 1999.]]
[11]
R. Das, D. Mavriplis, J. Saltz, S. Gupta, and R. Ponnusamy. The design and implementation of a parallel unstructured euler solver using software primitives. In Proceedings of the 30th Aerospace Science Meeting, Reno, Navada, January 1992.]]
[12]
R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, Sept. 1994.]]
[13]
C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the SIGPLAN '99 Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999.]]
[14]
C. Ding and K. Kennedy. Inter-array data regrouping. In Proceedings of The 12th International Workshop on Languages and Compilers for Parallel Computing, La Jolla, California, August 1999.]]
[15]
C. Ding and K. Kennedy. Improving effective bandwidth through compiler enhancement of global cache reuse. Journal of Parallel and Distributed Computing, 64(1), 2004.]]
[16]
C. Ding and Y. Zhong. Compiler-directed run-time monitoring of program data access. In Proceedings of the first ACM SIGPLAN Workshop on Memory System Performance, Berlin, Germany, June 2002.]]
[17]
C. Ding and Y. Zhong. Predicting whole-program locality with reuse distance analysis. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003.]]
[18]
D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformations. In Proceedings of the First International Conference on Supercomputing. Springer-Verlag, Athens, Greece, June 1987.]]
[19]
N. Gloy and M. D. Smith. Procedure placement using temporal-ordering information. ACM Transactions on Programming Languages and Systems, 21(5), September 1999.]]
[20]
H. Han and C. W. Tseng. Locality optimizations for adaptive irregular scientific codes. Technical report, Department of Computer Science, University of Maryland, College Park, 2000.]]
[21]
J. A. Hartigan. Clustering Algorithms. John Wiley & Sons, 1975.]]
[22]
T. E. Jeremiassen and S. J. Eggers. Reducing false sharing on shared memory multiprocessors through compile time data transformations. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 179--188, Santa Barbara, CA, July 1995.]]
[23]
S. Jiang and X. Zhang. LIRS: an efficient low inter-reference recency set replacement to improve buffer cache performance. In Proceedings of ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, Marina Del Rey, California, June 2002.]]
[24]
K. Kennedy and U. Kremer. Automatic data layout for distributed memory machines. ACM Transactions on Programming Languages and Systems, 20(4), 1998.]]
[25]
D. Knuth. An empirical study of FORTRAN programs. Software-Practice and Experience, 1:105--133, 1971.]]
[26]
Z. Li, J. Gu, and G. Lee. An evaluation of the potential benefits of register allocation for array references. In Workshop on Interaction between Compilers and Computer Architectures in conjuction with the HPCA-2, San Jose, California, February 1996.]]
[27]
J. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statisitics and Probability, pages 281--297, 1967.]]
[28]
R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78--117, 1970.]]
[29]
K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424--453, July 1996.]]
[30]
J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. International Journal of Parallel Programming, 29(3), June 2001.]]
[31]
N. Mitchell, L. Carter, and J. Ferrante. Localizing non-affine array references. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, Newport Beach, California, October 1999.]]
[32]
D. Pelleg and A. Moore. X-means: Extending k-means with efficient estimaiton of the number of clusters. In Proceddings of the 17th International Conference on Machine Learning, pages 727--734, San Francisco, CA, 2000.]]
[33]
E. Petrank and D. Rawitz. The hardness of cache conscious data placement. In Proceedings of ACM Symposium on Principles of Programming Languages, Portland, Oregon, January 2002.]]
[34]
R. M. Rabbah and K. V. Palem. Data remapping for design space optimization of embedded memory systems. ACM Transactions in Embedded Computing Systems, 2(2), 2003.]]
[35]
M. L. Seidl and B. G. Zorn. Segregating heap objects by reference behavior and lifetime. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, Oct 1998.]]
[36]
X. Shen, Y. Zhong, and C. Ding. Regression-based multi-model prediction of data reuse signature. In Proceedings of the 4th Annual Symposium of the Las Alamos Computer Science Institute, Sante Fe, New Mexico, November 2003.]]
[37]
M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003.]]
[38]
K. O. Thabit. Cache Management by the Compiler. PhD thesis, Dept. of Computer Science, Rice University, 1981.]]
[39]
M. E. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991.]]
[40]
Y. Zhong, C. Ding, and K. Kennedy. Reuse distance analysis for scientific programs. In Proceedings of Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers, Washington DC, March 2002.]]
[41]
Y. Zhong, S. G. Dropsho, and C. Ding. Miss rate preidiction across all program inputs. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, New Orleans, Louisiana, September 2003.]]
[42]
Y. Zhong, X. Shen, and C. Ding. A hierarchical model of reference affinity. In Proceedings of the 16th International Workshop on Languages and Compilers for Parallel Computing, College Station, Texas, October 2003.]]
[43]
Y. Zhou, P. M. Chen, and K. Li. The multi-queue replacement algorithm for second level buffer caches. In Proceedings of USENIX Technical Conference, June 2001.]]

Cited By

View all
  • (2025)AC-Cache: A Memory-Efficient Caching System for Small Objects via Exploiting Access CorrelationsProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710856(142-155)Online publication date: 28-Feb-2025
  • (2024)CBANA: A Lightweight, Efficient, and Flexible Cache Behavior Analysis FrameworkIEEE Transactions on Computers10.1109/TC.2024.341674773:9(2262-2274)Online publication date: 1-Sep-2024
  • (2023)Performance Prediction for Scalability AnalysisPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_6(129-161)Online publication date: 19-Jun-2023
  • Show More Cited By

Index Terms

  1. Array regrouping and structure splitting using whole-program reference affinity

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 39, Issue 6
    PLDI '04
    May 2004
    299 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/996893
    Issue’s Table of Contents
    • cover image ACM Conferences
      PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
      June 2004
      310 pages
      ISBN:1581138075
      DOI:10.1145/996841
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2004
    Published in SIGPLAN Volume 39, Issue 6

    Check for updates

    Author Tags

    1. array regrouping
    2. program locality
    3. program transformation
    4. reference affinity
    5. reuse signature
    6. structure splitting
    7. volume distance

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)AC-Cache: A Memory-Efficient Caching System for Small Objects via Exploiting Access CorrelationsProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710856(142-155)Online publication date: 28-Feb-2025
    • (2024)CBANA: A Lightweight, Efficient, and Flexible Cache Behavior Analysis FrameworkIEEE Transactions on Computers10.1109/TC.2024.341674773:9(2262-2274)Online publication date: 1-Sep-2024
    • (2023)Performance Prediction for Scalability AnalysisPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_6(129-161)Online publication date: 19-Jun-2023
    • (2022)A Comprehensive Survey on Affinity Analysis, Bibliomining, and Technology Mining: Past, Present, and Future ResearchApplied Sciences10.3390/app1210522712:10(5227)Online publication date: 21-May-2022
    • (2022)MemSweeper: virtualizing cluster memory management for high memory utilization and isolationProceedings of the 2022 ACM SIGPLAN International Symposium on Memory Management10.1145/3520263.3534651(15-28)Online publication date: 14-Jun-2022
    • (2022)STAFF: A Model for Structure Layout Optimization2022 7th International Conference on Computer and Communication Systems (ICCCS)10.1109/ICCCS55155.2022.9846314(115-122)Online publication date: 22-Apr-2022
    • (2019)A Fast Joint Application-Architecture Exploration Platform for Heterogeneous SystemsEmbedded, Cyber-Physical, and IoT Systems10.1007/978-3-030-16949-7_9(203-232)Online publication date: 29-Jun-2019
    • (2018)LWPTool: A Lightweight Profiler to Guide Data Layout OptimizationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.284099229:11(2489-2502)Online publication date: 11-Oct-2018
    • (2016)The hardness of data packingACM SIGPLAN Notices10.1145/2914770.283766951:1(232-242)Online publication date: 11-Jan-2016
    • (2016)StructSlim: a lightweight profiler to guide structure splittingProceedings of the 2016 International Symposium on Code Generation and Optimization10.1145/2854038.2854053(36-46)Online publication date: 29-Feb-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media