Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

HOTL: a higher order theory of locality

Published: 16 March 2013 Publication History

Abstract

The locality metrics are many, for example, miss ratio to test performance, data footprint to manage cache sharing, and reuse distance to analyze and optimize a program. It is unclear how different metrics are related, whether one subsumes another, and what combination may represent locality completely.
This paper first derives a set of formulas to convert between five locality metrics and gives the condition for correctness. The transformation is analogous to differentiation and integration used to convert between higher order polynomials. As a result, these metrics can be assigned an order and organized into a hierarchy.
Using the new theory, the paper then develops two techniques: one measures the locality in real time without special hardware support, and the other predicts multicore cache interference without parallel testing. The paper evaluates them using sequential and parallel programs as well as for a parallel mix of sequential programs.

References

[1]
R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, Oct. 2001.
[2]
M. Arnold and B. G. Ryder. A framework for reducing the cost of instrumented code. In Proceedings of PLDI, pages 168--179, Snowbird, Utah, June 2001.
[3]
E. Berg and E. Hagersten. Fast data-locality profiling of native execution. In Proceedings of SIGMETRICS, pages 169--180, 2005.
[4]
K. Beyls and E. D'Hollander. Generating cache hints for improved program efficiency. Journal of Systems Architecture, 51(4):223--250, 2005.
[5]
K. Beyls and E. D'Hollander. Discovery of locality-improving refactoring by reuse path analysis. In Proceedings of HPCC. Springer. Lecture Notes in Computer Science Vol. 4208, pages 220--229, 2006.
[6]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of PACT, pages 72--81, 2008.
[7]
C. Cascaval, E. Duesterwald, P. F. Sweeney, and R. W. Wisniewski. Multiple page size modeling and optimization. In Proceedings of PACT, pages 339--349, 2005.
[8]
C. Cascaval and D. A. Padua. Estimating cache misses and locality using stack distances. In Proceedings of ICS, pages 150--159, 2003.
[9]
D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of HPCA, pages 340--351, 2005.
[10]
A. Chauhan and C.-Y. Shei. Static reuse distances for locality-based optimizations in MATLAB. In Proceedings of ICS, pages 295--304, 2010.
[11]
T. M. Chilimbi and M. Hirzel. Dynamic hot data stream prefetching for general-purpose programs. In Proceedings of PLDI, Berlin, Germany, June 2002.
[12]
H. Cui, Q. Yi, J. Xue, L. Wang, Y. Yang, and X. Feng. A highly parallel reuse distance analysis algorithm on gpus. In Proceedings of IPDPS, 2012.
[13]
P. J. Denning. The working set model for program behaviour. Communications of ACM, 11(5):323--333, 1968.
[14]
P. J. Denning. Working sets past and present. IEEE Transactions on Software Engineering, SE-6(1), Jan. 1980.
[15]
P. J. Denning and S. C. Schwartz. Properties of the working set model. Communications of ACM, 15(3):191--198, 1972.
[16]
P. J. Denning and D. R. Slutz. Generalized working sets for segment reference strings. Communications of ACM, 21(9):750--759, 1978.
[17]
C. Ding and T. Chilimbi. All-window profiling of concurrent executions. In Proceedings of PPoPP, 2008. phposter paper.
[18]
C. Ding and T. Chilimbi. A composable model for analyzing locality of multi-threaded programs. Technical Report MSR-TR-2009--107, Microsoft Research, August 2009.
[19]
D. Eklov, D. Black-Schaffer, and E. Hagersten. Fast modeling of shared caches in multicore systems. In Proceedings of HiPEAC, pages 147--157, 2011. phbest paper.
[20]
D. Eklov and E. Hagersten. StatStack: Efficient modeling of LRU caches. In Proceedings of ISPASS, pages 55--65, 2010.
[21]
C. Fang, S. Carr, S. Önder, and Z. Wang. Path-based reuse distance analysis. In Proceedings of CC, pages 32--46, 2006.
[22]
S. Gupta, P. Xiang, Y. Yang, and H. Zhou. Locality principle revisited: A probability-based quantitative approach. In Proceedings of IPDPS, 2012.
[23]
J. L. Henning. Spec cpu2006 benchmark descriptions. SIGARCH Computer Architecture News, 34(4):1--17, 2006.
[24]
M. D. Hill. Aspects of cache memory and instruction buffer performance. PhD thesis, University of California, Berkeley, Nov. 1987.
[25]
M. D. Hill and A. J. Smith. Evaluating associativity in CPU caches. IEEE Transactions on Computers, 38(12):1612--1630, 1989.
[26]
Y. Jiang, K. Tian, and X. Shen. Combining locality analysis with online proactive job co-scheduling in chip multiprocessors. In Proceedings of HiPEAC, pages 201--215, 2010.
[27]
Y. Jiang, E. Z. Zhang, K. Tian, and X. Shen. Is reuse distance applicable to data locality analysis on chip multiprocessors? In Proceedings of CC, pages 264--282, 2010.
[28]
S. F. Kaplan, Y. Smaragdakis, and P. R. Wilson. Flexible reference trace reduction for VM simulations. ACM Transactions on Modeling and Computer Simulation, 13(1):1--38, 2003.
[29]
C.-K. Luk, R. S. Cohn, R. Muth, H. Patil, A. Klauser, P. G. Lowney, S. Wallace, V. J. Reddi, and K. M. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of PLDI, pages 190--200, 2005.
[30]
G. Marin and J. Mellor-Crummey. Cross architecture performance predictions for scientific applications using parameterized models. In Proceedings of SIGMETRICS, pages 2--13, 2004.
[31]
R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78--117, 1970.
[32]
T. Moseley, A. Shye, V. J. Reddi, D. Grunwald, and R. Peri. Shadow profiling: Hiding instrumentation costs with parallelism. In Proceedings of CGO, pages 198--208, 2007.
[33]
Q. Niu, J. Dinan, Q. Lu, and P. Sadayappan. PARDA: A fast parallel reuse distance analysis algorithm. In Proceedings of IPDPS, 2012.
[34]
F. Olken. Efficient methods for calculating the success function of fixed space replacement policies. Technical Report LBL-12370, Lawrence Berkeley Laboratory, 1981.
[35]
D. L. Schuff, M. Kulkarni, and V. S. Pai. Accelerating multicore reuse distance analysis with sampling and parallelization. In Proceedings of PACT, pages 53--64, 2010.
[36]
X. Shen, J. Shaw, B. Meeker, and C. Ding. Locality approximation using time. In Proceedings of POPL, pages 55--61, 2007.
[37]
A. J. Smith. On the effectiveness of set associative page mapping and its applications in main memory management. In Proceedings of ICSE, 1976.
[38]
R. A. Sugumar and S. G. Abraham. Efficient simulation of caches under optimal replacement with applications to miss characterization. In Proceedings of SIGMETRICS, Santa Clara, CA, May 1993.
[39]
G. E. Suh, S. Devadas, and L. Rudolph. Analytical cache models with applications to cache partitioning. In Proceedings of ICS, pages 1--12, 2001.
[40]
D. K. Tam, R. Azimi, L. Soares, and M. Stumm. RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations. In Proceedings of ASPLOS, pages 121--132, 2009.
[41]
D. Thiébaut and H. S. Stone. Footprints in the cache. ACM Transactions on Computer Systems, 5(4):305--329, 1987.
[42]
S. Wallace and K. Hazelwood. Superpin: Parallelizing dynamic instrumentation for real-time performance. In Proceedings of CGO, pages 209--220, 2007.
[43]
M.-J. Wu and D. Yeung. Coherent profiles: Enabling efficient reuse distance analysis of multicore scaling for loop-based parallel programs. In Proceedings of PACT, pages 264--275, 2011.
[44]
M.-J. Wu and D. Yeung. Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis. In Proceedings of the ACM SIGPLAN Workshop on Memory System Performance and Correctness, pages 2--11, 2012.
[45]
X. Xiang, B. Bao, T. Bai, C. Ding, and T. M. Chilimbi. All-window profiling and composable models of cache sharing. In Proceedings of PPoPP, pages 91--102, 2011.
[46]
X. Xiang, B. Bao, C. Ding, and Y. Gao. Linear-time modeling of program working set in shared cache. In Proceedings of PACT, pages 350--360, 2011.
[47]
X. Xiang, B. Bao, C. Ding, and K. Shen. Cache conscious task regrouping on multicore processors. In Proceedings of CCGrid, pages 603--611, 2012.
[48]
Y. Zhong and W. Chang. Sampling-based program locality approximation. In Proceedings of ISMM, pages 91--100, 2008.
[49]
Y. Zhong, X. Shen, and C. Ding. Program locality analysis using reuse distance. ACM Transactions on Programming Languages and Systems, 31(6):1--39, Aug. 2009.
[50]
X. Zhuang, M. J. Serrano, H. W. Cain, and J.-D. Choi. Accurate, efficient, and adaptive calling context profiling. In Proceedings of PLDI, pages 263--271, 2006.
[51]
S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of ASPLOS, pages 129--142, 2010.

Cited By

View all
  • (2023)FLORIA: A Fast and Featherlight Approach for Predicting Cache PerformanceProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593740(25-36)Online publication date: 21-Jun-2023
  • (2022)CASHT: Contention Analysis in Shared Hierarchies with TheftsACM Transactions on Architecture and Code Optimization10.1145/349453819:1(1-27)Online publication date: 23-Jan-2022
  • (2022) ReuseTracker : Fast Yet Accurate Multicore Reuse Distance Analyzer ACM Transactions on Architecture and Code Optimization10.1145/348419919:1(1-25)Online publication date: 31-Mar-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 48, Issue 4
ASPLOS '13
April 2013
540 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2499368
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
    March 2013
    574 pages
    ISBN:9781450318709
    DOI:10.1145/2451116
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 March 2013
Published in SIGPLAN Volume 48, Issue 4

Check for updates

Author Tags

  1. locality metrics
  2. locality modeling

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)103
  • Downloads (Last 6 weeks)10
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)FLORIA: A Fast and Featherlight Approach for Predicting Cache PerformanceProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593740(25-36)Online publication date: 21-Jun-2023
  • (2022)CASHT: Contention Analysis in Shared Hierarchies with TheftsACM Transactions on Architecture and Code Optimization10.1145/349453819:1(1-27)Online publication date: 23-Jan-2022
  • (2022) ReuseTracker : Fast Yet Accurate Multicore Reuse Distance Analyzer ACM Transactions on Architecture and Code Optimization10.1145/348419919:1(1-25)Online publication date: 31-Mar-2022
  • (2022)Data layout optimization based on the spatio-temporal model of field access2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)10.1109/AEMCSE55572.2022.00055(238-244)Online publication date: Apr-2022
  • (2017)Security Analysis of Cache Replacement PoliciesProceedings of the 6th International Conference on Principles of Security and Trust - Volume 1020410.1007/978-3-662-54455-6_9(189-209)Online publication date: 22-Apr-2017
  • (2014)Performance Metrics and Models for Shared CacheJournal of Computer Science and Technology10.1007/s11390-014-1460-729:4(692-712)Online publication date: 4-Jul-2014
  • (2025)Scalpel: High Performance Contention-Aware Task Co-Scheduling for Shared Cache HierarchyIEEE Transactions on Computers10.1109/TC.2024.350038174:2(678-690)Online publication date: Feb-2025
  • (2024)Parallel Loop Locality Analysis for Symbolic Thread CountsProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676948(219-232)Online publication date: 14-Oct-2024
  • (2023)MicroProf: Code-level Attribution of Unnecessary Data Transfer in Microservice ApplicationsACM Transactions on Architecture and Code Optimization10.1145/3622787Online publication date: 8-Sep-2023
  • (2023)Hardware Counter-based Performance Analysis of ANUGA Flood SimulatorProceedings of the 2023 Fifteenth International Conference on Contemporary Computing10.1145/3607947.3608041(412-418)Online publication date: 3-Aug-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media