Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/301618.301633acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article
Free access

Cache-conscious structure layout

Published: 01 May 1999 Publication History

Abstract

Hardware trends have produced an increasing disparity between processor speeds and memory access times. While a variety of techniques for tolerating or reducing memory latency have been proposed, these are rarely successful for pointer-manipulating programs.This paper explores a complementary approach that attacks the source (poor reference locality) of the problem rather than its manifestation (memory latency). It demonstrates that careful data organization and layout provides an essential mechanism to improve the cache locality of pointer-manipulating programs and consequently, their performance. It explores two placement techniques---clustering and coloring---that improve cache performance by increasing a pointer structure's spatial and temporal locality, and by reducing cache-conflicts.To reduce the cost of applying these techniques, this paper discusses two strategies---cache-conscious reorganization and cache-conscious allocation---and describes two semi-automatic tools---ccmorph and ccmalloc---that use these strategies to produce cache-conscious pointer structure layouts. ccmorph is a transparent tree reorganizer that utilizes topology information to cluster and color the structure. ccmalloc is a cache-conscious heap allocator that attempts to co-locate contemporaneously accessed data elements in the same physical cache block. Our evaluations, with microbenchmarks, several small benchmarks, and a couple of large real-world applications, demonstrate that the cache-conscious structure layouts produced by ccmorph and ccmalloc offer large performance benefits---in most cases, significantly outperforming state-of-the-art prefetching.

References

[1]
A. Agarwal, M. Horowitz, and J. Hennessy. "An analytical cache model." A CM Transactions on Computer Systems, 7(2): 184--215, 1989.
[2]
A.V. Aho, P.J. Denning, and J. D. Ullman. "Principles of optimal page replacement." Journal of the ACM, 18(1):80- 93, 1971.
[3]
J. Banerjee, W. Kim, and J. F. Garza. "Clustering a DAG for CAD databases." IEEE Transactions on Software Engineering, 14(11): 1684--1699, 1988.
[4]
R. Bayer and C. McCreight. "Organization and maintenance of large ordered indexes." Acta Informatica, 1(3):173--189, 1972.
[5]
Veronique Benzaken and Claude Delobel. "Enhancing performance in a persistent object store: Clustering strategies in O2.' In Technical Report 50-90, Altair, Aug. 1990.
[6]
R.K. Brayton, G D. Hachtel, A. S. Vincentelli, F. Somenzi, A. Aziz, S. Cheng, S. Edwards, S. Khatri, Y. Kukimoto, A. Pardo, S. Qadeer, R. Ranjan, S. Sarwary, T.R. Shilpe, G Swamy, and T. Villa. "VIS: a system for verification and synthesis." In Proceedings of the Eight International Conference on Computer Aided Verification, July 1996.
[7]
Doug Burger, James R. Goodman, and Alain Kagi. "Memory bandwidth limitations of future microprocessors." In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 78--89, May 1996.
[8]
Brad Calder, Chandra Krintz, Simmi John, and Todd Austin. "Cache-conscious data placement." In Proceedings of the Eight International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII), pages 139-149, Oct. 1998.
[9]
David Callahan, Ken Kennedy, and Allan Poterfield. "Software prefetching." in Proceedings of the Fourth International Conjkrence on Architectural Support jbr Programming Languages and Operating Systems (ASPLOS IV), pages 40--52, April 1991.
[10]
Steve Carr, Kathryn S. McKinley, and Chau-Wen Tseng. "Compiler optimizations for improving data locality." In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V1), pages 252-262, Oct. 1994.
[11]
Trishul M. Chilimbi, and James R. Larus. "Using generational garbage collection to implement cache-conscious data placement." In Proceedings of the 1998 International Symposium on Memory Management, Oct. 1998.
[12]
Trishul M. Chilimbi, Bob Davidson, and James R. Lares. "Cache-conscious structure definition." In Proceedings of the A CM SIGPIMN'99 Conference on Programming Language Design and Implementation, May 1999.
[13]
Douglas Comer. "The ubiquitous B-tree." ACM Computing Surveys, 11(2):121-137, 1979.
[14]
R. Courts. "Improving locality of reference in a garbage-collecting memory management system." Communications of the ACM, 31(9):1128--1138, 1988.
[15]
P. Drew and R. King. "The performance and utility of the Cactis implementation algorithms." In Proceedings of the 16th VLDB Conj~rence, pages 135-147, 1990.
[16]
Dennis Gannon, William Jalby, and K. Gallivan. "Strategies for cache and local memory management by global program transformation." Journal of Parallel and Distributed Computing, 5:587--616, 1988.
[17]
N. Gloy, T. Blackwell, M. D. Smith, and B. Calder. "Procedure placement using temporal ordering information." In Proceedings of MICRO-30, Dee. 1997.
[18]
I. J. Haikala. "Cache hit ratios with geometric task switch intervals." In Proceedings of the l lth Annual International Symposium on Computer Architecture, pages 364-371, June 1984.
[19]
Mark D. Hill and Alan Jay Smith. "Evaluafng associativity in CPU caches." IEEE Transactions on Computers, C- 38(12): 1612-1630, December 1989.
[20]
David Kroft. "Lockup-free instruction fetch/prefetch cache organization." In The 8th Annual International Symposium on Computer Architecture, pages 81--87, May 1981.
[21]
M.S. Lam, P.R. Wilson, and T. G Moher. "Object type directed garbage collection to improve locality." In Proceedings of the International Workshop on Memory Management, pages 16-18, Sept. 1992.
[22]
Monica S. Lam, Edward E. Rothberg, and Michael E. Wolf. "The cache performance and optimizations of blocked algorithms.'' In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 63-74, Santa Clara, California, 1991.
[23]
Anthony LaMarca and Richard E. Ladner. "The influence of caches on the performance of heaps." ACMJournal of ExperimentalAlgorithmics, 1, 1996.
[24]
Anthony LaMarca and Richard E. Ladner. "The influence of caches on the performance of sorting." In Eight Annual A CM- SIAM Symposium on Discrete Algorithms, Jan. 1997.
[25]
James Laudon, Anoop Gupta, and Mark Horowitz. "Interleaving: A multithreading technique targeting multiprocessors and workstations." In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 308--318, San Jose, California, 1994.
[26]
Chi-Keung Luk and Todd C. Mowry. "Compiler-based prefetching for recursive data structures." In Proceedings of the Seventh international Conference on Architectural Support for Programming Languages and Operating b3,stems (,tSPLOS VIi), pages 222-233, Oct. 1996.
[27]
Scott McFarling. "Program optimization for instruction caches." In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 183-191, 1989.
[28]
D. A. Moon. "Garbage collection in a large LISP system." In ConJbrence Record of the i984 Symposium on LISP and Functional Programming, pages 235--246, Aug. 1984.
[29]
Todd C. Mowry, Monica S. Lain, and Anoop Gupta. "Design and evaluation of a compiler algorithm for prefetching." In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IO, pages 62-73, October 1992.
[30]
V. S. Pai, P. Ranganathan, and S. V. Adve. "RSIM reference manual version 1.0." In Technical Report 9705, Dept. of Electrical and Computer Engineering, Rice University, Aug. 1997.
[31]
V. $. Pai, P. Ranganathan, S.V. Adve, and T. Harton. "An evaluation of memory consistency models for shared-memory systems with ILP processors." In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII), pages 12-23, Oct. 1996.
[32]
David Patterson, Thnm~~ And~renn Monl /~rdxvoll Rieh~rrl Fromm, Kimberly Keaton, Christoforos Kazyrakis, Randi Thomas, and Katherine Yellick. "A case for intelligent RAM." In IEEE Micro, pages 34--44, Apr 1997.
[33]
Sharon E. Perl and Richard L. Sites. "Studies of Windows NT pcHormancc using execution traces." in Symposium on Operating Systems Design and Implementation, Oct. 1996.
[34]
Karl Pettis and Robert C. Hansen. "Profile guided code positioning.'' SIGPLAN Notices, 25(6):16-27, June 1990. Proceedings of the A CM SIGPLAN'90 Conference on ProgrammingLanguage Design and Implementation.
[35]
G. S. Rao. ;;Performance analysis of cache memories." Journal of the ACM, 25(3):378-395, 1978.
[36]
G.S. Rao 'Performance anayyisgb iugbiu iugbiu ivbiu ing dynamic data structures on distributed memory machines." A CM Transactions on Programming Languages and Systems, 17(2), 1995.
[37]
M. Rosenblm, E. Bugnion, S.A. Herrod, E. Witchel, and A. Gupta. "The impact of architectural trends on operating system performance." In Proceedings of the 15th ACM Symposium on Operating System Pnnciples (SOSP), pages 285- 298, Dec. 1995.
[38]
A. Roth, A. Moshovos, and GS. Sohi. "Dependence based prefetching for linked data structures." In Proceedings of the Eight International ConJkrence on Architectural Support for ~rvglutttttttttg l.~tt~uu~co llrtu LtF~ruttltg VIII), pages 115--126, Oct. 1998.
[39]
J. H. Saltzer. "A simple linear model of demand paging performance." Communications of the ACM, 17(4):181-186, 1974.
[40]
M. L. Seidl, and B. G Zorn "Segregating heap objects by reference behavior and lifetime." In Proceedings of the Eight gramming Languages and Operating Systems (ASPLOS VIII), pages 12-23, Oct. 1998.
[41]
JaswinderPal Singh, Harold S. Stone, and Dominique F. Thiebaut. "A model of workloads and its use in miss-rate prediction for fully associative caches." IEI~I~ Transactions on Computers, 41 (7):811-825, 1992.
[42]
A.J. Smith "A Compartive study pf giug iugiugvuiy}fg mapping algorithms and their use for cache and main memory.'' IEEE Trans. on Software En_~neering, 4(2):121-130, 1978.
[43]
Alan J. Smith. "Cache memories." ACM Computing Surveys, 14(3):473-530, 1982.
[44]
Burton J. Smith. "Architecture and applications of the HEP Multiprocerssor oihoi og9u iugiuguigiyfc yufui}yg iugh cessing IV, pages 241-248, 1981.
[45]
J.R. Spirn, editor. Program Behavior: Models and Measurements. Operating and Programming System Series, Elsevier, New York, 1977.
[46]
J. W. Stamos. "Static grouping of small objects to enhance performance of a paged virtual memory." ACM Transactions on Programming Languages and Systems, 2(2):155-180, 1984.
[47]
Dan N. Truong oilkhoih oihoi iohoi iouhbo oih oihoihiooihi ing cache behavior of dynamically allocated data structures." In International Conference on Parallel Architectures and Compilation Techniques, Oct. 1998.
[48]
M. N. Tsangaris and J. Naughton. "On the performance of nh~r,t oh,etoriBg tot, hni.,._.~,o." ._t- Proceedings A CM SIGMOD Intl. Conf. on Management of Data, pages 144-153, June 1992.
[49]
G J. Ward. "The RADIANCE lighting simulation and rendering system." in Proceedings of SIGGRAPH '94, July 1994.
[50]
M. V. Wilkes. "Siave memories and dynamic storage allocation.'' In IEEE Trans. on Electronic Computers, pages 270- 9'71 April 1965.
[51]
Paul R. Wilson, Michael S. Lain, and Thomas G Moher. "Effective "static-graph" reorganization to improve locality in garbage-collected systems." SiGPLAN Notices, 26(6):177- 191, June 1991. Proceedings of the A CM $IGPIMN'91 Conference on Programming Language Design and implementation.
[52]
Michael E. Wolf and Monics bhouihoihoihihihp oihphj'p poi mizing algorithm." SIGPLAN Notices, 26(6):30--44, June 1991. Proceedings of the ACM SIGPLAN'91 Conference on Programming Language Design and Implementation.

Cited By

View all
  • (2024)Representing Data Collections in an SSA FormProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444817(308-321)Online publication date: 2-Mar-2024
  • (2023)ALT: Breaking the Wall between Data Layout and Loop Optimizations for Deep Learning CompilationProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3587440(199-214)Online publication date: 8-May-2023
  • (2022)Software-defined address mapping: a case on 3D memoryProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507774(70-83)Online publication date: 28-Feb-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
May 1999
304 pages
ISBN:1581130945
DOI:10.1145/301618
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1999

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache-conscious allocation
  2. cache-conscious data placement
  3. cache-conscious reorganization
  4. clustering
  5. coloring

Qualifiers

  • Article

Conference

PLDI99

Acceptance Rates

PLDI '99 Paper Acceptance Rate 26 of 130 submissions, 20%;
Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)252
  • Downloads (Last 6 weeks)39
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Representing Data Collections in an SSA FormProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444817(308-321)Online publication date: 2-Mar-2024
  • (2023)ALT: Breaking the Wall between Data Layout and Loop Optimizations for Deep Learning CompilationProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3587440(199-214)Online publication date: 8-May-2023
  • (2022)Software-defined address mapping: a case on 3D memoryProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507774(70-83)Online publication date: 28-Feb-2022
  • (2022)Data layout optimization based on the spatio-temporal model of field access2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)10.1109/AEMCSE55572.2022.00055(238-244)Online publication date: Apr-2022
  • (2021)Offline and Online Algorithms for SSD ManagementProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/34910455:3(1-28)Online publication date: 15-Dec-2021
  • (2021)Accelerated Seeding for Genome Sequence Alignment with Enumerated Radix Trees2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA52012.2021.00038(388-401)Online publication date: Jun-2021
  • (2020)Effective and efficient community search over large heterogeneous information networksProceedings of the VLDB Endowment10.14778/3380750.338075613:6(854-867)Online publication date: 1-Feb-2020
  • (2018)DIFFProceedings of the VLDB Endowment10.14778/3297753.329776112:4(419-432)Online publication date: 1-Dec-2018
  • (2018)Cimple: instruction and memory level parallelismProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243185(1-16)Online publication date: 1-Nov-2018
  • (2018)Dark packets and the end of network scalingProceedings of the 2018 Symposium on Architectures for Networking and Communications Systems10.1145/3230718.3230727(1-14)Online publication date: 23-Jul-2018
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media