Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Compiler-directed page coloring for multiprocessors

Published: 01 September 1996 Publication History
  • Get Citation Alerts
  • Abstract

    This paper presents a new technique, compiler-directed page coloring, that eliminates conflict misses in multiprocessor applications. It enables applications to make better use of the increased aggregate cache size available in a multiprocessor. This technique uses the compiler's knowledge of the access patterns of the parallelized applications to direct the operating system's virtual memory page mapping strategy. We demonstrate that this technique can lead to significant performance improvements over two commonly used page mapping strategies for machines with either direct-mapped or two-way set-associative caches. We also show that it is complementary to latency-hiding techniques such as prefetching.We implemented compiler-directed page coloring in the SUIF parallelizing compiler and on two commercial operating systems. We applied the technique to the SPEC95fp benchmark suite, a representative set of numeric programs. We used the SimOS machine simulator to analyze the applications and isolate their performance bottlenecks. We also validated these results on a real machine, an eight-processor 350MHz Digital AlphaServer. Compiler-directed page coloring leads to significant performance improvements for several applications. Overall, our technique improves the SPEC95fp rating for eight processors by 8% over Digital UNIX's page mapping policy and by 20% over a page coloring, a standard page mapping policy. The SUIF compiler achieves a SPEC95fp ratio of 57.4, the highest ratio to date.

    References

    [1]
    Saman P. Amarasinghe, Jennifer M. Anderson, Christopher S. Wilson, Shih-Wei Liao, Robert S. French, Mary W. Hall, Brian R. Murphy and Monica S. Lam. The Multiprocessor as a General-Purpose Processor: A Software Perspective. IEEE Micro, 16(3), jun. 1996.
    [2]
    Jennifer M. Anderson, Saman P. Amarasinghe and Monica S. Lam, "Data and Computation Transformations for Multiprocessors," In Proceedings of the Fifth A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Jul. 1995, pp. 166-178.
    [3]
    Jennifer M. Anderson and Monica S. Lam, "Global Optimizations for Parallelism and Locality on Scalable Parallel Machines", In Proceedings of the A CM SIGPLAN'93 Conference on Programming Language Design and Implementation, Jun. 1993, pp. 112-125.
    [4]
    Brian N. Bershad, Dennis Lee, Theodore H. Romer, and J. Bradley Chen, "Avoiding Conflict Misses Dynamically in Large Direct-Mapped Caches", In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1994, pp. 158-170.
    [5]
    David E Bacon, Susan L. Graham and Oliver J. Sharp, "Compiler Transformations for High-Performance Computing", In Computing Surveys, 26 (4), Dec. 1994.
    [6]
    David Callahan, Ken Kennedy and Allan Porterfield, "Software Prefetching", In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 1991, pp. 40-52.
    [7]
    Steve Carr, Kathryn S. McKinley and Chau-Wen Tseng, "Compiler Optimizations for Improving Data Locality", In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1994, pp. 252-262.
    [8]
    Michel Dubois, Jonas Skeppstedt, Livio Ricciulli, Krishnan Ramamurthy and Per Stenstrom, "The Detection and Elimination of Useless Misses in Multiprocessors", In Proceedings of the 20th International Symposium on Computer Architecture, May 1993, pp. 88-97.
    [9]
    Susan J. Eggers and Randy H. Katz, "The effect of sharing on cache and bus performance of parallel programs", in Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 1989, pp. 257-270.
    [10]
    Dawson R. Engler, M. Frans Kaashoek and James O'Toole Jr. "Exokernel: An Operating System Architecture for Application-Level Resource Managment", In Proceedings of the 15th A CM Symposium on Operating System Principles, Dec. 1995, pp 251-266.
    [11]
    Manish Gupta and Prith Banerjee, "Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers." In IEEE Transactions on Parallel and Distributed Systems, 3(2), Mar. 1992, pp. 179- 193.
    [12]
    Mary W. Hall, Saman E Amarasinghe, Brian R. Murphy, Shih-Wei Liao and Monica S. Lain, "Detecting Coarse-Grain Parallelism Using an Interproceclural Parallelizing Compiler," In Proceedings of Supercomputing '95, Dec. 1995.
    [13]
    Kieran Harty and David R. Cheriton, "Application-controlled Physical Memory using External Page-Cache Management", In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 1991.
    [14]
    Tor E. Jeremiassen and Susan J. Eggers, "Reducing False Sharing on Shared Memory Multiprocessors through Compile Time Data Transformations", In Proceedings of the Fifth A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Jul. 1095, pp. 179-188.
    [15]
    Ken Kennedy and Ulrich Kremer, "Automatic Data Layout for High Performance Fortran", In Proceedings of Supercomputing '95, Dec. 1995.
    [16]
    Richard E. Kessler and Mark D. Hill, "Page Placement Algorithms for Large Real-indexed Caches", In A CM Transactions on Computer Systems, 10(4), Nov. 1992.
    [17]
    Butler W. Lampson, "Hints for Computer System Design", In Proceedings of the Ninth A CM Symposium on Operating Systems Principles, Oct. 1983, pp. 33-48.
    [18]
    Todd C. Mowry, Monica S. Lain and Anoop Gupta, "Design and Evaluation of a Compiler Algorithm for Prefetching", In Proceedings of the Fifth international Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1992, pp. 62-73.
    [19]
    Todd C. Mowry, "Tolerating Latency through Softwarecontrolled Data Prefetching", Ph.D. thesis, Technical Report CSL-TR-94-626, Stanford University, Mar. 1994.
    [20]
    Theodore H. Romer, Dennis Lee, Brian N. Bershad and J. Bradley Chen, "Dynamic Page Mapping Policies for Cache Conflict Resolution on Standard Hardware", In Proceedings of the First Symposium on Operating Systems Design and Implementation, Nov. 1994, pp. 2;55-266.
    [21]
    Mendel Rosenblum, Stephen A. lterrod, Emmett Witchel and Anoop Gupta, "Complete Computer Simulation: The SimOS Approach", In IEEE Parallel and Distributed Technology, 3(4), Fall 1995.
    [22]
    Standard Performance Evaluation Corporation, The SPEC95fp benchmark suite, http://www, spechbench, org.
    [23]
    Ben Verghese, Scott Devine, Anoop Gupta and Mendel Rosenblum, "Operating System Support for Improving Locality on CC-NUMA Compute Servers", In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1996.
    [24]
    Robert P. Wilson, Robert S. French, Christopher S. Wilson, Saman P. Amarasinghe, Jennifer M. Anderson, Steven W.K. Tjiang, Shi-Wei Liao, Chau-W~n Tseng, Mary W. Hall, Monica S. Lam and John L. Hennessy, "SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers", In ACM SIGPLAN Notices, 29(12), Dec. 1994.
    [25]
    Emmett Witchel and Mendel Rosenblum, "Embra: Fast and Flexible Machine Simulation", In Proceedings of the A CM SIGMETRICS '96 Conference on Measurement and Modeling of Computer Systems, May 1996, pp. 68-79.
    [26]
    Michael E. Wolf and Monica S. Lam, "A Data Locality Optimizing Algorithm", In Proceedings of the A CM SIGPLAN '91 Conference on Programming Language Design and Implementation, June 1991, pp. 30-44.

    Cited By

    View all
    • (2012)Dynamic cache partitioning based on hot page migrationFrontiers of Computer Science10.1007/s11704-012-2099-66:4(363-372)Online publication date: 3-Aug-2012
    • (2012)Improving TLB performance on current chip multiprocessor architectures through demand‐driven superpagingSoftware: Practice and Experience10.1002/spe.212843:6(705-729)Online publication date: May-2012
    • (1997)A fresh look at memory hierarchy managementProceedings. The Sixth Workshop on Hot Topics in Operating Systems (Cat. No.97TB100133)10.1109/HOTOS.1997.595195(130-134)Online publication date: 1997
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGOPS Operating Systems Review
    ACM SIGOPS Operating Systems Review  Volume 30, Issue 5
    Dec. 1996
    273 pages
    ISSN:0163-5980
    DOI:10.1145/248208
    Issue’s Table of Contents
    • cover image ACM Conferences
      ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
      October 1996
      290 pages
      ISBN:0897917677
      DOI:10.1145/237090
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 September 1996
    Published in SIGOPS Volume 30, Issue 5

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)92
    • Downloads (Last 6 weeks)10

    Other Metrics

    Citations

    Cited By

    View all
    • (2012)Dynamic cache partitioning based on hot page migrationFrontiers of Computer Science10.1007/s11704-012-2099-66:4(363-372)Online publication date: 3-Aug-2012
    • (2012)Improving TLB performance on current chip multiprocessor architectures through demand‐driven superpagingSoftware: Practice and Experience10.1002/spe.212843:6(705-729)Online publication date: May-2012
    • (1997)A fresh look at memory hierarchy managementProceedings. The Sixth Workshop on Hot Topics in Operating Systems (Cat. No.97TB100133)10.1109/HOTOS.1997.595195(130-134)Online publication date: 1997
    • (2023)PinIt: Influencing OS Scheduling via Compiler-Induced AffinitiesProceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3589610.3596279(87-98)Online publication date: 13-Jun-2023
    • (2022)Software-defined address mapping: a case on 3D memoryProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507774(70-83)Online publication date: 28-Feb-2022
    • (2018)Reducing the second-level cache conflict misses using a set folding techniqueThe Journal of Supercomputing10.1007/s11227-017-2174-874:2(970-993)Online publication date: 1-Feb-2018
    • (2017)Locality-Aware Dynamic Task Graph Scheduling2017 46th International Conference on Parallel Processing (ICPP)10.1109/ICPP.2017.16(70-80)Online publication date: Aug-2017
    • (2016)MARACAS: A Real-Time Multicore VCPU Scheduling Framework2016 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS.2016.026(179-190)Online publication date: Nov-2016
    • (2015)vCacheProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830825(623-634)Online publication date: 5-Dec-2015
    • (2015)A Survey on Cache Management Mechanisms for Real-Time Embedded SystemsACM Computing Surveys10.1145/283055548:2(1-36)Online publication date: 3-Nov-2015
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media