Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1178597.1178613acmconferencesArticle/Chapter ViewAbstractPublication PagesmspConference Proceedingsconference-collections
Article

A flexible data to L2 cache mapping approach for future multicore processors

Published: 22 October 2006 Publication History
  • Get Citation Alerts
  • Abstract

    This paper proposes and studies a distributed L2 cache management approach through page-level data to cache slice mapping in a future processor chip comprising many cores. L2 cache management is a crucial multicore processor design aspect to overcome non-uniform cache access latency for high program performance and to reduce on-chip network traffic and related power consumption. Unlike previously studied "pure" hardware-based private and shared cache designs, the proposed OS-microarchitecture approach allows mimicking a wide spectrum of L2 caching policies without complex hardware support. Moreover, processors and cache slices can be isolated from each other without hardware modifications, resulting in improved chip reliability characteristics. We discuss the key design issues and implementation strategies of the proposed approach, and present an experimental result showing the promise of it.

    References

    [1]
    T. Austin, E. Larson, and D. Ernst. "SimpleScalar: An Infrastructure for Computer System Modeling," IEEE Computer, 35(2):59--67, Feb. 2002.
    [2]
    M. J. Bach. Design of the UNIX Operating System, Prentice Hall, Feb. 1987.
    [3]
    S. Borkar et al. "Platform 2015: Intel Processor and Platform Evolution for the Next Decade," Technology@Intel Magazine, March 2005.
    [4]
    D. Burger and J. R. Goodman. "Billion-Transistor Architectures: There and Back Again." IEEE Computer, 37(3):22--28, March 2004.
    [5]
    J. Chang and G. S. Sohi. "Cooperative Caching for Chip Multiprocessors," Proc. Int'l Symp. Computer Architecture, pp. 264--276, June 2006.
    [6]
    Z. Chishti, M. D. Powell, and T. N. Vijaykumar. "Optimizing Replication, Communication, and Capacity Allocation in CMPs," Proc. Int'l Symp. Computer Architecture, pp.357--368, June 2005.
    [7]
    J. Huh, D. Burger, and S. W. Keckler. "Exploring the Design Space of Future CMPs," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 199--210, Sept. 2001.
    [8]
    J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler. "A NUCA Substrate for Flexible CMP Cache Sharing," Proc. Int'l Conf. Supercomputing, pp. 31--40, June 2005.
    [9]
    Intel Corp. "A New Era of Architectural Innovation Arrives with Intel Dual-Core Processors," Technology@Intel Magazine, May 2005.
    [10]
    ITRS (Int'l Technology Roadmap for Semiconductors). 2005 Edition, http://public.itrs.net.
    [11]
    R. Iyer. "CQoS: A Framework for Enabling QoS in Shared Caches of CMP Platforms," Proc. Int'l Conf. Supercomputing, pp. 257--266, June 2004.
    [12]
    A. Jaleel, M. Mattina, and B. Jacob. "Last Level Cache (LLC) Performance of Data Mining Workloads on a CMP - A Case Study of Parallel Bioinformatics Workloads," Proc. Int'l Symp. High-Perf. Computer Arch., pp. 88--98, Feb. 2006.
    [13]
    R. E. Kessler and M. D. Hill. "Page Placement Algorithms for Large Real-Indexed Caches," ACM Trans. Computer Systems, 10(4):338--359, Nov. 1992.
    [14]
    C. Kim, D. Burger, and S. W. Keckler. "An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches," Proc. Int'l Conf. Architectural Support for Prog. Languages and Operating Systems, pp. 211--222, Oct. 2002.
    [15]
    J. Kong and G. Lee. "Relaxing the Inclusion Property in Cache Only Memory Architecture," Proc. Euro-Par, pp. 435--444, August 1996.
    [16]
    P. Kongetira, K. Aingaran, and K. Olukotun. "Niagara: A 32-Way Multithreaded Sparc Processor," IEEE Micro, 25(2):21--29, March-April 2005.
    [17]
    C. Liu, A. Sivasubramaniam, and M. Kandemir. "Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs," Proc. Int'l Symp. High-Performance Computer Architecture, pp. 176--185, Feb. 2004.
    [18]
    B. A. Nayfeh and K. Olukotun. "Exploring the Design Space for a Shared-Cache Multiprocessor," Proc. Int'l Symp. Computer Architecture, pp. 166--175, April 1994.
    [19]
    B. A. Nayfeh, K. Olukotun, and J. P. Singh. "The Impact of Shared-Cache Clustering in Small-Scale Shared-Memory Multiprocessors," Proc. Int'l Symp. High-Performance Computer Architecture, pp. 74--84, Feb. 1996.
    [20]
    T. Sherwood, B. Calder, and J. Emer. "Reducing Cache Misses Using Hardware and Software Page Placement," Proc. Int'l Conf. Supercomputing, pp. 155--164, June 1999.
    [21]
    B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, J. B. Joyner. "POWER5 System Microarchitecture," IBM J. Res. & Dev., 49(4): 505--521, July. 2005.
    [22]
    E. Speight, H. Shafi, L. Zhang, and R. Rajamony. "Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors," Proc. Int'l Symp. Computer Architecture, pp. 346--356, June 2005.
    [23]
    J. Srinivasan, S. V. Adve, P. Bose, and J. A. Rivers. "The Impact of Technology Scaling on Lifetime Reliability," Proc. Int'l Conf. Dependable Systems and Networks, pp. 177--186, June 2004.
    [24]
    Standard Performance Evaluation Corporation. http://www.specbench.org.
    [25]
    S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. "The SPLASH-2 Programs: Characterization and Methodological Considerations," Proc. Int'l Symp. Computer Architecture, pp. 24--36, July 1995.
    [26]
    M. Zhang and K. Asanović. "Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors," Proc. Int'l Symp. Computer Architecture, pp. 336--345, June 2005.

    Cited By

    View all
    • (2019)Adaptive memory-side last-level GPU cachingProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322235(411-423)Online publication date: 22-Jun-2019
    • (2015)Optimizing off-chip accesses in multicoresACM SIGPLAN Notices10.1145/2813885.273798950:6(131-142)Online publication date: 3-Jun-2015
    • (2015)Optimizing off-chip accesses in multicoresProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2737924.2737989(131-142)Online publication date: 3-Jun-2015
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MSPC '06: Proceedings of the 2006 workshop on Memory system performance and correctness
    October 2006
    114 pages
    ISBN:1595935789
    DOI:10.1145/1178597
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 October 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. non-uniform cache architecture (NUCA)
    2. page allocation

    Qualifiers

    • Article

    Conference

    MSPC '06
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 6 of 20 submissions, 30%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Adaptive memory-side last-level GPU cachingProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322235(411-423)Online publication date: 22-Jun-2019
    • (2015)Optimizing off-chip accesses in multicoresACM SIGPLAN Notices10.1145/2813885.273798950:6(131-142)Online publication date: 3-Jun-2015
    • (2015)Optimizing off-chip accesses in multicoresProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2737924.2737989(131-142)Online publication date: 3-Jun-2015
    • (2013)Reshaping cache misses to improve row-buffer locality in multicore systemsProceedings of the 22nd international conference on Parallel architectures and compilation techniques10.5555/2523721.2523754(235-244)Online publication date: 7-Oct-2013
    • (2013)Traffic steering between a low-latency unswitched TL ring and a high-throughput switched on-chip interconnectProceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2013.6618820(309-318)Online publication date: Oct-2013
    • (2012)Dynamic reusability-based replication with network address mapping in CMPs17th Asia and South Pacific Design Automation Conference10.1109/ASPDAC.2012.6165002(487-492)Online publication date: Jan-2012
    • (2011)Neighborhood-aware data locality optimization for NoC-based multicoresProceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization10.5555/2190025.2190066(191-200)Online publication date: 2-Apr-2011
    • (2011)Studying inter-core data reuse in multicoresACM SIGMETRICS Performance Evaluation Review10.1145/2007116.200712039:1(25-36)Online publication date: 7-Jun-2011
    • (2011)Studying inter-core data reuse in multicoresProceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems10.1145/1993744.1993748(25-36)Online publication date: 7-Jun-2011
    • (2011)Experience with Improving Distributed Shared Cache Performance on Tilera's Tile ProcessorIEEE Computer Architecture Letters10.1109/L-CA.2011.1810:2(45-48)Online publication date: 1-Jul-2011
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media