article

Stealth prefetching

Authors:

Jason F. Cantin,

Mikko H. Lipasti,

James E. SmithAuthors Info & Claims

ACM SIGPLAN Notices, Volume 41, Issue 11

Pages 274 - 282

https://doi.org/10.1145/1168918.1168892

Published: 20 October 2006 Publication History

Abstract

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching techniques can mitigate the increasing memory latency, they can harm performance by wasting precious interconnect bandwidth and prematurely accessing shared data, causing state downgrades at remote nodes that force later upgrades.This paper investigates Stealth Prefetching, a new technique that utilizes information from Coarse-Grain Coherence Tracking (CGCT) for prefetching data aggressively, stealthily, and efficiently in a broadcast-based shared-memory multiprocessor system. Stealth Prefetching utilizes CGCT to identify regions of memory that are not shared by other processors, aggressively fetches these lines from DRAM in open-page mode, and moves them close to the processor in anticipation of future references. Our analysis with commercial, scientific, and multiprogrammed workloads show that Stealth Prefetching provides an average speedup of 20% over an aggressive baseline system with conventional prefetching.

References

[1]

Charlesworth, A. The Sun Fireplane System Interconnect. In Proceedings of SC2001.

Digital Library

[2]

Tendler, J., Dodson, S., and Fields, S. IBM eServer Power4 System Microarchitecture, Technical White Paper, IBM Server Group, 2001

[3]

Kalla, R., Sinharoy, B., and Tendler, J. IBM Power5 Chip: A Dual-Core Multithreaded Processor IEEE Micro, 2004.

[4]

Weber, F., Opteron and AMD64, A Commodity 64 bit x86 SOC. Presentation. Advanced Micro Devices, 2003.

[5]

Lin, W-F., Burger, D., Reducing DRAM Latencies with an Integrated Memory Hierarchy Design. In Proceedings of the 28th International Symposium on High-Performance Computer Architecture (HPCA), 2001.

Digital Library

[6]

Lin, W-F., Burger, D., and Puzak, T., Filtering Superfluous Prefetches using Density Vectors. In Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors (ICCD), 2001.

Digital Library

[7]

Wang, Z., Burger, D., McKinley, K., Reinhardt, S., and Weems, C., Guided Region Prefetching: A Cooperative Hardware/Software Approach. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA), 2003.

Digital Library

[8]

Wang, Z., McKinley, K., and Burger, D., Combining Software/Hardware Prefetching and Cache Replacement. IBM Austin Center for Advanced Studies Conference (CAS), 2004.

[9]

Nesbit, K., and Smith, J., Prefetching Using a Global History Buffer. Proceedings of the 10th Annual International Symposium on High Performance Computer Architecture, 2004.

Digital Library

[10]

Hughes, C., and Adve, S., Memory-side Prefetching for Linked Data Structures for Processor-In-Memory Systems. IEEE Journal on Parallel and Distributed Systems, Volume 65, Issue 4, 2005.

Digital Library

[11]

Somogyi, S., Wenisch, T., Ailamaki, A., Falsafi, B., and Moshovos, A. Spatial Memory Streaming. Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA), 2006.

Digital Library

[12]

Jerger, N., Hill, E., and Lipasti, M., Friendly Fire: Understanding the Effects of Multiprocessor Prefetching. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2006.

[13]

Moshovos, A., Exploiting Coarse-Grain Non-Shared Regions in Snoopy Coherent Multiprocessors. Computer Engineering Group Technical Report, University of Toronto, December 2003.

[14]

Moshovos, A., RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA). 2005.

Digital Library

[15]

Cantin, J., Lipasti, M., and Smith J., Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA), 2005.

Digital Library

[16]

Cantin, J., Moshovos, A., Lipasti, M., Smith, J., and Falsafi, B., "Coarse-Grain Coherence Tracking: RegionScout and Region Coherence Arrays". IEEE Micro Special Issue on Top Picks from 2005 Computer Architecture Conferences, Jan-Feb 2006.

Digital Library

[17]

Jouppi, N., Improving Direct-Mapped Cache Performance by the Addition of a Small, Fully-Associative Cache and Prefetch Buffers. In Proceedings of the 17th Annual International Symposium on Computer Architecture (ISCA), 1990.

Digital Library

[18]

Cain, H., Lepak, K., Schwartz, B., and Lipasti, M. Precise and Accurate Processor Simulation. In Proceedings of the 5th Workshop on Computer Architecture Evaluation Using Commercial Workloads, pp. 13--22, 2002.

[19]

Keller, T., Maynard, A., Simpson, R., and Bohrer, P., Simosppc Full System Simulator. http://www.cs.utexas.edu/users/cart/simOS.

[20]

UltraSPARC IV Processor, User's Manual Supplement, Sun Microsystems Inc, 2004.

[21]

Gharachorloo, K., Gupta, A., and Hennessy, J. Two Techniques to Enhance the Performance of Memory Consistency Models. In Proceedings of the International Conference on Parallel Processing (ICPP), 1991.

[22]

Alameldeen, A., Martin, M., Mauer, C., Moore, K., Xu, M., Hill, M., and Wood, D. Simulating a $2M Commercial Server on a $2K PC. IEEE Computer, 2003.

Digital Library

Cited By

Eris FLouis MEris KAbellán JJoshi A(2022)Puppeteer: A Random Forest Based Manager for Hardware Prefetchers Across the Memory HierarchyACM Transactions on Architecture and Code Optimization10.1145/357030420:1(1-25)Online publication date: 31-Oct-2022
https://dl.acm.org/doi/10.1145/3570304
Grannaes MJahre MNatvig L(2011)Exploring the prefetcher/memory controller design spaceProceedings of the 24th international conference on Architecture of computing systems10.5555/1966221.1966237(135-146)Online publication date: 24-Feb-2011
https://dl.acm.org/doi/10.5555/1966221.1966237
Grannaes MJahre MNatvig L(2011)Exploring the Prefetcher/Memory Controller Design Space: An Opportunistic Prefetch Scheduling StrategyArchitecture of Computing Systems - ARCS 201110.1007/978-3-642-19137-4_12(135-146)Online publication date: 2011
https://doi.org/10.1007/978-3-642-19137-4_12
Show More Cited By

Index Terms

Stealth prefetching
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data

Recommendations

Stealth prefetching
Proceedings of the 2006 ASPLOS Conference

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Stealth prefetching
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Stealth prefetching
Proceedings of the 2006 ASPLOS Conference

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 41, Issue 11

Proceedings of the 2006 ASPLOS Conference

November 2006

425 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/1168918

Issue’s Table of Contents

ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
October 2006
440 pages
ISBN:1595934510
DOI:10.1145/1168857
General Chair:
John Paul Shen
Intel Corp.
,
Program Chair:
Margaret R. Martonosi
Princeton University

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2006

Published in SIGPLAN Volume 41, Issue 11

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

35
Total Citations
View Citations
907
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Eris FLouis MEris KAbellán JJoshi A(2022)Puppeteer: A Random Forest Based Manager for Hardware Prefetchers Across the Memory HierarchyACM Transactions on Architecture and Code Optimization10.1145/357030420:1(1-25)Online publication date: 31-Oct-2022
https://dl.acm.org/doi/10.1145/3570304
Grannaes MJahre MNatvig L(2011)Exploring the prefetcher/memory controller design spaceProceedings of the 24th international conference on Architecture of computing systems10.5555/1966221.1966237(135-146)Online publication date: 24-Feb-2011
https://dl.acm.org/doi/10.5555/1966221.1966237
Grannaes MJahre MNatvig L(2011)Exploring the Prefetcher/Memory Controller Design Space: An Opportunistic Prefetch Scheduling StrategyArchitecture of Computing Systems - ARCS 201110.1007/978-3-642-19137-4_12(135-146)Online publication date: 2011
https://doi.org/10.1007/978-3-642-19137-4_12
Grannaes MJahre MNatvig L(2008)Low-cost open-page prefetch scheduling in chip multiprocessors2008 IEEE International Conference on Computer Design10.1109/ICCD.2008.4751890(390-396)Online publication date: Oct-2008
https://doi.org/10.1109/ICCD.2008.4751890
Jiang SCi YYang QLi M(2021)Matryoshka: A Coalesced Delta Sequence PrefetcherProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3473510(1-11)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3472456.3473510
Cebrian JKaxiras SRos A(2020)Boosting Store Buffer Efficiency with Store-Prefetch Bursts2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00054(568-580)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00054
Bakhshalipour MTabaeiaghdaei SLotfi-Kamran PSarbazi-Azad H(2019)Evaluation of Hardware Data Prefetchers on Server ProcessorsACM Computing Surveys10.1145/331274052:3(1-29)Online publication date: 18-Jun-2019
https://dl.acm.org/doi/10.1145/3312740
Bakhshalipour MShakerinava MLotfi-Kamran PSarbazi-Azad H(2019)Bingo Spatial Data Prefetcher2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2019.00053(399-411)Online publication date: Feb-2019
https://doi.org/10.1109/HPCA.2019.00053
Kondguli SHuang M(2018)Division of laborProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00018(83-95)Online publication date: 2-Jun-2018
https://dl.acm.org/doi/10.1109/ISCA.2018.00018
Bakhshalipour MLotfi-Kamran PSarbazi-Azad H(2018)Domino Temporal Data Prefetcher2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2018.00021(131-142)Online publication date: Feb-2018
https://doi.org/10.1109/HPCA.2018.00021
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents