Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2989081.2989092acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article

Software Assisted Hardware Cache Coherence for Heterogeneous Processors

Published: 03 October 2016 Publication History

Abstract

Current trends suggest that future computing platforms will be increasingly heterogeneous. While these heterogeneous processors physically integrate disparate computing elements like CPUs and GPUs on a single chip, their programmability critically depends upon the ability to efficiently support cache coherence and shared virtual memory across tightly-integrated CPUs and GPUs. However, throughput-oriented GPUs easily overwhelm existing hardware coherence mechanisms that long kept the cache hierarchies in multi-core CPUs coherent.
This paper proposes a novel solution called Software Assisted Hardware Coherence (SAHC) to scale cache coherence to future heterogeneous processors. We observe that the system software (Operating system and runtime) often has semantic knowledge about sharing patterns of data across the CPU and the GPU. This high-level knowledge can be utilized to effectively provide cache coherence across throughput-oriented GPUs and latency-sensitive CPUs in a heterogeneous processor. SAHC thus proposes a hybrid software-hardware mechanism that judiciously uses hardware coherence only when needed while using software's knowledge to filter out most of the unnecessary coherence traffic. Our evaluation suggests that SAHC can often eliminate up to 98-100% of the hardware coherence lookups, resulting up to 49% reduction in runtime.

References

[1]
N. Agarwal, L.-S. Peh, and N. K. Jha, "In-Network Coherence Filtering: Snoopy coherence without broadcasts," in 42nd Annual IEEE/ACM International Symposium on Microarchitecture, 2009. MICRO-42, 2009, pp. 232--243.
[2]
M. Alisafaee, "Spatiotemporal Coherence Tracking," in Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, 2012, pp. 341--350 {Online}. Available: http://dx.doi.org/10.1109/MICRO.2012.39.
[3]
AMD Radeon Graphics Technology, "{AMD Graphics Cores Next (GCN) Architecture White Paper}," Jun. 2012.
[4]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The Gem5 Simulator," SIGARCH Comput. Archit. News, vol. 39, no. 2, pp. 1--7, Aug. 2011.
[5]
S. Che, J. W. Sheaffer, M. Boyer, L. G. Szafaryn, L. Wang, and K. Skadron, "A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads," in 2010 IEEE International Symposium on Workload Characterization (IISWC), 2010, pp. 1--11.
[6]
N. D. Enright Jerger, L.-S. Peh, and M. H. Lipasti, "Virtual Tree Coherence: Leveraging Regions and In-network Multicast Trees for Scalable Cache Coherence," in Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, 2008, pp. 35--46 {Online}. Available: http://dx.doi.org/10.1109/MICRO.2008.4771777.
[7]
I. Gelado, J. E. Stone, J. Cabezas, S. Patel, N. Navarro, and W. W. Hwu, "An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems," in Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems, New York, NY, USA, 2010, pp. 347--358 {Online}. Available: http://doi.acm.org/10.1145/1736020.1736059.
[8]
N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, "Reactive NUCA: Near-optimal Block Placement and Replication in Distributed Caches," in Proceedings of the 36th Annual International Symposium on Computer Architecture, New York, NY, USA, 2009, pp. 184--195 {Online}. Available: http://doi.acm.org/10.1145/1555754.1555779.
[9]
N. Jayasena, M. Erez, J. H. Ahn, and W. J. Dally, "Stream register files with indexed access," in Software, IEE Proceedings-, 2004, pp. 60--72.
[10]
J. H. Kelm, D. R. Johnson, W. Tuohy, S. S. Lumetta, and S. J. Patel, "Cohesion: A Hybrid Memory Model for Accelerators," in Proceedings of the 37th Annual International Symposium on Computer Architecture, New York, NY, USA, 2010, pp. 429--440 {Online}. Available: http://doi.acm.org/10.1145/1815961.1816019.
[11]
D. Kim, J. Ahn, J. Kim, and J. Huh, "Subspace Snooping: Filtering Snoops with Operating System Support," in Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, New York, NY, USA, 2010, pp. 111--122 {Online}. Available: http://doi.acm.org/10.1145/1854273.1854292.
[12]
P. Lotfi-Kamran, M. Ferdman, D. Crisan, and B. Falsafi, "TurboTag: Lookup Filtering to Reduce Coherence Directory Power," in Proceedings of the 16th ACM/IEEE International Symposium on Low Power Electronics and Design, New York, NY, USA, 2010, pp. 377--382 {Online}. Available: http://doi.acm.org/10.1145/1840845.1840929. {Accessed: 25-Nov-2014}
[13]
A. Moshovos, "RegionScout: exploiting coarse grain sharing in snoop-based coherence," in 32nd International Symposium on Computer Architecture, 2005. ISCA '05. Proceedings, 2005, pp. 234--245.
[14]
A. Moshovos, G. Memik, A. Choudhary, and B. Falsafi, "JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers," in Proceedings of the 7th International Symposium on High-Performance Computer Architecture, Washington, DC, USA, 2001, p. 85-- {Online}. Available: http://dl.acm.org/citation.cfm?id=580550.876432. {Accessed: 25-Nov-2014}
[15]
J. Power, A. Basu, J. Gu, S. Puthoor, B. M. Beckmann, M. D. Hill, S. K. Reinhardt, and D. A. Wood, "Heterogeneous System Coherence for Integrated CPU-GPU Systems," in Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, New York, NY, USA, 2013, pp. 457--467 {Online}. Available: http://doi.acm.org/10.1145/2540708.2540747. {Accessed: 20-Nov-2014}
[16]
C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel, "PTask: Operating System Abstractions to Manage GPUs As Compute Devices," in Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, New York, NY, USA, 2011, pp. 233--248 {Online}. Available: http://doi.acm.org/10.1145/2043556.2043579.
[17]
I. Singh, A. Shriraman, W. W. L. Fung, M. O'Connor, and T. M. Aamodt, "Cache Coherence for GPU Architectures," IEEE Micro, vol. 34, no. 3, pp. 69--79, May 2014.
[18]
J. Zebchuk, B. Falsafi, and A. Moshovos, "Multi-grain Coherence Directories," in Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, New York, NY, USA, 2013, pp. 359--370 {Online}. Available: http://doi.acm.org/10.1145/2540708.2540739.
[19]
J. Zebchuk, E. Safi, and A. Moshovos, "A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy," in Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, 2007, pp. 314--327 {Online}. Available: http://dx.doi.org/10.1109/MICRO.2007.5.
[20]
"AMD App SDK" {Online}. Available: http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/
[21]
"CUDA:Unified Memory." {Online}. Available: http://devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6/
[22]
"HSA Foundation." {Online}. Available: http://www.hsafoundation.com/

Cited By

View all
  • (2023)Turn-based Spatiotemporal Coherence for GPUsACM Transactions on Architecture and Code Optimization10.1145/359305420:3(1-27)Online publication date: 19-Jul-2023
  • (2023)FinePack: Transparently Improving the Efficiency of Fine-Grained Transfers in Multi-GPU Systems2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070949(516-529)Online publication date: Feb-2023
  • (2021)Efficient multi-GPU shared memory via automatic optimization of fine-grained transfersProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00020(139-152)Online publication date: 14-Jun-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems
October 2016
463 pages
ISBN:9781450343053
DOI:10.1145/2989081
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cache coherence
  2. GPGPU
  3. Heterogeneous processor
  4. Operating system
  5. Virtual memory

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

MEMSYS '16

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)2
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Turn-based Spatiotemporal Coherence for GPUsACM Transactions on Architecture and Code Optimization10.1145/359305420:3(1-27)Online publication date: 19-Jul-2023
  • (2023)FinePack: Transparently Improving the Efficiency of Fine-Grained Transfers in Multi-GPU Systems2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070949(516-529)Online publication date: Feb-2023
  • (2021)Efficient multi-GPU shared memory via automatic optimization of fine-grained transfersProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00020(139-152)Online publication date: 14-Jun-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media