Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3195970.3195984acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Active forwarding: eliminate IOMMU address translation for accelerator-rich architectures

Published: 24 June 2018 Publication History

Abstract

Accelerator-rich architectures employ IOMMUs to support unified virtual address, but researches show that they fail to meet the performance and energy requirements of accelerators. Instead of optimizing the speed/energy of IOMMU address translation, this work tackles the issue from a new perspective, eliminating the need for translation with an active forwarding (AF) mechanism that forwards input data of accelerators directly from the CPU cache to the scratchpad memory of the accelerator. Results show that on average, AF can provide 8% performance improvement compared to the state-of-the-art mechanism, hostPageWalk, and reduce 22.1% accelerator power.

References

[1]
Darren Abramson, Jeff Jackson, Sridhar Muthrasanallur, Gil Neiger, Greg Regnier, Rajesh Sankaran, Ioannis Schoinas, Rich Uhlig, Balaji Vembu, and John Wiegert. 2006. Intel Virtualization Technology for Directed I/O. Intel technology journal 10, 3 (2006).
[2]
AMD. 2007. AMD I/O Virtualization Technology (IOMMU) Specification. (2007).
[3]
Thomas W. Barr, Alan L. Cox, and Scott Rixner. 2010. Translation Caching: Skip, Don'T Walk (the Page Table). In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '10). 48--59.
[4]
Arkaprava Basu, Mark D. Hill, and Michael M. Swift. 2012. Reducing Memory Reference Energy with Opportunistic Virtual Caching. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA '12). 297--308.
[5]
Ravi Bhargava, Benjamin Serebrin, Francesco Spadini, and Srilatha Manne. 2008. Accelerating Two-dimensional Page Walks for Virtualized Systems. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). 26--35.
[6]
Himanshu Bhatnagar. 2007. Advanced ASIC Chip Synthesis: Using Synopsys® Design CompilerTM Physical CompilerTM and PrimeTime®. Springer Science & Business Media.
[7]
S. Chatterjee and S. Sen. 2000. Cache-efficient matrix transposition. In Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6. 195--205.
[8]
Tao Chen, Alexander Rucker, and G. Edward Suh. 2015. Execution Time Prediction for Energy-efficient Hardware Accelerators. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). 457--469.
[9]
Young-kyu Choi, Jason Cong, Zhenman Fang, Yuchen Hao, Glenn Reinman, and Peng Wei. 2016. A Quantitative Analysis on Microarchitectures of Modern CPU-FPGA Platforms. In Proceedings of the 53rd Annual Design Automation Conference (DAC '16). Article 109, 6 pages.
[10]
Jason Cong, Zhenman Fang, Michael Gill, and Glenn Reinman. 2015. PARADE: A Cycle-Accurate Full-System Simulation Platform for Accelerator-Rich Architectural Design and Exploration. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD '15). 380--387.
[11]
Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark Silicon and the End of Multicore Scaling. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA '11). 365--376.
[12]
HSA Foundation. 2015. HSA Platform System Architecture. (2015).
[13]
Y. Hao, Z. Fang, G. Reinman, and J. Cong. 2017. Supporting Address Translation for Accelerator-Centric Architectures. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 37--48.
[14]
B. Reagen, R. Adolf, Y. S. Shao, G. Y. Wei, and D. Brooks. 2014. MachSuite: Benchmarks for accelerator design and customized architectures. In 2014 IEEE International Symposium on Workload Characterization (IISWC). 110--119.
[15]
Yakun Sophia Shao and David Brooks. 2015. Research Infrastructures for Hardware Accelerators. Morgan & Claypool. 99-- pages.
[16]
Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David Brooks. 2014. Aladdin: A Pre-RTL, Power-performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures. In Proceeding of the 41st Annual International Symposium on Computer Architecture (ISCA '14). 97--108.
[17]
Yakun Sophia Shao, Sam (Likun) Xi, Vijayalakshmi Srinivasan, Gu-Yeon Wei, and David Brooks. 2016. Co-designing Accelerators and SoC Interfaces Using Gem5-aladdin. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-49). Article 48, 12 pages.
[18]
Avinash Sodani. 2011. Race to exascale: Opportunities and challenges. In Keynote at the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[19]
Shyamkumar Thoziyoor, Naveen Muralimanohar, Jung Ho Ahn, and Norman P Jouppi. 2008. CACTI 5.1. Technical Report. HPL-2008-20, HP Labs.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '18: Proceedings of the 55th Annual Design Automation Conference
June 2018
1089 pages
ISBN:9781450357005
DOI:10.1145/3195970
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. accelerator-rich architecture
  2. heterogeneous computing
  3. virtual memory system

Qualifiers

  • Research-article

Conference

DAC '18
Sponsor:
DAC '18: The 55th Annual Design Automation Conference 2018
June 24 - 29, 2018
California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 282
    Total Downloads
  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)6
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media