Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3579371.3589087acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Imprecise Store Exceptions

Published: 17 June 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Precise exceptions are a cornerstone of modern computing as they provide the abstraction of sequential instruction execution to programmers while accommodating microarchitectural optimizations. However, increasing compute capabilities in deep memory hierarchies (e.g., software event handlers, programmable accelerators) expose long exception detection latencies that forgo precise exception semantics for retired stores awaiting completion. Unfortunately, well-known post-retirement speculation mechanisms to tolerate these latencies require excessively large microarchitectural structures per core. This paper rethinks the role of architecture and OS in supporting precise exceptions. We show that instead of forcing the architecture to support precise exceptions transparently in all cases, it is preferable to employ hardware-software co-design to handle imprecise store exceptions efficiently. We develop formalism to prove that this approach complies with underlying memory consistency models and design a RISC-V prototype that passes all litmus tests, demonstrating its efficacy.

    References

    [1]
    Sarita V. Adve and Kourosh Gharachorloo. 1996. Shared Memory Consistency Models: A Tutorial. Computer 29, 12 (1996), 66--76.
    [2]
    Shaizeen Aga, Supreet Jeloka, Arun Subramaniyan, Satish Narayanasamy, David T. Blaauw, and Reetuparna Das. 2017. Compute Caches. In Proceedings of the 23rd IEEE Symposium on High-Performance Computer Architecture (HPCA). 481--492.
    [3]
    Jade Alglave, Luc Maranget, Susmit Sarkar, and Peter Sewell. 2010. Fences in Weak Memory Models. In 22nd International Conference on Computer Aided Verification. 258--272.
    [4]
    Jonathan Bachrach, Huy Vo, Brian C. Richards, Yunsup Lee, Andrew Waterman, Rimas Avizienis, John Wawrzynek, and Krste Asanovic. 2012. Chisel: constructing hardware in a Scala embedded language. In Design Automation Conference 2012. 1216--1225.
    [5]
    Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. 2003. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP). 164--177.
    [6]
    Scott Beamer, Krste Asanovic, and David A. Patterson. 2015. The GAP Benchmark Suite. CoRR abs/1508.03619 (2015).
    [7]
    Colin Blundell, Milo M. K. Martin, and Thomas F. Wenisch. 2009. InvisiFence: performance-transparent memory ordering in conventional multiprocessors. In Proceedings of the 36th International Symposium on Computer Architecture (ISCA). 233--244.
    [8]
    Bob Boothe and Abhiram G. Ranade. 1992. Improved Multithreading Techniques for Hiding Communication Latency in Multiprocessors. In Proceedings of the 19th International Symposium on Computer Architecture (ISCA). 214--223.
    [9]
    Michel Cekleov and Michel Dubois. 1997. Virtual-address caches. Part 1: problems and solutions in uniprocessors. IEEE Micro 17, 5 (1997), 64--71.
    [10]
    Michel Cekleov and Michel Dubois. 1997. Virtual-address caches.2. Multiprocessor issues. IEEE Micro 17, 6 (1997), 69--74.
    [11]
    Luis Ceze, James Tuck, Pablo Montesinos, and Josep Torrellas. 2007. BulkSC: bulk enforcement of sequential consistency. In Proceedings of the 34th International Symposium on Computer Architecture (ISCA). 278--289.
    [12]
    David Chaiken and Anant Agarwal. 1994. Software-Extended Coherent Shared Memory: Performance and Cost. In Proceedings of the 21st International Symposium on Computer Architecture (ISCA). 314--324.
    [13]
    Jonathan Corbet. 2019. Ringing in a new asynchronous I/O API. https://lwn.net/Articles/776703/
    [14]
    Michel Dubois, Christoph Scheurich, and Faye A. Briggs. 1986. Memory Access Buffering in Multiprocessors. In Proceedings of the 13th International Symposium on Computer Architecture (ISCA). 434--442.
    [15]
    Babak Falsafi and Thomas F. Wenisch. 2014. A Primer on Hardware Prefetching. Morgan & Claypool Publishers.
    [16]
    Babak Falsafi and David A. Wood. 1997. Reactive NUMA: A Design for Unifying S-COMA and CC-NUMA. In Proceedings of the 24th International Symposium on Computer Architecture (ISCA). 229--240.
    [17]
    Michael Ferdman, Almutaz Adileh, Yusuf Onur Koçberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XVII). 37--48.
    [18]
    Shaked Flur and Luc Maranget. 2022. RISC-V architecture concurrency model litmus tests. https://github.com/litmus-tests/litmus-tests-riscv
    [19]
    Daichi Fujiki, Xiaowei Wang, Arun Subramaniyan, and Reetuparna Das. 2021. In-/Near-Memory Computing. Morgan & Claypool Publishers.
    [20]
    Chris Gniady and Babak Falsafi. 2002. Speculative Sequential Consistency with Little Custom Storage. In IEEE PACT. 179--188.
    [21]
    Chris Gniady, Babak Falsafi, and T. N. Vijaykumar. 1999. Is SC + ILP=RC?. In Proceedings of the 26th International Symposium on Computer Architecture (ISCA). 162--171.
    [22]
    James R. Goodman. 1987. Coherency for Multiprocessor Virtual Address Caches. In Proceedings of the 2nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-II). 72--81.
    [23]
    Siddharth Gupta, Atri Bhattacharyya, Yunho Oh, Abhishek Bhattacharjee, Babak Falsafi, and Mathias Payer. 2021. Rebooting Virtual Memory with Midgard. In Proceedings of the 48th International Symposium on Computer Architecture (ISCA). 512--525.
    [24]
    Siddharth Gupta, Yunho Oh, Lei Yan, Mark Sutherland, Abhishek Bhattacharjee, Babak Falsafi, and Peter Hsu. 2023. AstriFlash A Flash-Based System for Online Services. In Proceedings of the 29th IEEE Symposium on High-Performance Computer Architecture (HPCA). 81--93.
    [25]
    Nastaran Hajinazar, Pratyush Patel, Minesh Patel, Konstantinos Kanellopoulos, Saugata Ghose, Rachata Ausavarungnirun, Geraldo F. Oliveira, Jonathan Appavoo, Vivek Seshadri, and Onur Mutlu. 2020. The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual Memory Framework. In Proceedings of the 47th International Symposium on Computer Architecture (ISCA). 1050--1063.
    [26]
    Mark D. Hill, James R. Larus, Steven K. Reinhardt, and David A. Wood. 1993. Cooperative Shared Memory: Software and Hardware Support for Scalable Multiprocesors. ACM Trans. Comput. Syst. 11, 4 (1993), 300--318.
    [27]
    Mark Horowitz, Margaret Martonosi, Todd C. Mowry, and Michael D. Smith. 1996. Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors. In Proceedings of the 23rd International Symposium on Computer Architecture (ISCA). 260--270.
    [28]
    Sorin Iacobovici. 1988. A pipelined interface for high floating-point performance with precise exceptions. IEEE Micro 8, 3 (1988), 77--87.
    [29]
    Intel Corporation. 2022. Intel ©64 and IA-32 Architectures Software Developer Manuals. https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
    [30]
    Simon L. Peyton Jones, Alastair Reid, Fergus Henderson, C. A. R. Hoare, and Simon Marlow. 1999. A Semantics for Imprecise Exceptions. In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation (PLDI). 25--36.
    [31]
    Sagar Karandikar, Howard Mao, Donggyu Kim, David Biancolin, Alon Amid, Dayeol Lee, Nathan Pemberton, Emmanuel Amaro, Colin Schmidt, Aditya Chopra, Qijing Huang, Kyle Kovacs, Borivoje Nikolic, Randy H. Katz, Jonathan Bachrach, and Krste Asanovic. 2018. FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud. In Proceedings of the 45th International Symposium on Computer Architecture (ISCA). 29--42.
    [32]
    Harshad Kasture and Daniel Sánchez. 2016. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In Proceedings of the 2016 IEEE International Symposium on Workload Characterization (IISWC). 3--12.
    [33]
    Eric J. Koldinger, Jeffrey S. Chase, and Susan J. Eggers. 1992. Architectural Support for Single Address Space Operating Systems. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-V). 175--186.
    [34]
    Kai Li and Paul Hudak. 1989. Memory Coherence in Shared Virtual Memory Systems. ACM Trans. Comput. Syst. 7, 4 (1989), 321--359.
    [35]
    Elliot Lockerman, Axel Feldmann, Mohammad Bakhshalipour, Alexandru Stanescu, Shashwat Gupta, Daniel Sánchez, and Nathan Beckmann. 2020. Livia: Data-Centric Computing Throughout the Memory Hierarchy. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXV). 417--433.
    [36]
    Daniel Lustig, Geet Sethi, Abhishek Bhattacharjee, and Margaret Martonosi. 2017. Transistency Models: Memory Ordering at the Hardware-OS Interface. IEEE Micro 37, 3 (2017), 88--97.
    [37]
    Todd C. Mowry and Sherwyn R. Ramkissoon. 2000. Software-Controlled Multithreading Using Informing Memory Operations. In Proceedings of the 6th IEEE Symposium on High-Performance Computer Architecture (HPCA). 121--132.
    [38]
    Vijay Nagarajan, Daniel J. Sorin, Mark D. Hill, and David A. Wood. 2020. A Primer on Memory Consistency and Cache Coherence, Second Edition. Morgan & Claypool Publishers.
    [39]
    NVM Express, Inc. 2022. NVM Express Specifications. https://nvmexpress.org/specifications/
    [40]
    Parallel Systems Architecture Lab (PARSA), EPFL. 2020. QFlex. https://qflex.epfl.ch
    [41]
    Xiaogang Qiu and Michel Dubois. 1999. Tolerating Late Memory Traps in ILP Processors. In Proceedings of the 26th International Symposium on Computer Architecture (ISCA). 76--87.
    [42]
    Parthasarathy Ranganathan, Vijay S. Pai, and Sarita V. Adve. 1997. Using Speculative Retirement and Larger Instruction Windows to Narrow the Performance Gap Between Memory Consistency Models. In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures. 199--210.
    [43]
    RDMA Consortium. 2009. Architectural Specifications for RDMA over TCP/IP. http://www.rdmaconsortium.org/
    [44]
    Steven K. Reinhardt, James R. Larus, and David A. Wood. 1994. Tempest and Typhoon: User-Level Shared Memory. In Proceedings of the 21st International Symposium on Computer Architecture (ISCA). 325--336.
    [45]
    RISC-V International. 2022. Specifications. https://riscv.org/technical/specifications/
    [46]
    Bogdan F. Romanescu, Alvin R. Lebeck, and Daniel J. Sorin. 2010. Specifying and dynamically verifying address translation-aware memory consistency. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XV). 323--334.
    [47]
    Rusty Russell. 2008. virtio: towards a de-facto standard for virtual I/O devices. ACM SIGOPS Oper. Syst. Rev. 42, 5 (2008), 95--103.
    [48]
    Somayeh Sardashti, Angelos Arelakis, Per Stenström, and David A. Wood. 2015. A Primer on Compression in the Memory Hierarchy. Morgan & Claypool Publishers.
    [49]
    Ioannis Schoinas, Babak Falsafi, Alvin R. Lebeck, Steven K. Reinhardt, James R. Larus, and David A. Wood. 1994. Fine-grain Access Control for Distributed Shared Memory. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI). 297--306.
    [50]
    Brian C. Schwedock, Piratach Yoovidhya, Jennifer Seibert, and Nathan Beckmann. 2022. täk--- : a polymorphic cache hierarchy for general-purpose optimization of data movement. In Proceedings of the 49th International Symposium on Computer Architecture (ISCA). 42--58.
    [51]
    SiFive, Inc. 2017. SiFive TileLink Specification. https://static.dev.sifive.com/docs/tilelink/tilelink-spec-1.7-draft.pdf
    [52]
    Abhayendra Singh, Satish Narayanasamy, Daniel Marino, Todd D. Millstein, and Madanlal Musuvathi. 2012. End-to-end sequential consistency. In Proceedings of the 39th International Symposium on Computer Architecture (ISCA). 524--535.
    [53]
    James E. Smithand Andrew R. Pleszkun. 1985. Implementation of Precise Interrupts in Pipelined Processors. In Proceedings of the 12th International Symposium on Computer Architecture (ISCA). 36--44.
    [54]
    Vilas Sridharan, Nathan DeBardeleben, Sean Blanchard, Kurt B. Ferreira, Jon Stearley, John Shalf, and Sudhanva Gurumurthi. 2015. Memory Errors in Modern Systems: The Good, The Bad, and The Ugly. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XX). 297--310.
    [55]
    Ivan Tanasic, Isaac Gelado, Marc Jordà, Eduard Ayguadé, and Nacho Navarro. 2017. Efficient exception handling support for GPUs. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 109--122.
    [56]
    David L. Weaver and Tom Germond. 1994. The SPARC Architecture Manual - Version 9. https://www.cs.utexas.edu/users/novak/sparcv9.pdf
    [57]
    Thomas F. Wenisch, Anastassia Ailamaki, Babak Falsafi, and Andreas Moshovos. 2007. Mechanisms for store-wait-free multiprocessors. In Proceedings of the 34th International Symposium on Computer Architecture (ISCA). 266--277.
    [58]
    Wikichip. 2020. AMD Zen3. https://en.wikichip.org/wiki/amd/microarchitectures/zen_3.
    [59]
    Wikichip. 2020. ARM Cortex A76. https://en.wikichip.org/wiki/arm_holdings/microarchitectures/cortex-a76.
    [60]
    David A. Wood, Satish Chandra, Babak Falsafi, Mark D. Hill, James R. Larus, Alvin R. Lebeck, James C. Lewis, Shubhendu S. Mukherjee, Subbarao Palacharla, and Steven K. Reinhardt. 1993. Mechanisms for Cooperative Shared Memory. In Proceedings of the 20th International Symposium on Computer Architecture (ISCA). 156--167.
    [61]
    David A. Wood, Susan J. Eggers, Garth A. Gibson, Mark D. Hill, Joan M. Pendleton, Scott A. Ritchie, George S. Taylor, Randy H. Katz, and David A. Patterson. 1986. An In-Cache Address Translation Mechanism. In Proceedings of the 13th International Symposium on Computer Architecture (ISCA). 358--365.
    [62]
    Yinan Xu, Zihao Yu, Dan Tang, Guokai Chen, Lu Chen, Lingrui Gou, Yue Jin, Qianruo Li, Xin Li, Zuojun Li, Jiawei Lin, Tong Liu, Zhigang Liu, Jiazhan Tan, Huaqiang Wang, Huizhe Wang, Kaifan Wang, Chuanqi Zhang, Fawang Zhang, Linjuan Zhang, Zifei Zhang, Yangyang Zhao, Yaoyang Zhou, Yike Zhou, Jiangrui Zou, Ye Cai, Dandan Huan, Zusong Li, Jiye Zhao, Zihao Chen, Wei He, Qiyuan Quan, Xingwu Liu, Sa Wang, Kan Shi, Ninghui Sun, and Yungang Bao. 2022. Towards Developing High Performance RISC-V Processors Using Agile Methodology. In Proceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1178--1199.
    [63]
    Arash Pourhabibi Zarandi, Siddharth Gupta, Hussein Kassir, Mark Sutherland, Zilu Tian, Mario Paulo Drumond, Babak Falsafi, and Christoph Koch. 2020. Optimus Prime: Accelerating Data Transformation in Servers. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXV). 1203--1216.
    [64]
    Arash Pourhabibi Zarandi, Mark Sutherland, Alexandros Daglis, and Babak Falsafi. 2021. Cerebros: Evading the RPC Tax in Datacenters. In Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 407--420.
    [65]
    Lixin Zhang, Evan Speight, Ramakrishnan Rajamony, and Jiang Lin. 2010. Enigma: architectural and operating system support for reducing the impact of address translation. In Proceedings of the 2010 ACM/IEEE Conference on Supercomputing (SC). 159--168.

    Index Terms

    1. Imprecise Store Exceptions

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture
      June 2023
      1225 pages
      ISBN:9798400700958
      DOI:10.1145/3579371
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 June 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Badges

      Author Tags

      1. memory hierarchies
      2. exception handling
      3. memory consistency

      Qualifiers

      • Research-article

      Funding Sources

      • Swiss National Science Foundation
      • Qualcomm Innovation Fellowship
      • Intel research donation
      • National Research Foundation of Korea (NRF)

      Conference

      ISCA '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 543 of 3,203 submissions, 17%

      Upcoming Conference

      ISCA '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 553
        Total Downloads
      • Downloads (Last 12 months)373
      • Downloads (Last 6 weeks)10
      Reflects downloads up to 09 Aug 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media