Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3297858.3304033acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Public Access

Replica: A Wireless Manycore for Communication-Intensive and Approximate Data

Published: 04 April 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Data access patterns that involve fine-grained sharing, multicasts, or reductions have proved to be hard to scale in shared-memory platforms. Recently, wireless on-chip communication has been proposed as a solution to this problem, but a previous architecture has used it only to speed-up synchronization. An intriguing question is whether wireless communication can be widely effective for ordinary shared data. This paper presents Replica, a manycore that uses wireless communication for communication-intensive ordinary data. To deliver high performance, Replica supports an adaptive wireless protocol and selective message dropping. We describe the computational patterns that leverage wireless communication, programming techniques to restructure applications, and tools that help with automation. Our results show that wireless communication is effective for ordinary data. For 64 cores, Replica obtains a mean speed-up of 1.76x over a conventional machine. The mean speed-up reaches 1.89x if approximate-computing transformations are enabled. The average energy consumption is substantially reduced by 34% (or 38% with approximate transformations), and the area increases only modestly.

    References

    [1]
    Sergi Abadal, Eduard Alarcó n, Albert Cabellos-Aparicio, and Josep Torrellas. 2016. WiSync: An Architecture for Fast Synchronization through On-Chip Wireless Communication. In ASPLOS .
    [2]
    Sergi Abadal, Mario Iannazzo, Mario Nemirovsky, Albert Cabellos-Aparicio, Heekwan Lee, and Eduard Alarcón. 2015. On the Area and Energy Scalability of Wireless Network-on-Chip: A Model-based Benchmarked Design Space Exploration. IEEE/ACM Transactions on Networking, Vol. 23, 5 (2015).
    [3]
    Sergi Abadal, Albert Mestres, Josep Torrellas, Eduard Alarcón, and Albert Cabellos-Aparicio. 2018. Medium Access Control in Wireless Network-on-Chip: A Context Analysis. IEEE Communications Magazine, Vol. 56, 6 (2018).
    [4]
    Masab Ahmad, Farrukh Hijaz, Qingchuan Shi, and Omer Khan. 2015. CRONO : A benchmark suite for multithreaded graph algorithms executing on futuristic multicores. In IISWC.
    [5]
    Riad Akram, Mohammad Mejbah Ul Alam, and Abdullah Muzahid. 2016. Approximate Lock: Trading off Accuracy for Performance by Skipping Critical Sections. In ISSRE.
    [6]
    Enrique Amigó, Julio Gonzalo, Javier Artiles, and Felisa Verdejo. 2009. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information retrieval, Vol. 12, 4 (2009).
    [7]
    Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. 2014. OpenTuner: An extensible framework for program autotuning. In PACT .
    [8]
    Rajeshwari Banakar, Stefan Steinke, Bo-Sik Lee, Mahesh Balakrishnan, and Peter Marwedel. 2002. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In CODES .
    [9]
    Nick Barrow-Williams, Christian Fensch, and Simon Moore. 2009. A communication characterisation of SPLASH-2 and PARSEC. In IISWC.
    [10]
    Christopher Batten, Ajay Joshi, Vladimir Stojanovic, and Krste Asanovic. 2012. Designing Chip-Level Nanophotonic Interconnection Networks. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, Vol. 2, 2 (2012).
    [11]
    Bradford M. Beckmann and David A. Wood. 2003. TLC: Transmission Line Caches. In MICRO.
    [12]
    Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In PACT .
    [13]
    Rahul Boyapati, Jiayi Huang, Pritam Majumder, Ki Hwan Yum, and Eun Jung Kim. 2017. APPROX-NoC: A Data Approximation Framework for Network-On-Chip Architectures. In ISCA .
    [14]
    Simone Campanoni, Glenn Holloway, Gu-Yeon Wei, and David Brooks. 2015. HELIX-UP: Relaxing program semantics to unleash parallelization. In CGO.
    [15]
    Aaron Carpenter, Jianyun Hu, Ovunc Kocabas, Michael Huang, and Hui Wu. 2012. Enhancing effective throughput for transmission line-based bus. In ISCA.
    [16]
    Aaron Carpenter, Jianyun Hu, Jie Xu, Michael Huang, and Hui Wu. 2011. A case for globally shared-medium on-chip interconnect. In ISCA.
    [17]
    M Frank Chang, Jason Cong, Adam Kaplan, Mishali Naik, Glenn Reinman, Eran Socher, and Sai-Wang Tam. 2008. CMP Network-on-Chip Overlaid With Multi-Band RF-Interconnect. In HPCA .
    [18]
    D.D. Clark, K.T. Pogran, and D.P. Reed. 1978. An introduction to local area networks. Proc. IEEE, Vol. 66, 11 (1978).
    [19]
    Cray Research Inc. 1993. CRAY T3D System Architecture Overview .
    [20]
    Bhavya K Daya, Li-shiuan Peh, and Anantha P Chandrakasan. 2016. Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling. In DAC .
    [21]
    Sujay Deb, Amlan Ganguly, Partha Pratim Pande, Benjamin Belzer, and Deukhyoun Heo. 2012. Wireless NoC as Interconnection Backbone for Multicore Chips: Promises and Challenges. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, Vol. 2, 2 (2012).
    [22]
    Enrico A Deiana, Vincent St-Amour, Peter A Dinda, Nikos Hardavellas, and Simone Campanoni. 2018. Unconventional Parallelization of Nondeterministic Applications. In ASPLOS .
    [23]
    Pedro C Diniz and Martin C Rinard. 1998. Lock coarsening: Eliminating lock overhead in automatically parallelized object-based programs. J. Parallel and Distrib. Comput., Vol. 49, 2 (1998).
    [24]
    Karthi Duraisamy, Hao Lu, Partha Pratim Pande, and Aananth Kalyanaraman. 2017. Accelerating Graph Community Detection with Approximate Updates via an Energy-Efficient NoC. In DAC .
    [25]
    Karthi Duraisamy, Hao Lu, Partha Pratim Pande, and Ananth Kalyanaraman. 2016. High-Performance and Energy-Efficient Network-on-Chip Architectures for Graph Analytics. ACM Trans. Embed. Comput. Syst, Vol. 15, 26 (2016).
    [26]
    Yaosheng Fu, Tri M. Nguyen, and David Wentzlaff. 2015. Coherence Domain Restriction on Large Scale Systems. In MICRO.
    [27]
    A. Gara, M. A. Blumrich, D. Chen, G. L.-T. Chiu, P. Coteus, M. E. Giampapa, R. A. Haring, P. Heidelberger, D. Hoenicke, G. V. Kopcsay, T. A. Liebsch, M. Ohmacht, B. D. Steinmacher-Burow, T. Takken, and P. Vranas. 2005. Overview of the Blue Gene/L System Architecture. In IBM Journal of Research and Development .
    [28]
    Felix Gutierrez, Shatam Agarwal, Kristen Parrish, and Theodore S. Rappaport. 2009. On-chip integrated antenna structures in CMOS for 60 GHz WPAN systems. IEEE Journal on Selected Areas in Communications, Vol. 27, 8 (2009).
    [29]
    S. K. Khatamifard, I. Akturk, and U. R. Karpuzcu. 2018. On Approximate Speculative Lock Elision. IEEE Transactions on Multi-Scale Computing Systems, Vol. 4, 2 (2018).
    [30]
    N. Kirman, M. Kirman, R. K. Dokania, Jose F. Martinez, Alyssa B. Apsel, Matthew A. Watkins, and David H. Albonesi. 2006. Leveraging Optical Technology in Future Bus-based Chip Multiprocessors. In MICRO.
    [31]
    George Kurian, J.E. Miller, James Psota, Jonathan Eastep, Jifeng Liu, Jurgen Michel, L.C. Kimerling, and Anant Agarwal. 2010. ATAC: A 1000-Core Cache-Coherent Processor with On-Chip Optical Network. In PACT.
    [32]
    J. Laudon and D. Lenoski. 1997. The SGI Origin: A ccNUMA Highly Scalable Server. In ISCA.
    [33]
    Sheng Li, Jung Ho Ahn, R.D. Strong, J.B. Brockman, D.M. Tullsen, and N.P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In MICRO .
    [34]
    Ching-Kai Liang and Milos Prvulovic. 2015. MiSAR: Minimalistic Synchronization Accelerator with Resource Overflow Management. In ISCA.
    [35]
    Jiayuan Meng, Srimat Chakradhar, and Anand Raghunathan. 2009. Best-effort parallel execution framework for recognition and mining applications. In IPDPS .
    [36]
    Albert Mestres, Sergi Abadal, Josep Torrellas, Eduard Alarcón, and Albert Cabellos-Aparicio. 2016. A MAC protocol for Reliable Broadcast Communications in Wireless Network-on-Chip. In Proceedings of the 9th International Workshop on Network on Chip Architectures .
    [37]
    Sasa Misailovic, Deokhwan Kim, and Martin Rinard. 2013. Parallelizing sequential programs with statistical accuracy tests. ACM Transactions on Embedded Computing Systems (TECS), Vol. 12, 2s (2013), 88.
    [38]
    Sasa Misailovic, Stelios Sidiroglou, and Martin C Rinard. 2012. Dancing with uncertainty. In RACES .
    [39]
    Hemanta Kumar Mondal, Shashwat Kaushik, Sri Harsha Gade, and Sujay Deb. 2017. Energy-Efficient Transceiver for Wireless NoC. In VLSID .
    [40]
    Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P Jouppi. 2009. CACTI 6.0: A Tool to Model Large Caches. Technical Report.
    [41]
    Huu Hai Nguyen and Martin Rinard. 2007. Detecting and Eliminating Memory Leaks Using Cyclic Memory Allocation. In ISMM.
    [42]
    GP Nychis, Chris Fallin, and Thomas Moscibroda. 2012. On-chip networks from a networking perspective: congestion and scalability in many-core interconnects. In SIGCOMM .
    [43]
    Jungju Oh, Milos Prvulovic, and Alenka Zajic. 2011. TLSync: support for multiple fast barriers using on-chip transmission lines. In ISCA .
    [44]
    Jungju Oh, Alenka Zajic, and Milos Prvulovic. 2013. Traffic Steering Between a Low-latency Unswitched TL Ring and a High-throughput Switched On-chip Interconnect. In PACT .
    [45]
    Lakshminarayanan Renganarayana, Vijayalakshmi Srinivasan, Ravi Nair, and Daniel Prener. 2012. Programming with relaxed synchronization. In Relax .
    [46]
    Martin Rinard. 2006. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks. In SC .
    [47]
    Martin Rinard. 2013. Parallel Synchronization-Free Approximate Data Structure Construction. In HotPar .
    [48]
    Martin C Rinard. 2007. Using early phase termination to eliminate load imbalances at barrier synchronization points. In OOPSLA.
    [49]
    Mehrzad Samadi, Davoud Anoushe Jamshidi, Janghaeng Lee, and Scott Mahlke. 2014. Paraprox: pattern-based approximation for data parallel applications. In ASPLOS.
    [50]
    Saurabh Saxena, Guanghua Shu, Romesh Kumar Nandwana, Mrunmay Talegaonkar, Ahmed Elkholy, Tejasvi Anand, Woo Seok Choi, and Pavan Kumar Hanumolu. 2017. A 2.8 mW/Gb/s, 14 Gb/s Serial Link Transceiver. IEEE Journal of Solid-State Circuits, Vol. 52, 5 (2017).
    [51]
    S. Scott. 1996. Synchronization and Communication in the T3E Multiprocessor. In ASPLOS.
    [52]
    Stelios Sidiroglou-Douskos, Sasa Misailovic, Henry Hoffmann, and Martin Rinard. 2011. Managing performance vs. accuracy trade-offs with loop perforation. In FSE.
    [53]
    Eran Socher and Mau-Chung Frank Chang. 2007. Can RF Help CMOS Processors?{Topics in Circuits for Communications}. IEEE Communications Magazine, Vol. 45, 8 (2007).
    [54]
    Per Stenstrom, Mats Brorsson, and Lars Sandberg. 1993. An Adaptive Cache Coherence Protocol Optimized for Migratory Sharing. In ISCA.
    [55]
    Chen Sun, Chia-Hsin Owen Chen, George Kurian, Lan Wei, Jason Miller, Anant Agarwal, Li-Shiuan Peh, and Vladimir Stojanovic. 2012a. DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-electronic Networks-on-Chip Modeling. In NoCS .
    [56]
    Chen Sun, Mark T. Wade, Yunsup Lee, Jason S. Orcutt, Luca Alloatti, Michael S. Georgas, Andrew S. Waterman, Jeffrey M. Shainline, Rimas R. Avizienis, Sen Lin, Benjamin R. Moss, Rajesh Kumar, Fabio Pavanello, Amir H. Atabaki, Henry M. Cook, Albert J. Ou, Jonathan C. Leu, Yu-Hsin Chen, Krste Asanović, Rajeev J. Ram, Milovs A. Popović, and Vladimir M. Stojanović. 2015. Single-chip microprocessor that communicates directly using light. Nature, Vol. 528, 7583 (2015).
    [57]
    Guang Sun, Shih-Hung Weng, Chung-Kuan Cheng, Bill Lin, and Lieguang Zeng. 2012b. An on-chip global broadcast network design with equalized transmission lines in the 1024-core era. In Proceedings of the International Workshop on System Level Interconnect Prediction .
    [58]
    Rafael Ubal, Perhaad Mistry, Dana Schaa, Huntington Ave, and David Kaeli. 2012. Multi2Sim: A Simulation Framework for CPU-GPU Computing. In PACT .
    [59]
    Abhishek Udupa, Kaushik Rajan, and William Thies. 2011. ALTER: Exploiting Breakable Dependences for Parallelization. In PLDI .
    [60]
    Dana Vantrease, Robert Schreiber, Matteo Monchiero, M. McLaren, N.P. Jouppi, Marco Fiorentino, Al Davis, Nathan Binkert, R.G. Beausoleil, and J.H. Ahn. 2008. Corona: System Implications of Emerging Nanophotonic Technology. In ISCA.
    [61]
    Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In ISCA.
    [62]
    Benwei Xu, Yuan Zhou, and Yun Chiu. 2017. A 23-mW 24-GS/s 6-bit Voltage-Time Hybrid Time-Interleaved ADC in 28-nm CMOS. IEEE Journal of Solid-State Circuits, Vol. 52, 4 (2017).
    [63]
    Xinmin Yu, Joe Baylon, Paul Wettin, Deukhyoun Heo, Partha Pratim Pande, and Shahriar Mirabbasi. 2014. Architecture and Design of Multi-Channel Millimeter-Wave Wireless Network-on-Chip. IEEE Design & Test, Vol. 31, 6 (2014).
    [64]
    Xinmin Yu, Hooman Rashtian, and Shahriar Mirabbasi. 2015. An 18.7-Gb/s 60-GHz OOK Demodulator in 65-nm CMOS for Wireless Network-on-Chip. IEEE Transactions on Circuits And Systems -I: Regular Papers, Vol. 62, 3 (2015).
    [65]
    Xinmin Yu, Suman Prasad Sah, Hooman Rashtian, Shahriar Mirabbasi, Partha Pratim Pande, and Deukhyoun Heo. 2014. A 1.2-pJ/bit 16-Gb/s 60-GHz OOK Transmitter in 65-nm CMOS for Wireless Network-On-Chip. IEEE Transactions on Microwave Theory and Techniques, Vol. 62, 10 (2014).
    [66]
    Weirong Zhu, Vugranam C Sreedhar, Ziang Hu, and Guang R Gao. 2007. Synchronization State Buffer: Supporting Efficient Fine-grain Synchronization on Many-core Architectures. In ISCA.

    Cited By

    View all
    • (2024)Approximate Communication in Network-on-Chips for Training and Inference of Image Classification ModelsDesign and Applications of Emerging Computer Systems10.1007/978-3-031-42478-6_27(709-740)Online publication date: 14-Jan-2024
    • (2023)System-Level Exploration of In-Package Wireless Communication for Multi-Chiplet PlatformsProceedings of the 28th Asia and South Pacific Design Automation Conference10.1145/3566097.3567952(561-566)Online publication date: 16-Jan-2023
    • (2023)REMOTE: Re-thinking Task Mapping on Wireless 2.5D Systems-on-Package for Hotspot Removal2023 IFIP/IEEE 31st International Conference on Very Large Scale Integration (VLSI-SoC)10.1109/VLSI-SoC57769.2023.10321912(1-6)Online publication date: 16-Oct-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems
    April 2019
    1126 pages
    ISBN:9781450362405
    DOI:10.1145/3297858
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 April 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. approximate
    2. multicore
    3. parallelism
    4. wireless

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ASPLOS '19

    Acceptance Rates

    ASPLOS '19 Paper Acceptance Rate 74 of 351 submissions, 21%;
    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)102
    • Downloads (Last 6 weeks)13

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Approximate Communication in Network-on-Chips for Training and Inference of Image Classification ModelsDesign and Applications of Emerging Computer Systems10.1007/978-3-031-42478-6_27(709-740)Online publication date: 14-Jan-2024
    • (2023)System-Level Exploration of In-Package Wireless Communication for Multi-Chiplet PlatformsProceedings of the 28th Asia and South Pacific Design Automation Conference10.1145/3566097.3567952(561-566)Online publication date: 16-Jan-2023
    • (2023)REMOTE: Re-thinking Task Mapping on Wireless 2.5D Systems-on-Package for Hotspot Removal2023 IFIP/IEEE 31st International Conference on Very Large Scale Integration (VLSI-SoC)10.1109/VLSI-SoC57769.2023.10321912(1-6)Online publication date: 16-Oct-2023
    • (2023)Slack-Aware Packet Approximation for Energy-Efficient Network-on-ChipsIEEE Transactions on Sustainable Computing10.1109/TSUSC.2022.32134698:1(120-132)Online publication date: 1-Jan-2023
    • (2023)A Technique for Approximate Communication in Network-on-Chips for Image ClassificationIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2022.316216511:1(30-42)Online publication date: 1-Jan-2023
    • (2022)Approximate Network-on-Chips with Application to Image Classification2022 IEEE International Conference on Networking, Architecture and Storage (NAS)10.1109/NAS55553.2022.9925540(1-8)Online publication date: Oct-2022
    • (2022)Full System Exploration of On-Chip Wireless Communication on Many-Core Architectures2022 IEEE 13th Latin America Symposium on Circuits and System (LASCAS)10.1109/LASCAS53948.2022.9893905(1-4)Online publication date: 1-Mar-2022
    • (2022)A systematic analysis of power saving techniques for wireless network-on-chip architecturesJournal of Systems Architecture10.1016/j.sysarc.2022.102485126(102485)Online publication date: May-2022
    • (2022)Accuracy-Aware CompilersApproximate Computing Techniques10.1007/978-3-030-94705-7_7(177-214)Online publication date: 3-Jan-2022
    • (2021)6G Enabled Smart Infrastructure for Sustainable Society: Opportunities, Challenges, and Research RoadmapSensors10.3390/s2105170921:5(1709)Online publication date: 2-Mar-2021
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media