research-article

Exploring Energy Scalability in Coprocessor-Dominated Architectures for Dark Silicon

Authors:

Nathan Goulding-Hotta,

Scott Ricketts,

Steven Swanson,

Michael Bedford Taylor,

Jack SampsonAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 13, Issue 4s

Article No.: 130, Pages 1 - 24

https://doi.org/10.1145/2584657

Published: 01 April 2014 Publication History

Abstract

As chip designers face the prospect of increasingly dark silicon, there is increased interest in incorporating energy-efficient specialized coprocessors into general-purpose designs. For specialization to be a viable means of leveraging dark silicon, it must provide energy savings over the majority of execution for large, diverse workloads, and this will require deploying coprocessors in large numbers. Recent work has shown that automatically generated application-specific coprocessors can greatly improve energy efficiency, but it is not clear that current techniques will scale to Coprocessor-Dominated Architectures (CoDAs) with hundreds or thousands of coprocessors.

We show that scaling CoDAs to include very large numbers of coprocessors is challenging because of the energy cost of interconnects, the memory system, and leakage. These overheads grow with the number of coprocessors and, left unchecked, will squander the energy gains that coprocessors can provide. The article presents a detailed study of energy costs across a wide range of tiled CoDA designs and shows that careful choice of cache configuration, tile size, coarse-grain power management and transistor implementation can limit the growth of these overheads. For multithreaded workloads, designer must also take care to avoid excessive contention for coprocessors, which can significantly increase energy consumption. The results suggest that, for CoDAs that target larger workloads, amortizing shared overheads via multithreading can provide up to 3.8× reductions in energy per instruction, retaining much of the 5.3× potential of smaller designs.

References

[1]

Jason Allred, Sanghamitra Roy, and Koushik Chakraborty. 2012. Designing for dark silicon: A methodological perspective on energy efficient systems. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'12). ACM Press, New York, 255--260.

Digital Library

[2]

Mark Bohr and Kaizad Mistry. 2011. intel's Revolutionary 22 nm Transistor Technology. http://download. intel.com/newsroom/kits/22nm/pdfs/22nm-Details_Presentation.pdf.

[3]

Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. 2004. Brook for gpus: Stream computing on graphics hardware. ACM Trans. Graph. 23, 3, 777--786.

Digital Library

[4]

Nathan Clark, Amir Hormati, and Scott Mahlke. 2008. VEAL: Virtualized execution accelerator for loops. In Proceedings of the 35^th Annual International Symposium on Computer Architecture (ISCA'08). IEEE Computer Society, 389--400.

Digital Library

[5]

Hamed F. Dadgour and Kaustav Banerjee. 2007. Design and analysis of hybrid nems-cmos circuits for ultra low-power applications. In Proceedings of the 44^th ACM/IEEE Design Automation Conference (DAC'07). 306--311.

Digital Library

[6]

Robert H. Dennard, Fritz H. Gaensslen, Hwa-Nien Yu, V. Leo Rideout, Ernest Bassous, and Andre R. Leblanc. 1974. Design of ion-implanted mosfet's with very small physical dimensions. IEEE J. Solid-State Circ. 9, 5, 256--268.

[7]

Embedded Microprocessor Benchmark Consortium. 2002. Eembc benchmark suite. http://www.eembc.org.

[8]

Hadi Esmaeilzadeh, Emily Blem, Renee S. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 38^th Annual International Symposium on Computer Architecture (ISCA'11). IEEE, 365--376.

Digital Library

[9]

Nathan Goulding, Jack Sampson, Ganesh Venkatesh, Saturnino Garcia, Joe Auricchio, Jonathan Babb, Michael B. Taylor, and Steven Swanson. 2010. GreenDroid: A mobile application processor for a future of dark silicon. http://www.academia.edu/2384482/GreenDroid_A_mobile_application_processor_for_a_future_of_dark_silicon

[10]

Nathan Goulding-Hotta, Jack Sampson, Ganesh Venkatesh, Saturnino Garcia, Joe Auricchio, Po-Chao Huang, Manish Arora, Siddhartha Nath, Vikram Bhatt, Jonathan Babb, Steven Swanson, and Michael B. Taylor. 2011. The greendroid mobile application processor: An architecture for silicon's dark future. IEEE Micro 31, 2, 86--95.

Digital Library

[11]

Nathan Goulding-Hotta, Jack Sampson, Qiaoshi Zheng, Vikram Bhatt, Joe Auricchio, Steven Swanson, and Michael B. Taylor. 2012. GreenDroid: An architecture for the dark silicon age. In Proceedings of the 17^th Asia and South Pacific Conference on Design Automation (ASP-DAC'12). IEEE, 100--105.

[12]

Venkatraman Govindaraju, Chen-Han Ho, Tony Nowatzki, Jatin Chhugani, Nadathur Satish, Karthikeyan Sankaralingam, and Changkyu Kim. 2012. DySER: Unifying functionality and parallelism specialization for energy efficient computing. IEEE Micro 33, 5, 38--51.

Digital Library

[13]

Frank Hannig, Sascha Roloff, Gregor Snelting, Jurgen Teich, and Andreas Zwinkau. 2011. Resource-aware programming and simulation of mpsoc architectures through extension of ×10. In Proceedings of the 14^th International Workshop on Software and Compilers for Embedded Systems (SCOPES'11). ACM Press, New York, 48--55.

Digital Library

[14]

Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward dark silicon in servers. IEEE Micro 31, 4, 6--15.

Digital Library

[15]

Michael B. Henry, Robert Lyerly, Leyla Nazhandali, Adam Fruehling, and Dimitrios Peroulis. 2011. MEMS-based power gating for highly scalable periodic and event-driven processing. In Proceedings of the 24^th International Conference on VLSI Design (VLSIDesign'11). 286--291.

Digital Library

[16]

Michael B. Henry and Leyla Nazhandali. 2010. From transistors to mems: Throughput-aware power gating in cmos circuits. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'10). 130--135.

Digital Library

[17]

IMOD Technology Overview. 2008. IMOD technology overview. http://www.qualcomm.com/common/documents/white papers/QMT_Technology_Overview_12-07.pdf.

[18]

Independent Jpeg Group. 2002. Library for jpeg image compression. http://www.ijg.org/.

[19]

Ravi Jotwani, Sriram Sundaram, Stephen Kosonocky, Alex Schaefer, Victor Andrade, Greg Constant, Amy Novak, and Samuel Naffziger. 2010. An ×86-64 core implemented in 32nm soi cmos. In Proceedings of the IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC'10). 106--107.

[20]

Chris Lattner and Vikram Adve. 2004. Llvm: A compilation framework for lifelong program analysis and transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'04). IEEE Computer Society, 75--86.

Digital Library

[21]

Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36^th Annual International Symposium on Computer rchitecture (ISCA'09). ACM Press, New York, 2--13.

Digital Library

[22]

Chi-Keung Luk, Sunpyo Hong, and Hyesoon Kim. 2009. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42^nd Annual IEEE/ACM International Symposium on Microarchitecture (Micro'09). ACM Press, New York, 45--55.

Digital Library

[23]

Mahim Mishra, Timothy J. Callahan, Tiberiu Chelcea, Girish Venkataramani, Seth C. Goldstein, and Mihai Budiu. 2006. Tartan: Evaluating spatial computation for whole program execution. SIGOPS Oper. Syst. Rev. 40, 5, 163--174.

Digital Library

[24]

John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. 2008. Scalable parallel programming with cuda. In Proceedings of the ACM SIGGRAPH Classes (SIGGRAPH'08). ACM Press, New York, 1--14.

Digital Library

[25]

John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krger, Aaron E. Lefohn, and Timothy J. Purcell. 2005. A survey of general-purpose computation on graphics hardware. In Proceedings of the Eurographics State of the Art Reports. 21--51.

[26]

Jack Sampson, Ganesh Venkatesh, Nathan Goulding-Hotta, Saturnino Garcia, Steven Swanson, and Michael B. Taylor. 2011. Efficient Complex Operators for irregular codes. In Proceedings of the 17^th IEEE International Symposium on High Performance Computer Architecture (HPCA'11). 491--502.

Digital Library

[27]

Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, and Charles R. Moore. 2003. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In Proceedings of the 30^th Annual International Symposium on Computer Architecture (ISCA'03). ACM Press, News York, 422--433.

Digital Library

[28]

Semiconductor Industries Association. 2012. International technology roadmap for semiconductors. http://www.itrs.net/Links/2012ITRS/Home2012.htm.

[29]

Mingoo Seok, S. Hanson, Yu-Shiang Lin, Zhiyoong Foo, Daeyeon Kim, Yoonmyung Lee, Nurrachman Liu, D. Sylvester, and D. Blaauw. 2008. The phoenix processor: A 30pw platform for sensor applications. In Proceedings of the IEEE Symposium on VLSI Circuits. 188--189.

[30]

Standard Performance Evaluation Corporation. 2000. SPEC CPU 2000 benchmark specifications. SPEC2000 Benchmark Release. http://www.spec.org/.

[31]

Standard Performance Evaluation Corporation. 2006. SPEC CPU 2006 benchmark specifications. SPEC2006 Benchmark Release. http://www.spec.org/.

[32]

Steven Swanson, Andrew Schwerin, Martha Mercaldi, Andrew Petersen, Andrew Putnam, Ken Michelson, Mark Oskin, and Susan J. Eggers. 2007. The wavescalar architecture. ACM Trans. Comput. Syst. 25, 2, 4.

Digital Library

[33]

Michael B. Taylor. 2012. Is dark silicon useful&quest; Harnessing the four horsemen of the coming dark silicon apocalypse. In Proceedings of the 49^th ACM/IEEE Design Automation Conference (DAC'12). ACM Press, New York, 1131--1136.

Digital Library

[34]

Michael B. Taylor. 2013. A landscape of the new dark silicon design regime. IEEE Micro 33, 5, 8--19.

Digital Library

[35]

Michael B. Taylor, Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt, Ben Greenwald, Henry Hoffmann, Paul Johnson, Jason Kim, James Psota, Arvind Saraf, Nathan Shnidman, Volker Strumpen, Matt Frank, Saman Amarasinghe, and Anant Agarwal. 2004. Evaluation of the raw microprocessor: An exposed-wire-delay architecture for ilp and streams. In Proceedings of the 31^st Annual International Symposium on Computer Architecture (ISCA'04). IEEE Computer Society, 2--13.

Digital Library

[36]

Shyamkumar Thoziyoor, Naveen Muralimanohar, Jung Ho Ahn, and Norman P. Jouppi. 2008. CACTI 5.1. Tech. rep. HPL-2008-20. HP Labs, Palo Alto, CA. http://www.hpl.hp.com/techreports/2008/HPL-2008-20.html.

[37]

Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael B. Taylor. 2010. Conservation cores: Reducing the energy of mature computations. In Proceedings of the 15^th International Conference Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10). ACM Press, New York, 205--218.

Digital Library

[38]

Ganesh Venkatesh, Jack Sampson, Nathan Goulding-Hotta, Sravanthi K. Venkata, Michael B. Taylor, and Steven Swanson. 2011. QsCores: Trading dark silicon for scalable energy efficiency with quasi-specific cores. In Proceedings of the 44^th Annual IEEE/ACM International Symposium on Microarchitecture (Micro'11). 163--174.

Digital Library

[39]

Miljan Vuletic, Paolo Ienne, Christopher Claus, and Walter Stechele. 2006. Multithreaded virtual-memory-enabled reconfigurable hardware accelerators. In Proceedings of the IEEE International Conference on Field Programmable Technology (FPT'06). 197--204.

[40]

Perry H. Wang, Jamison D. Collins, Gautham M. Chinya, Hong Jiang, Xinmin Tian, Milind Girkar, Nick Y. Yang, Guei-Yuan Lueh, and Hong Wang. 2007. EXOCHI: Architecture and programming environment for a heterogeneous multi-core multithreaded system. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'07). ACM Press, New York, 156--166.

Digital Library

Cited By

Modarressi MSarbazi-Azad H(2018)Topology Specialization for Networks-on-Chip in the Dark Silicon EraDark Silicon and Future On-chip Systems10.1016/bs.adcom.2018.03.009(217-258)Online publication date: 2018
https://doi.org/10.1016/bs.adcom.2018.03.009
Martins ASant'Ana AMoraes F(2016)Runtime energy management for many-core systems2016 IEEE International Conference on Electronics, Circuits and Systems (ICECS)10.1109/ICECS.2016.7841212(380-383)Online publication date: Dec-2016
https://doi.org/10.1109/ICECS.2016.7841212
Kumar SShriraman AVedula N(2015)FusionACM SIGARCH Computer Architecture News10.1145/2872887.275042143:3S(733-745)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750421
Show More Cited By

Index Terms

Exploring Energy Scalability in Coprocessor-Dominated Architectures for Dark Silicon
1. Computer systems organization
  1. Architectures
    1. Other architectures

Recommendations

Is dark silicon useful?: harnessing the four horsemen of the coming dark silicon apocalypse
DAC '12: Proceedings of the 49th Annual Design Automation Conference

Due to the breakdown of Dennardian scaling, the percentage of a silicon chip that can switch at full frequency is dropping exponentially with each process generation. This utilization wall forces designers to ensure that, at any point in time, large ...
Utilizing Dark Silicon to Save Energy with Computational Sprinting

Computational sprinting activates dark silicon to improve responsiveness by briefly but intensely exceeding a system's sustainable power limit. This article focuses on the energy implications of sprinting. The authors observe that sprinting can save ...
A CNN accelerator on embedded FPGA using dynamic reconfigurable coprocessor
AIIPCC '19: Proceedings of the International Conference on Artificial Intelligence, Information Processing and Cloud Computing

Convolutional neural network (CNN) has been widely deployed in deep learning networks at present. However, numerous convolution operations are computing intensive and often require powerful accelerator such as FPGA. The existed accelerators usually as ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 13, Issue 4s

Special Issue on Real-Time and Embedded Technology and Applications, Domain-Specific Multicore Computing, Cross-Layer Dependable Embedded Systems, and Application of Concurrency to System Design (ACSD'13)

July 2014

571 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/2601432

Editors:
Sandeep K. Shukla
Virginia Tech, USA
,
Josep Carmona
Universitat Politècnica de Catalunya, Spain
,
Mihai Teodor Lazarescu
Politecnico di Torino, Italy
,
Marta Pietkiewicz-koutny
Newcastle University, UK

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 01 April 2014

Accepted: 01 September 2013

Revised: 01 June 2013

Received: 01 January 2013

Published in TECS Volume 13, Issue 4s

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
567
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Modarressi MSarbazi-Azad H(2018)Topology Specialization for Networks-on-Chip in the Dark Silicon EraDark Silicon and Future On-chip Systems10.1016/bs.adcom.2018.03.009(217-258)Online publication date: 2018
https://doi.org/10.1016/bs.adcom.2018.03.009
Martins ASant'Ana AMoraes F(2016)Runtime energy management for many-core systems2016 IEEE International Conference on Electronics, Circuits and Systems (ICECS)10.1109/ICECS.2016.7841212(380-383)Online publication date: Dec-2016
https://doi.org/10.1109/ICECS.2016.7841212
Kumar SShriraman AVedula N(2015)FusionACM SIGARCH Computer Architecture News10.1145/2872887.275042143:3S(733-745)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750421
Kumar SShriraman AVedula NMarr DAlbonesi D(2015)FusionProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750421(733-745)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2750421
Taneja KTaneja HKumar R(2015)SPF: Segmented processor framework for energy efficient proactive routing based applications in MANET2015 2nd International Conference on Recent Advances in Engineering & Computational Sciences (RAECS)10.1109/RAECS.2015.7453411(1-5)Online publication date: Dec-2015
https://doi.org/10.1109/RAECS.2015.7453411

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents