Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Exploring Energy Scalability in Coprocessor-Dominated Architectures for Dark Silicon

Published: 01 April 2014 Publication History

Abstract

As chip designers face the prospect of increasingly dark silicon, there is increased interest in incorporating energy-efficient specialized coprocessors into general-purpose designs. For specialization to be a viable means of leveraging dark silicon, it must provide energy savings over the majority of execution for large, diverse workloads, and this will require deploying coprocessors in large numbers. Recent work has shown that automatically generated application-specific coprocessors can greatly improve energy efficiency, but it is not clear that current techniques will scale to Coprocessor-Dominated Architectures (CoDAs) with hundreds or thousands of coprocessors.
We show that scaling CoDAs to include very large numbers of coprocessors is challenging because of the energy cost of interconnects, the memory system, and leakage. These overheads grow with the number of coprocessors and, left unchecked, will squander the energy gains that coprocessors can provide. The article presents a detailed study of energy costs across a wide range of tiled CoDA designs and shows that careful choice of cache configuration, tile size, coarse-grain power management and transistor implementation can limit the growth of these overheads. For multithreaded workloads, designer must also take care to avoid excessive contention for coprocessors, which can significantly increase energy consumption. The results suggest that, for CoDAs that target larger workloads, amortizing shared overheads via multithreading can provide up to 3.8× reductions in energy per instruction, retaining much of the 5.3× potential of smaller designs.

References

[1]
Jason Allred, Sanghamitra Roy, and Koushik Chakraborty. 2012. Designing for dark silicon: A methodological perspective on energy efficient systems. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'12). ACM Press, New York, 255--260.
[2]
Mark Bohr and Kaizad Mistry. 2011. intel's Revolutionary 22 nm Transistor Technology. http://download. intel.com/newsroom/kits/22nm/pdfs/22nm-Details_Presentation.pdf.
[3]
Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. 2004. Brook for gpus: Stream computing on graphics hardware. ACM Trans. Graph. 23, 3, 777--786.
[4]
Nathan Clark, Amir Hormati, and Scott Mahlke. 2008. VEAL: Virtualized execution accelerator for loops. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). IEEE Computer Society, 389--400.
[5]
Hamed F. Dadgour and Kaustav Banerjee. 2007. Design and analysis of hybrid nems-cmos circuits for ultra low-power applications. In Proceedings of the 44th ACM/IEEE Design Automation Conference (DAC'07). 306--311.
[6]
Robert H. Dennard, Fritz H. Gaensslen, Hwa-Nien Yu, V. Leo Rideout, Ernest Bassous, and Andre R. Leblanc. 1974. Design of ion-implanted mosfet's with very small physical dimensions. IEEE J. Solid-State Circ. 9, 5, 256--268.
[7]
Embedded Microprocessor Benchmark Consortium. 2002. Eembc benchmark suite. http://www.eembc.org.
[8]
Hadi Esmaeilzadeh, Emily Blem, Renee S. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA'11). IEEE, 365--376.
[9]
Nathan Goulding, Jack Sampson, Ganesh Venkatesh, Saturnino Garcia, Joe Auricchio, Jonathan Babb, Michael B. Taylor, and Steven Swanson. 2010. GreenDroid: A mobile application processor for a future of dark silicon. http://www.academia.edu/2384482/GreenDroid_A_mobile_application_processor_for_a_future_of_dark_silicon
[10]
Nathan Goulding-Hotta, Jack Sampson, Ganesh Venkatesh, Saturnino Garcia, Joe Auricchio, Po-Chao Huang, Manish Arora, Siddhartha Nath, Vikram Bhatt, Jonathan Babb, Steven Swanson, and Michael B. Taylor. 2011. The greendroid mobile application processor: An architecture for silicon's dark future. IEEE Micro 31, 2, 86--95.
[11]
Nathan Goulding-Hotta, Jack Sampson, Qiaoshi Zheng, Vikram Bhatt, Joe Auricchio, Steven Swanson, and Michael B. Taylor. 2012. GreenDroid: An architecture for the dark silicon age. In Proceedings of the 17th Asia and South Pacific Conference on Design Automation (ASP-DAC'12). IEEE, 100--105.
[12]
Venkatraman Govindaraju, Chen-Han Ho, Tony Nowatzki, Jatin Chhugani, Nadathur Satish, Karthikeyan Sankaralingam, and Changkyu Kim. 2012. DySER: Unifying functionality and parallelism specialization for energy efficient computing. IEEE Micro 33, 5, 38--51.
[13]
Frank Hannig, Sascha Roloff, Gregor Snelting, Jurgen Teich, and Andreas Zwinkau. 2011. Resource-aware programming and simulation of mpsoc architectures through extension of ×10. In Proceedings of the 14th International Workshop on Software and Compilers for Embedded Systems (SCOPES'11). ACM Press, New York, 48--55.
[14]
Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward dark silicon in servers. IEEE Micro 31, 4, 6--15.
[15]
Michael B. Henry, Robert Lyerly, Leyla Nazhandali, Adam Fruehling, and Dimitrios Peroulis. 2011. MEMS-based power gating for highly scalable periodic and event-driven processing. In Proceedings of the 24th International Conference on VLSI Design (VLSIDesign'11). 286--291.
[16]
Michael B. Henry and Leyla Nazhandali. 2010. From transistors to mems: Throughput-aware power gating in cmos circuits. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'10). 130--135.
[17]
IMOD Technology Overview. 2008. IMOD technology overview. http://www.qualcomm.com/common/documents/white papers/QMT_Technology_Overview_12-07.pdf.
[18]
Independent Jpeg Group. 2002. Library for jpeg image compression. http://www.ijg.org/.
[19]
Ravi Jotwani, Sriram Sundaram, Stephen Kosonocky, Alex Schaefer, Victor Andrade, Greg Constant, Amy Novak, and Samuel Naffziger. 2010. An ×86-64 core implemented in 32nm soi cmos. In Proceedings of the IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC'10). 106--107.
[20]
Chris Lattner and Vikram Adve. 2004. Llvm: A compilation framework for lifelong program analysis and transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'04). IEEE Computer Society, 75--86.
[21]
Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th Annual International Symposium on Computer rchitecture (ISCA'09). ACM Press, New York, 2--13.
[22]
Chi-Keung Luk, Sunpyo Hong, and Hyesoon Kim. 2009. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (Micro'09). ACM Press, New York, 45--55.
[23]
Mahim Mishra, Timothy J. Callahan, Tiberiu Chelcea, Girish Venkataramani, Seth C. Goldstein, and Mihai Budiu. 2006. Tartan: Evaluating spatial computation for whole program execution. SIGOPS Oper. Syst. Rev. 40, 5, 163--174.
[24]
John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. 2008. Scalable parallel programming with cuda. In Proceedings of the ACM SIGGRAPH Classes (SIGGRAPH'08). ACM Press, New York, 1--14.
[25]
John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krger, Aaron E. Lefohn, and Timothy J. Purcell. 2005. A survey of general-purpose computation on graphics hardware. In Proceedings of the Eurographics State of the Art Reports. 21--51.
[26]
Jack Sampson, Ganesh Venkatesh, Nathan Goulding-Hotta, Saturnino Garcia, Steven Swanson, and Michael B. Taylor. 2011. Efficient Complex Operators for irregular codes. In Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA'11). 491--502.
[27]
Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, and Charles R. Moore. 2003. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA'03). ACM Press, News York, 422--433.
[28]
Semiconductor Industries Association. 2012. International technology roadmap for semiconductors. http://www.itrs.net/Links/2012ITRS/Home2012.htm.
[29]
Mingoo Seok, S. Hanson, Yu-Shiang Lin, Zhiyoong Foo, Daeyeon Kim, Yoonmyung Lee, Nurrachman Liu, D. Sylvester, and D. Blaauw. 2008. The phoenix processor: A 30pw platform for sensor applications. In Proceedings of the IEEE Symposium on VLSI Circuits. 188--189.
[30]
Standard Performance Evaluation Corporation. 2000. SPEC CPU 2000 benchmark specifications. SPEC2000 Benchmark Release. http://www.spec.org/.
[31]
Standard Performance Evaluation Corporation. 2006. SPEC CPU 2006 benchmark specifications. SPEC2006 Benchmark Release. http://www.spec.org/.
[32]
Steven Swanson, Andrew Schwerin, Martha Mercaldi, Andrew Petersen, Andrew Putnam, Ken Michelson, Mark Oskin, and Susan J. Eggers. 2007. The wavescalar architecture. ACM Trans. Comput. Syst. 25, 2, 4.
[33]
Michael B. Taylor. 2012. Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse. In Proceedings of the 49th ACM/IEEE Design Automation Conference (DAC'12). ACM Press, New York, 1131--1136.
[34]
Michael B. Taylor. 2013. A landscape of the new dark silicon design regime. IEEE Micro 33, 5, 8--19.
[35]
Michael B. Taylor, Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt, Ben Greenwald, Henry Hoffmann, Paul Johnson, Jason Kim, James Psota, Arvind Saraf, Nathan Shnidman, Volker Strumpen, Matt Frank, Saman Amarasinghe, and Anant Agarwal. 2004. Evaluation of the raw microprocessor: An exposed-wire-delay architecture for ilp and streams. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA'04). IEEE Computer Society, 2--13.
[36]
Shyamkumar Thoziyoor, Naveen Muralimanohar, Jung Ho Ahn, and Norman P. Jouppi. 2008. CACTI 5.1. Tech. rep. HPL-2008-20. HP Labs, Palo Alto, CA. http://www.hpl.hp.com/techreports/2008/HPL-2008-20.html.
[37]
Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael B. Taylor. 2010. Conservation cores: Reducing the energy of mature computations. In Proceedings of the 15th International Conference Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10). ACM Press, New York, 205--218.
[38]
Ganesh Venkatesh, Jack Sampson, Nathan Goulding-Hotta, Sravanthi K. Venkata, Michael B. Taylor, and Steven Swanson. 2011. QsCores: Trading dark silicon for scalable energy efficiency with quasi-specific cores. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (Micro'11). 163--174.
[39]
Miljan Vuletic, Paolo Ienne, Christopher Claus, and Walter Stechele. 2006. Multithreaded virtual-memory-enabled reconfigurable hardware accelerators. In Proceedings of the IEEE International Conference on Field Programmable Technology (FPT'06). 197--204.
[40]
Perry H. Wang, Jamison D. Collins, Gautham M. Chinya, Hong Jiang, Xinmin Tian, Milind Girkar, Nick Y. Yang, Guei-Yuan Lueh, and Hong Wang. 2007. EXOCHI: Architecture and programming environment for a heterogeneous multi-core multithreaded system. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'07). ACM Press, New York, 156--166.

Cited By

View all

Index Terms

  1. Exploring Energy Scalability in Coprocessor-Dominated Architectures for Dark Silicon

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 13, Issue 4s
    Special Issue on Real-Time and Embedded Technology and Applications, Domain-Specific Multicore Computing, Cross-Layer Dependable Embedded Systems, and Application of Concurrency to System Design (ACSD'13)
    July 2014
    571 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/2601432
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 01 April 2014
    Accepted: 01 September 2013
    Revised: 01 June 2013
    Received: 01 January 2013
    Published in TECS Volume 13, Issue 4s

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CoDA
    2. conservation core
    3. coprocessor
    4. dark silicon
    5. energy efficiency
    6. scalable specialization

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Topology Specialization for Networks-on-Chip in the Dark Silicon EraDark Silicon and Future On-chip Systems10.1016/bs.adcom.2018.03.009(217-258)Online publication date: 2018
    • (2016)Runtime energy management for many-core systems2016 IEEE International Conference on Electronics, Circuits and Systems (ICECS)10.1109/ICECS.2016.7841212(380-383)Online publication date: Dec-2016
    • (2015)FusionACM SIGARCH Computer Architecture News10.1145/2872887.275042143:3S(733-745)Online publication date: 13-Jun-2015
    • (2015)FusionProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750421(733-745)Online publication date: 13-Jun-2015
    • (2015)SPF: Segmented processor framework for energy efficient proactive routing based applications in MANET2015 2nd International Conference on Recent Advances in Engineering & Computational Sciences (RAECS)10.1109/RAECS.2015.7453411(1-5)Online publication date: Dec-2015

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media