Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor

Published: 20 October 2006 Publication History

Abstract

In this paper, we show how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing. Our proposed architecture, PicoServer, employs 3D technology to bond one die containing several simple slow processing cores to multiple DRAM dies sufficient for a primary memory. The 3D technology also enables wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores. The additional cores allow the clock frequency to be lowered without impairing throughput. Lower clock frequency in turn reduces power and means that thermal constraints, a concern with 3D stacking, are easily satisfied.The PicoServer architecture specifically targets Tier 1 server applications, which exhibit a high degree of thread level parallelism. An architecture targeted to efficient throughput is ideal for this application domain. We find for a similar logic die area, a 12 CPU system with 3D stacking and no L2 cache outperforms an 8 CPU system with a large on-chip L2 cache by about 14% while consuming 55% less power. In addition, we show that a PicoServer performs comparably to a Pentium 4-like class machine while consuming only about 1/10 of the power, even when conservative assumptions are made about the power consumption of the PicoServer.

References

[1]
ARM 11 MPcore. http://www.arm.com/products/CPUs/ARM11MPCoreMultiprocessor.html.
[2]
Evolution of network memory. http://www.jedex.org/images/pdf/jack_troung_samsung.pdf.
[3]
FaStack 3D RISC super-8051 microcontroller. http://www.tachyonsemi.com/OtherICs/datasheets/TSCR8051Lx_1_5Web.pdf.
[4]
The Micron system-power calculator. http://www.micron.com/products/dram/syscalc.html.
[5]
National semiconductor DP83820 10 / 100 / 1000 Mb/s PCI ethernet network interface controller.
[6]
Predictive technology model. http://www.eas.asu.edu/~ptm.
[7]
(LS)3-libre streaming, libre software, libre standards an open multimedia streaming project. http://streaming.polito.it/.
[8]
RLDRAM memory. http://www.micron.com/products/dram/rldram/.
[9]
SPECweb99 benchmark. http://www.spec.org/osg/web99/.
[10]
Sun Fire T2000 Server Power Calculator. http://www.sun.com/servers/coolthreads/t2000/calc/index.jsp.
[11]
ITRS roadmap. Technical report, 2005.
[12]
K. Banerjee, S.J. Souri, P. Kapur, and K.C. Saraswat. 3-D ICs: A novel chip design for improving deep-submicrometer interconnect performance and systems-on-chip integration. Proc. of IEEE, 89(5):602--533, May 2001.
[13]
P. Barford and M. Crovella. Generating representative web workloads for network and server performance evaluation. In Measurement and Modeling of Computer Systems, pages 151--160, 1998.
[14]
L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In Proc. Int'l Symp. on Computer Architecture, June 2000.
[15]
N.L. Binkert, R.G. Dreslinski, L.R. Hsu, K.T. Lim, A.G. Saidi, and S.K. Reinhardt. The M5 simulator: Modeling networked systems. IEEE Micro, 26(4):52--60, Jul/Aug 2006.
[16]
B. Black, D. Nelson, C. Webb, and N. Samra. 3D processing technology and its impact on iA32 microprocessors. In Proc. Int'l Conf. of Computer Design, pages 316--318, 2004.
[17]
T.-Y. Chiang, S.J. Souri, C.O. Chui, and K.C. Saraswat. Thermal analysis of heterogeneous 3-D ICs with various integration scenario. In IEDM Technical Digest, pages 681--684, Dec. 2001.
[18]
L.T. Clark, E.J. Hoffman, J. Miller, M. Biyani, Y. Liao, S. Strazdus, M. Morrow, K.E. Verlarde, and M.A. Yarch. An embedded 32-b microprocessor core for low-power and high-performance applications. IEEE Journal of Solid State Circuits, 36(11):1599--1608, Nov. 2001.
[19]
E.L. Congduc. Packet classification in the NIC for improved SMPbased internet servers. In Proc. Int'l Conf. on Networking, Feb. 2004.
[20]
W.R. Davis, J.Wilson, S. Mick, J. Xu, H. Hua, C. Mineo, A.M. Sule, M. Steer, and P.D. Franzon. Demystifying 3D ICs: The pros and cons of going vertical. IEEE Design & Test of Computers, 22(6):498--510, 2005.
[21]
M.J. Flynn and P. Hung. Computer architecture and technology: Some thoughts on the road ahead. In Proc. Int'l Conf. on Engineering of Reconfigurable Systems and Algorithms, pages 3--16, 2004.
[22]
B. Goplen and S.S. Sapatnekar. Thermal via placement in 3D ICs. In Proc. Int'l Symp. on Physical Design, pages 167--174, Apr. 2005.
[23]
S. Gupta, M. Hilbert, S. Hong, and R. Patti. Techniques for producing 3D ICs with high-density interconnect. www.tezzaron.com/about/papers/ieee_vmic_2004_finalsecure.pdf.
[24]
R. Ho and M. Horowitz. The future of wires. Proc. of the IEEE, 89(4), Apr. 2001.
[25]
W. Huang, M.R. Stan, K. Skadron, K. Sankaranarayanan, S. Ghosh, and S. Velusam. Compact thermal modeling for temperature-aware design. In Proc. Design Automation Conf., June 2004.
[26]
P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded Sparc processor. IEEE Micro, 25(2):21--29, Mar. 2005.
[27]
M. Koyanagi. Different approaches to 3D chips. http://asia.stanford.edu/events/Spring05/slides/051205-Koyanagi.pdf.
[28]
C. Kozyrakis, J. Gebis, D. Martin, S. Williams, I. Mavroidis, S. Pope, D. Jones, D. Patterson, and K. Yelick. Vector IRAM: A mediaoriented vector processor with embedded DRAM. In Hotchips, Aug. 2000.
[29]
J. Laudon. Performance/watt: the new server focus. SIGARCH Computer Architecture News, 33(4):5--13, 2005.
[30]
K. Lee, T. Nakamura, T. Ono, Y. Yamada, T. Mizukusa, H. Hashimoto, K. Park, H. Kurino, and M. Koyanagi. Three-dimensional shared memory fabricated using wafer stacking technology. In IEDM Technical Digest., pages 165--168, Dec 2000.
[31]
J. Li and J.F. Martinez. Power-performance implications of threadlevel parallelism in chip multiprocessors. In Proc. Int'l Symp. on Performance Analysis of Systems and Software, Mar. 2005.
[32]
J. Lu. Wafer-level 3D hyper-integration technology platform. www.rpi.edu/~luj/RPI_3D_Research_0504.pdf.
[33]
G. MacGillivray. Process vs. density in DRAMs. http://www.eetasia.com/ARTICLES/2005SEP/B/2005SEP01_STOR_TA.pdf.
[34]
D.A. Maltz and P. Bhagwat. TCP splicing for application layer proxy performance. Research Report RC 21139, IBM, Mar. 1998.
[35]
R.E. Matick and S.E. Schuster. Logic-based eDRAM: origins and rationale for use. IBM Journal of Research and Development, 49(1), Jan. 2005.
[36]
T. Mudge. Power: A first-class architectural design constraint. IEEE Computer, 34(4), Apr. 2001.
[37]
K. Olukotun, B.A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. The case for a single-chip multiprocessor. In Proc. Int'l Conf. on Arch. Support for Prog. Lang. and Oper. Sys., Oct. 1996.
[38]
A. Rahman and R. Reif. System-level performance evaluation of three-dimensional integrated circuits. IEEE Trans. on VLSI, 8, Dec. 2000.
[39]
F. Ricci, L.T. Clark, T. Beatty, W. Yu, A. Bashmakov, S. Demmons, E. Fox, J. Miller, M. Biyani, and J. Haigh. A 1.5GHz 90nm embedded microprocessor core. In Proc. Symp. on VLSI Circuits, June 2005.
[40]
J. Schutz and C. Webb. A scalable X86 CPU design for 90 nm process. In Proc. Int'l Solid-State Circuits Conference, Feb. 2004.
[41]
D. Wendell, J. Lin, P. Kaushik, S. Seshadri, A. Wang, V. Sundararaman, P. Wang, H. McIntyre, S. Kim, W. Hsu, H. Park, G. Levinsky, J. Lu, M. Chirania, R. Heald, and P. Lazar. A 4MB on-chip l2 cache for a 90nm 1.6GHz 64b SPARC microprocessor. In Proc. Int'l Solid-State Circuits Conference, Feb. 2004.
[42]
L. Xue, C.C. Liu, H.-S. Kim, S. Kim, and S. Tiwari. Threedimensional integration: Technology, use, and issues for mixed-signal applications. IEEE Trans. on Electron Devices, 50:601--609, May 2003.

Cited By

View all
  • (2017)Low power NoC architecture based dynamic reconfigurable systemCluster Computing10.1007/s10586-017-1413-3Online publication date: 6-Dec-2017
  • (2016)A unified memory network architecture for in-memory computing in commodity serversThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195673(1-14)Online publication date: 15-Oct-2016
  • (2014)Thermal power plane enabling dual-side electrical interconnects for high-performance chip stacks: ConceptProceedings of the 5th Electronics System-integration Technology Conference (ESTC)10.1109/ESTC.2014.6962727(1-6)Online publication date: Sep-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 41, Issue 11
Proceedings of the 2006 ASPLOS Conference
November 2006
425 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1168918
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
    October 2006
    440 pages
    ISBN:1595934510
    DOI:10.1145/1168857
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2006
Published in SIGPLAN Volume 41, Issue 11

Check for updates

Author Tags

  1. 3D stacking technology
  2. chip multiprocessor
  3. full-system simulation
  4. low power
  5. tier 1 server
  6. web/file/streaming server

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)6
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Low power NoC architecture based dynamic reconfigurable systemCluster Computing10.1007/s10586-017-1413-3Online publication date: 6-Dec-2017
  • (2016)A unified memory network architecture for in-memory computing in commodity serversThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195673(1-14)Online publication date: 15-Oct-2016
  • (2014)Thermal power plane enabling dual-side electrical interconnects for high-performance chip stacks: ConceptProceedings of the 5th Electronics System-integration Technology Conference (ESTC)10.1109/ESTC.2014.6962727(1-6)Online publication date: Sep-2014
  • (2013)3D integration for power-efficient computingProceedings of the Conference on Design, Automation and Test in Europe10.5555/2485288.2485477(779-784)Online publication date: 18-Mar-2013
  • (2013)Replacing Different Levels of the Memory Hierarchy with NVMsExploring Memory Hierarchy Design with Emerging Memory Technologies10.1007/978-3-319-00681-9_2(13-67)Online publication date: 19-Sep-2013
  • (2013)An Energy-Efficient 3D Stacked STT-RAM Cache Architecture for CMPsEmerging Memory Technologies10.1007/978-1-4419-9551-3_6(145-167)Online publication date: 22-Oct-2013
  • (2013)A MOO‐based Methodology for Designing 3D Stacked Integrated CircuitsJournal of Multi-Criteria Decision Analysis10.1002/mcda.149721:1-2(43-63)Online publication date: 4-Nov-2013
  • (2011)Modeling the computational efficiency of 2-D and 3-D silicon processors for early-chip planningProceedings of the International Conference on Computer-Aided Design10.5555/2132325.2132407(310-317)Online publication date: 7-Nov-2011
  • (2011)BibliographyDesigning Network On-Chip Architectures in the Nanoscale Era10.1201/b10477-18(443-475)Online publication date: 9-Feb-2011
  • (2011)Modeling the computational efficiency of 2-D and 3-D silicon processors for early-chip planningProceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design10.1109/ICCAD.2011.6105347(310-317)Online publication date: 7-Nov-2011
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media