research-article

Open access

Designing on-chip networks for throughput accelerators

Authors:

Tor M. AamodtAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 10, Issue 3

Article No.: 21, Pages 1 - 35

https://doi.org/10.1145/2512429

Published: 16 September 2013 Publication History

Abstract

As the number of cores and threads in throughput accelerators such as Graphics Processing Units (GPU) increases, so does the importance of on-chip interconnection network design. This article explores throughput-effective Network-on-Chips (NoC) for future compute accelerators that employ Bulk-Synchronous Parallel (BSP) programming models such as CUDA and OpenCL. A hardware optimization is “throughput effective” if it improves parallel application-level performance per unit chip area. We evaluate performance of future looking workloads using detailed closed-loop simulations modeling compute nodes, NoC, and the DRAM memory system. We start from a mesh design with bisection bandwidth balanced to off-chip demand. Accelerator workloads tend to demand high off-chip memory bandwidth which results in a many-to-few traffic pattern when coupled with expected technology constraints of slow growth in pins-per-chip. Leveraging these observations we reduce NoC area by proposing a “checkerboard” NoC which alternates between conventional full routers and half routers with limited connectivity. Next, we show that increasing network terminal bandwidth at the nodes connected to DRAM controllers alleviates a significant fraction of the remaining imbalance resulting from the many-to-few traffic pattern. Furthermore, we propose a “double checkerboard inverted” NoC organization which takes advantage of channel slicing to reduce area while maintaining the performance improvements of the aforementioned techniques. This organization also has a simpler routing mechanism and improves average application throughput per unit area by 24.3%.

References

[1]

Abts, D., Jerger, N. D. E., Kim, J., Gibson, D., and Lipasti, M. H. 2009. Achieving predictable performance through better memory controller placement in many-core cmps. In Proceedings of the IEEE/ACM Symposium on Computer Architecture (ISCA'09). ACM Press, New York, 451--461.

Digital Library

[2]

Ahn, J. H., Dally, W. J., Khailany, B., Kapasi, U. J., and Das, A. 2004. Evaluating the imagine stream architecture. In Proceedings of the IEEE/ACM Symposium on Computer Architecture (ISCA'04). IEEE Computer Society, Washington, DC, 14--25.

Digital Library

[3]

Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., and Yelick, K. 2009. A view of the parallel computing landscape. Comm. ACM 52, 10, 56--67.

Digital Library

[4]

Bai, P., Auth, C., Balakrishnan, S., Bost, M., Brain, R., et al. 2004. A 65nm logic technology featuring 35nm gate lengths, enhanced channel strain, 8 cu interconnect layers, low-k ild and 0.57 um2 sram cell. In Proceedings of the IEEE International Electron Devices Meeting, IEDM Technical Digest. 657--660.

[5]

Bakhoda, A., Kim, J., and Aamodt T. M. 2010. Throughput-effective on-chip networks for manycore accelerators. In Proceedings of the IEEE/ACM Symposium on Microarchitecture (MICRO'10). IEEE Computer Society, Washington, DC, 421--432.

Digital Library

[6]

Bakhoda, A., Yuan, G. L., Fung, W. W. L., Wong, H., and Aamodt, T. M. 2009. Analyzing cuda workloads using a detailed gpu simulator. In Proceedings of the IEEE Symposium on Performance Analysis of Systems and Software (ISPASS'09). 163--174.

[7]

Balfour, J. D. and Dally, W. J. 2006. Design tradeoffs for tiled CMP on-chip networks. In Proceedings of the ACM Conference on Supercomputing (ICS'06). ACM Press, New York, 187--198.

Digital Library

[8]

Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer J. W., Lee, S.-H., and Skadron, K. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the IEEE Symposium on Workload Characterization (IISWC'09). 44--54.

Digital Library

[9]

Coon, B. W. and Lindholm, E. J. 2008. US patent 7,353,369: System and method for managing divergent threads in a simd architecture. https://www.google.com/patents/US7353369.

[10]

Sdk, C. 2009. NVIDIA CUDA SDK code samples. http://developer.nvidia.com/object/cuda sdk samples.html.

[11]

Dally, W. J., Labonte, F., Das, A., Hanrahan, P., Ahn, J.-H., Gummaraju, J., Erez, M., Jayasena, N., Buck, I., Knight, T. J., and Kapasi, U. J. 2003. Merrimac: Supercomputing with streams. In Proceedings of the ACM/IEEE Conference on Supercomputing. ACM Press, New York, 35.

Digital Library

[12]

Dally, W. J. and Towles, B. 2004. Principles and Practices of Interconnection Networks. Morgan Kaufmann, San Fransisco, CA.

Digital Library

[13]

Das, R., Eachempati, S., Mishra, A. K., Narayanan, V., and Das, C. R. 2009. Design and evaluation of a hierarchical on-chip interconnect for next-generation CMPs. In Proceedings of the IEEE Symposium on High-Performance Computer Architecture (HPCA'09). 175--186.

[14]

Fang, J.-W., and Chang, Y.-W. 2010. Area-I/O flip-chip routing for chip-package co-design considering signal skews. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 29, 5, 711--721.

Digital Library

[15]

Fung, W. W. L., Sham, I., Yuan, G., and Aamodt, T. M. 2007. Dynamic warp formation and scheduling for efficient GPU control flow. In Proceedings of the 40^th IEEE/ACM Symposium on Microarchitecture (MICRO'07). IEEE Computer Society, Washington, DC, 407--420.

Digital Library

[16]

Grot, B., Hestness, J., Keckler, S. W., and Mutlu, O. 2009. Express cube topologies for on-chip interconnects. In Proceedings of the IEEE Symposium on High-Performnce Computer Architecture (HPCA'09). 163--174.

[17]

Harris, M. 2009. UNSW CUDA tutorial part 4 optimizing CUDA. http://cs.anu.edu.au/files/systems/GPUWksp/PDFs/04 OptimizingCUDA full.pdf.

[18]

Ingerly, D., Agraharam, S., Becher, D., Chikarmane, V., Fischer, K., et al. 2008. Low-k interconnect stack with thick metal 9 redistribution layer and cu die bump for 45nm high volume manufacturing. In Proceedings of the International Interconnect Technology Conference (IITC'08). 216--218.

[19]

Itrs. 2008. International technology roadmap for semiconductors 2008 update. http://www.itrs.net/Links/2008ITRS/Home2008.htm.

[20]

Jiang, N., Becker, D. U., Michelogiannakis, G., Balfour, J., Towles, B., Kim, J., and Dally, W. J. 2013. A detailed and flexible cycle-accurate network-on-chip simulator. In Proceedings of the IEEE Symposium on Performance Analysis of Systems and Software (ISPASS'13). 86--96.

[21]

Kahng, A., Li, B., Peh, L.-S., and Samadi, K. 2009. ORION 2.0: A fast and accurate noc power and area model for early-stage design space exploration. In Proceedings of the IEEE/ACM Conference on Design Automation and Test in Europe (DATE'09). 23--428.

Digital Library

[22]

Kelm, J. H., Johnson, D. R., Lumetta, S. S., Frank, M. I., and Patel, S. 2010a. A task-centric memory model for scalable accelerator architectures. IEEE Micro 30, 1, 29--39.

Digital Library

[23]

Kelm, J. H., Johnson, D. R., Touhy, W., Lumetta, S. S., and Patel, S. 2010b. Cohesion: A hybrid memory model for accelerator architectures. In Proceedings of the IEEE/ACM Symposium on Computer Architecture (ISCA'10). ACM Press, New York, 429--440.

Digital Library

[24]

Kessler, R. E., and Schwarzmeier, J. L. 1993. Cray t3d: A new dimension for cray research. In Compcon Spring Digest of Papers. 176--182.

[25]

Group, K. 2010. OpenCL - The open standard for parallel programming of heterogeneous systems. http://www.khronos.org/opencl/.

[26]

Kim, J. 2009. Low-cost router microarchitecture for on-chip networks. In Proceedings of the IEEE/ACM Symposium on Microarchitecture (MICRO'09). 255--266.

Digital Library

[27]

Kim, J., Balfour, J., and Dally, W. 2007. Flattened butterfly topology for on-chip networks. In Proceedings of the IEEE/ACM Symposium on Microarchitecture (MICRO'07). IEEE Computer Society, Washington, DC, 172--182.

Digital Library

[28]

Kim, J., Dally, W. J., Towles, B., and Gupta, A. K. 2005. Microarchitecture of a high-radix router. In Proceedings of the IEEE/ACM Symposium on Computer Architecture (ISCA'05). IEEE Computer Society, Washington, DC, 420--431.

Digital Library

[29]

Kistler, M., Perrone, M., and Petrini, F. 2006. Cell multiprocessor communication network: Built for speed. IEEE Micro 26, 3, 10--23.

Digital Library

[30]

Kongetira, P., Aingaran, K., and Olukotun, K. 2005. Niagara: A 32-way multithreaded sparc processor. IEEE Micro 25, 2, 21--29.

Digital Library

[31]

Krolak, D. 2005. Cell broadband engine eib bus. http://www.ibm.com/developerworks/power/library/paexpert9/.

[32]

Kumar, A., Kundu, P., Singh, A., Peh, L.-S., and Jha, N. 2007a. A 4.6tbits/s 3.6ghz singlecycle noc router with a novel switch allocator in 65nm cmos. In Proceedings of the IEEE Conference on Computer Design (ICCD'07). 63--70.

[33]

Kumar, A., Peh, L.-S., Kundu, P., and Jhay, N. K. 2007b. Express virtual channels: Towards the ideal interconnection fabric. In Proceedings of the IEEE/ACM Symposium on Computer Architecture (ISCA'07). ACM Press, New York, 150--161.

Digital Library

[34]

Kumar, A., Peh, L.-S., and Jha, N. K. 2008. Token flow control. In Proceedings IEEE/ACM Symposium on Microarchitecture (MICRO'08). IEEE Computer Society, Washington, DC, 342--353.

Digital Library

[35]

Kumar, P., Pan, Y., Kim, J., Memik, G., and Choudhary, A. N. 2009. Exploring concentration and channel slicing in on-chip network router. In Proceedings of the IEEE/ACM Symposium on Networks-on-Chip (NOCS'09). 276--285.

Digital Library

[36]

Lee, J. W., Ng, M. C., and Asanovic, K. 2008. Globally-synchronized frames for guaranteed quality-of-service in on-chip networks. In Proceedings of the IEEE/ACM Symposium on Computer Architecture (ISCA'08). IEEE Computer Society, Washington, DC, 89--100.

Digital Library

[37]

Lee, M. M., Kim, J., Abts, D., Marty, M., and Lee, J. W. 2010. Probabilistic distance-based arbitration: Providing equality of service for many-core CMPs. In Proceedings of the IEEE/ACM Symposium on Microarchitecture (MICRO'10). IEEE Computer Society, Washington, DC, 509--519.

Digital Library

[38]

Levinthal, A., and Porter, T. 1984. Chap - A simd graphics processor. In Proceedings of the ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques (SIGGRAPH'84). 77--82.

Digital Library

[39]

Lotfi-Kamran, P., Grot, B., and Falsafi, B. 2012. NOC-out: Microarchitecting a scale-out processor. In Proceedings of the IEEE/ACM Symposium on Microarchitecture (MICRO'12). IEEE Computer Society, Washington, DC, 177--187.

Digital Library

[40]

Mishra, A. K., Vijaykrishnan, N., and Das, C. R. 2011. A case for heterogeneous on-chip interconnects for CMPs. In Proceedings of the IEEE/ACM Symposium on Computer Architecture (ISCA'11). ACM Press, New York, 389--400.

Digital Library

[41]

Moscibroda, T., and Mutlu, O. 2009. A case for bufferless routing in on-chip networks. In Proceedings of the IEEE/ACM Symposium on Computer Architecture (ISCA'09). ACM Press, New York, 196--207.

Digital Library

[42]

Mullins, R. D., West, A., and Moore, S. W. 2004. Low-latency virtual-channel routers for on-chip networks. In Proceedings of the IEEE/ACM Symposium on Computer Architecture (ISCA'04). IEEE Computer Society, Washington, DC, 188--197.

Digital Library

[43]

Nesson, T., and Johnsson, S. L. 1995. ROMM routing on mesh and torus networks. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures (SPAA'95). ACM Press, New York, 275--287.

Digital Library

[44]

Nickolls, J., Buck, I., Garland, M., and Skadron, K. 2008. Scalable parallel programming with CUDA. ACM Queue 6, 2, 40--53.

Digital Library

[45]

Nickolls, J. R., Coon, B. W., and Shebanow, M. C. 2011. US patent application 20110072213: Instructions for managing a parallel cache hierarchy (Assignee NVIDIA Corp.). March.

[46]

Nvidia. 2009. NVIDIA's next generation CUDA compute architecture: Fermi. http://openclcomputing.com/index.php/cuda/10-fermi.

[47]

Nvidia 2010. NVIDIA CUDA Programming Guide, 3.0 ed. NVIDIA.

[48]

Peh, L.-S. and Dally, W. J. 2001. A delay model and speculative architecture for pipelined routers. In Proceedings of the IEEE Symposium on High-Performance Computer Architecture (HPCA'01). IEEE Computer Society, Washington, DC, 255--266.

Digital Library

[49]

Pfister, G. F. and Norton, V. A. 1985. Hot-spot contention and combining in multistage interconnection networks. IEEE Trans. Comput. 34, 10, 943--948.

[50]

Pullini, A., F., Angiolini, A., Murali, S., Atienza, D., Micheli, G. D., and Benini, L. 2007. Bringing nocs to 65 nm. IEEE Micro 27, 5, 75--85.

Digital Library

[51]

Rixner, S., Dally, W. J., Kapasi, U. J., Mattson, P., and Owens, J. D. 2000. Memory access scheduling. In Proceedings of the 27^th International Symposium on Computer Architecture. ACM Press, New York, 128--138.

Digital Library

[52]

Ryoo, S., Rodrigues, C., Stone, S., Baghsorkhi, S., Ueng, S.-Z., Stratton, J., and Hwu, W.-M. W. 2008. Program optimization space pruning for a multithreaded GPU. In Proceedings of the IEEE/ACM Symposium on Code Generation and Optimization (CGO'08). ACM Press, New York, 195--204.

Digital Library

[53]

Salihundam, P., Jain, S., Jacob, T., Kumar, S., Erraguntla, V., et al. 2010. A 2tb/s 6&ast;4 mesh network with DVFS and 2.3tb/s/w router in 45nm CMOS. In Proceedings of the IEEE Symposium on VLSI Circuits (VLSIC'10).79--80.

[54]

Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., and Hanrahan, P. 2008. Larrabee: A many-core x86 architecture for visual computing. ACM Trans. Graph. 27, 3, 18:1--18:15.

Digital Library

[55]

Seo, D., Ali, A., Lim, W.-T., Rafique, N., and Thottethodi, M. 2005. Near-optimal worst-case throughput routing for two-dimensional mesh networks. In Proceedings of the IEEE/ACM Symposium on Computer Architecture (ISCA'05). 432--443.

Digital Library

[56]

Sun, C., Chen, C.-H. O., Kurian, G., Wei, L., Miller, J., Agarwal, A., Peh, L.-S., and Stojanovic, V. 2012. DSENT - A tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In Proceedings of the IEEE/ACM Symposium on Networks-on-Chip (NOCS'12). IEEE Computer Society, Washington, DC, 201--210.

Digital Library

[57]

Sun Microsystems Inc. 2007. OpenSPARCTM t2 core microarchitecture specification. http://www.oracle.com/technetwork/systems/opensparc/t2-06-opensparct2-core-microarch-1537749.html.

[58]

Valiant, L. G. 1990. A bridging model for parallel computation. Comm. ACM 33, 8, 103--111.

Digital Library

[59]

Valiant, L. G. and Brebner, G. J. 1981. Universal schemes for parallel communication. In Proceedings of the ACM Symposium on Theory of Computing (STOC'81). ACM Press, New York, 263--277.

Digital Library

[60]

Vangal, S. R., Howard, J., Ruhl, G., Dighe, S., Wilson, H., et al. 2008. An 80-tile sub-100-w teraflops processor in 65-nm CMOS. IEEE J. Solid-State Circ. 43, 1, 29--41.

[61]

Volos, S., Seiculescu, C., Grot, B., Pour, N. K., Falsafi, B., and Micheli, G. D. 2012. CCNoC: Specializing on-chip interconnects for energy efficiency in cache-coherent servers. In Proceedings of the IEEE/ACM Symposium on Networks-on-Chip (NOCS'12). IEEE Computer Society, Washington, DC, 67--74.

Digital Library

[62]

Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L., Edwards, B., Ramey, C., Mattina, M., Miao, C.-C., Brown Iii, J. F., and Agarwal, A. 2007. On-chip interconnection architecture of the tile processor. IEEE Micro 27, 15--31.

Digital Library

[63]

Wong, H., Bracy, A., Schuchman, E., Aamodt, T. M., Collins, J. D., Wang, P. H., Chinya, G., Groen, A. K., Jiang, H., and Wang, H. 2008. Pangaea: A tightly-coupled ia32 heterogeneous chip multiprocessor. In Proceedings of the IEEE/ACM Conference on Parallel Architectures and Compilation Techniques (PACT'08). ACM Press, New York, 52--61.

Digital Library

[64]

Wong, H., Papadopoulou, M.-M., Sadooghi-Alvandi, M., and Moshovos, A. 2010. Demystifying GPU microarchitecture through microbenchmarking. In Proceedings of the IEEE Symposium on Performance Analysis of Systems and Software (ISPASS'10). 235--246.

[65]

Yuan, G. L., Bakhoda, A., and Aamodt, T. M. 2009. Complexity effective memory access scheduling for many-core accelerator architectures. In Proceedings of the IEEE/ACM Symposium on Microarchitecture (MICRO'09). ACM Press, New York, 34--44.

Digital Library

Cited By

Bitalebi HSafaei F(2022)Criticality-aware priority to accelerate GPU memory accessThe Journal of Supercomputing10.1007/s11227-022-04657-379:1(188-213)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1007/s11227-022-04657-3
Richins DDoshi DBlackmore MNair APathapati NPatel ADaguman BDobrijalowski DIllikkal RLong KZimmerman DReddi V(2021)AI TaxACM Transactions on Computer Systems10.1145/344068937:1-4(1-32)Online publication date: 26-Mar-2021
https://dl.acm.org/doi/10.1145/3440689
Sadrosadati MMirhosseini AHajiabadi AEhsani SFalahati HSarbazi-Azad HDrumond MFalsafi BAusavarungnirun RMutlu O(2021)Highly Concurrent Latency-tolerant Register Files for GPUsACM Transactions on Computer Systems10.1145/341997337:1-4(1-36)Online publication date: 4-Jan-2021
https://dl.acm.org/doi/10.1145/3419973
Show More Cited By

Recommendations

Throughput-Effective On-Chip Networks for Manycore Accelerators
MICRO '43: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

As the number of cores and threads in many core compute accelerators such as Graphics Processing Units (GPU) increases, so does the importance of on-chip interconnection network design. This paper explores throughput-effective network-on-chips (NoC) for ...
GPGPU-Accelerated Parallel and Fast Simulation of Thousand-Core Platforms
CCGRID '11: Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

The multicore revolution and the ever-increasing complexity of computing systems is dramatically changing sys-tem design, analysis and programming of computing platforms. Future architectures will feature hundreds to thousands of simple processors and ...
Bandwidth-efficient on-chip interconnect designs for GPGPUs
DAC '15: Proceedings of the 52nd Annual Design Automation Conference

Modern computational workloads require abundant thread level parallelism (TLP), necessitating highly-parallel, many-core accelerators such as General Purpose Graphics Processing Units (GPGPUs). GPGPUs place a heavy demand on the on-chip interconnect ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 10, Issue 3

September 2013

310 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/2509420

Issue’s Table of Contents

Copyright © 2013 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2013

Accepted: 01 June 2013

Revised: 01 April 2013

Received: 01 May 2011

Published in TACO Volume 10, Issue 3

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
1,018
Total Downloads

Downloads (Last 12 months)96
Downloads (Last 6 weeks)13

Reflects downloads up to 11 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bitalebi HSafaei F(2022)Criticality-aware priority to accelerate GPU memory accessThe Journal of Supercomputing10.1007/s11227-022-04657-379:1(188-213)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1007/s11227-022-04657-3
Richins DDoshi DBlackmore MNair APathapati NPatel ADaguman BDobrijalowski DIllikkal RLong KZimmerman DReddi V(2021)AI TaxACM Transactions on Computer Systems10.1145/344068937:1-4(1-32)Online publication date: 26-Mar-2021
https://dl.acm.org/doi/10.1145/3440689
Sadrosadati MMirhosseini AHajiabadi AEhsani SFalahati HSarbazi-Azad HDrumond MFalsafi BAusavarungnirun RMutlu O(2021)Highly Concurrent Latency-tolerant Register Files for GPUsACM Transactions on Computer Systems10.1145/341997337:1-4(1-36)Online publication date: 4-Jan-2021
https://dl.acm.org/doi/10.1145/3419973
BiTalebi HSafaei F(2021)LARA: Locality-aware resource allocation to improve GPU memory-access timeThe Journal of Supercomputing10.1007/s11227-021-03854-w77:12(14438-14460)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1007/s11227-021-03854-w
Mirhosseini ASadrosadati MAghamohammadi FModarressi MSarbazi-Azad H(2019)BARANACM Transactions on Parallel Computing10.1145/32940495:3(1-29)Online publication date: 22-Jan-2019
https://dl.acm.org/doi/10.1145/3294049
Sadrosadati MMirhosseini AEhsani SSarbazi-Azad HDrumond MFalsafi BAusavarungnirun RMutlu O(2018)LTRFACM SIGPLAN Notices10.1145/3296957.317321153:2(489-502)Online publication date: 19-Mar-2018
https://dl.acm.org/doi/10.1145/3296957.3173211
Sadrosadati MMirhosseini AEhsani SSarbazi-Azad HDrumond MFalsafi BAusavarungnirun RMutlu OShen XTuck JBianchini RSarkar V(2018)LTRFProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3173211(489-502)Online publication date: 19-Mar-2018
https://dl.acm.org/doi/10.1145/3173162.3173211
Mirhosseini ASadrosadati MSoltani BSarbazi-Azad HWenisch T(2017)BiNoCHSProceedings of the Eleventh IEEE/ACM International Symposium on Networks-on-Chip10.1145/3130218.3130222(1-8)Online publication date: 19-Oct-2017
https://dl.acm.org/doi/10.1145/3130218.3130222
Chien AThanh-Hoang TVasudevan DFang YShambayati A(2015)10x10ACM SIGARCH Computer Architecture News10.1145/2856113.285611543:3(2-9)Online publication date: 8-Dec-2015
https://dl.acm.org/doi/10.1145/2856113.2856115

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents