survey

Public Access

A Survey of Techniques for Architecting and Managing Asymmetric Multicore Processors

Author:

Sparsh MittalAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 48, Issue 3

Article No.: 45, Pages 1 - 38

https://doi.org/10.1145/2856125

Published: 08 February 2016 Publication History

Abstract

To meet the needs of a diverse range of workloads, asymmetric multicore processors (AMPs) have been proposed, which feature cores of different microarchitecture or ISAs. However, given the diversity inherent in their design and application scenarios, several challenges need to be addressed to effectively architect AMPs and leverage their potential in optimizing both sequential and parallel performance. Several recent techniques address these challenges. In this article, we present a survey of architectural and system-level techniques proposed for designing and managing AMPs. By classifying the techniques on several key characteristics, we underscore their similarities and differences. We clarify the terminology used in this research field and identify challenges that are worthy of future investigation. We hope that more than just synthesizing the existing work on AMPs, the contribution of this survey will be to spark novel ideas for architecting future AMPs that can make a definite impact on the landscape of next-generation computing systems.

References

[1]

Arunachalam Annamalai, Rance Rodrigues, Israel Koren, and Sandip Kundu. 2013. An opportunistic prediction-based thread scheduling to maximize throughput/watt in AMPs. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’13). 63--72.

Digital Library

[2]

Murali Annavaram, Ed Grochowski, and John Shen. 2005. Mitigating Amdahl’s law through EPI throttling. In Proceedings of the International Symposium on Computer Architecture (ISCA’05). 298--309.

Digital Library

[3]

Amin Ansari, Shuguang Feng, Shantanu Gupta, Josep Torrellas, and Scott Mahlke. 2013. Illusionist: Transforming lightweight cores into aggressive cores on demand. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’13). 436--447.

Digital Library

[4]

ARM. 2015a. big.LITTLE Technology. Retrieved December 29, 2015, from http://www.arm.com/products/processors/technologies/biglittleprocessing.php.

[5]

ARM. 2015b. Cortex-A Series Processors. Retrieved December 29, 2015, from http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.set.cortexa/index.html.

[6]

Saisanthosh Balakrishnan, Ravi Rajwar, Mike Upton, and Konrad Lai. 2005. The impact of performance asymmetry in emerging multicore architectures. In Proceedings of the International Symposium on Computer Architecture (ISCA’05). 506--517.

Digital Library

[7]

Antonio Barbalace, Marina Sadini, Saif Ansary, Christopher Jelesnianski, Akshay Ravichandran, Cagil Kendir, Alastair Murray, and Binoy Ravindran. 2015. Popcorn: Bridging the programmability gap in heterogeneous-ISA platforms. In Proceedings of the European Conference on Computer Systems (EuroSys’15). 29:1--29:16.

Digital Library

[8]

Michela Becchi and Patrick Crowley. 2006. Dynamic thread assignment on heterogeneous multiprocessor architectures. In Proceedings of the Computing Frontiers Conference (CF’06). 29--40.

Digital Library

[9]

Jeffery Brown, Leo Porter, and Dean M. Tullsen. 2011. Fast thread migration via cache working set prediction. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’11). 193--204.

Digital Library

[10]

Ting Cao, Stephen M. Blackburn, Tiejun Gao, and Kathryn S. McKinley. 2012. The yin and yang of power and performance for asymmetric hardware and managed software. In Proceedings of the International Symposium on Computer Architecture (ISCA’12). 225--236.

Digital Library

[11]

Jian Chen and Lizy Kurian John. 2008. Energy-aware application scheduling on a heterogeneous multi-core system. In Proceedings of the International Symposium on Workload Characterization (IISWC’08). 5--13.

[12]

Jian Chen and Lizy Kurian John. 2009. Efficient program scheduling for heterogeneous multi-core processors. In Proceedings of the Design Automation Conference (DAC’09). 927--930.

Digital Library

[13]

Quan Chen and Minyi Guo. 2014. Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures. ACM Transactions on Architecture and Code Optimization 11, 1, 8:1--8:25.

Digital Library

[14]

Nagabhushan Chitlur, Ganapati Srinivasa, Scott Hahn, Pragya K. Gupta, Dheeraj Reddy, David Koufaty, Paul Brett, Abirami Prabhakaran, Li Zhao, Nelson Ijih, Suchit Subhaschandra, Sabina Grover, Xiaowei Jiang, and Ravi Iyer. 2012. QuickIA: Exploring heterogeneous architectures on real prototypes. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’12). 1--8.

Digital Library

[15]

Jih-Ching Chiu, Yu-Liang Chou, and Po-Kai Chen. 2010. Hyperscalar: A novel dynamically reconfigurable multi-core architecture. In Proceedings of the International Conference on Parallel Processing (ICPP’10). 277--286.

Digital Library

[16]

CNXSoft. 2014. ARM Cortex A15/A17 SoCs Comparison—Nvidia Tegra K1 vs Samsung Exynos 5422 vs Rockchip RK3288 vs AllWinner A80. Retrieved December 29, 2015, from http://www.cnx-software.com/2014/05/21/comparison-nvidia-tegra-k1-samsung-exynos-5422-rockchip-rk3288-allwinner-a80/.

[17]

Jason Cong and Bo Yuan. 2012. Energy-efficient scheduling on heterogeneous multi-core architectures. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’12). 345--350.

Digital Library

[18]

Matthew DeVuyst, Ashish Venkat, and Dean M. Tullsen. 2012. Execution migration in a heterogeneous-ISA chip multiprocessor. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 261--272.

Digital Library

[19]

Stijn Eyerman and Lieven Eeckhout. 2010. Modeling critical sections in Amdahl’s law and its implications for multicore design. In Proceedings of the International Symposium on Computer Architecture (ISCA’10). 362--370.

Digital Library

[20]

Stijn Eyerman and Lieven Eeckhout. 2014. The benefit of SMT in the multi-core era: Flexibility towards degrees of thread-level parallelism. ACM SIGARCH Computer Architecture News 42, 1, 591--606.

Digital Library

[21]

Chris Fallin, Chris Wilkerson, and Onur Mutlu. 2014. The heterogeneous block architecture. In Proceedings of the International Conference on Computer Design (ICCD’14). 386--393.

[22]

Andrei Frumusanu and Ryan Smith. 2015. ARM A53/A57/T760 Investigated—Samsung Galaxy Note 4 Exynos Review. Retrieved December 29, 2015, from http://www.anandtech.com/show/8718/the-samsung-galaxy-note-4-exynos-rev iew/6.

[23]

Giorgis Georgakoudis, Dimitrios S. Nikolopoulos, and Spyros Lalis. 2013. Fast dynamic binary rewriting to support thread migration in shared-ISA asymmetric multicores. In Proceedings of the International Workshop on Code Optimisation for Multi and Many Cores (COSMIC’13). 4:1--4:10.

Digital Library

[24]

Dan Gibson and David A. Wood. 2010. Forwardflow: A scalable core for power-constrained CMPs. ACM SIGARCH Computer Architecture News 38, 14--25.

Digital Library

[25]

Lori Gil. 2015. NVIDIAs Tegra X1 Crushes the Competition. Retrieved December 29, 2015, from http://liliputing.com/2015/02/nvidias-tegra-x1-crushes-the-competition.html.

[26]

Ryan E. Grant and Ahmad Afsahi. 2006. Power-performance efficiency of asymmetric multiprocessors for multi-threaded scientific applications. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’06).

Digital Library

[27]

Ed Grochowski, Ronny Ronen, John Shen, and Hong Wang. 2004. Best of both latency and throughput. In Proceedings of the IEEE International Conference on Computer Design (ICCD’04). 236--243.

Digital Library

[28]

Michael Gschwind, H. Peter Hofstee, Brian Flachs, Martin Hopkins, Yukio Watanabe, and Takeshi Yamazaki. 2006. Synergistic processing in Cell’s multicore architecture. IEEE Micro 26, 2, 10--24.

Digital Library

[29]

Divya P. Gulati, Changkyu Kim, Simha Sethumadhavan, Stephen W. Keckler, and Doug Burger. 2008. Multitasking workload scheduling on flexible-core chip multiprocessors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’08). 187--196.

Digital Library

[30]

Shantanu Gupta, Shuguang Feng, Amin Ansari, and Scott Mahlke. 2010. Erasing core boundaries for robust and configurable performance. In Proceedings of the International Symposium on Microarchitecture (MICRO’10). 325--336.

Digital Library

[31]

Vishal Gupta and Ripal Nathuji. 2010. Analyzing performance asymmetric multicore processors for latency sensitive datacenter applications. In Proceedings of the Workshop on Power Aware Computing and Systems (HotPower’10). 1--8.

Digital Library

[32]

Anthony Gutierrez, Ronald G. Dreslinski, and Trevor Mudge. 2014. Evaluating private vs. shared last-level caches for energy efficiency in asymmetric multi-cores. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS’14). 191--198.

[33]

Mark D. Hill and Michael R. Marty. 2008. Amdahl’s law in the multicore era. IEEE Computer 7, 33--38.

Digital Library

[34]

Houman Homayoun, Vasileios Kontorinis, Amirali Shayan, Ta-Wei Lin, and Dean M. Tullsen. 2012. Dynamically heterogeneous cores through 3D resource pooling. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’12). 1--12.

Digital Library

[35]

Tomas Hruby, Herbert Bos, and Andrew S. Tanenbaum. 2013. When slower is faster: On heterogeneous multicores for reliable systems. In Proceedings of the USENIX Annual Technical Conference (ATC’13). 255--266.

Digital Library

[36]

Ineda. 2015. Ineda Dhanush Wearable Processing Unit.

[37]

Engin Ipek, Meyrem Kirman, Nevin Kirman, and Jose F. Martinez. 2007. Core fusion: Accommodating software diversity in chip multiprocessors. In Proceedings of the International Symposium on Computer Architecture (ISCA’07). 186--197.

Digital Library

[38]

Brian Jeff. 2012. Big.LITTLE system architecture from ARM: Saving power through heterogeneous multiprocessing and task context migration. In Proceedings of the ACM Design Automation Conference (DAC’12).

[39]

José A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt. 2012. Bottleneck identification and scheduling in multithreaded applications. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 223--234.

Digital Library

[40]

José A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt. 2013. Utility-based acceleration of multithreaded applications on asymmetric CMPs. In Proceedings of the International Symposium on Computer Architecture (ISCA’13). 154--165.

Digital Library

[41]

B. H. H. Juurlink and C. H. Meenderinck. 2012. Amdahl’s law for predicting the future of multicores considered harmful. ACM SIGARCH Computer Architecture News 40, 2, 1--9.

Digital Library

[42]

Vahid Kazempour, Ali Kamali, and Alexandra Fedorova. 2010. AASH: An asymmetry-aware scheduler for hypervisors. ACM SIGPLAN Notices 45, 7, 85--96.

Digital Library

[43]

Omer Khan and Sandip Kundu. 2010. A self-adaptive scheduler for asymmetric multi-cores. In Proceedings of the ACM Great Lakes Symposium on VLSI (GLSVLSI’10). 397--400.

Digital Library

[44]

Khubaib Khubaib, M. Aater Suleman, Milad Hashemi, Chris Wilkerson, and Yale N. Patt. 2012. MorphCore: An energy-efficient microarchitecture for high performance ILP and high throughput TLP. In Proceedings of the International Symposium on Microarchitecture (MICRO’12). 305--316.

Digital Library

[45]

Changkyu Kim, Simha Sethumadhavan, Madhu S. Govindan, Nitya Ranganathan, Divya Gulati, Doug Burger, and Stephen W. Keckler. 2007. Composable lightweight processors. In Proceedings of the International Symposium on Microarchitecture (MICRO’07). 381--394.

Digital Library

[46]

Jun Kim, Joonwon Lee, and Jinkyu Jeong. 2015. Exploiting asymmetric CPU performance for fast startup of subsystem in mobile smart devices. IEEE Transactions on Consumer Electronics 61, 1, 103--111.

Digital Library

[47]

Myungsun Kim, Kibeom Kim, James R. Geraci, and Seongsoo Hong. 2014. Utilization-aware load balancing for the energy efficient operation of the big.LITTLE processor. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’14). 223:1--223:4.

Digital Library

[48]

Byeong-Moon Ko, Joonwon Lee, and Heeseung Jo. 2012. AMP aware core allocation scheme for mobile devices. In Proceedings of the IEEE Spring Congress on Engineering and Technology (S-CET’12). 1--4.

[49]

David Koufaty, Dheeraj Reddy, and Scott Hahn. 2010. Bias scheduling in heterogeneous multi-core architectures. In Proceedings of the European Conference on Computer Systems (EuroSys’10). 125--138.

Digital Library

[50]

Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, and Dean M. Tullsen. 2003. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proceedings of the International Symposium on Microarchitecture (MICRO’03). 81--92.

Digital Library

[51]

Rakesh Kumar, Norman P. Jouppi, and Dean M. Tullsen. 2004a. Conjoined-core chip multiprocessing. In Proceedings of the International Symposium on Microarchitecture (MICRO’04). 195--206.

Digital Library

[52]

Rakesh Kumar, Dean M. Tullsen, and Norman P. Jouppi. 2006. Core architecture optimization for heterogeneous chip multiprocessors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’06). 23--32.

Digital Library

[53]

Rakesh Kumar, Dean M. Tullsen, Parthasarathy Ranganathan, Norman P. Jouppi, and Keith I. Farkas. 2004b. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. ACM SIGARCH Computer Architecture News 32, 64.

Digital Library

[54]

Youngjin Kwon, Changdae Kim, Seungryoul Maeng, and Jaehyuk Huh. 2011. Virtualizing performance asymmetric multi-core systems. In Proceedings of the International Symposium on Computer Architecture (ISCA’11). 45--56.

Digital Library

[55]

Nagesh B. Lakshminarayana and Hyesoon Kim. 2008. Understanding performance, power and energy behavior in asymmetric multiprocessors. In Proceedings of the International Conference on Computer Design (ICCD’08). 471--477.

[56]

Nagesh B. Lakshminarayana, Jaekyu Lee, and Hyesoon Kim. 2009. Age based scheduling for asymmetric multiprocessors. In Proceedings of the Conference on High Performance Computing Networking, Storage, and Analysis (SC’09). 25:1--25:12.

Digital Library

[57]

Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’07). 53:1--53:11.

Digital Library

[58]

Tong Li, Paul Brett, Rob Knauerhase, David Koufaty, Dheeraj Reddy, and Scott Hahn. 2010. Operating system support for overlapping-ISA heterogeneous multi-core architectures. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’10). 1--12.

[59]

Felix Xiaozhu Lin, Zhen Wang, Robert LiKamWa, and Lin Zhong. 2012. Reflex: Using low-power processors in smartphones without knowing them. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 13--24.

Digital Library

[60]

Felix Xiaozhu Lin, Zhen Wang, and Lin Zhong. 2014. K2: A mobile operating system for heterogeneous coherence domains. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). 285--300.

Digital Library

[61]

Guangshuo Liu, Jinpyo Park, and Diana Marculescu. 2013. Dynamic thread mapping for high-performance, power-efficient heterogeneous many-core systems. In Proceedings of the International Conference on Computer Design (ICCD’13). 54--61.

[62]

Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Ronald Dreslinski Jr., Thomas F. Wenisch, and Scott Mahlke. 2014. Heterogeneous microarchitectures trump voltage scaling for low-power cores. In Proceedings of the International Conference on Parallel Architectures and Compilation (PACT’14). 237--250.

Digital Library

[63]

Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Faissal M. Sleiman, Ronald Dreslinski, Thomas F. Wenisch, and Scott Mahlke. 2012. Composite cores: Pushing heterogeneity into a core. In Proceedings of the International Symposium on Microarchitecture (MICRO’12). 317--328.

Digital Library

[64]

Yangchun Luo, Venkatesan Packirisamy, Wei-Chung Hsu, and Antonia Zhai. 2010. Energy efficient speculative threads: Dynamic thread allocation in same-ISA heterogeneous multicore systems. In Proceedings of the International Conference on Parallel Architectures and Compilation (PACT’10). 453--464.

Digital Library

[65]

Daniel Lustig, Caroline Trippel, Michael Pellauer, and Margaret Martonosi. 2015. ArMOR: Defending against memory consistency model mismatches in heterogeneous architectures. In Proceedings of the International Symposium on Computer Architecture (ISCA’15). 388--400.

Digital Library

[66]

Felipe Lopes Madruga, Henrique C. Freitas, and Philippe Olivier Alexandre Navaux. 2010. Parallel shared-memory workloads performance on asymmetric multi-core architectures. In Proceedings of the Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’10). 163--169.

Digital Library

[67]

N. Markovic, D. Nemirovsky, O. Unsal, M. Valero, and A. Cristal. 2014. Thread lock section-aware scheduling on asymmetric single-ISA multi-core. IEEE Computer Architecture Letters 14, 2, 160--163.

Digital Library

[68]

Sparsh Mittal. 2014a. A survey of techniques for improving energy efficiency in embedded computing systems. International Journal of Computer Aided Engineering and Technology 6, 4, 440--459.

[69]

Sparsh Mittal. 2014b. Power Management Techniques for Data Centers: A Survey. Technical Report ORNL/TM-2014/381. Oak Ridge National Laboratory, Oak Ridge, TN.

[70]

Sparsh Mittal, Matthew Poremba, Jeffrey Vetter, and Yuan Xie. 2014. Exploring Design Space of 3D NVM and eDRAM Caches Using DESTINY Tool. Technical Report ORNL/TM-2014/636. Oak Ridge National Laboratory, Oak Ridge, TN.

[71]

Sparsh Mittal and Jeffrey Vetter. 2015. A survey of CPU-GPU heterogeneous computing techniques. ACM Computing Surveys 47, 4, 69:1--69:35.

Digital Library

[72]

Jeffrey C. Mogul, Jayaram Mudigonda, Nathan Binkert, Parthasarathy Ranganathan, and Vanish Talwar. 2008. Using asymmetric single-ISA CMPs to save energy on operating systems. IEEE Micro 28, 3, 26--41.

Digital Library

[73]

Tomer Y. Morad, Avinoam Kolodny, and Uri C. Weiser. 2010. Scheduling multiple multithreaded applications on asymmetric and symmetric chip multiprocessors. In Proceedings of the International Symposium on Parallel Architectures, Algorithms, and Programming (PAAP’10). 65--72.

Digital Library

[74]

Tomer Y. Morad, Uri C. Weiser, Avinoam Kolodny, Mateo Valero, and Eduard Ayguade. 2006. Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors. Computer Architecture Letters 5, 1, 14--17.

Digital Library

[75]

Tobias Mühlbauer, Wolf Rödiger, Robert Seilbeck, Alfons Kemper, and Thomas Neumann. 2014. Heterogeneity-conscious parallel query execution: Getting a better mileage while driving faster&excl; In Proceedings of the International Workshop on Data Management on New Hardware (DaMoN’14). 2:1--2:10.

Digital Library

[76]

Janani Mukundan, Saugata Ghose, Robert Karmazin, Engin Ipek, and José F. Martínez. 2012. Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architecture. In Proceedings of the International Conference on Supercomputing (ICS’12). 101--110.

Digital Library

[77]

Thannirmalai Somu Muthukaruppan, Anuj Pathania, and Tulika Mitra. 2014. Price theory based power management for heterogeneous multi-cores. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). 161--176.

Digital Library

[78]

Thannirmalai Somu Muthukaruppan, Mihai Pricopi, Vanchinathan Venkataramani, Tulika Mitra, and Sanjay Vishin. 2013. Hierarchical power management for asymmetric multi-core in dark silicon era. In Proceedings of the Design Automation Conference (DAC’13). 174.

Digital Library

[79]

Hashem Hashemi Najaf-Abadi, Niket Kumar Choudhary, and Eric Rotenberg. 2009. Core-selectability in chip multiprocessors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’09). 113--122.

Digital Library

[80]

Hashem H. Najaf-Abadi and Eric Rotenberg. 2009. Architectural contesting. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’09). 189--200.

[81]

Sandeep Navada, Niket K. Choudhary, Salil V. Wadhavkar, and Eric Rotenberg. 2013. A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 133--144.

Digital Library

[82]

Rajiv Nishtala, Daniel Mossé, and Vinicius Petrucci. 2013. Energy-aware thread co-location in heterogeneous multicore processors. In Proceedings of the International Conference on Embedded Software (EMSOFT’13). 1--9.

Digital Library

[83]

NVIDIA. 2011. Variable SMP—A Multi-Core CPU Architecture for Low Power and High Performance. Retrieved December 29, 2015, from http://www.nvidia.com/content/PDF/tegra_white_papers/tegra-whitepaper-0 911b.pdf.

[84]

Shruti Padmanabha, Andrew Lukefahr, Reetuparna Das, and Scott Mahlke. 2013. Trace based phase prediction for tightly-coupled heterogeneous cores. In Proceedings of the International Symposium on Microarchitecture. 445--456.

Digital Library

[85]

Sankaralingam Panneerselvam and Michael M. Swift. 2012. Chameleon: Operating system support for dynamic processors. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 99--110.

Digital Library

[86]

George Patsilaras, Niket K. Choudhary, and James Tuck. 2012. Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era. ACM Transactions on Architecture and Code Optimization 8, 4, 28:1--28:21.

Digital Library

[87]

Miquel Pericas, Adrian Cristal, Francisco J. Cazorla, Ruben Gonzalez, Daniel A. Jimenez, and Mateo Valero. 2007. A flexible heterogeneous multi-core architecture. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT’07). 13--24.

Digital Library

[88]

Vinicius Petrucci, Orlando Loques, and Daniel Mossé. 2012. Lucky scheduling for energy-efficient heterogeneous multi-core systems. In Proceedings of the USENIX Conference on Power-Aware Computing and Systems (HotPower’12).

Digital Library

[89]

Dmitry Ponomarev, Gurhan Kucuk, and Kanad Ghose. 2001. Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources. In Proceedings of the International Symposium on Microarchitecture. 90--101.

Digital Library

[90]

Mihai Pricopi and Tulika Mitra. 2012. Bahurupi: A polymorphic heterogeneous multi-core architecture. ACM Transactions on Architecture and Code Optimization 8, 4, 22:1--22:21.

Digital Library

[91]

Mihai Pricopi and Tulika Mitra. 2014. Task scheduling on adaptive multi-core. IEEE Transactions on Computers 63, 10, 2590--2603.

Digital Library

[92]

Mihai Pricopi, Thannirmalai Somu Muthukaruppan, Vanchinathan Venkataramani, Tulika Mitra, and Sanjay Vishin. 2013. Power-performance modeling on asymmetric multi-cores. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’13). 1--10.

Digital Library

[93]

Moo-Ryong Ra, Bodhi Priyantha, Aman Kansal, and Jie Liu. 2012. Improving energy efficiency of personal sensing applications with heterogeneous multi-processors. In Proceedings of the ACM Conference on Ubiquitous Computing (Ubicomp’12). 1--10.

Digital Library

[94]

M. Mustafa Rafique, Benjamin Rose, Ali R. Butt, and Dimitrios S. Nikolopoulos. 2009. Supporting MapReduce on large-scale asymmetric multi-core clusters. ACM SIGOPS Operating Systems Review 43, 2, 25--34.

Digital Library

[95]

Behnam Robatmili, Dong Li, Hadi Esmaeilzadeh, Sibi Govindan, Aaron Smith, Andrew Putnam, Doug Burger, and Stephen W. Keckler. 2013. How to implement effective prediction and forwarding for fusable dynamic multicore architectures. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’13). 460--471.

Digital Library

[96]

Rance Rodrigues, Arunachalam Annamalai, Israel Koren, Sandip Kundu, and Omer Khan. 2011. Performance per watt benefits of dynamic core morphing in asymmetric multicores. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’11). 121--130.

Digital Library

[97]

Rance Rodrigues, Israel Koren, and Sandip Kundu. 2014. Performance and power benefits of sharing execution units between a high performance core and a low power core. In Proceedings of the International Conference on VLSI Design (VLSID’14). 204--209.

Digital Library

[98]

Juan Carlos Saez, Alexandra Fedorova, David Koufaty, and Manuel Prieto. 2012. Leveraging core specialization via OS scheduling to improve performance on asymmetric multicore systems. ACM Transactions on Computer Systems 30, 2, 6:1--6:38.

Digital Library

[99]

Juan Carlos Saez, Alexandra Fedorova, Manuel Prieto, and Hugo Vegas. 2010. Operating system support for mitigating software scalability bottlenecks on asymmetric multicore processors. In Proceedings of the Computing Frontiers Conference (CF’10). 31--40.

Digital Library

[100]

Juan Carlos Saez, Adrian Pousa, Fernando Castro, Daniel Chaver, and Manuel Prieto-Matias. 2015. ACFS: A completely fair scheduler for asymmetric single-ISA multicore systems. In Proceedings of the ACM Symposium on Applied Computing (SAC’15).

Digital Library

[101]

Pierre Salverda and Craig Zilles. 2008. Fundamental performance constraints in horizontal fusion of in-order cores. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’08). 252--263.

[102]

Samsung. 2013. SAMSUNG Highlights Innovations in Mobile Experiences Driven by Components, in CES Keynote. Retrieved December 29, 2015, from http://www.samsung.com/us/news/20353.

[103]

Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, and Charles R. Moore. 2003. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In Proceedings of the International Symposium on Computer Architecture (ISCA’03). 422--433.

Digital Library

[104]

Lina Sawalha and Ronald D. Barnes. 2012. Energy-efficient phase-aware scheduling for heterogeneous multicore processors. In Proceedings of the IEEE Green Technologies Conference. 1--6.

[105]

Daniel Shelepov, Juan Carlos Saez Alcaide, Stacey Jeffery, Alexandra Fedorova, Nestor Perez, Zhi Feng Huang, Sergey Blagodurov, and Viren Kumar. 2009. HASS: A scheduler for heterogeneous multicore systems. ACM SIGOPS Operating Systems Review 43, 2, 66--75.

Digital Library

[106]

Tyler Sondag and Hridesh Rajan. 2009. Phase-guided thread-to-core assignment for improved utilization of performance-asymmetric multi-core processors. In Proceedings of the ICSE Workshop on Multicore Software Engineering. 73--80.

Digital Library

[107]

Sudarshan Srinivasan, Nithesh Kurella, Israel Koren, and Sandip Kundu. 2015. Exploring heterogeneity within a core for improved power efficiency. IEEE Transactions on Parallel and Distributed Systems PP, 99, 1.

[108]

Sudarshan Srinivasan, Rance Rodrigues, Arunachalam Annamalai, Israel Koren, and Sandip Kundu. 2013. A study on polymorphing superscalar processor dynamically to improve power efficiency. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’13). 46--51.

[109]

Sadagopan Srinivasan, Li Zhao, Ramesh Illikkal, and Ravishankar Iyer. 2011. Efficient interaction between OS and architecture in heterogeneous platforms. ACM SIGOPS Operating Systems Review 45, 1, 62--72.

Digital Library

[110]

Richard Strong, Jayaram Mudigonda, Jeffrey C. Mogul, Nathan Binkert, and Dean Tullsen. 2009. Fast switching of threads between cores. ACM SIGOPS Operating Systems Review 43, 2, 35--45.

Digital Library

[111]

M. Aater Suleman, Onur Mutlu, José A. Joao, Khubaib, and Yale Patt. 2010. Data marshaling for multi-core architectures. In Proceedings of the International Symposium on Computer Architecture (ISCA’10). 441--450.

Digital Library

[112]

M. Aater Suleman, Onur Mutlu, Moinuddin K. Qureshi, and Yale N. Patt. 2009. Accelerating critical section execution with asymmetric multi-core architectures. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’09). 253--264.

Digital Library

[113]

M. Aater Suleman, Yale N. Patt, Eric Sprangle, Anwar Rohillah, Anwar Ghuloum, and Doug Carmean. 2007. Asymmetric Chip Multiprocessors: Balancing Hardware Efficiency and Programmer Efficiency. TR-HPS-2007-001. University of Texas, Austin, TX.

[114]

Hsin-Ching Sun, Bor-Yeh Shen, Wuu Yang, and Jenq-Kuen Lee. 2011. Migrating Java threads with fuzzy control on asymmetric multicore systems for better energy delay product. In Proceedings of the International Conference on Computing and Security.

[115]

Tao Sun, Hong An, Tao Wang, Haibo Zhang, and Xiufeng Sui. 2012. CRQ-based fair scheduling on composable multicore architectures. In Proceedings of the International Conference on Supercomputing (ICS’12). 173--184.

Digital Library

[116]

Ibrahim Takouna, Wesam Dawoud, and Christoph Meinel. 2011. Efficient virtual machine scheduling-policy for virtualized heterogeneous multicore systems. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’11).

[117]

David Tarjan, Michael Boyer, and Kevin Skadron. 2008. Federation: Repurposing scalar cores for out-of-order instruction issue. In Proceedings of the Design Automation Conference (DAC’08). 772--775.

Digital Library

[118]

Kenzo Van Craeynest, Shoaib Akram, Wim Heirman, Aamer Jaleel, and Lieven Eeckhout. 2013. Fairness-aware scheduling on single-ISA heterogeneous multi-cores. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’13). 177--187.

Digital Library

[119]

Kenzo Van Craeynest and Lieven Eeckhout. 2013. Understanding fundamental design choices in single-ISA heterogeneous multicore architectures. ACM Transactions on Architecture and Code Optimization 9, 4, 32.

Digital Library

[120]

Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In Proceedings of the International Symposium on Computer Architecture (ISCA’12). 213--224.

Digital Library

[121]

Ashish Venkat and Dean M. Tullsen. 2014. Harnessing ISA diversity: Design of a heterogeneous-ISA chip multiprocessor. In Proceedings of the International Symposium on Computer Architecture (ISCA’14). 121--132.

Digital Library

[122]

Jeffrey Vetter and Sparsh Mittal. 2015. Opportunities for nonvolatile memory systems in extreme-scale high performance computing. Computing in Science and Engineering 17, 2, 73--82.

Digital Library

[123]

Carl A. Waldspurger and William E. Weihl. 1994. Lottery scheduling: Flexible proportional-share resource management. In Proceedings of the USENIX Conference on Operating Systems Design and Implementation (OSDI’94).

Digital Library

[124]

Yasuko Watanabe, John D. Davis, and David A. Wood. 2010. WiDGET: Wisconsin decoupled grid execution tiles. In Proceedings of the International Symposium on Computer Architecture (ISCA’10), Vol. 38. 2--13.

Digital Library

[125]

Ryan Whitwam. 2014. Qualcomm Unveils 64-Bit Snapdragon 808 and 810 SoCs: The Apple A7 Stop-Gap Measures Continue. Retrieved December 29, 2015, from http://goo.gl/v4ywMW.

[126]

Youfeng Wu, Shiliang Hu, Edson Borin, and Cheng Wang. 2011. A HW/SW co-designed heterogeneous multi-core virtual machine for energy-efficient general purpose computing. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’11). 236--245.

Digital Library

[127]

Ying Zhang, Lide Duan, Bin Li, Lu Peng, and Srinivasan Sadagopan. 2014a. Energy efficient job scheduling in single-ISA heterogeneous chip-multiprocessors. In Proceedings of the International Symposium on Quality Electronic Design (ISQED’14). 660--666.

[128]

Ying Zhang, Li Zhao, Ramesh Illikkal, Ravi Iyer, Andrew Herdrich, and Lu Peng. 2014b. QoS management on heterogeneous architecture for parallel applications. In Proceedings of the IEEE International Conference on Computer Design (ICCD’14). 332--339.

[129]

Hongtao Zhong, Steven A. Lieberman, and Scott A. Mahlke. 2007. Extending multicore architectures to exploit hybrid parallelism in single-thread applications. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’07). 25--36.

Digital Library

[130]

Yuhao Zhu and Vijay Janapa Reddi. 2013. High-performance and energy-efficient mobile web browsing on big/little systems. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’13). 13--24.

Digital Library

Cited By

Barjami RMiele AMottola LShu YLiu JTan RHe YChen J(2024)Intermittent Inference: Trading a 1% Accuracy Loss for a 1.9x Throughput SpeedupProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699364(647-660)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3666025.3699364
Zeng XZhang S(2024)CStream: Parallel Data Stream Compression on Multicore Edge DevicesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338686236:11(5889-5904)Online publication date: Nov-2024
https://doi.org/10.1109/TKDE.2024.3386862
ÖZ DAltuntas T(2023)Scatter search with stochastic beam search on the coalition formation problemJournal of Industrial and Management Optimization10.3934/jimo.2023119(0-0)Online publication date: 2023
https://doi.org/10.3934/jimo.2023119
Show More Cited By

Index Terms

A Survey of Techniques for Architecting and Managing Asymmetric Multicore Processors
1. Computer systems organization
2. General and reference
  1. Document types
    1. Reference works

Recommendations

HASpGEMM: Heterogeneity-Aware Sparse General Matrix-Matrix Multiplication on Modern Asymmetric Multicore Processors
ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing

Sparse general matrix-matrix multiplication (SpGEMM) is an important kernel in computational science and engineering, and has been widely studied on homogeneous processors, e.g., CPUs and GPUs. Recently, the asymmetric multicore processors (AMPs), ...
COLAB: a collaborative multi-factor scheduler for asymmetric multicore processors
CGO '20: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization

Increasingly prevalent asymmetric multicore processors (AMP) are necessary for delivering performance in the era of limited power budget and dark silicon. However, the software fails to use them efficiently. OS schedulers, in particular, handle asymmetry ...
Acceleration of bulk memory operations in a heterogeneous multicore architecture
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

In this paper, we present a novel approach of using the integrated GPU to accelerate conventional operations that are normally performed by the CPUs, the bulk memory operations, such as memcpy or memset. Offloading the bulk memory operations to the GPU ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 48, Issue 3

February 2016

619 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/2856149

Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering/University of Florida/Gainesville

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2016

Accepted: 01 November 2015

Revised: 01 August 2015

Received: 01 April 2015

Published in CSUR Volume 48, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey
Research
Refereed

Funding Sources

Office of Science
Advanced Scientific Computing Research
U.S. Department of Energy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

93
Total Citations
View Citations
2,024
Total Downloads

Downloads (Last 12 months)346
Downloads (Last 6 weeks)70

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Barjami RMiele AMottola LShu YLiu JTan RHe YChen J(2024)Intermittent Inference: Trading a 1% Accuracy Loss for a 1.9x Throughput SpeedupProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699364(647-660)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3666025.3699364
Zeng XZhang S(2024)CStream: Parallel Data Stream Compression on Multicore Edge DevicesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338686236:11(5889-5904)Online publication date: Nov-2024
https://doi.org/10.1109/TKDE.2024.3386862
ÖZ DAltuntas T(2023)Scatter search with stochastic beam search on the coalition formation problemJournal of Industrial and Management Optimization10.3934/jimo.2023119(0-0)Online publication date: 2023
https://doi.org/10.3934/jimo.2023119
Hertogh MWiesinger MÖsterlund SMuench MAmit NBos HGiuffrida C(2023)Quarantine: Mitigating Transient Execution Attacks with Physical Domain IsolationProceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3607199.3607248(207-221)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3607199.3607248
Carpentieri LCosenza B(2023)Towards a SYCL API for Approximate ComputingProceedings of the 2023 International Workshop on OpenCL10.1145/3585341.3585374(1-2)Online publication date: 18-Apr-2023
https://dl.acm.org/doi/10.1145/3585341.3585374
Thangamani AJost TLoechner VGenaud SBramas BDubach CBruening DHardekopf B(2023)Lifting Code Generation of Cardiac Physiology Simulation to Novel Compiler TechnologyProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580008(68-80)Online publication date: 17-Feb-2023
https://dl.acm.org/doi/10.1145/3579990.3580008
Damsgaard HOmetov ANurmi J(2023)Approximation Opportunities in Edge Computing Hardware: A Systematic Literature ReviewACM Computing Surveys10.1145/357277255:12(1-49)Online publication date: 3-Mar-2023
https://dl.acm.org/doi/10.1145/3572772
Andelfinger P(2023)Towards Differentiable Agent-Based SimulationACM Transactions on Modeling and Computer Simulation10.1145/356581032:4(1-26)Online publication date: 11-Jan-2023
https://dl.acm.org/doi/10.1145/3565810
Awais MZahir AShah SReviriego PUllah AUllah NKhan AAli H(2023)Toward Optimal Softcore Carry-aware Approximate Multipliers on Xilinx FPGAsACM Transactions on Embedded Computing Systems10.1145/356424322:4(1-19)Online publication date: 3-Aug-2023
https://dl.acm.org/doi/10.1145/3564243
Gong JSaadat HGamaarachchi HJavaid HHu XParameswaran S(2023)ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and InferenceIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.325304542:11(3505-3518)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1109/TCAD.2023.3253045
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents