research-article

Open access

Performance Portability Across Heterogeneous SoCs Using a Generalized Library-Based Approach

Authors:

Lieven Eeckhout,

Chengyong WuAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 11, Issue 2

Article No.: 21, Pages 1 - 25

https://doi.org/10.1145/2608253

Published: 01 June 2014 Publication History

Abstract

Because of tight power and energy constraints, industry is progressively shifting toward heterogeneous system-on-chip (SoC) architectures composed of a mix of general-purpose cores along with a number of accelerators. However, such SoC architectures can be very challenging to efficiently program for the vast majority of programmers, due to numerous programming approaches and languages. Libraries, on the other hand, provide a simple way to let programmers take advantage of complex architectures, which does not require programmers to acquire new accelerator-specific or domain-specific languages. Increasingly, library-based, also called algorithm-centric, programming approaches propose to generalize the usage of libraries and to compose programs around these libraries, instead of using libraries as mere complements.

In this article, we present a software framework for achieving performance portability by leveraging a generalized library-based approach. Inspired by the notion of a component, as employed in software engineering and HW/SW codesign, we advocate nonexpert programmers to write simple wrapper code around existing libraries to provide simple but necessary semantic information to the runtime. To achieve performance portability, the runtime employs machine learning (simulated annealing) to select the most appropriate accelerator and its parameters for a given algorithm. This selection factors in the possibly complex composition of algorithms used in the application, the communication among the various accelerators, and the tradeoff between different objectives (i.e., accuracy, performance, and energy).

Using a set of benchmarks run on a real heterogeneous SoC composed of a multicore processor and a GPU, we show that the runtime overhead is fairly small at 5.1% for the GPU and 6.4% for the multi-core. We then apply our accelerator selection approach to a simulated SoC platform containing multiple inexact accelerators. We show that accelerator selection together with hardware parameter tuning achieves an average 46.2% energy reduction and a speedup of 2.1× while meeting the desired application error target.

References

[1]

A. Agarwal, M. Rinard, S. Sidiroglou, S. Misailovic, and H. Hoffmann. 2009. Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures. Technical Report. MIT.

[2]

C. Alvarez, J. Corbal, and M. Valero. 2005. Fuzzy memoization for floating-point multimedia applications. IEEE Transactions on Computing 54, 7 (2005), 922--927.

Digital Library

[3]

E. Amigó, J. Gonzalo, J. Artiles, and F. Verdejo. 2009. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 12, 4 (2009), 461--486.

Digital Library

[4]

J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and S. Amarasinghe. 2009. PetaBricks: A language and compiler for algorithmic choice. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 38--49.

Digital Library

[5]

J. Ansel, Y. L. Wong, C. Chan, M. Olszewski, A. Edelman, and S. Amarasinghe. 2011. Language and compiler support for auto-tuning variable-accuracy algorithms. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 85--96.

Digital Library

[6]

P. Arató, Z. A. Mann, and A. Orbán. 2005. Extending component-based design with hardware components. Science of Computer Programming 56, 1--2 (2005), 23--39.

Digital Library

[7]

C. Augonnet, S. Thibault, R. Namyst, and P. A. Wacrenier. 2009. StarPU: A unified platform for task Scheduling on heterogeneous multicore architectures. In Proceedings of the 15th International Euro-Par Conference (Euro-Par). 863--874.

Digital Library

[8]

W. Baek and T. M. Chilimbi. 2010. Green: A framework for supporting energy-conscious programming using controlled approximation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 198--209.

Digital Library

[9]

M. Becchi, S. Byna, S. Cadambi, and S. Chakradhar. 2010. Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory. In Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 82--91.

Digital Library

[10]

N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2 (2011), 1--7.

Digital Library

[11]

K. J. Brown, A. K. Sujeeth, H. J. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun. 2011. A heterogeneous parallel framework for domain-specific languages. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). 89--100.

Digital Library

[12]

Cacti 5.3. 2008. CACTI. Available at: http://hpl.hp.com/research/cacti/.

[13]

N. P. Carter, A. Agrawal, S. Borkar, R. Cledat, H. David, D. Dunning, J. Fryman, I. Ganev, R. A. Golliver, R. Knauerhase, R. Lethin, B. Meister, A. K. Mishra, W. R. Pinfold, J. Teller, J. Torrellas, N. Vasilache, G. Venkatesh, and J. Xu. 2013. Runnemede: An architecture for ubiquitous high-performance computing. In Proceedings of the IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). 198--209.

Digital Library

[14]

B. Catanzaro, S. Kamil, Y. Lee, K. Asanovic, J. Demmel, K. Keutzer, J. Shalf, K. Yelick, and A. Fox. 2009. SEJITS: Getting productivity and performance with selective embedded JIT specialization. Programming Models for Emerging Architectures 1, 1 (2009), 1--9.

[15]

W. Cesário, A. Baghdadi, L. Gauthier, D. Lyonnard, G. Nicolescu, Y. Paviot, S. Yoo, A. Jerraya, and M. Diaz-Nava. 2002. Component-based design approach for multicore SoCs. In Proceedings of the 39th Annual Design Automation Conference (DAC). 789--794.

Digital Library

[16]

L. N. B. Chakrapani, K. K. Muntimadugu, A. Lingamneni, J. George, and K. V. Palem. 2008. Highly energy and performance efficient embedded computing through approximately correct arithmetic: A mathematical foundation and preliminary experimental validation. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES). 187--196.

Digital Library

[17]

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K. Skadron. 2009. Rodinia: A Benchmark Suite for Heterogeneous Computing. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC). 44--54.

Digital Library

[18]

Y. Chen, S. Fang, L. Eeckhout, O. Temam, and C. Wu. 2012. Iterative optimization for the data center. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 49--60.

Digital Library

[19]

P. H. Cheung, K. Hao, and F. Xie. 2007. Component-based hardware/software co-simulation. In Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD). 265--270.

Digital Library

[20]

G. Diamos. 2008. Harmony: an execution model and runtime for heterogeneous many core systems. In Proceedings of the International Symposium on High-Performance Parallel and Distributed Computing (HPDC). 197--200.

Digital Library

[21]

R. Dolbeau, S. Bihan, and F. Bodin. 2007. HMPP: A hybrid multi-core parallel programming environment. In Proceedings of the Workshop on GPGPU. CAPS Entreprise, 1--5.

[22]

C. Dubach, P. Cheng, R. Rabbah, D. F. Bacon, and S. J. Fink. 2012. Compiling a high-level language for GPUs (via language support for architectures and compilers). In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 1--12.

Digital Library

[23]

H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, and D. Burger. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 38th International Symposium on Computer Architecture (ISCA). 365--376.

Digital Library

[24]

H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. 2012a. Architecture support for disciplined approximate programming. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 301--312.

Digital Library

[25]

H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. 2012b. Neural Acceleration for General-Purpose Approximate Programs. In 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--6.

Digital Library

[26]

K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. 2006. Sequoia: programming the memory hierarchy. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC). Article 83.

Digital Library

[27]

D. Grewe, Z. Wang, and M. F. P. O’Boyle. 2013. Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 1--10.

Digital Library

[28]

G. Heineman and W. Councill. 2001. Component-based Software Engineering: Putting the Pieces Together. Addison-Wesley Longman, Boston, MA.

Digital Library

[29]

H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard. 2011. Dynamic knobs for responsive power-aware computing. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 199--212.

Digital Library

[30]

S. Hong and H. Kim. 2010. An integrated GPU power and performance model. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA). 280--289.

Digital Library

[31]

Y. Huang, P. Ienne, O. Temam, and C. Wu. 2013. Elastic CGRAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA). 171--180.

Digital Library

[32]

Intel. 2011. Intel64 and IA-32 Architectures Software Developer’s Manual. Intel.

[33]

N. Jiang, D. U. Becker, G. Michelogiannakis, J. Balfour, B. Towles, D. E. Shaw, J. Kim, and W. J. Dally. 2013. A detailed and flexible cycle-accurate network-on-chip simulator. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 86--96.

[34]

A. B. Kahng, Bin Li, Li-Shiuan Peh, and K. Samadi. 2009. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE). 423--428.

Digital Library

[35]

L. V. Kalé and S. Krishnan. 1993. CHARM++: A portable concurrent object oriented system based on C++. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA). 91--108.

Digital Library

[36]

C. Kessler, U. Dastgeer, S. Thibault, R. Namyst, A. Richards, U. Dolinsky, S. Benkner, J. L. Traff, and S. Pllana. 2012. Programmability and performance portability aspects of heterogeneous multi-/manycore systems. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE). 1403--1408.

Digital Library

[37]

M. D. Kruijf, S. Nomura, and K. Sankaralingam. 2010. Relax: An architectural framework for software recovery of hardware faults. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA). 497--508.

Digital Library

[38]

T. Lane and the Independent JPEG Group. 1991. Libjpeg. Available at: http://libjpeg.sourceforge.net/.

[39]

S. Lee and R. Eigenmann. 2010. OpenMPC: Extended OpenMP programming and tuning for GPUs. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 1--11.

Digital Library

[40]

H. Li, W. He, Y. Chen, L. Eeckhout, O. Temam, and C. Wu. 2012. SWAP: Parallelization through algorithm substitution. IEEE Micro 32, 4 (2012), 54--67.

Digital Library

[41]

F. Lin, Z. Wang, and R. LiKamWa. 2012. Reflex: Using low-power processors in smartphones without knowing them. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 13--24.

Digital Library

[42]

M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng. 2008. Merge: A programming model for heterogeneous multi-core systems. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 287--296.

Digital Library

[43]

A. Lingamneni, C. Enz, J. L. Nagel, K. Palem, and C. Piguet. 2011. Energy parsimonious circuit design through probabilistic pruning. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE). 1--6.

[44]

S. Liu, K. Pattabiraman, T. Moscibroda, and B. G. Zorn. 2009. Flicker: Saving Refresh-Power in Mobile Devices through Critical Data Partitioning. Technical Report. Microsoft Research.

[45]

K. Lu, D. Muller-Gritschneder, and U. Schlichtmann. 2012. Accurately timed transaction level models for virtual prototyping at high abstraction level. In Design, Automation Test in Europe Conference Exhibition (DATE). 135--140.

Digital Library

[46]

C. K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. Reddi, and K. Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 190--200.

Digital Library

[47]

C. K. Luk, S. Hong, and H. Kim. 2009. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 45--55.

Digital Library

[48]

M. Maggio, H. Hoffmann, M. Santambrogio, A. Agarwal, and A. Leva. 2013. Power optimization in embedded systems via feedback control of resource allocation. IEEE Transactions on Control Systems Technology 21 (2013), 239--246.

[49]

MAGMA 2011-2013. MAGMA: http://icl.cs.utk.edu/magma/index.html. (2011--2013).

[50]

G. Martin, R. Seepold, T. Zhang, L. Benini, and G. De Micheli. 2001. Component selection and matching for IP-based design. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE). 40--46.

Digital Library

[51]

N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. 1953. Equation of state calculations by fast computing machines. Journal of Chemical Physics 21, 6 (1953), 1087--1092.

[52]

S. Misailovic, S. Sidiroglou, H. Hoffman, and M. Rinard. 2010. Quality of Service Profiling. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE). 25--34.

Digital Library

[53]

A. Nayak, M. Haldar, A. Kanhere, P. Joisha, N. Shenoy, A. Choudhary, and P. Banerjee. 2000. A library based compiler to execute MATLAB programs on a heterogeneous platform. In Proceedings of the Conference on Parallel and Distributed Computing Systems (PDCS). 1--9.

[54]

J. Oberleitner and T. Gschwind. 2002. Composing Distributed Components with the Component Workbench. In Proceedings of the 3rd International Conference on Software Engineering and Middleware (SEM). 102--114.

Digital Library

[55]

OMPSs 2010. OMPSs. https://pm.bsc.es/ompss. (2010).

[56]

OpenACC 2013. OpenACC 2.0. Available at http://www.openacc-standard.org/.

[57]

M. Papakipos. 2006. The PeakStream platform: High-productivity software development for multi-core processors. In Proceedings of the Los Alamos Computer Science Institute, Workshop on Heterogeneous Computing (LACSI). 1--10.

[58]

P. M. Phothilimthana, J. Ansel, J. Ragan-Kelley, and S. Amarasinghe. 2013. Portable performance on heterogeneous architectures. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 431--444.

Digital Library

[59]

M. Rinard. 2006. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS). 324--334.

Digital Library

[60]

F. Rincon, J. Barba, F. Moya, F. J. Villanueva, D. Villa, J. Dondo, and J. C. Lopez. 2007. Unified Inter-Communication Architecture for Systems-on-Chip. In Proceedings of the 18th IEEE/IFIP International Workshop on Rapid System Prototyping (RSP). 17--26.

Digital Library

[61]

P. S. Roop, A. Sowmya, and S. Ramesh. 2000. Automatic component matching using forced simulation. In Proceedings of the 13th International Conference on VLSI Design (VLSI Design). 64--69.

Digital Library

[62]

A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman. 2011. EnerJ: Approximate data types for safe and general low-power computation. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 164--174.

Digital Library

[63]

M. Sgroi, M. Sheets, A. Mihal, K. Keutzer, S. Malik, J. Rabaey, and A. Sangiovanni-Vencentelli. 2001. Addressing the system-on-a-chip interconnect woes through communication-based design. In Proceedings of the 38th Annual Design Automation Conference (DAC). 667--672.

Digital Library

[64]

A. Sidelnik, S. Maleki, B. L. Chamberlain, M. J. Garzarán, and D. Padua. 2012. Performance portability with the Chapel language. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS). 582--594.

Digital Library

[65]

J. Sorber, A. Kostadinov, M. Garber, M. Brennan, M. D. Corner, and E. D. Berger. 2007. Eon: A language and runtime system for perpetual systems. In Proceedings of the 5th International Conference on Embedded Networked Sensor Systems (SenSys). 161--174.

Digital Library

[66]

O. Temam. 2012. A defect-tolerant accelerator for emerging high-performance applications. In Proceedings of the 39th annual international symposium on Computer architecture (ISCA). 356--367.

Digital Library

[67]

G. Wang, D. Anand, N. Butt, A. Cestero, M. Chudzik, J. Ervin, S. Fang, G. Freeman, H. Ho, B. Khan, B. Kim, W. Kong, R. Krishnan, S. Krishnan, O. Kwon, J. Liu, K. McStay, E. Nelson, K. Nummy, P. Parries, J. Sim, R. Takalkar, A. Tessier, R. M. Todi, R. Malik, S. Stiffler, and S. S. Iyer. 2009. Scaling deep trench based eDRAM on SOI to 32nm and Beyond. In Proceedings of the IEEE International Electron Devices Meeting (IEDM). 1--4.

[68]

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing. 13 (2004), 600--612.

Digital Library

Cited By

Stanley-Marbell PAlaghi ACarbin MDarulova EDolecek LGerstlauer AGillani GJevdjic DMoreau TCacciotti MDaglis AJerger NFalsafi BMisailovic SSampson AZufferey D(2020)Exploiting Errors for EfficiencyACM Computing Surveys10.1145/339489853:3(1-39)Online publication date: 12-Jun-2020
https://dl.acm.org/doi/10.1145/3394898
Gadioli DVitali EPalermo GSilvano C(2019)mARGOt: A Dynamic Autotuning Framework for Self-Aware Approximate ComputingIEEE Transactions on Computers10.1109/TC.2018.288359768:5(713-728)Online publication date: 1-May-2019
https://doi.org/10.1109/TC.2018.2883597
Ahmad MDogan HMichael CKhan O(2019)HeteroMap: A Runtime Performance Predictor for Efficient Processing of Graph Analytics on Heterogeneous Multi-Accelerators2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2019.00039(268-281)Online publication date: Mar-2019
https://doi.org/10.1109/ISPASS.2019.00039
Show More Cited By

Index Terms

Performance Portability Across Heterogeneous SoCs Using a Generalized Library-Based Approach
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

A lightweight approach to performance portability with targetDP

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well ...
Performance and Portability of a Linear Solver Across Emerging Architectures
Accelerator Programming Using Directives
Abstract
A linear solver algorithm used by a large-scale unstructured-grid computational fluid dynamics application is examined for a broad range of familiar and emerging architectures. Efficient implementation of a linear solver is challenging on recent ...
On the GPU-CPU Performance Portability of OpenCL for 3D Stencil Computations
ICPADS '13: Proceedings of the 2013 International Conference on Parallel and Distributed Systems

Although OpenCL programming provides full code portability between different hardware platforms, performance portability can be far from satisfactory. In this work, we use a set of representative 3D stencil computations to study OpenCL's performance ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 11, Issue 2

June 2014

210 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/2639036

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2014

Accepted: 01 January 2014

Revised: 01 December 2013

Received: 01 June 2013

Published in TACO Volume 11, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
563
Total Downloads

Downloads (Last 12 months)64
Downloads (Last 6 weeks)10

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Stanley-Marbell PAlaghi ACarbin MDarulova EDolecek LGerstlauer AGillani GJevdjic DMoreau TCacciotti MDaglis AJerger NFalsafi BMisailovic SSampson AZufferey D(2020)Exploiting Errors for EfficiencyACM Computing Surveys10.1145/339489853:3(1-39)Online publication date: 12-Jun-2020
https://dl.acm.org/doi/10.1145/3394898
Gadioli DVitali EPalermo GSilvano C(2019)mARGOt: A Dynamic Autotuning Framework for Self-Aware Approximate ComputingIEEE Transactions on Computers10.1109/TC.2018.288359768:5(713-728)Online publication date: 1-May-2019
https://doi.org/10.1109/TC.2018.2883597
Ahmad MDogan HMichael CKhan O(2019)HeteroMap: A Runtime Performance Predictor for Efficient Processing of Graph Analytics on Heterogeneous Multi-Accelerators2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2019.00039(268-281)Online publication date: Mar-2019
https://doi.org/10.1109/ISPASS.2019.00039
Sedova AEblen JBudiardja RTharrington ASmith J(2018)High-Performance Molecular Dynamics Simulation for Biological and Materials Sciences: Challenges of Performance Portability2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)10.1109/P3HPC.2018.00004(1-13)Online publication date: Nov-2018
https://doi.org/10.1109/P3HPC.2018.00004
Sui XLenharth AFussell DPingali K(2016)Proactive Control of Approximate ProgramsACM SIGARCH Computer Architecture News10.1145/2980024.287240244:2(607-621)Online publication date: 25-Mar-2016
https://dl.acm.org/doi/10.1145/2980024.2872402
Sui XLenharth AFussell DPingali K(2016)Proactive Control of Approximate ProgramsACM SIGOPS Operating Systems Review10.1145/2954680.287240250:2(607-621)Online publication date: 25-Mar-2016
https://doi.org/10.1145/2954680.2872402
Sui XLenharth AFussell DPingali K(2016)Proactive Control of Approximate ProgramsACM SIGPLAN Notices10.1145/2954679.287240251:4(607-621)Online publication date: 25-Mar-2016
https://dl.acm.org/doi/10.1145/2954679.2872402
Sui XLenharth AFussell DPingali KConte TZhou Y(2016)Proactive Control of Approximate ProgramsProceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/2872362.2872402(607-621)Online publication date: 25-Mar-2016
https://dl.acm.org/doi/10.1145/2872362.2872402
Ding YAnsel JVeeramachaneni KShen XO’Reilly UAmarasinghe S(2015)Autotuning algorithmic choice for input sensitivityACM SIGPLAN Notices10.1145/2813885.273796950:6(379-390)Online publication date: 3-Jun-2015
https://dl.acm.org/doi/10.1145/2813885.2737969
Ding YAnsel JVeeramachaneni KShen XO’Reilly UAmarasinghe SGrove DBlackburn S(2015)Autotuning algorithmic choice for input sensitivityProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2737924.2737969(379-390)Online publication date: 3-Jun-2015
https://dl.acm.org/doi/10.1145/2737924.2737969

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents