research-article

Open access

Integrating profile-driven parallelism detection and machine-learning-based mapping

Authors:

Georgios Tournavitis,

Michael F. P. O'boyleAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 11, Issue 1

Article No.: 2, Pages 1 - 26

https://doi.org/10.1145/2579561

Published: 01 February 2014 Publication History

Abstract

Compiler-based auto-parallelization is a much-studied area but has yet to find widespread application. This is largely due to the poor identification and exploitation of application parallelism, resulting in disappointing performance far below that which a skilled expert programmer could achieve. We have identified two weaknesses in traditional parallelizing compilers and propose a novel, integrated approach resulting in significant performance improvements of the generated parallel code. Using profile-driven parallelism detection, we overcome the limitations of static analysis, enabling the identification of more application parallelism, and only rely on the user for final approval. We then replace the traditional target-specific and inflexible mapping heuristics with a machine-learning-based prediction mechanism, resulting in better mapping decisions while automating adaptation to different target architectures. We have evaluated our parallelization strategy on the NAS and SPEC CPU2000 benchmarks and two different multicore platforms (dual quad-core Intel Xeon SMP and dual-socket QS20 Cell blade). We demonstrate that our approach not only yields significant improvements when compared with state-of-the-art parallelizing compilers but also comes close to and sometimes exceeds the performance of manually parallelized codes. On average, our methodology achieves 96% of the performance of the hand-tuned OpenMP NAS and SPEC parallel benchmarks on the Intel Xeon platform and gains a significant speedup for the IBM Cell platform, demonstrating the potential of profile-guided and machine-learning- based parallelization for complex multicore platforms.

References

[1]

NAS Parallel Benchmarks 2.3, OpenMP C version. (2004). http://www.hpcs.cs.tsukuba.ac.jp/omni-compiler/download/download-benchmarks.html.

[2]

Krste Asanovic, Rastislav Bodik, James Demmel, Tony Keaveny, Kurt Keutzer, John Kubiatowicz, Nelson Morgan, David Patterson, Koushik Sen, John Wawrzynek, David Wessel, and Katherine Yelick. 2009. A view of the parallel computing landscape. Communications of ACM 52, 10 (2009), 56--67.

Digital Library

[3]

Amina Aslam and Laurie Hendren. 2010. McFLAT: A profile-based framework for MATLAB loop analysis and transformations. In Proceedings of the 23rd International Conference on Languages and Compilers for Parallel Computing (LCPC'10). 1--15.

Digital Library

[4]

Vishal Aslot, Max J. Domeika, Rudolf Eigenmann, Greg Gaertner, Wesley B. Jones, and Bodo Parady. 2001. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance. In Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming (WOMPAT'01). 1--10.

Digital Library

[5]

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS parallel benchmarks—summary and preliminary results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing'91). 158--165.

Digital Library

[6]

Christopher M. Bishop. 2007. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer.

Digital Library

[7]

Bernhard E. Boser, Isabelle M. Guyon, and Vladimir N. Vapnik. 1992. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual Workshop on Computational Learning Theory (COLT'92). 144--152.

Digital Library

[8]

T. Brandes, S. Chaumette, M. C. Counilh, J. Roman, A. Darte, F. Desprez, and J. C. Mignot. 1997. HPFIT: A set of integrated tools for the parallelization of applications using high performance Fortran. PART I: HPFIT and the TransTOOL environment. Parallel Comput. 23 (1997), 71--87. Issue 1--2.

Digital Library

[9]

Matthew Bridges, Neil Vachharajani, Yun Zhang, Thomas Jablin, and David August. 2007. Revisiting the sequential programming model for multi-core. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 40). 69--84.

Digital Library

[10]

Michael Burke and Ron Cytron. 1986. Interprocedural dependence analysis and parallelization. In Proceedings of the 1986 SIGPLAN Symposium on Compiler Construction. 162--175.

Digital Library

[11]

Michael K. Chen and Kunle Olukotun. 2003. The Jrpm system for dynamically parallelizing Java programs. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA'03). 434--446.

Digital Library

[12]

Tong Chen, Jin Lin, Xiaoru Dai, Wei-Chung Hsu, and Pen-Chung Yew. 2004. Data dependence profiling for speculative optimizations. In Compiler Construction. 57--72.

[13]

Julita Corbalán, Xavier Martorell, and Jesús Labarta. 2000. Performance-driven processor allocation. In Proceedings of the 4th Conference on Operating System Design and Implementation (OSDI'00). 5--17.

Digital Library

[14]

CoSy. 2009. CoSy compiler development system. Retrieved from http://www.ace.nl/compiler/.

[15]

Chirag Dave and Rudolf Eigenmann. 2009. Automatically tuning parallel and parallelized programs. In Proceedings of the 22nd International Conference on Languages and Compilers for Parallel Computing (LCPC'09). 126--139.

Digital Library

[16]

Chen Ding, Xipeng Shen, Kirk Kelsey, Chris Tice, Ruke Huang, and Chengliang Zhang. 2007. Software behavior oriented parallelization. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'07). 223--234.

Digital Library

[17]

Chen Ding and Yutao Zhong. 2003. Predicting whole-program locality through reuse distance analysis. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI'03).

Digital Library

[18]

Jialin Dou and Marcelo Cintra. 2004. Compiler estimation of load imbalance overhead in dpeculative parallelization. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT'04). 203--214.

Digital Library

[19]

Zhao-Hui Du, Chu-Cheow Lim, Xiao-Feng Li, Chen Yang, Qingyu Zhao, and Tin-Fook Ngai. 2004. A cost-driven compilation framework for speculative parallelization of sequential programs. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI'04). 71--81.

Digital Library

[20]

Vector Fabrics. 2013. Homepage. Retrieved from http://www.vectorfabrics.com/.

[21]

Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (PLDI'98). 212--223.

Digital Library

[22]

Saturnino Garcia, Donghwan Jeon, Christopher M. Louie, and Michael Bedford Taylor. 2011. Kremlin: Rethinking and rebooting gprof for the multicore age. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Fesign and Implementation (PLDI'11). 458--469.

Digital Library

[23]

Michael I. Gordon. 2010. Compiler Techniques for Scalable Performance of Stream Programs on Multicore Architectures. Ph.D. Thesis. Massachusetts Institute of Technology.

Digital Library

[24]

Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, and Saman Amarasinghe. 2002. A stream compiler for communication-exposed architectures. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X). 291--303.

Digital Library

[25]

Ryan E. Grant and Ahmad Afsahi. 2007. A comprehensive analysis of OpenMP applications on dual-core Intel Xeon SMPs. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007). 1--8.

[26]

Dominik Grewe, Zheng Wang, and Michael F.P. O'Boyle. 2013. Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In CGO'13.

[27]

Dominik Grewe, Zheng Wang, and Michael F. P. O'Boyle. 2011. A workload-aware mapping approach for data-parallel programs. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC'11). 117--126.

Digital Library

[28]

Dominik Grewe, Zheng Wang, and Michael F. P. O'Boyle. 2013. OpenCL task partitioning in the presence of GPU contention. In LCPC'13.

[29]

M. W. Hall, J. M. Anderson, S. P. Amarasinghe, B. R. Murphy, Shih-Wei Liao, and E. Bu. 1996. Maximizing multiprocessor performance with the SUIF compiler. Computer 29, 12 (1996), 84--89.

Digital Library

[30]

Parry Husbands, Costin Iancu, and Katherine Yelick. 2003. A performance analysis of the Berkeley UPC compiler. In Proceedings of the 17th Annual International Conference on Supercomputing (ICS'03). 63--73.

Digital Library

[31]

François Irigoin, Pierre Jouvelot, and Rémi Triolet. 1991. Semantical interprocedural parallelization: an overview of the PIPS project. In Proceedings of the 5th International Conference on Supercomputing (ICS'91). 244--251.

Digital Library

[32]

Makoto Ishihara, Hiroki Honda, and Mitsuhisa Sato. 2006. Development and implementation of an interactive parallelization assistance tool for OpenMP: iPat/OMP. IEICE Transactions on Information and Systems E89-D, 2 (2006), 399--407.

Digital Library

[33]

Hanjun Johnson, Nick P. Kim, Prakash Prabhu, Ayal Zaks, and David I. August. 2012. Speculative separation for Privatization and Reductions. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'12).

Digital Library

[34]

Ken Kennedy and John R. Allen. 2002. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann.

Digital Library

[35]

Ken Kennedy, Kathryn McKinley, and Chau-Wen Tseng. 1991. Interactive parallel programming using the ParaScope editor. IEEE Transactions on Parallel and Distributed Systems 2, 3 (1991).

Digital Library

[36]

Minjang Kim, Hyesoon Kim, and Chi-Keung Luk. 2010. SD3: A scalable approach to dynamic data-dependence profiling. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'43). 535--546.

Digital Library

[37]

D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe. 1981. Dependence graphs and compiler optimizations. In Proceedings of the 8th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'81). 207--218.

Digital Library

[38]

Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. 2007. Optimistic parallelism requires abstractions. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design andImplementation (PLDI'07). 211--222.

Digital Library

[39]

Leslie Lamport. 1974. The parallel execution of DO loops. Communications of ACM 17, 2 (1974), 83--93.

Digital Library

[40]

Shih-Wei Liao, Amer Diwan, Robert P. Bosch, Jr., Anwar Ghuloum, and Monica S. Lam. 1999. SUIF Explorer: an interactive and interprocedural parallelizer. In Proceedings of the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'99). 37--48.

Digital Library

[41]

Amy W. Lim and Monica S. Lam. 1997. Maximizing parallelism and minimizing synchronization with affine transforms. In Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'97). 201--214.

Digital Library

[42]

Open64. 2013. Homepage. Retrieved from http://www.open64.net.

[43]

Guilherme Ottoni, Ram Rangan, Adam Stoler, and David I. August. 2005. Automatic thread extraction with decoupled software pipelining. In MICRO 38. 105--118.

Digital Library

[44]

David A. Padua, Rudolf Eigenmann, Jay Hoeflinger, Paul Petersen, Peng Tu, Stephen Weatherford, and Keith Faigin. 1993. Polaris: A New-Generation Parallelizing Compiler for MPPs. Technical Report. University of Illinois at Urbana-Champaign.

[45]

P. Peterson and David A. Padua. 1993. Dynamic dependence analysis: A novel method for data dependence evaluation. In Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing. 64--81.

Digital Library

[46]

William Morton Pottenger. 1995. Induction Variable Substitution and Reduction Recognition in the Polaris Parallelizing Compiler. Technical Report. University of Illinois at Urbana-Champaign.

[47]

Louis-Noël Pouchet, Uday Bondhugula, Cédric Bastoul, Albert Cohen, J. Ramanujam, and P. Sadayappan. 2010. Combined iterative and model-driven optimization in an automatic parallelization framework. In Conference on Supercomputing (SC'10).

Digital Library

[48]

Manohar K. Prabhu and Kunle Olukotun. 2005. Exposing speculative thread parallelism in SPEC2000. In Proceedings of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'05). 142--152.

Digital Library

[49]

Graham Price and Manish Vachharajani. 2010. Large program trace analysis and compression with ZDDs. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO'10).

Digital Library

[50]

J. Ramanujam and P. Sadayappan. 1989. A methodology for parallelizing programs for multicomputers and complex memory multiprocessors. In Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing'89).

Digital Library

[51]

Ram Rangan, Neil Vachharajani, Manish Vachharajani, and David I. August. 2004. Decoupled software pipelining with the synchronization array. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT'04). 177--188.

Digital Library

[52]

Lawrence Rauchwerger, Nancy M. Amato, and David A. Padua. 1995. Run-time methods for parallelizing partially parallel loops. In Proceedings of the 9th International Conference on Supercomputing (ICS'95). 137--146.

Digital Library

[53]

Lawrence Rauchwerger, Francisco Arzu, and Koji Ouchi. 1998. Standard Templates Adaptive Parallel Library (STAPL). In Languages, Compilers, and Run-Time Systems for Scalable Computers. Lecture Notes in Computer Science, Vol. 1511. 402--409.

Digital Library

[54]

Lawrence Rauchwerger and David Padua. 1995. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation (PLDI'95). 218--232.

Digital Library

[55]

Sean Rul, Hans Vandierendonck, and Koen De Bosschere. 2008. A dynamic analysis tool for finding coarse-grain parallelism. In HiPEAC Industrial Workshop.

[56]

Silvius Rus, Maikel Pennings, and Lawrence Rauchwerger. 2007. Sensitivity analysis for automatic parallelization on multi-cores. In Proceedings of the 21st Annual International Conference on Supercomputing (ICS'07). 263--273.

Digital Library

[57]

Silvius Rus, Lawrence Rauchwerger, and Jay Hoeflinger. 2003. Hybrid analysis: Static & dynamic memory reference analysis. International Journal of Parallel Programming 31, 4 (2003), 251--283.

Digital Library

[58]

Vijay A. Saraswat, Vivek Sarkar, and Christoph von Praun. 2007. X10: concurrent programming for modern architectures. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'07). 271--271.

Digital Library

[59]

William Thies, Vikram Chandrasekhar, and Saman Amarasinghe. 2007. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 40). 356--369.

Digital Library

[60]

Georgios Tournavitis and Björn Franke. 2010. Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10). 377--388.

Digital Library

[61]

Georgios Tournavitis, Zheng Wang, Björn Franke, and Michael F. P. O'Boyle. 2009. Towards a holistic approach to auto-parallelization: Integrating profile-driven parallelism detection and machine-learning based mapping. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'09). 177--187.

Digital Library

[62]

Neil Vachharajani, Ram Rangan, Easwaran Raman, Matthew J. Bridges, Guilherme Ottoni, and David I. August. 2007. Speculative decoupled software pipelining. In PACT'07. 49--59.

Digital Library

[63]

Hans Vandierendonck, Sean Rul, and Koen De Bosschere. 2010. The Paralax infrastructure: automatic parallelization with a helping hand. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10). 389--400.

Digital Library

[64]

Zheng Wang and Michael F. P. O'Boyle. 2009. Mapping parallelism to multi-cores: a machine learning based approach. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'09).

Digital Library

[65]

Zheng Wang and Michael F. P. O'Boyle. 2010. Partitioning streaming parallelism for multi-cores: A machine learning based approach. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10).

Digital Library

[66]

Zheng Wang and Michael F. P. O'Boyle. 2013. Using machine learning to partition streaming programs. ACM ACM Transactions on Architecture and Code Optimization 10, 3 (2013), 1--25.

Digital Library

[67]

Peng Wu, Arun Kejariwal, and Călin Caşcaval. 2008. Compiler-driven dependence profiling to guide program parallelization. In Languages and Compilers for Parallel Computing. 232--248.

Digital Library

[68]

Heidi Ziegler and Mary Hall. 2005. Evaluating heuristics in automatically mapping multi-loop applications to FPGAs. In Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-programmable Gate Arrays (FPGA'05). 184--195.

Digital Library

Cited By

Dorronsoro BAragón-Jurado JJareño Jde la Torre JRuiz P(2024)A Survey on Automatic Source Code Transformation for Green Software GenerationEncyclopedia of Sustainable Technologies10.1016/B978-0-323-90386-8.00122-4(765-779)Online publication date: 2024
https://doi.org/10.1016/B978-0-323-90386-8.00122-4
Deiana ESuchy BWilkins MHomerding BMcMichen TDunajewski KDinda PHardavellas NCampanoni SDubach CBruening DHardekopf B(2023)Program State Element CharacterizationProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580011(199-211)Online publication date: 17-Feb-2023
https://dl.acm.org/doi/10.1145/3579990.3580011
Alaswad FEswaran P(2023)Investigating the superiority of Intel oneAPI IFX compiler on Intel CPUs using different optimization levels: A case study on a CFD system2023 4th IEEE Global Conference for Advancement in Technology (GCAT)10.1109/GCAT59970.2023.10353473(1-9)Online publication date: 6-Oct-2023
https://doi.org/10.1109/GCAT59970.2023.10353473
Show More Cited By

Index Terms

Integrating profile-driven parallelism detection and machine-learning-based mapping
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping
PLDI '09

Compiler-based auto-parallelization is a much studied area, yet has still not found wide-spread application. This is largely due to the poor exploitation of application parallelism, subsequently resulting in performance levels far below those which a ...
Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping
PLDI '09: Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation

Compiler-based auto-parallelization is a much studied area, yet has still not found wide-spread application. This is largely due to the poor exploitation of application parallelism, subsequently resulting in performance levels far below those which a ...
Assessing One-to-One Parallelism Levels Mapping for OpenMP Offloading to GPUs
PMAM'17: Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores

The proliferation of accelerators in modern clusters makes efficient coprocessor programming a key requirement if application codes are to achieve high levels of performance with acceptable energy consumption on such platforms. This has led to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 11, Issue 1

February 2014

373 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/2591460

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 February 2014

Accepted: 01 September 2013

Revised: 01 July 2013

Received: 01 June 2012

Published in TACO Volume 11, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

52
Total Citations
View Citations
1,023
Total Downloads

Downloads (Last 12 months)107
Downloads (Last 6 weeks)9

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dorronsoro BAragón-Jurado JJareño Jde la Torre JRuiz P(2024)A Survey on Automatic Source Code Transformation for Green Software GenerationEncyclopedia of Sustainable Technologies10.1016/B978-0-323-90386-8.00122-4(765-779)Online publication date: 2024
https://doi.org/10.1016/B978-0-323-90386-8.00122-4
Deiana ESuchy BWilkins MHomerding BMcMichen TDunajewski KDinda PHardavellas NCampanoni SDubach CBruening DHardekopf B(2023)Program State Element CharacterizationProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580011(199-211)Online publication date: 17-Feb-2023
https://dl.acm.org/doi/10.1145/3579990.3580011
Alaswad FEswaran P(2023)Investigating the superiority of Intel oneAPI IFX compiler on Intel CPUs using different optimization levels: A case study on a CFD system2023 4th IEEE Global Conference for Advancement in Technology (GCAT)10.1109/GCAT59970.2023.10353473(1-9)Online publication date: 6-Oct-2023
https://doi.org/10.1109/GCAT59970.2023.10353473
Liu HXu JChen SGuo T(2022)Compiler Optimization Parameter Selection Method Based on Ensemble LearningElectronics10.3390/electronics1115245211:15(2452)Online publication date: 6-Aug-2022
https://doi.org/10.3390/electronics11152452
Chang LMack JWillis BChen XBrunhaver JAkoglu AChakrabarti C(2022)Profile-Guided Parallel Task Extraction and Execution for Domain Specific Heterogeneous SoC2022 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00121(913-920)Online publication date: Dec-2022
https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00121
Dutta AAlcaraz JTehraniJamsaz ASikora ACesar EJannesari A(2022)Pattern-based Autotuning of OpenMP Loops using Graph Neural Networks2022 IEEE/ACM International Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S)10.1109/AI4S56813.2022.00010(26-31)Online publication date: Dec-2022
https://doi.org/10.1109/AI4S56813.2022.00010
Visochan AStroganov ATitarenko ILonchakov SMologin SPavlova SLyupa AKozlova A(2022)Method for Profile-Guided Optimization of Android Applications Using Random ForestIEEE Access10.1109/ACCESS.2022.321497110(109652-109662)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3214971
Parisi EBarchi FBartolini ATagliavini GAcquaviva A(2021)Source Code Classification for Energy Efficiency in Parallel Ultra Low-Power Microcontrollers2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474085(878-883)Online publication date: 1-Feb-2021
https://doi.org/10.23919/DATE51398.2021.9474085
Ren JGao LWang XMa MQiu GWang HZheng JWang Z(2021)Adaptive Computation Offloading for Mobile Augmented RealityProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/34949585:4(1-30)Online publication date: 27-Dec-2021
https://doi.org/10.1145/3494958
Morihata ASato SFreund SYahav E(2021)Reverse engineering for reduction parallelization via semiring polynomialsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454079(820-834)Online publication date: 19-Jun-2021
https://dl.acm.org/doi/10.1145/3453483.3454079
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents