Modern computer architectures designed with high-performance microprocessors offer tremendous potential gains in performance over previous designs. Yet their very complexity makes it increasingly difficult to produce efficient code and to realize their full potential. This landmark text from two leaders in the field focuses on the pivotal role that compilers can play in addressing this critical issue. The basis for all the methods presented in this book is data dependence, a fundamental compiler analysis tool for optimizing programs on high-performance microprocessors and parallel architectures. It enables compiler designers to write compilers that automatically transform simple, sequential programs into forms that can exploit special features of these modern architectures. The text provides a broad introduction to data dependence, to the many transformation strategies it supports, and to its applications to important optimization problems such as parallelization, compiler memory hierarchy management, and instruction scheduling. The authors demonstrate the importance and wide applicability of dependence-based compiler optimizations and give the compiler writer the basics needed to understand and implement them. They also offer cookbook explanations for transforming applications by hand to computational scientists and engineers who are driven to obtain the best possible performance of their complex applications.
Cited By
- Hofmann M KODA: Knit-program Optimization by Dependency Analysis Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology, (1-15)
- Pitchanathan A, Cohen A, Zinenko O and Grosser T Strided Difference Bound Matrices Computer Aided Verification, (279-302)
- Čugurović M, Vujošević Janičić M, Jovanović V and Würthinger T (2024). GraalSP, Journal of Systems and Software, 213:C, Online publication date: 1-Jul-2024.
- Zhao W, Yuan L, Yan B, Ma P, Zhang Y, Wang L and Wang Z Stencil Computation with Vector Outer Product Proceedings of the 38th ACM International Conference on Supercomputing, (247-258)
- Tayeb H, Paillat L and Bramas B (2023). Autovesk: Automatic Vectorized Code Generation from Unstructured Static Kernels Using Graph Transformations, ACM Transactions on Architecture and Code Optimization, 21:1, (1-25), Online publication date: 31-Mar-2024.
- Xu J, Song G, Zhou B, Li F, Hao J and Zhao J A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine Programs Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, (55-67)
- Marron M Toward Programming Languages for Reasoning: Humans, Symbolic Systems, and AI Agents Proceedings of the 2023 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, (136-152)
- Reber B, Gould M, Kneipp A, Liu F, Prechtl I, Ding C, Chen L and Patru D (2023). Cache Programming for Scientific Loops Using Leases, ACM Transactions on Architecture and Code Optimization, 20:3, (1-25), Online publication date: 30-Sep-2023.
- Chen T, Jia H, Zhang Y, Li K, Li Z, Zhao X, Yao J and Li C OpenFFT: An Adaptive Tuning Framework for 3D FFT on ARM Multicore CPUs Proceedings of the 37th International Conference on Supercomputing, (398-409)
- Su Z, Wang D, Yu Z, Yang Y, Jiang Y, Wang R, Chang W, Li W, Cui A and Sun J (2023). PHCG: Optimizing Simulink Code Generation for Embedded System With SIMD Instructions, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 42:4, (1072-1084), Online publication date: 1-Apr-2023.
- Bai A Million.js: A Fast Compiler-Augmented Virtual DOM for the Web Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, (1813-1820)
- Sundararajah K, Saumya C and Kulkarni M (2022). UniRec: a unimodular-like framework for nested recursions and loops, Proceedings of the ACM on Programming Languages, 6:OOPSLA2, (1264-1290), Online publication date: 31-Oct-2022.
- Borum H and Clausen M Transforming domain models to efficient C# for the Danish pension industry Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings, (766-773)
- Tong G, Yan R, Yang L, Lan M, Zhang J, Cheng Y, Ma W, Lü Y, Ma S and Huang L Optimizing Winograd Convolution on GPUs via Partial Kernel Fusion Network and Parallel Computing, (17-29)
- Praharenka W, Pankratz D, De Carvalho J, Amiri E and Amaral J (2022). Vectorizing divergent control flow with active-lane consolidation on long-vector architectures, The Journal of Supercomputing, 78:10, (12553-12588), Online publication date: 1-Jul-2022.
- Susungi A and Tadonki C (2021). Intermediate Representations for Explicitly Parallel Programs, ACM Computing Surveys, 54:5, (1-24), Online publication date: 30-Jun-2022.
- Khan S, Chatterjee B and Pande S VICO Proceedings of the 36th ACM International Conference on Supercomputing, (1-14)
- Ziraksima M, Lotfi S and Razmara J (2022). Deep reinforcement learning in loop fusion problem, Neurocomputing, 481:C, (102-120), Online publication date: 7-Apr-2022.
- Rocha R, Petoumenos P, Franke B, Bhatotia P and O'Boyle M Loop rolling for code size reduction Proceedings of the 20th IEEE/ACM International Symposium on Code Generation and Optimization, (217-229)
- Abdollahi-Kalkhoran A, Lotfi S and Izadkhah H (2022). TEA-SEA, Expert Systems with Applications: An International Journal, 191:C, Online publication date: 1-Apr-2022.
- Ding C, Chen D, Liu F, Reber B and Smith W (2022). CARL: Compiler Assigned Reference Leasing, ACM Transactions on Architecture and Code Optimization, 19:1, (1-28), Online publication date: 31-Mar-2022.
- Chatarasi P, Kwon H, Parashar A, Pellauer M, Krishna T and Sarkar V (2021). Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators, ACM Transactions on Architecture and Code Optimization, 19:1, (1-26), Online publication date: 31-Mar-2022.
- Liu L, Isaacman S and Kremer U (2021). An Adaptive Application Framework with Customizable Quality Metrics, ACM Transactions on Design Automation of Electronic Systems, 27:2, (1-33), Online publication date: 31-Mar-2022.
- de Souza Neto J, Martins Moreira A, Vargas-Solar G and Musicante M (2022). A two-level formal model for Big Data processing programs, Science of Computer Programming, 215:C, Online publication date: 1-Mar-2022.
- Feng J, He Y, Tao Q, Ma H and Hashmi M (2022). An SLP Vectorization Method Based on Equivalent Extended Transformation, Wireless Communications & Mobile Computing, 2022, Online publication date: 1-Jan-2022.
- Álvarez Casado C and Bordallo López M (2021). Real-time face alignment: evaluation methods, training strategies and implementation optimization, Journal of Real-Time Image Processing, 18:6, (2239-2267), Online publication date: 1-Dec-2021.
- Tao X, Pang J, Xu J and Zhu Y (2021). Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture, The Journal of Supercomputing, 77:12, (14502-14524), Online publication date: 1-Dec-2021.
- Bednárek D, Kruliš M and Yaghob J (2021). Letting future programmers experience performance-related tasks, Journal of Parallel and Distributed Computing, 155:C, (74-86), Online publication date: 1-Sep-2021.
- Di Luna G, Italiano D, Massarelli L, Österlund S, Giuffrida C and Querzoni L Who’s debugging the debuggers? exposing debug information bugs in optimized binaries Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, (1034-1045)
- Vasiladiotis C, Lozano R, Cole M and Franke B Loop parallelization using dynamic commutativity analysis Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization, (150-161)
- Poesia G and Pereira F (2020). Dynamic dispatch of context-sensitive optimizations, Proceedings of the ACM on Programming Languages, 4:OOPSLA, (1-28), Online publication date: 13-Nov-2020.
- Brinich P and Johnson J Verification of Vectorization of Signal Transforms Languages and Compilers for Parallel Computing, (215-231)
- Lezos C, Dimitroulakos G, Latifis I and Masselos K (2020). A Locality Optimizer for Loop-dominated Applications Based on Reuse Distance Analysis, ACM Transactions on Design Automation of Electronic Systems, 25:6, (1-26), Online publication date: 12-Oct-2020.
- Gharat P, Khedker U and Mycroft A (2020). Generalized Points-to Graphs, ACM Transactions on Programming Languages and Systems, 42:2, (1-78), Online publication date: 30-Jun-2020.
- Prabhu I and Nandivada V Chunking loops with non-uniform workloads Proceedings of the 34th ACM International Conference on Supercomputing, (1-12)
- Gupta S, Purandare S and Ramachandra K Aggify: Lifting the Curse of Cursor Loops using Custom Aggregates Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, (559-573)
- Vasilache N, Zinenko O, Theodoridis T, Goyal P, Devito Z, Moses W, Verdoolaege S, Adams A and Cohen A (2019). The Next 700 Accelerated Layers, ACM Transactions on Architecture and Code Optimization, 16:4, (1-26), Online publication date: 31-Dec-2020.
- Kunft A, Katsifodimos A, Schelter S, Breß S, Rabl T and Markl V (2019). An intermediate representation for optimizing machine learning pipelines, Proceedings of the VLDB Endowment, 12:11, (1553-1567), Online publication date: 1-Jul-2019.
- Jacob D and Singer J ALPyNA: acceleration of loops in Python for novel architectures Proceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming, (25-34)
- Sundararajah K and Kulkarni M Composable, sound transformations of nested recursion and loops Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, (902-917)
- Zou Y and Lin M Graph-Morphing Proceedings of the 56th Annual Design Automation Conference 2019, (1-6)
- Angerer F, Grimmer A, Prähofer H and Grünbacher P (2019). Change impact analysis for maintenance and evolution of variable software systems, Automated Software Engineering, 26:2, (417-461), Online publication date: 1-Jun-2019.
- Wei J, Gibson G, Gibbons P and Xing E Automating Dependence-Aware Parallelization of Machine Learning Training on Distributed Shared Memory Proceedings of the Fourteenth EuroSys Conference 2019, (1-17)
- Teixeira T, Ancourt C, Padua D and Gropp W Locus: a system and a language for program optimization Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization, (217-228)
- Wang Q, Su P, Chabbi M and Liu X Lightweight hardware transactional memory profiling Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, (186-200)
- Crago N, Stephenson M and Keckler S (2018). Exposing Memory Access Patterns to Improve Instruction and Memory Efficiency in GPUs, ACM Transactions on Architecture and Code Optimization, 15:4, (1-23), Online publication date: 8-Jan-2019.
- Sato Y, Yuki T and Endo T (2019). An Autotuning Framework for Scalable Execution of Tiled Code via Iterative Polyhedral Compilation, ACM Transactions on Architecture and Code Optimization, 15:4, (1-23), Online publication date: 31-Dec-2019.
- Zhao H, Zheng F, Wu J, Nan B, Li B and Mei K Automatic Parallelization for Binary on Multi-core Platforms Proceedings of the 2nd International Conference on Computer Science and Application Engineering, (1-6)
- Boehm M, Reinwald B, Hutchison D, Sen P, Evfimievski A and Pansare N (2018). On optimizing operator fusion plans for large-scale machine learning in systemML, Proceedings of the VLDB Endowment, 11:12, (1755-1768), Online publication date: 1-Aug-2018.
- Jinyang Y, Rongcai Z, Qi W and Xiaohan T Loop-nest Auto-vectorization Method Based on Benefit Analysis Proceedings of the 2nd International Conference on Advances in Image Processing, (240-244)
- Vahabzadeh A, Stocco A and Mesbah A Fine-grained test minimization Proceedings of the 40th International Conference on Software Engineering, (210-221)
- Stpiczyński P (2018). Language-based vectorization and parallelization using intrinsics, OpenMP, TBB and Cilk Plus, The Journal of Supercomputing, 74:4, (1461-1472), Online publication date: 1-Apr-2018.
- Zhao J and Zhao R (2018). K-DT, The Journal of Supercomputing, 74:4, (1655-1675), Online publication date: 1-Apr-2018.
- Zinenko O, Huot S and Bastoul C (2018). Visual Program Manipulation in the Polyhedral Model, ACM Transactions on Architecture and Code Optimization, 15:1, (1-25), Online publication date: 31-Mar-2018.
- Kotsifakou M, Srivastava P, Sinclair M, Komuravelli R, Adve V and Adve S (2018). HPVM, ACM SIGPLAN Notices, 53:1, (68-80), Online publication date: 23-Mar-2018.
- Shen D, Chabbi M and Liu X An Evaluation of Vectorization and Cache Reuse Tradeoffs on Modern CPUs Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores, (21-30)
- Rodrigues C, Phaosawasdi A and Wu P SIMDization of Small Tensor Multiplication Kernels for Wide SIMD Vector Processors Proceedings of the 2018 4th Workshop on Programming Models for SIMD/Vector Processing, (1-8)
- Lemaitre F, Couturier B and Lacassagne L Small SIMD Matrices for CERN High Throughput Computing Proceedings of the 2018 4th Workshop on Programming Models for SIMD/Vector Processing, (1-8)
- Zinenko O, Verdoolaege S, Reddy C, Shirako J, Grosser T, Sarkar V and Cohen A Modeling the conflicting demands of parallelism and Temporal/Spatial locality in affine scheduling Proceedings of the 27th International Conference on Compiler Construction, (3-13)
- Kotsifakou M, Srivastava P, Sinclair M, Komuravelli R, Adve V and Adve S HPVM Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, (68-80)
- Harris B, Moghaddam M, Kang D, Bae I, Kim E, Min H, Cho H, Kim S, Egger B, Ha S and Choi K Architectures and algorithms for user customization of CNNs Proceedings of the 23rd Asia and South Pacific Design Automation Conference, (540-547)
- Harris B, Moghaddam M, Kang D, Bae I, Kim E, Min H, Cho H, Kim S, Egger B, Ha S and Choi K Architectures and algorithms for user customization of CNNs 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), (540-547)
- Shrivastava R and Nandivada V (2017). Energy-Efficient Compilation of Irregular Task-Parallel Loops, ACM Transactions on Architecture and Code Optimization, 14:4, (1-29), Online publication date: 20-Dec-2017.
- Ramachandra K, Park K, Emani K, Halverson A, Galindo-Legaria C and Cunningham C (2017). Froid, Proceedings of the VLDB Endowment, 11:4, (432-444), Online publication date: 1-Dec-2017.
- Ramachandra K, Park K, Emani K, Halverson A, Galindo-Legaria C and Cunningham C (2018). Froid, Proceedings of the VLDB Endowment, 11:4, (432-444), Online publication date: 1-Dec-2017.
- Li Z, Liu L, Deng Y, Yin S, Wang Y and Wei S (2017). Aggressive Pipelining of Irregular Applications on Reconfigurable Hardware, ACM SIGARCH Computer Architecture News, 45:2, (575-586), Online publication date: 14-Sep-2017.
- Henriksen T, Serup N, Elsman M, Henglein F and Oancea C (2017). Futhark: purely functional GPU-programming with nested parallelism and in-place array updates, ACM SIGPLAN Notices, 52:6, (556-571), Online publication date: 14-Sep-2017.
- Jensen N and Karlsson S (2017). Improving Loop Dependence Analysis, ACM Transactions on Architecture and Code Optimization, 14:3, (1-24), Online publication date: 6-Sep-2017.
- Li Z, Liu L, Deng Y, Yin S, Wang Y and Wei S Aggressive Pipelining of Irregular Applications on Reconfigurable Hardware Proceedings of the 44th Annual International Symposium on Computer Architecture, (575-586)
- Gupta S, Shrivastava R and Nandivada V Optimizing recursive task parallel programs Proceedings of the International Conference on Supercomputing, (1-11)
- Henriksen T, Serup N, Elsman M, Henglein F and Oancea C Futhark: purely functional GPU-programming with nested parallelism and in-place array updates Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, (556-571)
- Bilardi G, Ekanadham K and Pattnaik P Optimal On-Line Computation of Stack Distances for MIN and OPT Proceedings of the Computing Frontiers Conference, (237-246)
- Sundararajah K, Sakka L and Kulkarni M (2017). Locality Transformations for Nested Recursive Iteration Spaces, ACM SIGPLAN Notices, 52:4, (281-295), Online publication date: 12-May-2017.
- Sundararajah K, Sakka L and Kulkarni M (2017). Locality Transformations for Nested Recursive Iteration Spaces, ACM SIGARCH Computer Architecture News, 45:1, (281-295), Online publication date: 11-May-2017.
- Sundararajah K, Sakka L and Kulkarni M Locality Transformations for Nested Recursive Iteration Spaces Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, (281-295)
- Shi X, Cui B, Dobbie G and Ooi B (2016). UniAD, ACM Transactions on Database Systems, 42:1, (1-42), Online publication date: 2-Mar-2017.
- Shirako J, Hayashi A and Sarkar V Optimized two-level parallelization for GPU accelerators using the polyhedral model Proceedings of the 26th International Conference on Compiler Construction, (22-33)
- Kusano M and Wang C Flow-sensitive composition of thread-modular abstract interpretation Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, (799-809)
- Huang J, Prabhu P, Jablin T, Ghosh S, Apostolakis S, Lee J and August D Speculatively Exploiting Cross-Invocation Parallelism Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, (207-221)
- Kristensen M, Lund S, Blum T and Avery J Fusion of Parallel Array Operations Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, (71-85)
- Agullo E, Buttari A, Guermouche A and Lopez F (2016). Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, ACM Transactions on Mathematical Software, 43:2, (1-22), Online publication date: 2-Sep-2016.
- Truong L, Barik R, Totoni E, Liu H, Markley C, Fox A and Shpeisman T (2016). Latte: a language, compiler, and runtime for elegant and efficient deep neural networks, ACM SIGPLAN Notices, 51:6, (209-223), Online publication date: 1-Aug-2016.
- Sultana N, Calvert A, Overbey J and Arnold G From OpenACC to OpenMP 4 Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale, (1-8)
- Agullo E, Bramas B, Coulaud O, Darve E, Messner M and Takahashi T (2016). Task-based FMM for heterogeneous architectures, Concurrency and Computation: Practice & Experience, 28:9, (2608-2629), Online publication date: 25-Jun-2016.
- Truong L, Barik R, Totoni E, Liu H, Markley C, Fox A and Shpeisman T Latte: a language, compiler, and runtime for elegant and efficient deep neural networks Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, (209-223)
- Šinkarovs A and Scholz S (2016). Type-driven data layouts for improved vectorisation, Concurrency and Computation: Practice & Experience, 28:7, (2092-2119), Online publication date: 1-May-2016.
- Lin Y and Lee J (2016). Vector data flow analysis for SIMD optimizations on OpenCL programs, Concurrency and Computation: Practice & Experience, 28:5, (1629-1654), Online publication date: 10-Apr-2016.
- Elkhouly R, El-Mahdy A and Elmasry A Optimality analysis of if-conversion transformation Proceedings of the 24th High Performance Computing Symposium, (1-8)
- Na Y, Kim S and Han Y (2016). JavaScript Parallelizing Compiler for Exploiting Parallelism from Data-Parallel HTML5 Applications, ACM Transactions on Architecture and Code Optimization, 12:4, (1-25), Online publication date: 7-Jan-2016.
- Yiapanis P, Brown G and Luján M (2015). Compiler-Driven Software Speculation for Thread-Level Parallelism, ACM Transactions on Programming Languages and Systems, 38:2, (1-45), Online publication date: 4-Jan-2016.
- Tan M, Liu G, Zhao R, Dai S and Zhang Z ElasticFlow Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, (78-85)
- Ding C, Lu H and Ye C MMC Proceedings of the 2015 International Symposium on Memory Systems, (47-50)
- Stevens J, Tschirhart P and Jacob B The Semantic Gap Between Software and the Memory System Proceedings of the 2015 International Symposium on Memory Systems, (43-46)
- Guo S, Kusano M, Wang C, Yang Z and Gupta A Assertion guided symbolic execution of multithreaded programs Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, (854-865)
- Venkat A, Hall M and Strout M (2015). Loop and data transformations for sparse matrix code, ACM SIGPLAN Notices, 50:6, (521-532), Online publication date: 7-Aug-2015.
- Weijiang Y, Balakrishna S, Liu J and Kulkarni M (2015). Tree dependence analysis, ACM SIGPLAN Notices, 50:6, (314-325), Online publication date: 7-Aug-2015.
- Kotha A, Anand K, Creech T, ElWazeer K, Smithson M, Yellareddy G and Barua R (2015). Affine Parallelization Using Dependence and Cache Analysis in a Binary Rewriter, IEEE Transactions on Parallel and Distributed Systems, 26:8, (2154-2163), Online publication date: 1-Aug-2015.
- Chatty S, Magnaudet M and Prun D Verification of properties of interactive components from their executable code Proceedings of the 7th ACM SIGCHI Symposium on Engineering Interactive Computing Systems, (276-285)
- Aloor R and Nandivada V Unique Worker model for OpenMP Proceedings of the 29th ACM on International Conference on Supercomputing, (47-56)
- Caballero D, Royuela S, Ferrer R, Duran A and Martorell X Optimizing Overlapped Memory Accesses in User-directed Vectorization Proceedings of the 29th ACM on International Conference on Supercomputing, (393-404)
- Venkat A, Hall M and Strout M Loop and data transformations for sparse matrix code Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, (521-532)
- Weijiang Y, Balakrishna S, Liu J and Kulkarni M Tree dependence analysis Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, (314-325)
- Hassaan M, Nguyen D and Pingali K (2015). Kinetic Dependence Graphs, ACM SIGARCH Computer Architecture News, 43:1, (457-471), Online publication date: 29-May-2015.
- Wang D, Janjusic T, Iversen C, Thornton P, Karssovski M, Wu W and Xu Y A scientific function test framework for modular environmental model development Proceedings of the 2015 International Workshop on Software Engineering for High Performance Computing in Science, (16-23)
- Hassaan M, Nguyen D and Pingali K (2015). Kinetic Dependence Graphs, ACM SIGPLAN Notices, 50:4, (457-471), Online publication date: 12-May-2015.
- Streit K, Doerfert J, Hammacher C, Zeller A and Hack S (2015). Generalized Task Parallelism, ACM Transactions on Architecture and Code Optimization, 12:1, (1-25), Online publication date: 16-Apr-2015.
- Hassaan M, Nguyen D and Pingali K Kinetic Dependence Graphs Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, (457-471)
- Kim H, El Hajj I, Stratton J, Lumetta S and Hwu W Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, (257-268)
- Lazarescu M and Lavagno L (2015). Interactive Trace-Based Analysis Toolset for Manual Parallelization of C Programs, ACM Transactions on Embedded Computing Systems, 14:1, (1-20), Online publication date: 21-Jan-2015.
- Huda Z, Jannesari A and Wolf F (2015). Using Template Matching to Infer Parallel Design Patterns, ACM Transactions on Architecture and Code Optimization, 11:4, (1-21), Online publication date: 9-Jan-2015.
- Kong M, Pop A, Pouchet L, Govindarajan R, Cohen A and Sadayappan P (2015). Compiler/Runtime Framework for Dynamic Dataflow Parallelization of Tiled Programs, ACM Transactions on Architecture and Code Optimization, 11:4, (1-30), Online publication date: 9-Jan-2015.
- Cilardo A and Gallo L (2015). Improving Multibank Memory Access Parallelism with Lattice-Based Partitioning, ACM Transactions on Architecture and Code Optimization, 11:4, (1-25), Online publication date: 9-Jan-2015.
- Yi Q, Wang Q and Cui H Specializing Compiler Optimizations through Programmable Composition for Dense Matrix Computations Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, (596-608)
- Shirako J, Pouchet L and Sarkar V Oil and water can mix Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, (287-298)
- Overbey J, Behrang F and Hafiz M A foundation for refactoring C with macros Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, (75-85)
- Sommer R, Vallentin M, De Carli L and Paxson V HILTI Proceedings of the 2014 Conference on Internet Measurement Conference, (461-474)
- Liu C, Zhang J, Zhou H, McDirmid S, Guo Z and Moscibroda T Automating Distributed Partial Aggregation Proceedings of the ACM Symposium on Cloud Computing, (1-12)
- Campanoni S, Brownell K, Kanev S, Jones T, Wei G and Brooks D (2014). HELIX-RC, ACM SIGARCH Computer Architecture News, 42:3, (217-228), Online publication date: 16-Oct-2014.
- Albert C, Murray A and Ravindran B Applying source level auto-vectorization to Aparapi Java Proceedings of the 2014 International Conference on Principles and Practices of Programming on the Java platform: Virtual machines, Languages, and Tools, (122-132)
- Kusano M and Wang C Assertion guided abstraction Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, (175-186)
- Shi X, Cui B, Dobbie G and Ooi B Towards unified ad-hoc data processing Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, (1263-1274)
- Campanoni S, Brownell K, Kanev S, Jones T, Wei G and Brooks D HELIX-RC Proceeding of the 41st annual international symposium on Computer architecuture, (217-228)
- Waterland A, Angelino E, Adams R, Appavoo J and Seltzer M (2014). ASC, ACM SIGARCH Computer Architecture News, 42:1, (575-590), Online publication date: 5-Apr-2014.
- Waterland A, Angelino E, Adams R, Appavoo J and Seltzer M (2014). ASC, ACM SIGPLAN Notices, 49:4, (575-590), Online publication date: 5-Apr-2014.
- Kim T and Hoskote Y Automatic generation of custom SIMD instructions for superword level parallelism Proceedings of the conference on Design, Automation & Test in Europe, (1-6)
- Boehm M, Tatikonda S, Reinwald B, Sen P, Tian Y, Burdick D and Vaithyanathan S (2014). Hybrid parallelization strategies for large-scale machine learning in SystemML, Proceedings of the VLDB Endowment, 7:7, (553-564), Online publication date: 1-Mar-2014.
- Waterland A, Angelino E, Adams R, Appavoo J and Seltzer M ASC Proceedings of the 19th international conference on Architectural support for programming languages and operating systems, (575-590)
- Lacassagne L, Etiemble D, Hassan Zahraee A, Dominguez A and Vezolle P High level transforms for SIMD and low-level computer vision algorithms Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing, (49-56)
- Venkat A, Shantharam M, Hall M and Strout M Non-affine Extensions to Polyhedral Code Generation Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, (185-194)
- Venkat A, Shantharam M, Hall M and Strout M Non-affine Extensions to Polyhedral Code Generation Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, (185-194)
- Ketterlin A and Clauss P (2014). Recovering memory access patterns of executable programs, Science of Computer Programming, 80:PB, (440-456), Online publication date: 1-Feb-2014.
- Wang Z, Tournavitis G, Franke B and O'boyle M (2014). Integrating profile-driven parallelism detection and machine-learning-based mapping, ACM Transactions on Architecture and Code Optimization, 11:1, (1-26), Online publication date: 1-Feb-2014.
- Brock J, Gu X, Bao B and Ding C (2013). Pacman, ACM SIGPLAN Notices, 48:11, (39-50), Online publication date: 4-Dec-2013.
- Fauzia N, Elango V, Ravishankar M, Ramanujam J, Rastello F, Rountev A, Pouchet L and Sadayappan P (2013). Beyond reuse distance analysis, ACM Transactions on Architecture and Code Optimization, 10:4, (1-29), Online publication date: 1-Dec-2013.
- Ravi N, Yang Y, Bao T and Chakradhar S Semi-automatic restructuring of offloadable tasks for many-core accelerators Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-12)
- Seo S, Lee J, Jo G and Lee J Automatic OpenCL work-group size selection for multicore CPUs Proceedings of the 22nd international conference on Parallel architectures and compilation techniques, (387-398)
- Govindaraju V, Nowatzki T and Sankaralingam K Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG Proceedings of the 22nd international conference on Parallel architectures and compilation techniques, (341-352)
- Compiler-directed memory hierarchy design for low-energy embedded systems Proceedings of the Eleventh ACM/IEEE International Conference on Formal Methods and Models for Codesign, (147-156)
- Henriksen T and Oancea C A T2 graph-reduction approach to fusion Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing, (47-58)
- Liu P, Huang C, Guo J, Geng Y, Wang W and Yang M Scalable-Grain Pipeline Parallelization Method for Multi-core Systems Proceedings of the 10th IFIP International Conference on Network and Parallel Computing - Volume 8147, (269-283)
- Ding C and Liu L Access Annotation for Safe Program Parallelization Proceedings of the 10th IFIP International Conference on Network and Parallel Computing - Volume 8147, (13-26)
- Papakonstantinou A, Gururaj K, Stratton J, Chen D, Cong J and Hwu W (2013). Efficient compilation of CUDA kernels for high-performance computing on FPGAs, ACM Transactions on Embedded Computing Systems, 13:2, (1-26), Online publication date: 1-Sep-2013.
- Agullo E, Buttari A, Guermouche A and Lopez F Multifrontal QR factorization for multicore architectures over runtime systems Proceedings of the 19th international conference on Parallel Processing, (521-532)
- Barthe G, Crespo J, Gulwani S, Kunz C and Marron M (2013). From relational verification to SIMD loop synthesis, ACM SIGPLAN Notices, 48:8, (123-134), Online publication date: 23-Aug-2013.
- Benoit A, Çatalyürek Ü, Robert Y and Saule E (2013). A survey of pipelined workflow scheduling, ACM Computing Surveys, 45:4, (1-36), Online publication date: 1-Aug-2013.
- Waterland A, Angelino E, Cubuk E, Kaxiras E, Adams R, Appavoo J and Seltzer M Computational caches Proceedings of the 6th International Systems and Storage Conference, (1-7)
- Sheffield D, Anderson M and Keutzer K Three fingered jack Proceedings of the 5th USENIX Conference on Hot Topics in Parallelism, (2-2)
- Johnson N, Oh T, Zaks A and August D (2013). Fast condensation of the program dependence graph, ACM SIGPLAN Notices, 48:6, (39-50), Online publication date: 23-Jun-2013.
- Kong M, Veras R, Stock K, Franchetti F, Pouchet L and Sadayappan P (2013). When polyhedral transformations meet SIMD code generation, ACM SIGPLAN Notices, 48:6, (127-138), Online publication date: 23-Jun-2013.
- Brock J, Gu X, Bao B and Ding C Pacman Proceedings of the 2013 international symposium on memory management, (39-50)
- Brock J, Gu X, Bao B and Ding C Pacman Proceedings of the 2013 international symposium on memory management, (39-50)
- Johnson N, Oh T, Zaks A and August D Fast condensation of the program dependence graph Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, (39-50)
- Kong M, Veras R, Stock K, Franchetti F, Pouchet L and Sadayappan P When polyhedral transformations meet SIMD code generation Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, (127-138)
- Alle M, Morvan A and Derrien S Runtime dependency analysis for loop pipelining in high-level synthesis Proceedings of the 50th Annual Design Automation Conference, (1-10)
- Papakonstantinou A, Chen D, Hwu W, Cong J and Liang Y Throughput-oriented kernel porting onto FPGAs Proceedings of the 50th Annual Design Automation Conference, (1-10)
- Leung A, Lhoták O and Lashari G (2013). Parallel execution of Java loops on Graphics Processing Units, Science of Computer Programming, 78:5, (458-480), Online publication date: 1-May-2013.
- Oh T, Kim H, Johnson N, Lee J and August D (2013). Practical automatic loop specialization, ACM SIGPLAN Notices, 48:4, (419-430), Online publication date: 23-Apr-2013.
- Xiang X, Ding C, Luo H and Bao B (2013). HOTL, ACM SIGPLAN Notices, 48:4, (343-356), Online publication date: 23-Apr-2013.
- Nandivada V, Shirako J, Zhao J and Sarkar V (2013). A Transformation Framework for Optimizing Task-Parallel Programs, ACM Transactions on Programming Languages and Systems, 35:1, (1-48), Online publication date: 1-Apr-2013.
- Oh T, Kim H, Johnson N, Lee J and August D (2013). Practical automatic loop specialization, ACM SIGARCH Computer Architecture News, 41:1, (419-430), Online publication date: 29-Mar-2013.
- Xiang X, Ding C, Luo H and Bao B (2013). HOTL, ACM SIGARCH Computer Architecture News, 41:1, (343-356), Online publication date: 29-Mar-2013.
- Vasilache N, Baskaran M, Meister B and Lethin R Memory reuse optimizations in the R-Stream compiler Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, (42-53)
- Oh T, Kim H, Johnson N, Lee J and August D Practical automatic loop specialization Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems, (419-430)
- Xiang X, Ding C, Luo H and Bao B HOTL Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems, (343-356)
- Barthe G, Crespo J, Gulwani S, Kunz C and Marron M From relational verification to SIMD loop synthesis Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, (123-134)
- August D, Huang J, Beard S, Johnson N and Jablin T Automatically exploiting cross-invocation parallelism using runtime information Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), (1-11)
- O'Boyle M, Wang Z and Grewe D Portable mapping of data parallel programs to OpenCL for heterogeneous systems Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), (1-10)
- Pouchet L, Zhang P, Sadayappan P and Cong J Polyhedral-based data reuse optimization for configurable computing Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays, (29-38)
- Bocchino R Alias control for deterministic parallelism Aliasing in Object-Oriented Programming, (156-195)
- Verdoolaege S, Carlos Juega J, Cohen A, Ignacio Gómez J, Tenllado C and Catthoor F (2013). Polyhedral parallel code generation for CUDA, ACM Transactions on Architecture and Code Optimization, 9:4, (1-23), Online publication date: 1-Jan-2013.
- Baghdadi R, Cohen A, Verdoolaege S and Trifunović K (2013). Improved loop tiling based on the removal of spurious false dependences, ACM Transactions on Architecture and Code Optimization, 9:4, (1-26), Online publication date: 1-Jan-2013.
- Cui H, Yi Q, Xue J and Feng X (2013). Layout-oblivious compiler optimization for matrix computations, ACM Transactions on Architecture and Code Optimization, 9:4, (1-20), Online publication date: 1-Jan-2013.
- Xydis S, Pekmestzi K, Soudris D and Economakos G (2013). Compiler-in-the-loop exploration during datapath synthesis for higher quality delay-area trade-offs, ACM Transactions on Design Automation of Electronic Systems, 18:1, (1-35), Online publication date: 1-Jan-2013.
- Ketterlin A and Clauss P Profiling Data-Dependence to Assist Parallelization Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, (437-448)
- Jo Y and Kulkarni M (2012). Automatically enhancing locality for tree traversals with traversal splicing, ACM SIGPLAN Notices, 47:10, (355-374), Online publication date: 15-Nov-2012.
- Li P, Wang Y, Zhang P, Luo G, Wang T and Cong J Memory partitioning and scheduling co-optimization in behavioral synthesis Proceedings of the International Conference on Computer-Aided Design, (488-495)
- Jo Y and Kulkarni M Automatically enhancing locality for tree traversals with traversal splicing Proceedings of the ACM international conference on Object oriented programming systems languages and applications, (355-374)
- Guo Z, Fan X, Chen R, Zhang J, Zhou H, McDirmid S, Liu C, Lin W, Zhou J and Zhou L Spotting code optimizations in data-parallel pipelines through PeriSCOPE Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, (121-133)
- Raman A, Lee J and August D From sequential programming to flexible parallel execution Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems, (37-40)
- Pellegrini S, Hoefler T and Fahringer T Exact dependence analysis for increased communication overlap Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface, (89-99)
- Oancea C, Andreetta C, Berthold J, Frisch A and Henglein F Financial software on GPUs Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing, (61-72)
- Kim S and Han H (2012). Efficient SIMD code generation for irregular kernels, ACM SIGPLAN Notices, 47:8, (55-64), Online publication date: 11-Sep-2012.
- Bikker J (2012). Improving Data Locality for Efficient In-Core Path Tracing, Computer Graphics Forum, 31:6, (1936-1947), Online publication date: 1-Sep-2012.
- Oancea C and Rauchwerger L (2012). Logical inference techniques for loop parallelization, ACM SIGPLAN Notices, 47:6, (509-520), Online publication date: 6-Aug-2012.
- Holewinski J, Ramamurthi R, Ravishankar M, Fauzia N, Pouchet L, Rountev A and Sadayappan P (2012). Dynamic trace-based analysis of vectorization potential of applications, ACM SIGPLAN Notices, 47:6, (371-382), Online publication date: 6-Aug-2012.
- Raman A, Zaks A, Lee J and August D (2012). Parcae, ACM SIGPLAN Notices, 47:6, (133-144), Online publication date: 6-Aug-2012.
- Yu H and Li Z Fast loop-level data dependence profiling Proceedings of the 26th ACM international conference on Supercomputing, (37-46)
- Ramachandra K, Guravannavar R and Sudarshan S Program analysis and transformation for holistic optimization of database applications Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program analysis, (39-44)
- Oancea C and Rauchwerger L Logical inference techniques for loop parallelization Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, (509-520)
- Holewinski J, Ramamurthi R, Ravishankar M, Fauzia N, Pouchet L, Rountev A and Sadayappan P Dynamic trace-based analysis of vectorization potential of applications Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, (371-382)
- Raman A, Zaks A, Lee J and August D Parcae Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, (133-144)
- Xu G, Yan D and Rountev A Static detection of loop-invariant data structures Proceedings of the 26th European conference on Object-Oriented Programming, (738-763)
- Cong J, Zhang P and Zou Y Optimizing memory hierarchy allocation with loop transformations for high-level synthesis Proceedings of the 49th Annual Design Automation Conference, (1233-1238)
- Campanoni S, Jones T, Holloway G, Wei G and Brooks D The HELIX project Proceedings of the 49th Annual Design Automation Conference, (277-282)
- Park Y, Seo S, Park H, Cho H and Mahlke S (2012). SIMD defragmenter, ACM SIGPLAN Notices, 47:4, (363-374), Online publication date: 1-Jun-2012.
- Bao B, Ding C, Gao Y and Archambault R Delta Send-Recv for Dynamic Pipelining in MPI Programs Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), (384-392)
- Zhang J, Zhou H, Chen R, Fan X, Guo Z, Lin H, Li J, Lin W, Zhou J and Zhou L Optimizing data shuffling in data-parallel computation by understanding user-defined functions Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, (22-22)
- Park Y, Seo S, Park H, Cho H and Mahlke S (2012). SIMD defragmenter, ACM SIGARCH Computer Architecture News, 40:1, (363-374), Online publication date: 18-Apr-2012.
- Zhou X, Giacalone J, Garzarán M, Kuhn R, Ni Y and Padua D Hierarchical overlapped tiling Proceedings of the Tenth International Symposium on Code Generation and Optimization, (207-218)
- Campanoni S, Jones T, Holloway G, Reddi V, Wei G and Brooks D HELIX Proceedings of the Tenth International Symposium on Code Generation and Optimization, (84-93)
- Unkule S, Shaltz C and Qasem A Automatic restructuring of GPU kernels for exploiting inter-thread data locality Proceedings of the 21st international conference on Compiler Construction, (21-40)
- Park Y, Seo S, Park H, Cho H and Mahlke S SIMD defragmenter Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, (363-374)
- Qasem A Efficient execution of time-step computations with pipelined parallelism and inter-thread data locality optimizaitions Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, (27-35)
- Kim S and Han H Efficient SIMD code generation for irregular kernels Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, (55-64)
- Burrows E and Haveraaen M Programmable data dependencies and placements Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming, (31-40)
- Stock K, Pouchet L and Sadayappan P (2012). Using machine learning to improve automatic vectorization, ACM Transactions on Architecture and Code Optimization, 8:4, (1-23), Online publication date: 1-Jan-2012.
- Feng M, Lin C and Gupta R (2012). PLDS, ACM Transactions on Architecture and Code Optimization, 8:4, (1-21), Online publication date: 1-Jan-2012.
- Owaida M, Bellas N, Antonopoulos C, Daloukas K and Antoniadis C Massively parallel programming models used as hardware description languages Proceedings of the International Conference on Computer-Aided Design, (326-333)
- Cong J, Zhang P and Zou Y Combined loop transformation and hierarchy allocation for data reuse optimization Proceedings of the International Conference on Computer-Aided Design, (185-192)
- Overbey J and Johnson R Differential precondition checking Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering, (303-312)
- Jo Y and Kulkarni M Enhancing locality for recursive traversals of recursive structures Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications, (463-482)
- Ke C, Liu L, Zhang C, Bai T, Jacobs B and Ding C Safe parallel programming using dynamic dependence hints Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications, (243-258)
- Jo Y and Kulkarni M (2011). Enhancing locality for recursive traversals of recursive structures, ACM SIGPLAN Notices, 46:10, (463-482), Online publication date: 18-Oct-2011.
- Ke C, Liu L, Zhang C, Bai T, Jacobs B and Ding C (2011). Safe parallel programming using dynamic dependence hints, ACM SIGPLAN Notices, 46:10, (243-258), Online publication date: 18-Oct-2011.
- Smith A and Kulkarni P Localizing globals and statics to make C programs thread-safe Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems, (205-214)
- Misailovic S, Roy D and Rinard M Probabilistically accurate program transformations Proceedings of the 18th international conference on Static analysis, (316-333)
- Burak D and Chudzik M Parallelization of the discrete chaotic block encryption algorithm Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II, (323-332)
- Kalinnik N, Korch M and Rauber T (2011). An efficient time-step-based self-adaptive algorithm for predictor-corrector methods of Runge-Kutta type, Journal of Computational and Applied Mathematics, 236:3, (394-410), Online publication date: 1-Sep-2011.
- Krzikalla O, Feldhoff K, Müller-Pfefferkorn R and Nagel W Scout Proceedings of the 2011 international conference on Parallel Processing - Volume 2, (137-145)
- Donaldson A, Kaiser A, Kroening D and Wahl T Symmetry-aware predicate abstraction for shared-variable concurrent programs Proceedings of the 23rd international conference on Computer aided verification, (356-371)
- Cong J, Huang H, Liu C and Zou Y A reuse-aware prefetching scheme for scratchpad memory Proceedings of the 48th Design Automation Conference, (960-965)
- Udupa A, Rajan K and Thies W ALTER Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, (480-491)
- Sato S and Iwasaki H Automatic parallelization via matrix multiplication Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, (470-479)
- Raman A, Kim H, Oh T, Lee J and August D Parallelism orchestration using DoPE Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, (26-37)
- Pingali K, Nguyen D, Kulkarni M, Burtscher M, Hassaan M, Kaleem R, Lee T, Lenharth A, Manevich R, Méndez-Lojo M, Prountzos D and Sui X The tao of parallelism in algorithms Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, (12-25)
- Prabhu P, Ghosh S, Zhang Y, Johnson N and August D Commutative set Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, (1-11)
- Udupa A, Rajan K and Thies W (2011). ALTER, ACM SIGPLAN Notices, 46:6, (480-491), Online publication date: 4-Jun-2011.
- Sato S and Iwasaki H (2011). Automatic parallelization via matrix multiplication, ACM SIGPLAN Notices, 46:6, (470-479), Online publication date: 4-Jun-2011.
- Raman A, Kim H, Oh T, Lee J and August D (2011). Parallelism orchestration using DoPE, ACM SIGPLAN Notices, 46:6, (26-37), Online publication date: 4-Jun-2011.
- Pingali K, Nguyen D, Kulkarni M, Burtscher M, Hassaan M, Kaleem R, Lee T, Lenharth A, Manevich R, Méndez-Lojo M, Prountzos D and Sui X (2011). The tao of parallelism in algorithms, ACM SIGPLAN Notices, 46:6, (12-25), Online publication date: 4-Jun-2011.
- Prabhu P, Ghosh S, Zhang Y, Johnson N and August D (2011). Commutative set, ACM SIGPLAN Notices, 46:6, (1-11), Online publication date: 4-Jun-2011.
- McFarlin D, Arbatov V, Franchetti F and Püschel M Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets Proceedings of the international conference on Supercomputing, (265-274)
- Rahman S, Yi Q and Qasem A Understanding stencil code performance on multicore architectures Proceedings of the 8th ACM International Conference on Computing Frontiers, (1-10)
- Bilardi G, Ekanadham K and Pattnaik P Efficient stack distance computation for priority replacement policies Proceedings of the 8th ACM International Conference on Computing Frontiers, (1-10)
- Newburn C, So B, Liu Z, McCool M, Ghuloum A, Toit S, Wang Z, Du Z, Chen Y, Wu G, Guo P, Liu Z and Zhang D Intel's Array Building Blocks Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, (224-235)
- Kandemir M, Zhang Y, Liu J and Yemliha T Neighborhood-aware data locality optimization for NoC-based multicores Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, (191-200)
- Nuzman D, Dyshel S, Rohou E, Rosen I, Williams K, Yuste D, Cohen A and Zaks A Vapor SIMD Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, (151-160)
- Henretty T, Stock K, Pouchet L, Franchetti F, Ramanujam J and Sadayappan P Data layout transformation for stencil computations on short-vector SIMD architectures Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software, (225-245)
- Kalinnik N, Korch M and Rauber T Dynamic selection of implementation variants of sequential iterated runge-kutta methods with tile size sampling Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering, (189-200)
- Cong J, Jiang W, Liu B and Zou Y (2011). Automatic memory partitioning and scheduling for throughput and power optimization, ACM Transactions on Design Automation of Electronic Systems, 16:2, (1-25), Online publication date: 1-Mar-2011.
- Liu M, Sha E, Zhuge Q, He Y and Qiu M (2011). Loop Distribution and Fusion with Timing and Code Size Optimization, Journal of Signal Processing Systems, 62:3, (325-340), Online publication date: 1-Mar-2011.
- Daloukas K, Antonopoulos C and Bellas N GLOpenCL Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers, (15-24)
- Qiu M, Niu J, Yang L, Qin X, Zhang S and Wang B Energy-Aware Loop Parallelism Maximization for Multi-core DSP Architectures Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing, (205-212)
- Barik R, Zhao J and Sarkar V Efficient Selection of Vector Instructions Using Dynamic Programming Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, (201-212)
- Kotha A, Anand K, Smithson M, Yellareddy G and Barua R Automatic Parallelization in a Binary Rewriter Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, (547-557)
- Kim H, Raman A, Liu F, Lee J and August D Scalable Speculative Parallelization on Commodity Clusters Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, (3-14)
- Pouchet L, Bondhugula U, Bastoul C, Cohen A, Ramanujam J and Sadayappan P Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, (1-11)
- Li G and Gopalakrishnan G Scalable SMT-based verification of GPU kernel functions Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering, (187-196)
- Yu C and Petrov P (2010). Energy- and Performance-Efficient Communication Framework for Embedded MPSoCs through Application-Driven Release Consistency, ACM Transactions on Design Automation of Electronic Systems, 16:1, (1-39), Online publication date: 1-Nov-2010.
- Palem K Compilers, architectures and synthesis for embedded computing Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems, (167-176)
- Méndez-Lojo M, Mathew A and Pingali K (2010). Parallel inclusion-based points-to analysis, ACM SIGPLAN Notices, 45:10, (428-443), Online publication date: 17-Oct-2010.
- Herzeel C and Costanza P (2010). Dynamic parallelization of recursive code, ACM SIGPLAN Notices, 45:10, (377-396), Online publication date: 17-Oct-2010.
- Tian K, Jiang Y, Zhang E and Shen X (2010). An input-centric paradigm for program dynamic optimizations, ACM SIGPLAN Notices, 45:10, (125-139), Online publication date: 17-Oct-2010.
- Chakraborty S and Nandivada V Inferring arbitrary distributions for data and computation Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion, (51-60)
- Méndez-Lojo M, Mathew A and Pingali K Parallel inclusion-based points-to analysis Proceedings of the ACM international conference on Object oriented programming systems languages and applications, (428-443)
- Herzeel C and Costanza P Dynamic parallelization of recursive code Proceedings of the ACM international conference on Object oriented programming systems languages and applications, (377-396)
- Tian K, Jiang Y, Zhang E and Shen X An input-centric paradigm for program dynamic optimizations Proceedings of the ACM international conference on Object oriented programming systems languages and applications, (125-139)
- Afek Y, Korland G and Zilberstein A Lowering STM overhead with static analysis Proceedings of the 23rd international conference on Languages and compilers for parallel computing, (31-45)
- Philippidis C and Shang W (2010). On minimizing register usage of linearly scheduled algorithms with uniform dependencies, Computer Languages, Systems and Structures, 36:3, (250-267), Online publication date: 1-Oct-2010.
- Qasem A, Guo J, Rahman F and Yi Q Exposing tunable parameters in multi-threaded numerical code Proceedings of the 2010 IFIP international conference on Network and parallel computing, (46-60)
- Nie J, Cheng B, Li S, Wang L and Li X Vectorization for Java Proceedings of the 2010 IFIP international conference on Network and parallel computing, (3-17)
- Vandierendonck H, Rul S and De Bosschere K The Paralax infrastructure Proceedings of the 19th international conference on Parallel architectures and compilation techniques, (389-400)
- Tournavitis G and Franke B Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information Proceedings of the 19th international conference on Parallel architectures and compilation techniques, (377-388)
- Lee J, Kim J, Seo S, Kim S, Park J, Kim H, Dao T, Cho Y, Seo S, Lee S, Cho S, Song H, Suh S and Choi J An OpenCL framework for heterogeneous multicores with local memory Proceedings of the 19th international conference on Parallel architectures and compilation techniques, (193-204)
- Zhao J, Shirako J, Nandivada V and Sarkar V Reducing task creation and termination overhead in explicitly parallel programs Proceedings of the 19th international conference on Parallel architectures and compilation techniques, (169-180)
- Purnaprajna M, Porrmann M, Rueckert U, Hussmann M, Thies M and Kastens U (2010). Runtime Reconfiguration of Multiprocessors Based on Compile-Time Analysis, ACM Transactions on Reconfigurable Technology and Systems, 3:3, (1-25), Online publication date: 1-Sep-2010.
- Lionetti F, McCulloch A and Baden S Source-to-source optimization of CUDA C for GPU accelerated cardiac cell modeling Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I, (38-49)
- Mak J, Faxén K, Janson S and Mycroft A Estimating and exploiting potential parallelism by source-level dependence profiling Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I, (26-37)
- Tian C, Feng M and Gupta R (2010). Speculative parallelization using state separation and multiple value prediction, ACM SIGPLAN Notices, 45:8, (63-72), Online publication date: 1-Aug-2010.
- Agullo E, Bouwmeester H, Dongarra J, Kurzak J, Langou J and Rosenberg L Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures Proceedings of the 9th international conference on High performance computing for computational science, (129-138)
- Kandemir M, Muralidhara S, Karakoy M and Son S Computation mapping for multi-level storage cache hierarchies Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, (179-190)
- Tian C, Feng M and Gupta R Speculative parallelization using state separation and multiple value prediction Proceedings of the 2010 international symposium on Memory management, (63-72)
- Zhang E, Jiang Y and Shen X (2010). Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?, ACM SIGPLAN Notices, 45:5, (203-212), Online publication date: 1-May-2010.
- Harper K, Zheng J and Mahate S Experiences in initiating concurrency software research efforts Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, (139-148)
- Steinberg R (2010). Mapping loop nests to multipipelined architecture, Programming and Computing Software, 36:3, (177-185), Online publication date: 1-May-2010.
- Jiang Y, Zhang E, Tian K, Mao F, Gethers M, Shen X and Gao Y Exploiting statistical correlations for proactive prediction of program behaviors Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, (248-256)
- Huang J, Raman A, Jablin T, Zhang Y, Hung T and August D Decoupled software pipelining creates parallelization opportunities Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, (121-130)
- Chen N and Johnson R Patterns for cache optimizations on multi-processor machines Proceedings of the 2010 Workshop on Parallel Programming Patterns, (1-10)
- Hormati A, Choi Y, Woh M, Kudlur M, Rabbah R, Mudge T and Mahlke S MacroSS Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems, (285-296)
- Raman A, Kim H, Mason T, Jablin T and August D Speculative parallelization using software multi-threaded transactions Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems, (65-76)
- Hormati A, Choi Y, Woh M, Kudlur M, Rabbah R, Mudge T and Mahlke S (2010). MacroSS, ACM SIGPLAN Notices, 45:3, (285-296), Online publication date: 5-Mar-2010.
- Raman A, Kim H, Mason T, Jablin T and August D (2010). Speculative parallelization using software multi-threaded transactions, ACM SIGPLAN Notices, 45:3, (65-76), Online publication date: 5-Mar-2010.
- Hormati A, Choi Y, Woh M, Kudlur M, Rabbah R, Mudge T and Mahlke S (2010). MacroSS, ACM SIGARCH Computer Architecture News, 38:1, (285-296), Online publication date: 5-Mar-2010.
- Raman A, Kim H, Mason T, Jablin T and August D (2010). Speculative parallelization using software multi-threaded transactions, ACM SIGARCH Computer Architecture News, 38:1, (65-76), Online publication date: 5-Mar-2010.
- Askitis N and Zobel J (2011). Redesigning the string hash table, burst trie, and BST to exploit cache, ACM Journal of Experimental Algorithmics, 15, (1.1-1.61), Online publication date: 1-Mar-2010.
- Zhang E, Jiang Y and Shen X Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, (203-212)
- Renganarayana L, Bondhugula U, Derisavi S, Eichenberger A and O'Brien K Compact multi-dimensional kernel extraction for register tiling Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, (1-12)
- Cong J, Jiang W, Liu B and Zou Y Automatic memory partitioning and scheduling for throughput and power optimization Proceedings of the 2009 International Conference on Computer-Aided Design, (697-704)
- Barik R, Budimlic Z, Cavè V, Chatterjee S, Guo Y, Peixotto D, Raman R, Shirako J, Taşırlar S, Yan Y, Zhao Y and Sarkar V The habanero multicore software research project Proceedings of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications, (735-736)
- Liu D, Shao Z, Wang M, Guo M and Xue J Optimal loop parallelization for maximizing iteration-level parallelism Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, (67-76)
- Kwiatkowski J and Iwaszyn R Automatic program parallelization for multicore processors Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I, (236-245)
- Bielecki W and Palkowski M Extracting both affine and non-linear synchronization-free slices in program loops Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I, (196-205)
- Kulkarni M, Pingali K, Walter B, Ramanarayanan G, Bala K and Chew L (2009). Optimistic parallelism requires abstractions, Communications of the ACM, 52:9, (89-97), Online publication date: 1-Sep-2009.
- Leung A, Lhoták O and Lashari G Automatic parallelization for graphics processing units Proceedings of the 7th International Conference on Principles and Practice of Programming in Java, (91-100)
- Zhong Y, Shen X and Ding C (2009). Program locality analysis using reuse distance, ACM Transactions on Programming Languages and Systems, 31:6, (1-39), Online publication date: 1-Aug-2009.
- Bilardi G, Ekanadham K and Pattnaik P (2009). On approximating the ideal random access machine by physical machines, Journal of the ACM, 56:5, (1-57), Online publication date: 1-Aug-2009.
- Mak J and Mycroft A Limits of parallelism using dynamic dependency graphs Proceedings of the Seventh International Workshop on Dynamic Analysis, (42-48)
- Long S and Fursin G (2009). Systematic search within an optimisation space based on Unified Transformation Framework, International Journal of Computational Science and Engineering, 4:2, (102-111), Online publication date: 1-Jul-2009.
- Tournavitis G, Wang Z, Franke B and O'Boyle M Towards a holistic approach to auto-parallelization Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, (177-187)
- Mehrara M, Hao J, Hsu P and Mahlke S Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, (166-176)
- Shirako J, Zhao J, Nandivada V and Sarkar V Chunking parallel loops in the presence of synchronization Proceedings of the 23rd international conference on Supercomputing, (181-192)
- Tournavitis G, Wang Z, Franke B and O'Boyle M (2009). Towards a holistic approach to auto-parallelization, ACM SIGPLAN Notices, 44:6, (177-187), Online publication date: 28-May-2009.
- Mehrara M, Hao J, Hsu P and Mahlke S (2009). Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory, ACM SIGPLAN Notices, 44:6, (166-176), Online publication date: 28-May-2009.
- Liao C, Quinlan D, Willcock J and Panas T Extending Automatic Parallelization to Optimize High-Level Abstractions for Multicore Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism, (28-41)
- Dimitroulakos G, Kostaras N, Galanis M and Goutis C (2009). Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays, The Journal of Supercomputing, 48:2, (115-151), Online publication date: 1-May-2009.
- Kelsey K, Bai T, Ding C and Zhang C Fast Track Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization, (157-168)
- Magee J and Qasem A A case for compiler-driven superpage allocation Proceedings of the 47th annual ACM Southeast Conference, (1-4)
- Jang B, Do S, Pien H and Kaeli D Architecture-aware optimization targeting multithreaded stream computing Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, (62-70)
- Kulkarni M, Burtscher M, Inkulu R, Pingali K and Casçaval C (2009). How much parallelism is there in irregular applications?, ACM SIGPLAN Notices, 44:4, (3-14), Online publication date: 14-Feb-2009.
- Kulkarni M, Burtscher M, Inkulu R, Pingali K and Casçaval C How much parallelism is there in irregular applications? Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, (3-14)
- Cooper K, Eckhardt J and Kennedy K Redundancy elimination revisited Proceedings of the 17th international conference on Parallel architectures and compilation techniques, (12-21)
- Nuzman D and Zaks A Outer-loop vectorization Proceedings of the 17th international conference on Parallel architectures and compilation techniques, (2-11)
- Ghodrat M, Givargis T and Nicolau A Control flow optimization in loops using interval analysis Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, (157-166)
- Leha A, Chalabine M and Kessler C Parallelizing scientific code with invasive interactive parallelization Proceedings of the 2008 compFrame/HPC-GECO workshop on Component based high performance, (1-10)
- Arenaz M, Touriño J and Doallo R (2008). XARK, ACM Transactions on Programming Languages and Systems, 30:6, (1-56), Online publication date: 1-Oct-2008.
- Youseff L, Seymour K, You H, Dongarra J and Wolski R The impact of paravirtualized memory hierarchy on linear algebra computational kernels and software Proceedings of the 17th international symposium on High performance distributed computing, (141-152)
- Pouchet L, Bastoul C, Cohen A and Cavazos J Iterative optimization in the polyhedral model Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, (90-100)
- Shou Y and van Engelen R Automatic SIMD vectorization of chains of recurrences Proceedings of the 22nd annual international conference on Supercomputing, (245-255)
- Pouchet L, Bastoul C, Cohen A and Cavazos J (2008). Iterative optimization in the polyhedral model, ACM SIGPLAN Notices, 43:6, (90-100), Online publication date: 30-May-2008.
- Rodrigues C, Hardy D, Stone J, Schulten K and Hwu W GPU acceleration of cutoff pair potentials for molecular modeling applications Proceedings of the 5th conference on Computing frontiers, (273-282)
- Nuzman D, Namolaru M, Zaks A and Derby J Compiling for an indirect vector register architecture Proceedings of the 5th conference on Computing frontiers, (199-208)
- Kotzmann T, Wimmer C, Mössenböck H, Rodriguez T, Russell K and Cox D (2008). Design of the Java HotSpot™ client compiler for Java 6, ACM Transactions on Architecture and Code Optimization, 5:1, (1-32), Online publication date: 1-May-2008.
- Hampton M and Asanovic K Compiling for vector-thread architectures Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization, (205-215)
- Ryoo S, Rodrigues C, Stone S, Baghsorkhi S, Ueng S, Stratton J and Hwu W Program optimization space pruning for a multithreaded gpu Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization, (195-204)
- Raman E, Va hharajani N, Rangan R and August D Spice Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization, (175-184)
- Raman E, Ottoni G, Raman A, Bridges M and August D Parallel-stage decoupled software pipelining Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization, (114-123)
- Suleman M, Qureshi M and Patt Y (2008). Feedback-driven threading, ACM SIGPLAN Notices, 43:3, (277-286), Online publication date: 25-Mar-2008.
- Kulkarni M, Pingali K, Ramanarayanan G, Walter B, Bala K and Chew L (2008). Optimistic parallelism benefits from data partitioning, ACM SIGPLAN Notices, 43:3, (233-243), Online publication date: 25-Mar-2008.
- Suleman M, Qureshi M and Patt Y (2008). Feedback-driven threading, ACM SIGOPS Operating Systems Review, 42:2, (277-286), Online publication date: 25-Mar-2008.
- Kulkarni M, Pingali K, Ramanarayanan G, Walter B, Bala K and Chew L (2008). Optimistic parallelism benefits from data partitioning, ACM SIGOPS Operating Systems Review, 42:2, (233-243), Online publication date: 25-Mar-2008.
- Suleman M, Qureshi M and Patt Y (2008). Feedback-driven threading, ACM SIGARCH Computer Architecture News, 36:1, (277-286), Online publication date: 25-Mar-2008.
- Kulkarni M, Pingali K, Ramanarayanan G, Walter B, Bala K and Chew L (2008). Optimistic parallelism benefits from data partitioning, ACM SIGARCH Computer Architecture News, 36:1, (233-243), Online publication date: 25-Mar-2008.
- Suleman M, Qureshi M and Patt Y Feedback-driven threading Proceedings of the 13th international conference on Architectural support for programming languages and operating systems, (277-286)
- Kulkarni M, Pingali K, Ramanarayanan G, Walter B, Bala K and Chew L Optimistic parallelism benefits from data partitioning Proceedings of the 13th international conference on Architectural support for programming languages and operating systems, (233-243)
- Ryoo S, Rodrigues C, Baghsorkhi S, Stone S, Kirk D and Hwu W Optimization principles and application performance evaluation of a multithreaded GPU using CUDA Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, (73-82)
- Reid W, Kelly W and Craik A Reasoning about inherent parallelism in modern object-oriented languages Proceedings of the thirty-first Australasian conference on Computer science - Volume 74, (27-36)
- Berzal F, Cubero J and Jiménez A Hierarchical program representation for program element matching Proceedings of the 8th international conference on Intelligent data engineering and automated learning, (467-476)
- Berzal F, Cubero J and Jiménez A Hierarchical Program Representation for Program Element Matching Intelligent Data Engineering and Automated Learning - IDEAL 2007, (467-476)
- Beletska A, Bielecki W and Pietro P Extracting synchronization-free slices of operations in perfectly-nested loops Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems, (244-249)
- Lokhmotov A, Gaster B, Mycroft A, Hickey N and Stuttard D Revisiting SIMD Programming Languages and Compilers for Parallel Computing, (32-46)
- Fritz N, Lucas P and Wilhelm R Exploiting SIMD Parallelism with the CGiS Compiler Framework Languages and Compilers for Parallel Computing, (246-260)
- Absar J, Li M, Raghavan P, Lambrechts A, Jayapala M, Vandecappelle A and Catthoor F Locality optimization in wireless applications Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis, (125-130)
- Fellahi M, Cohen A and Touati S Code-size conscious pipelining of imperfectly nested loops Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, (49-55)
- Xiao S and Lai E (2007). VLIW instruction scheduling for minimal power variation, ACM Transactions on Architecture and Code Optimization, 4:3, (18-es), Online publication date: 1-Sep-2007.
- Korch M and Rauber T Locality optimized shared-memory implementations of iterated runge-kutta methods Proceedings of the 13th international Euro-Par conference on Parallel Processing, (737-747)
- Lokhmotov A, Mycroft A and Richards A Delayed side-effects ease multi-core programming Proceedings of the 13th international Euro-Par conference on Parallel Processing, (641-650)
- Donaldson A, Riley C, Lokhmotov A and Cook A Auto-parallelisation of sieve C++ programs Proceedings of the 2007 conference on Parallel processing, (18-27)
- Ryoo S, Ueng S, Rodrigues C, Kidd R, Frank M and Hwu W Automatic Discovery of Coarse-Grained Parallelism in Media Applications Transactions on High-Performance Embedded Architectures and Compilers I, (194-213)
- Zelenov S and Zelenova S Model-based testing of optimizing compilers Proceedings of the 19th IFIP TC6/WG6.1 international conference, and 7th international conference on Testing of Software and Communicating Systems, (365-377)
- Ding C, Shen X, Kelsey K, Tice C, Huang R and Zhang C Software behavior oriented parallelization Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, (223-234)
- Kulkarni M, Pingali K, Walter B, Ramanarayanan G, Bala K and Chew L Optimistic parallelism requires abstractions Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, (211-222)
- Ding C, Shen X, Kelsey K, Tice C, Huang R and Zhang C (2007). Software behavior oriented parallelization, ACM SIGPLAN Notices, 42:6, (223-234), Online publication date: 10-Jun-2007.
- Kulkarni M, Pingali K, Walter B, Ramanarayanan G, Bala K and Chew L (2007). Optimistic parallelism requires abstractions, ACM SIGPLAN Notices, 42:6, (211-222), Online publication date: 10-Jun-2007.
- Yotov K, Roeder T, Pingali K, Gunnels J and Gustavson F An experimental comparison of cache-oblivious and cache-conscious programs Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, (93-104)
- Kennedy K, Koelbel C and Zima H The rise and fall of High Performance Fortran Proceedings of the third ACM SIGPLAN conference on History of programming languages, (7-1-7-22)
- Dimitroulakos G, Galanis M, Kostaras N and Goutis C A unified evaluation framework for coarse grained reconfigurable array architectures Proceedings of the 4th international conference on Computing frontiers, (161-172)
- Fireman L, Petrank E and Zaks A New algorithms for SIMD alignment Proceedings of the 16th international conference on Compiler construction, (1-15)
- Gontmakher A, Mendelson A and Schuster A Using fine grain multithreading for energy efficient computing Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, (259-269)
- Pouchet L, Bastoul C, Cohen A and Vasilache N Iterative Optimization in the Polyhedral Model Proceedings of the International Symposium on Code Generation and Optimization, (144-156)
- Birkbeck N, Levesque J and Amaral J A Dimension Abstraction Approach to Vectorization in Matlab Proceedings of the International Symposium on Code Generation and Optimization, (115-130)
- Gill G, Hansen J and Singh M Loop pipelining for high-throughput stream computation using self-timed rings Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design, (289-296)
- Wang S, Zhai A and Yew P Exploiting speculative thread-level parallelism in data compression applications Proceedings of the 19th international conference on Languages and compilers for parallel computing, (126-140)
- Zhao Y and Kennedy K Dependence-based code generation for a CELL processor Proceedings of the 19th international conference on Languages and compilers for parallel computing, (64-79)
- Audsley N and Ward M Syntax-driven implementation of software programming language control constructs and expressions on FPGAs Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, (253-260)
- Birch J, van Engelen R, Gallivan K and Shou Y An empirical evaluation of chains of recurrences for array dependence testing Proceedings of the 15th international conference on Parallel architectures and compilation techniques, (295-304)
- Cohen A, Donadio S, Garzaran M, Herrmann C, Kiselyov O and Padua D (2006). In search of a program generator to implement generic transformations for high-performance computing, Science of Computer Programming, 62:1, (25-46), Online publication date: 1-Sep-2006.
- Parsa S and Lotfi S (2006). A New Genetic Algorithm for Loop Tiling, The Journal of Supercomputing, 37:3, (249-269), Online publication date: 1-Sep-2006.
- Hu Z, del Cuvillo J, Zhu W and Gao G Optimization of dense matrix multiplication on IBM cyclops-64 Proceedings of the 12th international conference on Parallel Processing, (134-144)
- Vasilache N, Bastoul C, Cohen A and Girbal S Violated dependence analysis Proceedings of the 20th annual international conference on Supercomputing, (335-344)
- Parsa S and Lotfi S Loop parallelization in multi-dimensional cartesian space Proceedings of the 6th international Andrei Ershov memorial conference on Perspectives of systems informatics, (335-348)
- Zumbusch G Data dependence analysis for the parallelization of numerical tree codes Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing, (890-899)
- Bocchino R and Adve V Vector LLVA Proceedings of the 2nd international conference on Virtual execution environments, (46-56)
- Nuzman D, Rosen I and Zaks A Auto-vectorization of interleaved data for SIMD Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation, (132-143)
- Nuzman D, Rosen I and Zaks A (2006). Auto-vectorization of interleaved data for SIMD, ACM SIGPLAN Notices, 41:6, (132-143), Online publication date: 11-Jun-2006.
- Dimitroulakos G, Galanis M and Goutis C Exploring the design space of an optimized compiler approach for mesh-like coarse-grained reconfigurable architectures Proceedings of the 20th international conference on Parallel and distributed processing, (113-113)
- Galanis M, Dimitroulakos G and Goutis C Design flow for optimizing performance in processor systems with on-chip coarse-grain reconfigurable logic Proceedings of the 20th international conference on Parallel and distributed processing, (112-112)
- Zhang Z and Seidel S A performance model for fine-grain accesses in UPC Proceedings of the 20th international conference on Parallel and distributed processing, (65-65)
- Absar J and Catthoor F (2006). Reuse analysis of indirectly indexed arrays, ACM Transactions on Design Automation of Electronic Systems, 11:2, (282-305), Online publication date: 1-Apr-2006.
- Zhang T, Zhuang X and Pande S Compiler Optimizations to Reduce Security Overhead Proceedings of the International Symposium on Code Generation and Optimization, (346-357)
- Son S, Chen G and Kandemir M A Compiler-Guided Approach for Reducing Disk Power Consumption by Exploiting Disk Access Locality Proceedings of the International Symposium on Code Generation and Optimization, (256-268)
- Tang P Complete inlining of recursive calls Proceedings of the 44th annual ACM Southeast Conference, (579-584)
- Dongarra J, Bosilca G, Chen Z, Eijkhout V, Fagg G, Fuentes E, Langou J, Luszczek P, Pjesivac-Grbovic J, Seymour K, You H and Vadhiyar S (2006). Self-adapting numerical software (SANS) effort, IBM Journal of Research and Development, 50:2/3, (223-238), Online publication date: 1-Mar-2006.
- Liu M, Zhuge Q, Shao Z, Xue C, Qiu M and Sha E Loop distribution and fusion with timing and code size optimization for embedded DSPs Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing, (121-130)
- Yang H, Govindarajan R, Gao G and Hu Z (2005). Improving power efficiency with compiler-assisted cache replacement, Journal of Embedded Computing, 1:4, (487-499), Online publication date: 1-Dec-2005.
- Pop S, Cohen A and Silber G Induction variable analysis with delayed abstractions Proceedings of the First international conference on High Performance Embedded Architectures and Compilers, (218-232)
- Weinberg J, McCracken M, Strohmaier E and Snavely A Quantifying Locality In The Memory Access Patterns of HPC Applications Proceedings of the 2005 ACM/IEEE conference on Supercomputing
- Larsen S, Rabbah R and Amarasinghe S Exploiting Vector Parallelism in Software Pipelined Loops Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, (119-129)
- Ottoni G, Rangan R, Stoler A and August D Automatic Thread Extraction with Decoupled Software Pipelining Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, (105-118)
- Zuck L, Pnueli A, Goldberg B, Barrett C, Fang Y and Hu Y (2005). Translation and Run-Time Validation of Loop Transformations, Formal Methods in System Design, 27:3, (335-360), Online publication date: 1-Nov-2005.
- Chalabine M and Kessler C Parallelisation of sequential programs by invasive composition and aspect weaving Proceedings of the 6th international conference on Advanced Parallel Processing Technologies, (131-140)
- Shen X and Ding C Parallelization of utility programs based on behavior phase analysis Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing, (425-432)
- Epshteyn A, Garzaran M, DeJong G, Padua D, Ren G, Li X, Yotov K and Pingali K Analytic models and empirical search Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing, (259-273)
- Renganarayana L, Ramakrishna U and Rajopadhye S Combined ILP and register tiling Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing, (244-258)
- Yotov K, Jackson S, Steele T, Pingali K and Stodghill P Automatic measurement of instruction cache capacity Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing, (230-243)
- Matosevic I, Abdelrahman T, Karim F and Mellan A Power optimizations for the MLCA using dynamic voltage scaling Proceedings of the 2005 workshop on Software and compilers for embedded systems, (109-123)
- Narasamdya I and Voronkov A Finding basic block and variable correspondence Proceedings of the 12th international conference on Static Analysis, (251-267)
- Johnson J, Krandick W and Ruslanov A Architecture-aware classical Taylor shift by 1 Proceedings of the 2005 international symposium on Symbolic and algebraic computation, (200-207)
- Barrett C, Fang Y, Goldberg B, Hu Y, Pnueli A and Zuck L TVOC Proceedings of the 17th international conference on Computer Aided Verification, (291-295)
- Yotov K, Pingali K and Stodghill P Think globally, search locally Proceedings of the 19th annual international conference on Supercomputing, (141-150)
- Shen X, Gao Y, Ding C and Archambault R Lightweight reference affinity analysis Proceedings of the 19th annual international conference on Supercomputing, (131-140)
- Ni Y, Kremer U, Stere A and Iftode L Programming ad-hoc networks of mobile and resource-constrained devices Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, (249-260)
- Ni Y, Kremer U, Stere A and Iftode L (2005). Programming ad-hoc networks of mobile and resource-constrained devices, ACM SIGPLAN Notices, 40:6, (249-260), Online publication date: 12-Jun-2005.
- Yotov K, Pingali K and Stodghill P (2005). Automatic measurement of memory hierarchy parameters, ACM SIGMETRICS Performance Evaluation Review, 33:1, (181-192), Online publication date: 6-Jun-2005.
- Yotov K, Pingali K and Stodghill P Automatic measurement of memory hierarchy parameters Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, (181-192)
- Alam S and Vetter J Performance and scalability analysis of cray x1 vectorization and multistreaming optimization Proceedings of the 5th international conference on Computational Science - Volume Part I, (304-312)
- Chen G, Chen G, Ozturk O and Kandemir M Exploiting Inter-Processor Data Sharing for Improving Behavior of Multi-Processor SoCs Proceedings of the IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design, (90-95)
- Agerwala T and Chatterjee S (2005). Computer Architecture, IEEE Micro, 25:3, (58-69), Online publication date: 1-May-2005.
- Grelck C (2005). Shared memory multiprocessor support for functional array processing in SAC, Journal of Functional Programming, 15:3, (353-401), Online publication date: 1-May-2005.
- Guo Z, Wang X and Zhou A WSQuery Proceedings of the 10th international conference on Database Systems for Advanced Applications, (372-384)
- Barton C, Tal A, Blainey B and Amaral J Generalized index-set splitting Proceedings of the 14th international conference on Compiler Construction, (106-120)
- Shashidhar K, Bruynooghe M, Catthoor F and Janssens G Verification of source code transformations by program equivalence checking Proceedings of the 14th international conference on Compiler Construction, (221-236)
- Shin J, Hall M and Chame J Superword-Level Parallelism in the Presence of Control Flow Proceedings of the international symposium on Code generation and optimization, (165-175)
- Edwards S The Challenges of Hardware Synthesis from C-Like Languages Proceedings of the conference on Design, Automation and Test in Europe - Volume 1, (66-67)
- Shashidhar K, Bruynooghe M, Catthoor F and Janssens G Functional Equivalence Checking for Verification of Algebraic Transformations on Array-Intensive Source Code Proceedings of the conference on Design, Automation and Test in Europe - Volume 2, (1310-1315)
- Beletskyy V and Burak D Parallelization of the data encryption standard(DES) algorithm Enhanced methods in computer security, biometric and artificial intelligence systems, (23-33)
- Zhao Y and Kennedy K (2005). Scalarization using loop alignment and loop skewing, The Journal of Supercomputing, 31:1, (5-46), Online publication date: 1-Jan-2005.
- Ding C and Orlovich M The Potential of Computation Regrouping for Improving Locality Proceedings of the 2004 ACM/IEEE conference on Supercomputing
- Brifault K and Charles H Efficient data driven run-time code generation Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems, (1-7)
- Zhu Y, Magklis G, Scott M, Ding C and Albonesi D The Energy Impact of Aggressive Loop Fusion Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, (153-164)
- Liu M, Zhuge Q, Shao Z and Sha E General loop fusion technique for nested loops considering timing and code size Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, (190-201)
- Shen X, Zhong Y and Ding C Phase-Based miss rate prediction across program inputs Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing, (42-55)
- Baradaran N, Diniz P and Park J Extending the applicability of scalar replacement to multiple induction variables Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing, (455-469)
- Zhang G, Unnikrishnan P and Ren J Experiments with auto-parallelizing SPEC2000FP benchmarks Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing, (348-362)
- Yi Q and Quinlan D Applying loop optimizations to object-oriented abstractions through general classification of array semantics Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing, (253-267)
- Rauber T and Rünger G (2004). Improving locality for ODE solvers by program transformations, Scientific Programming, 12:3, (133-154), Online publication date: 1-Aug-2004.
- Carribault P and Cohen A Applications of storage mapping optimization to register promotion Proceedings of the 18th annual international conference on Supercomputing, (247-256)
- Drakenberg N A matrix-type for performance–portability Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing, (237-246)
- Kowarschik M, Christadler I and Rüde U Towards cache-optimized multigrid using patch-adaptive relaxation Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing, (901-910)
- Zhong Y, Orlovich M, Shen X and Ding C (2004). Array regrouping and structure splitting using whole-program reference affinity, ACM SIGPLAN Notices, 39:6, (255-266), Online publication date: 9-Jun-2004.
- Zhong Y, Orlovich M, Shen X and Ding C Array regrouping and structure splitting using whole-program reference affinity Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, (255-266)
- Yi Q, Kennedy K, You H, Seymour K and Dongarra J Automatic blocking of QR and LU factorizations for locality Proceedings of the 2004 workshop on Memory system performance, (12-22)
- Yi Q and Kennedy K (2004). Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion, International Journal of High Performance Computing Applications, 18:2, (237-253), Online publication date: 1-May-2004.
- Allen R and Kennedy K (2004). Automatic loop interchange, ACM SIGPLAN Notices, 39:4, (75-90), Online publication date: 1-Apr-2004.
- Li X, Garzarán M and Padua D A Dynamically Tuned Sorting Library Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
- Rong H, Tang Z, Govindarajan R, Douillet A and Gao G Single-Dimension Software Pipelining for Multi-Dimensional Loops Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
- Yi Q, Kennedy K and Adve V (2004). Transforming Complex Loop Nests for Locality, The Journal of Supercomputing, 27:3, (219-264), Online publication date: 1-Mar-2004.
- Song L and Kavi K (2004). What can we gain by unfolding loops?, ACM SIGPLAN Notices, 39:2, (26-33), Online publication date: 1-Feb-2004.
- Ding C and Kennedy K (2004). Improving effective bandwidth through compiler enhancement of global cache reuse, Journal of Parallel and Distributed Computing, 64:1, (108-134), Online publication date: 1-Jan-2004.
- Scholz S (2003). Single Assignment C: efficient support for high-level array operations in a functional setting, Journal of Functional Programming, 13:6, (1005-1059), Online publication date: 1-Nov-2003.
- Chen M and Olukotun K The Jrpm system for dynamically parallelizing Java programs Proceedings of the 30th annual international symposium on Computer architecture, (434-446)
- Ding C and Zhong Y Predicting whole-program locality through reuse distance analysis Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, (245-257)
- Yotov K, Li X, Ren G, Cibulskis M, DeJong G, Garzaran M, Padua D, Pingali K, Stodghill P and Wu P A comparison of empirical and model-driven optimization Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, (63-76)
- Ding C and Zhong Y (2003). Predicting whole-program locality through reuse distance analysis, ACM SIGPLAN Notices, 38:5, (245-257), Online publication date: 9-May-2003.
- Yotov K, Li X, Ren G, Cibulskis M, DeJong G, Garzaran M, Padua D, Pingali K, Stodghill P and Wu P (2003). A comparison of empirical and model-driven optimization, ACM SIGPLAN Notices, 38:5, (63-76), Online publication date: 9-May-2003.
- Chen M and Olukotun K (2003). The Jrpm system for dynamically parallelizing Java programs, ACM SIGARCH Computer Architecture News, 31:2, (434-446), Online publication date: 1-May-2003.
- Ghosh S, Kanhere A, Krishnaiyer R, Kulkarni D, Li W, Lim C and Ng J Integrating high-level optimizations in a production compiler Proceedings of the 12th international conference on Compiler construction, (303-319)
- Chen M and Olukotun K TEST Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, (301-312)
- Carter L, Ferrante J and Thomborson C (2003). Folklore confirmed, ACM SIGPLAN Notices, 38:1, (106-114), Online publication date: 15-Jan-2003.
- Carter L, Ferrante J and Thomborson C Folklore confirmed Proceedings of the 30th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, (106-114)
- Dongarra J, Foster I, Fox G, Gropp W, Kennedy K, Torczon L and White A References Sourcebook of parallel computing, (729-789)
- Meyer U, Sanders P and Sibeyn J (2003). Algorithms for memory hierarchies, 10.5555/1744652, Online publication date: 1-Jan-2003.
- Grelck C and Scholz S Axis control in SAC Proceedings of the 14th international conference on Implementation of functional languages, (182-198)
- Bik A, Girkar M, Grey P and Tian X Automatic detection of saturation and clipping idioms Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing, (61-74)
- Tan M, Liu G, Zhao R, Dai S and Zhang Z ElasticFlow: A complexity-effective approach for pipelining irregular loop nests 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), (78-85)
Index Terms
- Optimizing compilers for modern architectures: a dependence-based approach
Recommendations
Using Graph Models in Retargetable Optimizing Compilers for Microprocessors with VLIW Architectures
A mathematical model of programs, which is based on the concept of a hierarchical graph, is described. The model is used in the retargetable optimizing compiler NVRK-2 for microprocessor architectures with irregular very long instruction words (VLIWs). ...
Compilers and parallel architectures (abstract only): sequential to parallel mapping strategies
CSC '87: Proceedings of the 15th annual conference on Computer ScienceThe parallel optimizing compiler is offered as the only viable means of fully exploiting the power of parallel architectures and applying it to mainstream computing problems. In this context, “mainstream” includes -but should not be limited to- ...