research-article

Evolving to Find Optimizations Humans Miss: Using Evolutionary Computation to Improve GPU Code for Bioinformatics Applications

Authors:

Steven Hofmeyr,

Carole-Jean Wu,

Stephanie ForrestAuthors Info & Claims

ACM Transactions on Evolutionary Learning, Volume 4, Issue 4

Article No.: 21, Pages 1 - 29

https://doi.org/10.1145/3703920

Published: 29 November 2024 Publication History

Abstract

GPUs are used in many settings to accelerate large-scale scientific computation, including simulation, computational biology, and molecular dynamics. However, optimizing codes to run efficiently on GPUs requires developers to have both detailed understanding of the application logic and significant knowledge of parallel programming and GPU architectures. This paper shows that an automated GPU program optimization tool, GEVO, can leverage evolutionary computation to find code edits that reduce the runtime of three important applications, multiple sequence alignment, agent-based simulation and molecular dynamics codes, by 28.9%, 29%, and 17.8% respectively. The paper presents an in-depth analysis of the discovered optimizations, revealing that (1) several of the most important optimizations involve significant epistasis, (2) the primary sources of improvement are application-specific, and (3) many of the optimizations generalize across GPU architectures. In general, the discovered optimizations are not straightforward even for a GPU human expert, showcasing the potential of automated program optimization tools to both reduce the optimization burden for human domain experts and provide new insights for GPU experts.

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conf. on Operating Systems Design and Implementation.

[2]

Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified pre-training for program understanding and generation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, 2655–2668. Retrieved from https://www.aclweb.org/anthology/2021.naacl-main.211

[3]

Rajeev Alur, Rastislav Bodik, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-guided synthesis.

[4]

Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et al. 2021. Program synthesis with large language models. arXiv:2108.07732. Retrieved from https://arxiv.org/abs/2108.07732

[5]

Muaaz G Awan, Jack Deslippe, Aydin Buluc, Oguz Selvitopi, Steven Hofmeyr, Leonid Oliker, and Katherine Yelick. 2020. ADEPT: A domain independent sequence alignment strategy for gpu architectures. BMC Bioinformatics 21, 1 (2020), 1–29.

[6]

Sorav Bansal and Alex Aiken. 2006. Automatic generation of peephole superoptimizers. SIGARCH Computer Architecture News 34, 5 (2006), 394–403.

Digital Library

[7]

Gilles Barthe, Juan Manuel Crespo, Sumit Gulwani, Cesar Kunz, and Mark Marron. 2013. From relational verification to SIMD loop synthesis. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’13). ACM, New York, NY, 123–134. DOI:

Digital Library

[8]

William Bateson. 1909. Mendel’s Principles of Heredity. Cambridge University Press, Cambridge.

[9]

Sylvie Boldo, Marc Daumas, and Ren-Cang Li. 2008. Formally verified argument reduction with a fused multiply-add. IEEE Transactions on Computers 58, 8 (2008), 1139–1145.

Digital Library

[10]

Nicolas Brisebarre, David Defour, Peter Kornerup, J-M Muller, and Nathalie Revol. 2005. A new range-reduction algorithm. IEEE Transactions on Computers 54, 3 (2005), 331–339.

Digital Library

[11]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, 1877–1901.

Digital Library

[12]

Alexander Brownlee, Jason Adair, Saemundur Haraldsson, and John Jabbo. 2021. Exploring the accuracy-energy trade-off in machine learning. In Proceedings of the Genetic Improvement Workshop at 43rd International Conference on Software Engineering. ACM, New York, NY.

[13]

Bobby R. Bruce, Justyna Petke, and Mark Harman. 2015. Reducing energy consumption using genetic improvement. In Proceedings of the 17th Annual Conference on Genetic and Evolutionary Computation.

Digital Library

[14]

Bobby Ralph Bruce, Justyna Petke, Mark Harman, and Earl T. Barr. 2019. Approximate oracles and synergy in software energy search spaces. IEEE Transactions on Software Engineering 45, 11 (2019), 1150–1169.

[15]

Sebastian Buchwald, Andreas Fried, and Sebastian Hack. 2018. Synthesizing an instruction selection rule library from semantic specifications. In Proceedings of the International Symposium on Code Generation and Optimization (CGO ’18). ACM, New York, NY, 300–313. DOI:

Digital Library

[16]

Rudy Bunel, Alban Desmaison, M. Pawan Kumar, Philip H. S. Torr, and Pushmeet Kohli. 2017. Learning to superoptimize programs. In Proceedings of the International Conference on Learning Representations.

[17]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv:2107.03374. Retrieved from https://arxiv.org/abs/2107.03374

[18]

Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274. Retrieved from https://arxiv.org/abs/1512.01274

[19]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. \(\{\)TVM\(\}\): An automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 18), 578–594.

[20]

Berkeley Churchill, Rahul Sharma, J. F. Bastien, and Alex Aiken. 2017. Sound loop superoptimization for google native client. SIGARCH Computer Architecture News 45, 1 (2017), 313–326.

Digital Library

[21]

Colin B Clement, Dawn Drain, Jonathan Timcheck, Alexey Svyatkovskiy, and Neel Sundaresan. 2020. PyMT5: Multi-mode translation of natural language and Python code with transformers. arXiv:2010.03150. Retrieved from https://arxiv.org/abs/2010.03150

[22]

Chris Cummins, Volker Seeker, Dejan Grubisic, Mostafa Elhoushi, Youwei Liang, Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Kim Hazelwood, Gabriel Synnaeve, et al. 2023. Large language models for compiler optimization. arXiv:2309.07062. Retrieved from https://arxiv.org/abs/2309.07062

[23]

Florent De Dinechin, Luc Forget, Jean-Michel Muller, and Yohann Uguen. 2019. Posits: The good, the bad and the ugly. In Proceedings of the Conference for Next Generation Arithmetic, 1–10.

[24]

Vidroha Debroy and W. Eric Wong. 2010. Using mutation to automatically suggest fixes for faulty programs. In Proceedings of 3rd International Conference on Software Testing, Verification and Validation.

Digital Library

[25]

Jonathan P. K. Doye, Thomas E. Ouldridge, Ard A. Louis, Flavio Romano, Petr Šulc, Christian Matek, Benedict E. K. Snodin, Lorenzo Rovigatti, John S. Schreck, Ryan M. Harrison, et al. 2013. Coarse-graining DNA for simulations of DNA nanotechnology. Physical Chemistry Chemical Physics 15, 47 (2013), 20395–20414.

[26]

Peter Eastman, Mark S. Friedrichs, John D. Chodera, Randall J. Radmer, Christopher M. Bruns, Joy P. Ku, Kyle A. Beauchamp, Thomas J. Lane, Lee-Ping Wang, Diwakar Shukla, et al. 2013. OpenMM 4: A reusable, extensible, hardware independent library for high performance molecular simulation. Journal of Chemical Theory and Computation 9, 1 (2013), 461–469.

[27]

Richard Evans, David Saxton, David Amos, Pushmeet Kohli, and Edward Grefenstette. 2018. Can neural networks understand logical entailment?. In Proceedings of the International Conference on Learning Representations.

[28]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. 2020. Codebert: A pre-trained model for programming and natural languages. arXiv:2002.08155. Retrieved from https://arxiv.org/abs/2002.08155

[29]

Stephanie Forrest, ThanhVu Nguyen, Westley Weimer, and Claire Le Goues. 2009. A genetic programming approach to automated software repair. In Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation.

Digital Library

[30]

Scott Grauer-Gray, Lifan Xu, Robert Searles, Sudhee Ayalasomayajula, and John Cavazos. 2012. Auto-tuning a high-level language targeted to GPU codes. In Proceedings of the Innovative Parallel Computing (InPar), 1–10. DOI:

[31]

Sumit Gulwani, Susmit Jha, Ashish Tiwari, and Ramarathnam Venkatesan. 2011. Synthesis of loop-free programs. ACM SIGPLAN Notices 46, 6 (2011), 62–73.

Digital Library

[32]

Darrall Henderson. 2000. Elementary functions: Algorithms and implementation. Mathematics and Computer Education 34, 1 (2000), 94.

[33]

Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, and Alex Aiken. 2019. TASO: Optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symp. on Operating Systems Principles (SOSP ’19).

Digital Library

[34]

Petr Klus, Simon Lam, Dag Lyberg, Ming Sin Cheung, Graham Pullan, Ian McFarlane, Giles S. H. Yeo, and Brian Y. H. Lam. 2012. BarraCUDA – A fast short read sequence aligner using graphics processing units. BMC Research Notes 5, 1 (2012), 1–7.

[35]

Matija Korpar and Mile Šikić. 2013. SW#–GPU-enabled exact alignments on genome scale. Bioinformatics 29, 19 (2013), 2494–2495.

[36]

John R. Koza. 1994. Genetic programming as a means for programming computers by natural selection. Statistics and Computing 4, 2 (1994), 87–112.

[37]

Sudhir B. Kylasa, Hasan Metin Aktulga, and Ananth Y. Grama. 2014. PuReMD-GPU: A reactive molecular dynamics simulation package for GPUs. The Journal of Computational Physics 272 (2014), 343–359.

[38]

William B. Langdon and Mark Harman. 2010. Evolving a CUDA kernel from an nVidia template. In Proceedings of IEEE Congress on Evolutionary Computation.

[39]

William B. Langdon and Mark Harman. 2015. Grow and graft a better CUDA pknotsRG for RNA pseudoknot free energy calculation. In Proceedings of the Companion Publication of the 17th Annual Conference on Genetic and Evolutionary Computation.

Digital Library

[40]

William B. Langdon, Brian Yee Hong Lam, Justyna Petke, and Mark Harman. 2015. Improving CUDA DNA analysis software with genetic programming. In Proceedings of the 17th Annual Conference on Genetic and Evolutionary Computation.

Digital Library

[41]

Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO ’04). IEEE, 75–86.

[42]

Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A systematic study of automated program repair: Fixing 55 out of 105 bugs for \(\$\)8 each. In Proceedings of the 34th International Conference on Software Engineering.

[43]

Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2011. Genprog: A generic method for automatic software repair. IEEE Transactions on Software Engineering 38, 1 (2011), 54–72.

Digital Library

[44]

Hugh Leather, Edwin Bonilla, and Michael O’Boyle. 2009. Automatic feature generation for machine learning based optimizing compilation. In Proceedings of the International Symposium on Code Generation and Optimization, 81–91.

Digital Library

[45]

Shin-Ying Lee and Carole-Jean Wu. 2014. Characterizing the latency hiding ability of GPUs. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[46]

Jhe-Yu Liou, Muaaz Awan, Steven Hofmeyr, Stephanie Forrest, and Carole-Jean Wu. 2022. Understanding the power of evolutionary computation for GPU code optimization. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), 185–198. DOI:

[47]

Jhe-Yu Liou, Stephanie Forrest, and Carole-Jean Wu. 2019a. Genetic Improvement of GPU Code. In Proceedings of the IEEE/ACM International Workshop on Genetic Improvement (GI), 20–27. DOI:

Digital Library

[48]

Jhe-Yu Liou, Stephanie Forrest, and Carole-Jean Wu. 2019b. Uncovering performance opportunities by relaxing program semantics of GPGPU kernels. In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems: Workshop on Wild and Crazy Ideas (WACI).

[49]

Jhe-Yu Liou, Xiaodong Wang, Stephanie Forrest, and Carole-Jean Wu. 2020a. GEVO: GPU code optimization using evolutionary computation. ACM Transactions on Architecture and Code Optimization 17, 4 (Nov. 2020), Article 33, 28 pages. DOI:

Digital Library

[50]

Jhe-Yu Liou, Xiaodong Wang, Stephanie Forrest, and Carole-Jean Wu. 2020b. GEVO-ML: A proposal for optimizing ML code with evolutionary computation. In Proceedings of the Genetic and Evolutionary Computation Conference Companion.

[51]

Yongchao Liu, Bertil Schmidt, and Douglas L. Maskell. 2012. CUSHAW: A CUDA compatible short read aligner to large genomes based on the Burrows–Wheeler transform. Bioinformatics 28, 14 (2012), 1830–1837.

Digital Library

[52]

Zohar Manna and Richard Waldinger. 1980. A deductive approach to program synthesis. ACM Transactions on Programming Languages and Systems 2, 1 (1980), 90–121.

Digital Library

[53]

Alexandru Marginean, Johannes Bader, Satish Chandra, Mark Harman, Yue Jia, Ke Mao, Alexander Mols, and Andrew Scott. 2019. SapFix: Automated end-to-end repair at scale. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 269–278.

Digital Library

[54]

Charith Mendis, Cambridge Yang, Yewen Pu, Saman Amarasinghe, and Michael Carbin. 2019. Compiler auto-vectorization with imitation learning. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, 14598–14609.

[55]

Microsoft. 2023. Github Copilot. Retrieved from https://github.com/features/copilot

[56]

Melanie E. Moses, Steven Hofmeyr, Judy L. Cannon, Akil Andrews, Rebekah Gridley, Monica Hinga, Kirtus Leyba, Abigail Pribisova, Vanessa Surjadidjaja, Humayra Tasnim, et al. 2021. Spatially distributed infection increases viral load in a computational model of SARS-CoV-2 lung infection. PLoS Computational Biology 17, 12 (2021), e1009735.

[57]

Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems.

Digital Library

[58]

Dariusz Mrozek, Miłosz Brożek, and Bożena Małysiak-Mrozek. 2014. Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA. Journal of molecular modeling 20 (2014), 1–17.

[59]

NERSC. 2024. Cori GPU Nodes. Retrieved from https://docs-dev.nersc.gov/cgpu/hardware/

[60]

Kwok C. Ng. 1992. Argument Reduction for Huge Arguments: Good to the Last Bit. Unpublished draft, available from the author ([email protected]).

[61]

NVIDIA. 2024. CUDA LLVM Compiler. Retrieved from https://developer.nvidia.com/cuda-llvm-compiler/

[62]

NVIDIA. 2024. GPU Boost. Retrieved from https://www.nvidia.com/en-us/geforce/technologies/gpu-boost/technology/

[63]

NVIDIA. 2024. NVIDIA 1080ti GPU. Retrieved from https://www.nvidia.com/en-in/geforce/products/10series/geforce-gtx-1080-ti/

[64]

NVIDIA. 2024. NVIDIA A100 Tensor Core GPU. Retrieved from https://www.nvidia.com/en-us/data-center/a100/

[65]

NVIDIA. 2024. NVIDIA Tesla P100 GPU. Retrieved from https://www.nvidia.com/en-us/data-center/tesla-p100/

[66]

NVIDIA. 2024. NVIDIA V100 Tensor Core GPU. Retrieved from https://www.nvidia.com/en-us/data-center/v100/

[67]

NVIDIA. 2017. Register Cache: Caching for Warp-Centric CUDA Programs. Retrieved from https://developer.nvidia.com/blog/register-cache-warp-cuda/

[68]

NVIDIA. 2018. Using CUDA Warp-Level Primitives. Retrieved from https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/

[69]

Aditya Paliwal, Sarah Loos, Markus Rabe, Kshitij Bansal, and Christian Szegedy. 2020. Graph representations for higher-order logic and theorem proving. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2967–2974.

[70]

Bin Pang, Nan Zhao, Michela Becchi, Dmitry Korkin, and Chi-Ren Shyu. 2012. Accelerating large-scale protein structure alignments with graphics processing units. BMC Research Notes 5, 1 (2012), 1–11.

[71]

Chandra Shekhar Pareek, Rafal Smoczynski, and Andrzej Tretyn. 2011. Sequencing technologies and genome sequencing. Journal of Applied Genetics 52, 4 (2011), 413–435.

[72]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In Proceedings of the NeurIPS Autodiff Workshop.

[73]

Karl Pettis and Robert C. Hansen. 1990. Profile guided code positioning. In Proceedings of the ACM SIGPLAN Conference on Programming language design and implementation.

Digital Library

[74]

Erik Poppleton, Michael Matthies, Debesh Mandal, Flavio Romano, Petr Šulc, and Lorenzo Rovigatti. 2023. oxDNA: Coarse-grained simulations of nucleic acids made simple. Journal of Open Source Software 8, 81 (2023), 4693.

[75]

Erik Poppleton, Roger Romero, Aatmik Mallya, Lorenzo Rovigatti, and Petr Šulc. 2021. OxDNA.org: A public webserver for coarse-grained simulations of DNA and RNA nanostructures. Nucleic Acids Research 49, W1 (2021), W491–W498.

[76]

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices 48, 6 (Jun 2013), 519–530. DOI:

Digital Library

[77]

Paul Richmond, Dawn Walker, Simon Coakley, and Daniela Romano. 2010. High performance cellular level agent-based simulation with FLAME for the GPU. Briefings in Bioinformatics 11, 3 (2010), 334–347.

[78]

Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Summer Deng, Roman Dzhabarov, James Hegeman, Roman Levenstein, Bert Maher, Satish Nadathur, Jakob Olesen, et al. 2018. Glow: Graph lowering compiler techniques for neural networks. arXiv:1805.00907. Retrieved from https://arxiv.org/abs/1805.00907

[79]

Lorenzo Rovigatti, Petr Šulc, István Z. Reguly, and Flavio Romano. 2015. A comparison between parallelization approaches in molecular dynamics simulations on GPUs. Journal of Computational Chemistry 36, 1 (2015), 1–8.

[80]

Romelia Salomon-Ferrer, Andreas W Gotz, Duncan Poole, Scott Le Grand, and Ross C. Walker. 2013. Routine microsecond molecular dynamics simulations with AMBER on GPUs. 2. Explicit solvent particle mesh Ewald. Journal of Chemical Theory and Computation 9, 9 (2013), 3878–3888.

[81]

Eric Schkufza, Rahul Sharma, and Alex Aiken. 2013. Stochastic superoptimization. In Proceedings of ACM SIGARCH Computer Architecture News.

Digital Library

[82]

Eric Schkufza, Rahul Sharma, and Alex Aiken. 2014. Stochastic optimization of floating-point programs with tunable precision. ACM SIGPLAN Notices 49, 6 (2014), 53–64.

Digital Library

[83]

Eric Schulte, Jonathan Dorn, Stephen Harding, Stephanie Forrest, and Westley Weimer. 2014a. Post-compiler software optimization for reducing energy. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems.

Digital Library

[84]

Eric Schulte, Zachary P. Fry, Ethan Fast, Westley Weimer, and Stephanie Forrest. 2014b. Software mutational robustness. Genetic Programming and Evolvable Machines 15, 3 (2014), 281–312.

Digital Library

[85]

Daniel Selsam, Matthew Lamm, Benedikt Bünz, Percy Liang, Leonardo de Moura, and David L. Dill. 2019. Learning a SAT solver from single-bit supervision. In Proceedings of the International Conference on Learning Representations.

[86]

Rahul Sharma, Eric Schkufza, Berkeley Churchill, and Alex Aiken. 2015. Conditionally Correct Superoptimization. In Proceedings of ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications.

Digital Library

[87]

Xujie Si, Yuan Yang, Hanjun Dai, Mayur Naik, and Le Song. 2019. Learning a meta-solver for syntax-guided program synthesis. In Proceedings of the International Conference on Learning Representations.

[88]

Stelios Sidiroglou-Douskos, Sasa Misailovic, Henry Hoffmann, and Martin Rinard. 2011. Managing performance vs. accuracy trade-offs with loop perforation. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conf. on Foundations of Software Engineering.

Digital Library

[89]

Pitchaya Sitthi-Amorn, Nicholas Modly, Westley Weimer, and Jason Lawrence. 2011. Genetic programming for shader simplification. In Proceedings of the SIGGRAPH Asia Conference.

Digital Library

[90]

Temple F. Smith, and Michael S. Waterman. 1981. Identification of common molecular subsequences. Journal of Molecular Biology 147, 1 (1981), 195–197.

[91]

Benedict E. K. Snodin, Ferdinando Randisi, Majid Mosayebi, Petr Šulc, John S. Schreck, Flavio Romano, Thomas E. Ouldridge, Roman Tsukanov, Eyal Nir, Ard A. Louis, et al. 2015. Introducing improved structural properties and salt dependence into a coarse-grained model of DNA. The Journal of Chemical Physics 142, 23 (2015), 06B613_1.

[92]

Alex D. Stivala, Peter J. Stuckey, and Anthony I. Wirth. 2010. Fast and accurate protein substructure searching with simulated annealing and GPUs. BMC Bioinformatics 11 (2010), 1–17.

[93]

Petr Šulc, Flavio Romano, Thomas E. Ouldridge, Jonathan P. K. Doye, and Ard A. Louis. 2014. A nucleotide-level coarse-grained model of RNA. The Journal of Chemical Physics 140, 23 (2014), 06B614_1.

[94]

TensorFlow. 2018. XLA Is a Compiler That Optimizes TensorFlow Computations. Retrieved from https://www.tensorflow.org/xla/

[95]

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. Retrieved from https://arxiv.org/abs/2307.09288

[96]

Ben van Werkhoven. 2019. Kernel Tuner: A search-optimizing GPU code auto-tuner. Future Generation Computer Systems 90 (2019), 347–358. DOI:

[97]

Paul Walsh and Conor Ryan. 1996. Paragen: a novel technique for the autoparallelisation of sequential programs using gp. In Proceedings of the 1st Annual Conference on Genetic Programming, 406–409.

[98]

Yue Wang, Weishi Wang, Shafiq Joty, and Steven C. H. Hoi. 2021. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv:2109.00859. Retrieved from https://arxiv.org/abs/2109.00859

[99]

Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding patches using genetic programming. In Proceedings of the 31st International Conference on Software Engineering.

Digital Library

[100]

David R. White, Andrea Arcuri, and John A. Clark. 2011. Evolutionary improvement of programs. IEEE Transactions on Evolutionary Computation 15, 4 (2011), 515–538.

Digital Library

[101]

Yuan Yuan and Wolfgang Banzhaf. 2020. ARJA: Automated repair of java programs via multi-objective genetic programming. Transactions on Software Engineering 46, 10 (2020), 1040–1067.

Index Terms

Evolving to Find Optimizations Humans Miss: Using Evolutionary Computation to Improve GPU Code for Bioinformatics Applications
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Heuristic function construction
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Benchmarking Genetically Improved BarraCUDA on Epigenetic Methylation NGS datasets and nVidia GPUs
GECCO '16 Companion: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion

BarraCUDA uses CUDA graphics cards to map DNA reads to the human genome. Previously its software source code was genetically improved for short paired end next generation sequences. On longer noisy epigenetics strings using nVidia Titan and twin Tesla ...
Accelerating evolutionary computation with graphics processing units
GECCO '09: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers

Graphics Processing Units (GPUs) have become a major source of computational power for numerical applications. Originally designed for application of time-consuming graphics operations, GPUs are stream processors that implement the SIMD paradigm. Modern ...
Faster GPU-based genetic programming using a two-dimensional stack

Genetic programming (GP) is a computationally intensive technique which also has a high degree of natural parallelism. Parallel computing architectures have become commonplace especially with regards to Graphics Processing Units (GPU). Hence, versions ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Evolutionary Learning and Optimization

ACM Transactions on Evolutionary Learning and Optimization Volume 4, Issue 4

December 2024

231 pages

EISSN:2688-3007

DOI:10.1145/3613700

Editors:
Juergen Branke
Warwick Business School, UK
,
Manuel López-Ibáñez
University of Manchester, UK

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 November 2024

Online AM: 15 November 2024

Accepted: 30 October 2024

Revised: 20 September 2024

Received: 18 June 2023

Published in TELO Volume 4, Issue 4

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

ONR

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
159
Total Downloads

Downloads (Last 12 months)159
Downloads (Last 6 weeks)25

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents