Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3589335.3651470acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
short-paper
Open access

Are we Making Much Progress? Revisiting Chemical Reaction Yield Prediction from an Imbalanced Regression Perspective

Published: 13 May 2024 Publication History

Abstract

The yield of a chemical reaction quantifies the percentage of the target product formed in relation to the reactants consumed during the chemical reaction. Accurate yield prediction can guide chemists toward selecting high-yield reactions during synthesis planning, offering valuable insights before dedicating time and resources to wet lab experiments. While recent advancements in yield prediction have led to overall performance improvement across the entire yield range, an open challenge remains in enhancing predictions for high-yield reactions, which are of greater concern to chemists. In this paper, we argue that the performance gap in high-yield predictions results from the imbalanced distribution of real-world data skewed towards low-yield reactions, often due to unreacted starting materials and inherent ambiguities in the reaction processes. Despite this data imbalance, existing yield prediction methods continue to treat different yield ranges equally, assuming a balanced training distribution. Through extensive experiments on three real-world yield prediction datasets, we emphasize the urgent need to reframe reaction yield prediction as an imbalanced regression problem. Finally, we demonstrate that incorporating simple cost-sensitive re-weighting methods can significantly enhance the performance of yield prediction models on underrepresented high-yield regions.

Supplemental Material

MP4 File
Supplemental video

References

[1]
Derek T Ahneman, Jesús G Estrada, Shishi Lin, Spencer D Dreher, and Abigail G Doyle. 2018. Predicting reaction performance in C--N cross-coupling using machine learning. Science (2018).
[2]
Paula Branco, Lu'is Torgo, and Rita P Ribeiro. 2017. SMOGN: a pre-processing approach for imbalanced regression. In First international workshop on learning with imbalanced domains: Theory and applications.
[3]
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research (2002).
[4]
Yu Gong, Greg Mori, and Frederick Tung. 2022. RankSim: Ranking Similarity Regularization for Deep Imbalanced Regression. In ICML.
[5]
Zhichun Guo, Kehan Guo, Bozhao Nan, Yijun Tian, Roshni G Iyer, Yihong Ma, Olaf Wiest, Xiangliang Zhang, Wei Wang, Chuxu Zhang, et al. 2023. Graph-based molecular representation learning. In IJCAI.
[6]
Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL.
[7]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In ICCV.
[8]
Gang Liu, Tong Zhao, Eric Inae, Tengfei Luo, and Meng Jiang. 2023. Semi-Supervised Graph Imbalanced Regression. In KDD.
[9]
Michael P Maloney, Connor W Coley, Samuel Genheden, Nessa Carson, Paul Helquist, Per-Ola Norrby, and Olaf Wiest. 2023. Negative Data in Data Sets for Machine Learning Training.
[10]
Damith Perera, Joseph W Tucker, Shalini Brahmbhatt, Christopher J Helal, Ashley Chong, William Farrell, Paul Richardson, and Neal W Sach. 2018. A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow. Science (2018).
[11]
Rita P Ribeiro and Nuno Moniz. 2020. Imbalanced regression and extreme value prediction. Machine Learning (2020).
[12]
Mandana Saebi, Bozhao Nan, John E Herr, Jessica Wahlers, Zhichun Guo, Andrzej M Zura'nski, Thierry Kogej, Per-Ola Norrby, Abigail G Doyle, Nitesh V Chawla, et al. 2023. On the use of real-world datasets for reaction yield prediction. Chemical Science (2023).
[13]
Philippe Schwaller, Alain C Vaucher, Teodoro Laino, and Jean-Louis Reymond. 2021. Prediction of chemical reaction yields using deep learning. Machine learning: science and technology (2021).
[14]
Lu'is Torgo, Rita P Ribeiro, Bernhard Pfahringer, and Paula Branco. 2013. Smote for regression. In Portuguese conference on artificial intelligence.
[15]
Varvara Voinarovska, Mikhail Kabeshov, Dmytro Dudenko, Samuel Genheden, and Igor V Tetko. 2023. When yield prediction does not yield prediction: an overview of the current challenges. Journal of Chemical Information and Modeling (2023).
[16]
Yuzhe Yang, Kaiwen Zha, Yingcong Chen, Hao Wang, and Dina Katabi. 2021. Delving into deep imbalanced regression. In ICLR.
[17]
Dzvenymyra Yarish, Sofiya Garkot, Oleksandr O Grygorenko, Dmytro S Radchenko, Yurii S Moroz, and Oleksandr Gurbych. 2023. Advancing molecular graphs with descriptors for the prediction of chemical reaction yields. Journal of Computational Chemistry (2023). io

Index Terms

  1. Are we Making Much Progress? Revisiting Chemical Reaction Yield Prediction from an Imbalanced Regression Perspective

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WWW '24: Companion Proceedings of the ACM Web Conference 2024
      May 2024
      1928 pages
      ISBN:9798400701726
      DOI:10.1145/3589335
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 May 2024

      Check for updates

      Author Tags

      1. data imbalance
      2. reaction yield prediction
      3. regression tasks

      Qualifiers

      • Short-paper

      Funding Sources

      • NSF Center for Computer-Assisted Synthesis

      Conference

      WWW '24
      Sponsor:
      WWW '24: The ACM Web Conference 2024
      May 13 - 17, 2024
      Singapore, Singapore

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 361
        Total Downloads
      • Downloads (Last 12 months)361
      • Downloads (Last 6 weeks)48
      Reflects downloads up to 06 Feb 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media