-
Feature-based prediction of properties of cross-linked epoxy polymers by molecular dynamics and machine learning techniques
Authors:
Sindu B. S.,
Jan Hamaekers
Abstract:
Epoxy polymers are used in wide range of applications. The properties and performance of epoxy polymers depend upon various factors like the type of constituents and their proportions used and other process parameters. The conventional way of developing epoxy polymers is usually labor-intensive and may not be fully efficient, which has resulted in epoxy polymers having a limited performance range…
▽ More
Epoxy polymers are used in wide range of applications. The properties and performance of epoxy polymers depend upon various factors like the type of constituents and their proportions used and other process parameters. The conventional way of developing epoxy polymers is usually labor-intensive and may not be fully efficient, which has resulted in epoxy polymers having a limited performance range due to the use of predetermined blend combinations, compositions and development parameters. Hence, in order to experiment with more design parameters, robust and easy computational techniques need to be established. To this end, we developed and analyzed in this study a new machine learning (ML) based approach to predict the mechanical properties of epoxy polymers based on their basic structural features. The results from molecular dynamics (MD) simulations have been used to derive the ML model. The salient feature of our work is that for the development of epoxy polymers based on EPON-862, several new hardeners were explored in addition to the conventionally used ones. The influence of additional parameters like the proportion of curing agent used and the extent of curing on the mechanical properties of epoxy polymers were also investigated. This method can be further extended by providing the epoxy polymer with the desired properties through knowledge of the structural characteristics of its constituents. The findings of our study can thus lead toward development of efficient design methodologies for epoxy polymeric systems.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
LLamol: A Dynamic Multi-Conditional Generative Transformer for De Novo Molecular Design
Authors:
Niklas Dobberstein,
Astrid Maass,
Jan Hamaekers
Abstract:
Generative models have demonstrated substantial promise in Natural Language Processing (NLP) and have found application in designing molecules, as seen in General Pretrained Transformer (GPT) models. In our efforts to develop such a tool for exploring the organic chemical space in search of potentially electro-active compounds, we present "LLamol", a single novel generative transformer model based…
▽ More
Generative models have demonstrated substantial promise in Natural Language Processing (NLP) and have found application in designing molecules, as seen in General Pretrained Transformer (GPT) models. In our efforts to develop such a tool for exploring the organic chemical space in search of potentially electro-active compounds, we present "LLamol", a single novel generative transformer model based on the LLama 2 architecture, which was trained on a 13M superset of organic compounds drawn from diverse public sources. To allow for a maximum flexibility in usage and robustness in view of potentially incomplete data, we introduce "Stochastic Context Learning" as a new training procedure. We demonstrate that the resulting model adeptly handles single- and multi-conditional organic molecule generation with up to four conditions, yet more are possible. The model generates valid molecular structures in SMILES notation while flexibly incorporating three numerical and/or one token sequence into the generative process, just as requested. The generated compounds are very satisfactory in all scenarios tested. In detail, we showcase the model's capability to utilize token sequences for conditioning, either individually or in combination with numerical properties, making LLamol a potent tool for de novo molecule design, easily expandable with new properties.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
Predicting Properties of Oxide Glasses Using Informed Neural Networks
Authors:
Gregor Maier,
Jan Hamaekers,
Dominik-Sergio Martilotti,
Benedikt Ziebarth
Abstract:
Many modern-day applications require the development of new materials with specific properties. In particular, the design of new glass compositions is of great industrial interest. Current machine learning methods for learning the composition-property relationship of glasses promise to save on expensive trial-and-error approaches. Even though quite large datasets on the composition of glasses and…
▽ More
Many modern-day applications require the development of new materials with specific properties. In particular, the design of new glass compositions is of great industrial interest. Current machine learning methods for learning the composition-property relationship of glasses promise to save on expensive trial-and-error approaches. Even though quite large datasets on the composition of glasses and their properties already exist (i.e., with more than 350,000 samples), they cover only a very small fraction of the space of all possible glass compositions. This limits the applicability of purely data-driven models for property prediction purposes and necessitates the development of models with high extrapolation power. In this paper, we propose a neural network model which incorporates prior scientific and expert knowledge in its learning pipeline. This informed learning approach leads to an improved extrapolation power compared to blind (uninformed) neural network models. To demonstrate this, we train our models to predict three different material properties, that is, the glass transition temperature, the Young's modulus (at room temperature), and the shear modulus of binary oxide glasses which do not contain sodium. As representatives for conventional blind neural network approaches we use five different feed-forward neural networks of varying widths and depths. For each property, we set up model ensembles of multiple trained models and show that, on average, our proposed informed model performs better in extrapolating the three properties of previously unseen sodium borate glass samples than all five conventional blind models.
△ Less
Submitted 6 February, 2024; v1 submitted 18 August, 2023;
originally announced August 2023.
-
On the Interplay of Subset Selection and Informed Graph Neural Networks
Authors:
Niklas Breustedt,
Paolo Climaco,
Jochen Garcke,
Jan Hamaekers,
Gitta Kutyniok,
Dirk A. Lorenz,
Rick Oerder,
Chirag Varun Shukla
Abstract:
Machine learning techniques paired with the availability of massive datasets dramatically enhance our ability to explore the chemical compound space by providing fast and accurate predictions of molecular properties. However, learning on large datasets is strongly limited by the availability of computational resources and can be infeasible in some scenarios. Moreover, the instances in the datasets…
▽ More
Machine learning techniques paired with the availability of massive datasets dramatically enhance our ability to explore the chemical compound space by providing fast and accurate predictions of molecular properties. However, learning on large datasets is strongly limited by the availability of computational resources and can be infeasible in some scenarios. Moreover, the instances in the datasets may not yet be labelled and generating the labels can be costly, as in the case of quantum chemistry computations. Thus, there is a need to select small training subsets from large pools of unlabelled data points and to develop reliable ML methods that can effectively learn from small training sets. This work focuses on predicting the molecules atomization energy in the QM9 dataset. We investigate the advantages of employing domain knowledge-based data sampling methods for an efficient training set selection combined with informed ML techniques. In particular, we show how maximizing molecular diversity in the training set selection process increases the robustness of linear and nonlinear regression techniques such as kernel methods and graph neural networks. We also check the reliability of the predictions made by the graph neural network with a model-agnostic explainer based on the rate distortion explanation framework.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Parameterized Neural Networks for Finance
Authors:
Daniel Oeltz,
Jan Hamaekers,
Kay F. Pilz
Abstract:
We discuss and analyze a neural network architecture, that enables learning a model class for a set of different data samples rather than just learning a single model for a specific data sample. In this sense, it may help to reduce the overfitting problem, since, after learning the model class over a larger data sample consisting of such different data sets, just a few parameters need to be adjust…
▽ More
We discuss and analyze a neural network architecture, that enables learning a model class for a set of different data samples rather than just learning a single model for a specific data sample. In this sense, it may help to reduce the overfitting problem, since, after learning the model class over a larger data sample consisting of such different data sets, just a few parameters need to be adjusted for modeling a new, specific problem. After analyzing the method theoretically and by regression examples for different one-dimensional problems, we finally apply the approach to one of the standard problems asset managers and banks are facing: the calibration of spread curves. The presented results clearly show the potential that lies within this method. Furthermore, this application is of particular interest to financial practitioners, since nearly all asset managers and banks which are having solutions in place may need to adapt or even change their current methodologies when ESG ratings additionally affect the bond spreads.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Interatomic-Potential-Free, Data-Driven Molecular Dynamics
Authors:
J. Bulin,
J. Hamaekers,
M. P. Ariza,
M. Ortiz
Abstract:
We present a Data-Driven (DD) paradigm that enables molecular dynamics calculations to be performed directly from sampled force-field data such as obtained, e.g., from ab initio calculations, thereby eschewing the conventional step of modeling the data by empirical interatomic potentials entirely. The data required by the DD solvers consists of local atomic configurations and corresponding atomic…
▽ More
We present a Data-Driven (DD) paradigm that enables molecular dynamics calculations to be performed directly from sampled force-field data such as obtained, e.g., from ab initio calculations, thereby eschewing the conventional step of modeling the data by empirical interatomic potentials entirely. The data required by the DD solvers consists of local atomic configurations and corresponding atomic forces and is, therefore, fundamental, i.e., it is not beholden to any particular model. The resulting DD solvers, including a fully explicit DD-Verlet algorithm, are provably convergent and exhibit robust convergence with respect to the data in selected test cases. We present an example of application to C60 buckminsterfullerenes that showcases the feasibility, range and scope of the DD molecular dynamics paradigm.
△ Less
Submitted 5 August, 2022;
originally announced August 2022.
-
Similarity of particle systems using an invariant root mean square deviation measure
Authors:
Johannes Bulin,
Jan Hamaekers
Abstract:
Determining whether two particle systems are similar is a common problem in particle simulations. When the comparison should be invariant under permutations, orthogonal transformations, and translations of the systems, special techniques are needed. We present an algorithm that can test particle systems of finite size for similarity and, if they are similar, can find the optimal alignment between…
▽ More
Determining whether two particle systems are similar is a common problem in particle simulations. When the comparison should be invariant under permutations, orthogonal transformations, and translations of the systems, special techniques are needed. We present an algorithm that can test particle systems of finite size for similarity and, if they are similar, can find the optimal alignment between them. Our approach is based on an invariant version of the root mean square deviation (RMSD) measure and is capable of finding the globally optimal solution in $O(n^3)$ operations where $n$ is the number of three-dimensional particles.
△ Less
Submitted 17 June, 2021;
originally announced June 2021.
-
The octet rule in chemical space: Generating virtual molecules
Authors:
Rafel Israels,
Astrid Maaß,
Jan Hamaekers
Abstract:
We present a generator of virtual molecules that selects valid chemistry on the basis of the octet rule. Also, we introduce a mesomer group key that allows a fast detection of duplicates in the generated structures.
Compared to existing approaches, our model is simpler and faster, generates new chemistry and avoids invalid chemistry. Its versatility is illustrated by the correct generation of mo…
▽ More
We present a generator of virtual molecules that selects valid chemistry on the basis of the octet rule. Also, we introduce a mesomer group key that allows a fast detection of duplicates in the generated structures.
Compared to existing approaches, our model is simpler and faster, generates new chemistry and avoids invalid chemistry. Its versatility is illustrated by the correct generation of molecules containing third-row elements and a surprisingly adept handling of complex boron chemistry.
Without any empirical parameters, our model is designed to be valid also in unexplored regions of chemical space. One first unexpected finding is the high prevalence of dipolar structures among generated molecules.
△ Less
Submitted 20 September, 2017;
originally announced September 2017.
-
ATK-ForceField: A New Generation Molecular Dynamics Software Package
Authors:
Julian Schneider,
Jan Hamaekers,
Samuel T. Chill,
Søren Smidstrup,
Johannes Bulin,
Ralph Thesen,
Anders Blom,
Kurt Stokbro
Abstract:
ATK-ForceField is a software package for atomistic simulations using classical interatomic potentials. It is implemented as a part of the Atomistix ToolKit (ATK), which is a Python programming environment that makes it easy to create and analyze both standard and highly customized simulations. This paper will focus on the atomic interaction potentials, molecular dynamics, and geometry optimization…
▽ More
ATK-ForceField is a software package for atomistic simulations using classical interatomic potentials. It is implemented as a part of the Atomistix ToolKit (ATK), which is a Python programming environment that makes it easy to create and analyze both standard and highly customized simulations. This paper will focus on the atomic interaction potentials, molecular dynamics, and geometry optimization features of the software, however, many more advanced modeling features are available. The implementation details of these algorithms and their computational performance will be shown. We present three illustrative examples of the types of calculations that are possible with ATK-ForceField: modeling thermal transport properties in a silicon germanium crystal, vapor deposition of selenium molecules on a selenium surface, and a simulation of creep in a copper polycrystal.
△ Less
Submitted 6 July, 2017; v1 submitted 10 January, 2017;
originally announced January 2017.
-
An Adaptive Multiscale Approach for Electronic Structure Methods
Authors:
Sambasiva Rao Chinnamsetty,
Michael Griebel,
Jan Hamaekers
Abstract:
In this paper, we introduce a new scheme for the efficient numerical treatment of the electronic Schrödinger equation for molecules. It is based on the combination of a many-body expansion, which corresponds to the so-called bond order dissection Anova approach, with a hierarchy of basis sets of increasing order. Here, the energy is represented as a finite sum of contributions associated to subset…
▽ More
In this paper, we introduce a new scheme for the efficient numerical treatment of the electronic Schrödinger equation for molecules. It is based on the combination of a many-body expansion, which corresponds to the so-called bond order dissection Anova approach, with a hierarchy of basis sets of increasing order. Here, the energy is represented as a finite sum of contributions associated to subsets of nuclei and basis sets in a telescoping sum like fashion. Under the assumption of data locality of the electronic density (nearsightedness of electronic matter), the terms of this expansion decay rapidly and higher terms may be neglected. We further extend the approach in a dimension-adaptive fashion to generate quasi-optimal approximations, i.e. a specific truncation of the hierarchical series such that the total benefit is maximized for a fixed amount of costs. This way, we are able to achieve substantial speed up factors compared to conventional first principles methods depending on the molecular system under consideration. In particular, the method can deal efficiently with molecular systems which include only a small active part that needs to be described by accurate but expensive models.
△ Less
Submitted 19 July, 2017; v1 submitted 15 November, 2016;
originally announced November 2016.
-
Localized Coulomb Descriptors for the Gaussian Approximation Potential
Authors:
James Barker,
Johannes Bulin,
Jan Hamaekers,
Sonja Mathias
Abstract:
We introduce a novel class of localized atomic environment representations, based upon the Coulomb matrix. By combining these functions with the Gaussian approximation potential approach, we present LC-GAP, a new system for generating atomic potentials through machine learning (ML). Tests on the QM7, QM7b and GDB9 biomolecular datasets demonstrate that potentials created with LC-GAP can successful…
▽ More
We introduce a novel class of localized atomic environment representations, based upon the Coulomb matrix. By combining these functions with the Gaussian approximation potential approach, we present LC-GAP, a new system for generating atomic potentials through machine learning (ML). Tests on the QM7, QM7b and GDB9 biomolecular datasets demonstrate that potentials created with LC-GAP can successfully predict atomization energies for molecules larger than those used for training to chemical accuracy, and can (in the case of QM7b) also be used to predict a range of other atomic properties with accuracy in line with the recent literature. As the best-performing representation has only linear dimensionality in the number of atoms in a local atomic environment, this represents an improvement both in prediction accuracy and computational cost when considered against similar Coulomb matrix-based methods.
△ Less
Submitted 6 December, 2016; v1 submitted 15 November, 2016;
originally announced November 2016.