Search | arXiv e-print repository

arXiv:2312.07149 [pdf, other]

Feature-based prediction of properties of cross-linked epoxy polymers by molecular dynamics and machine learning techniques

Authors: Sindu B. S., Jan Hamaekers

Abstract: Epoxy polymers are used in wide range of applications. The properties and performance of epoxy polymers depend upon various factors like the type of constituents and their proportions used and other process parameters. The conventional way of developing epoxy polymers is usually labor-intensive and may not be fully efficient, which has resulted in epoxy polymers having a limited performance range… ▽ More Epoxy polymers are used in wide range of applications. The properties and performance of epoxy polymers depend upon various factors like the type of constituents and their proportions used and other process parameters. The conventional way of developing epoxy polymers is usually labor-intensive and may not be fully efficient, which has resulted in epoxy polymers having a limited performance range due to the use of predetermined blend combinations, compositions and development parameters. Hence, in order to experiment with more design parameters, robust and easy computational techniques need to be established. To this end, we developed and analyzed in this study a new machine learning (ML) based approach to predict the mechanical properties of epoxy polymers based on their basic structural features. The results from molecular dynamics (MD) simulations have been used to derive the ML model. The salient feature of our work is that for the development of epoxy polymers based on EPON-862, several new hardeners were explored in addition to the conventionally used ones. The influence of additional parameters like the proportion of curing agent used and the extent of curing on the mechanical properties of epoxy polymers were also investigated. This method can be further extended by providing the epoxy polymer with the desired properties through knowledge of the structural characteristics of its constituents. The findings of our study can thus lead toward development of efficient design methodologies for epoxy polymeric systems. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2311.14407 [pdf, other]

LLamol: A Dynamic Multi-Conditional Generative Transformer for De Novo Molecular Design

Authors: Niklas Dobberstein, Astrid Maass, Jan Hamaekers

Abstract: Generative models have demonstrated substantial promise in Natural Language Processing (NLP) and have found application in designing molecules, as seen in General Pretrained Transformer (GPT) models. In our efforts to develop such a tool for exploring the organic chemical space in search of potentially electro-active compounds, we present "LLamol", a single novel generative transformer model based… ▽ More Generative models have demonstrated substantial promise in Natural Language Processing (NLP) and have found application in designing molecules, as seen in General Pretrained Transformer (GPT) models. In our efforts to develop such a tool for exploring the organic chemical space in search of potentially electro-active compounds, we present "LLamol", a single novel generative transformer model based on the LLama 2 architecture, which was trained on a 13M superset of organic compounds drawn from diverse public sources. To allow for a maximum flexibility in usage and robustness in view of potentially incomplete data, we introduce "Stochastic Context Learning" as a new training procedure. We demonstrate that the resulting model adeptly handles single- and multi-conditional organic molecule generation with up to four conditions, yet more are possible. The model generates valid molecular structures in SMILES notation while flexibly incorporating three numerical and/or one token sequence into the generative process, just as requested. The generated compounds are very satisfactory in all scenarios tested. In detail, we showcase the model's capability to utilize token sequences for conditioning, either individually or in combination with numerical properties, making LLamol a potent tool for de novo molecule design, easily expandable with new properties. △ Less

Submitted 24 November, 2023; originally announced November 2023.

arXiv:2308.09492 [pdf, other]

Predicting Properties of Oxide Glasses Using Informed Neural Networks

Authors: Gregor Maier, Jan Hamaekers, Dominik-Sergio Martilotti, Benedikt Ziebarth

Abstract: Many modern-day applications require the development of new materials with specific properties. In particular, the design of new glass compositions is of great industrial interest. Current machine learning methods for learning the composition-property relationship of glasses promise to save on expensive trial-and-error approaches. Even though quite large datasets on the composition of glasses and… ▽ More Many modern-day applications require the development of new materials with specific properties. In particular, the design of new glass compositions is of great industrial interest. Current machine learning methods for learning the composition-property relationship of glasses promise to save on expensive trial-and-error approaches. Even though quite large datasets on the composition of glasses and their properties already exist (i.e., with more than 350,000 samples), they cover only a very small fraction of the space of all possible glass compositions. This limits the applicability of purely data-driven models for property prediction purposes and necessitates the development of models with high extrapolation power. In this paper, we propose a neural network model which incorporates prior scientific and expert knowledge in its learning pipeline. This informed learning approach leads to an improved extrapolation power compared to blind (uninformed) neural network models. To demonstrate this, we train our models to predict three different material properties, that is, the glass transition temperature, the Young's modulus (at room temperature), and the shear modulus of binary oxide glasses which do not contain sodium. As representatives for conventional blind neural network approaches we use five different feed-forward neural networks of varying widths and depths. For each property, we set up model ensembles of multiple trained models and show that, on average, our proposed informed model performs better in extrapolating the three properties of previously unseen sodium borate glass samples than all five conventional blind models. △ Less

Submitted 6 February, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

Comments: 25 pages

arXiv:2306.10066 [pdf, other]

On the Interplay of Subset Selection and Informed Graph Neural Networks

Authors: Niklas Breustedt, Paolo Climaco, Jochen Garcke, Jan Hamaekers, Gitta Kutyniok, Dirk A. Lorenz, Rick Oerder, Chirag Varun Shukla

Abstract: Machine learning techniques paired with the availability of massive datasets dramatically enhance our ability to explore the chemical compound space by providing fast and accurate predictions of molecular properties. However, learning on large datasets is strongly limited by the availability of computational resources and can be infeasible in some scenarios. Moreover, the instances in the datasets… ▽ More Machine learning techniques paired with the availability of massive datasets dramatically enhance our ability to explore the chemical compound space by providing fast and accurate predictions of molecular properties. However, learning on large datasets is strongly limited by the availability of computational resources and can be infeasible in some scenarios. Moreover, the instances in the datasets may not yet be labelled and generating the labels can be costly, as in the case of quantum chemistry computations. Thus, there is a need to select small training subsets from large pools of unlabelled data points and to develop reliable ML methods that can effectively learn from small training sets. This work focuses on predicting the molecules atomization energy in the QM9 dataset. We investigate the advantages of employing domain knowledge-based data sampling methods for an efficient training set selection combined with informed ML techniques. In particular, we show how maximizing molecular diversity in the training set selection process increases the robustness of linear and nonlinear regression techniques such as kernel methods and graph neural networks. We also check the reliability of the predictions made by the graph neural network with a model-agnostic explainer based on the rate distortion explanation framework. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2304.08883 [pdf, other]

Parameterized Neural Networks for Finance

Authors: Daniel Oeltz, Jan Hamaekers, Kay F. Pilz

Abstract: We discuss and analyze a neural network architecture, that enables learning a model class for a set of different data samples rather than just learning a single model for a specific data sample. In this sense, it may help to reduce the overfitting problem, since, after learning the model class over a larger data sample consisting of such different data sets, just a few parameters need to be adjust… ▽ More We discuss and analyze a neural network architecture, that enables learning a model class for a set of different data samples rather than just learning a single model for a specific data sample. In this sense, it may help to reduce the overfitting problem, since, after learning the model class over a larger data sample consisting of such different data sets, just a few parameters need to be adjusted for modeling a new, specific problem. After analyzing the method theoretically and by regression examples for different one-dimensional problems, we finally apply the approach to one of the standard problems asset managers and banks are facing: the calibration of spread curves. The presented results clearly show the potential that lies within this method. Furthermore, this application is of particular interest to financial practitioners, since nearly all asset managers and banks which are having solutions in place may need to adapt or even change their current methodologies when ESG ratings additionally affect the bond spreads. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: 24 pages, 17 figures

arXiv:2208.04937 [pdf, other]

Interatomic-Potential-Free, Data-Driven Molecular Dynamics

Authors: J. Bulin, J. Hamaekers, M. P. Ariza, M. Ortiz

Abstract: We present a Data-Driven (DD) paradigm that enables molecular dynamics calculations to be performed directly from sampled force-field data such as obtained, e.g., from ab initio calculations, thereby eschewing the conventional step of modeling the data by empirical interatomic potentials entirely. The data required by the DD solvers consists of local atomic configurations and corresponding atomic… ▽ More We present a Data-Driven (DD) paradigm that enables molecular dynamics calculations to be performed directly from sampled force-field data such as obtained, e.g., from ab initio calculations, thereby eschewing the conventional step of modeling the data by empirical interatomic potentials entirely. The data required by the DD solvers consists of local atomic configurations and corresponding atomic forces and is, therefore, fundamental, i.e., it is not beholden to any particular model. The resulting DD solvers, including a fully explicit DD-Verlet algorithm, are provably convergent and exhibit robust convergence with respect to the data in selected test cases. We present an example of application to C60 buckminsterfullerenes that showcases the feasibility, range and scope of the DD molecular dynamics paradigm. △ Less

Submitted 5 August, 2022; originally announced August 2022.

Comments: 22 pages, 10 figures

arXiv:2106.09363 [pdf, ps, other]

Similarity of particle systems using an invariant root mean square deviation measure

Authors: Johannes Bulin, Jan Hamaekers

Abstract: Determining whether two particle systems are similar is a common problem in particle simulations. When the comparison should be invariant under permutations, orthogonal transformations, and translations of the systems, special techniques are needed. We present an algorithm that can test particle systems of finite size for similarity and, if they are similar, can find the optimal alignment between… ▽ More Determining whether two particle systems are similar is a common problem in particle simulations. When the comparison should be invariant under permutations, orthogonal transformations, and translations of the systems, special techniques are needed. We present an algorithm that can test particle systems of finite size for similarity and, if they are similar, can find the optimal alignment between them. Our approach is based on an invariant version of the root mean square deviation (RMSD) measure and is capable of finding the globally optimal solution in $O(n^3)$ operations where $n$ is the number of three-dimensional particles. △ Less

Submitted 17 June, 2021; originally announced June 2021.

arXiv:1709.06746 [pdf, other]

doi 10.1007/s11030-017-9775-2

The octet rule in chemical space: Generating virtual molecules

Authors: Rafel Israels, Astrid Maaß, Jan Hamaekers

Abstract: We present a generator of virtual molecules that selects valid chemistry on the basis of the octet rule. Also, we introduce a mesomer group key that allows a fast detection of duplicates in the generated structures. Compared to existing approaches, our model is simpler and faster, generates new chemistry and avoids invalid chemistry. Its versatility is illustrated by the correct generation of mo… ▽ More We present a generator of virtual molecules that selects valid chemistry on the basis of the octet rule. Also, we introduce a mesomer group key that allows a fast detection of duplicates in the generated structures. Compared to existing approaches, our model is simpler and faster, generates new chemistry and avoids invalid chemistry. Its versatility is illustrated by the correct generation of molecules containing third-row elements and a surprisingly adept handling of complex boron chemistry. Without any empirical parameters, our model is designed to be valid also in unexplored regions of chemical space. One first unexpected finding is the high prevalence of dipolar structures among generated molecules. △ Less

Submitted 20 September, 2017; originally announced September 2017.

Comments: 24 pages, 10 figures

arXiv:1701.02495 [pdf, other]

doi 10.1088/1361-651X/aa8ff0

ATK-ForceField: A New Generation Molecular Dynamics Software Package

Authors: Julian Schneider, Jan Hamaekers, Samuel T. Chill, Søren Smidstrup, Johannes Bulin, Ralph Thesen, Anders Blom, Kurt Stokbro

Abstract: ATK-ForceField is a software package for atomistic simulations using classical interatomic potentials. It is implemented as a part of the Atomistix ToolKit (ATK), which is a Python programming environment that makes it easy to create and analyze both standard and highly customized simulations. This paper will focus on the atomic interaction potentials, molecular dynamics, and geometry optimization… ▽ More ATK-ForceField is a software package for atomistic simulations using classical interatomic potentials. It is implemented as a part of the Atomistix ToolKit (ATK), which is a Python programming environment that makes it easy to create and analyze both standard and highly customized simulations. This paper will focus on the atomic interaction potentials, molecular dynamics, and geometry optimization features of the software, however, many more advanced modeling features are available. The implementation details of these algorithms and their computational performance will be shown. We present three illustrative examples of the types of calculations that are possible with ATK-ForceField: modeling thermal transport properties in a silicon germanium crystal, vapor deposition of selenium molecules on a selenium surface, and a simulation of creep in a copper polycrystal. △ Less

Submitted 6 July, 2017; v1 submitted 10 January, 2017; originally announced January 2017.

Comments: 28 pages, 9 figures

arXiv:1611.05129 [pdf, other]

An Adaptive Multiscale Approach for Electronic Structure Methods

Authors: Sambasiva Rao Chinnamsetty, Michael Griebel, Jan Hamaekers

Abstract: In this paper, we introduce a new scheme for the efficient numerical treatment of the electronic Schrödinger equation for molecules. It is based on the combination of a many-body expansion, which corresponds to the so-called bond order dissection Anova approach, with a hierarchy of basis sets of increasing order. Here, the energy is represented as a finite sum of contributions associated to subset… ▽ More In this paper, we introduce a new scheme for the efficient numerical treatment of the electronic Schrödinger equation for molecules. It is based on the combination of a many-body expansion, which corresponds to the so-called bond order dissection Anova approach, with a hierarchy of basis sets of increasing order. Here, the energy is represented as a finite sum of contributions associated to subsets of nuclei and basis sets in a telescoping sum like fashion. Under the assumption of data locality of the electronic density (nearsightedness of electronic matter), the terms of this expansion decay rapidly and higher terms may be neglected. We further extend the approach in a dimension-adaptive fashion to generate quasi-optimal approximations, i.e. a specific truncation of the hierarchical series such that the total benefit is maximized for a fixed amount of costs. This way, we are able to achieve substantial speed up factors compared to conventional first principles methods depending on the molecular system under consideration. In particular, the method can deal efficiently with molecular systems which include only a small active part that needs to be described by accurate but expensive models. △ Less

Submitted 19 July, 2017; v1 submitted 15 November, 2016; originally announced November 2016.

MSC Class: 65D15

arXiv:1611.05126 [pdf, other]

Localized Coulomb Descriptors for the Gaussian Approximation Potential

Authors: James Barker, Johannes Bulin, Jan Hamaekers, Sonja Mathias

Abstract: We introduce a novel class of localized atomic environment representations, based upon the Coulomb matrix. By combining these functions with the Gaussian approximation potential approach, we present LC-GAP, a new system for generating atomic potentials through machine learning (ML). Tests on the QM7, QM7b and GDB9 biomolecular datasets demonstrate that potentials created with LC-GAP can successful… ▽ More We introduce a novel class of localized atomic environment representations, based upon the Coulomb matrix. By combining these functions with the Gaussian approximation potential approach, we present LC-GAP, a new system for generating atomic potentials through machine learning (ML). Tests on the QM7, QM7b and GDB9 biomolecular datasets demonstrate that potentials created with LC-GAP can successfully predict atomization energies for molecules larger than those used for training to chemical accuracy, and can (in the case of QM7b) also be used to predict a range of other atomic properties with accuracy in line with the recent literature. As the best-performing representation has only linear dimensionality in the number of atoms in a local atomic environment, this represents an improvement both in prediction accuracy and computational cost when considered against similar Coulomb matrix-based methods. △ Less

Submitted 6 December, 2016; v1 submitted 15 November, 2016; originally announced November 2016.

MSC Class: 92E10

Showing 1–11 of 11 results for author: Hamaekers, J