DOBBELAERE-Machine Learning in Chemical Engineering Strengths Weaknesses Opportunities and Threats
DOBBELAERE-Machine Learning in Chemical Engineering Strengths Weaknesses Opportunities and Threats
DOBBELAERE-Machine Learning in Chemical Engineering Strengths Weaknesses Opportunities and Threats
Engineering
journal homepage: www.elsevier.com/locate/eng
Research
AI Energizes Process Manufacturing—Perspective
a r t i c l e i n f o a b s t r a c t
Article history: Chemical engineers rely on models for design, research, and daily decision-making, often with potentially
Received 16 October 2020 large financial and safety implications. Previous efforts a few decades ago to combine artificial intelli-
Revised 16 January 2021 gence and chemical engineering for modeling were unable to fulfill the expectations. In the last five years,
Accepted 22 March 2021
the increasing availability of data and computational resources has led to a resurgence in machine
Available online 29 July 2021
learning-based research. Many recent efforts have facilitated the roll-out of machine learning techniques
in the research field by developing large databases, benchmarks, and representations for chemical appli-
Keywords:
cations and new machine learning frameworks. Machine learning has significant advantages over tradi-
Artificial intelligence
Machine learning
tional modeling techniques, including flexibility, accuracy, and execution speed. These strengths also
Reaction engineering come with weaknesses, such as the lack of interpretability of these black-box models. The greatest oppor-
Process engineering tunities involve using machine learning in time-limited applications such as real-time optimization and
planning that require high accuracy and that can build on models with a self-learning ability to recognize
patterns, learn from data, and become more intelligent over time. The greatest threat in artificial intelli-
gence research today is inappropriate use because most chemical engineers have had limited training in
computer science and data analysis. Nevertheless, machine learning will definitely become a trustworthy
element in the modeling toolbox of chemical engineers.
Ó 2021 THE AUTHORS. Published by Elsevier LTD on behalf of Chinese Academy of Engineering and
Higher Education Press Limited Company. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction made using theoretical models, which have been constructed for
centuries. The Navier–Stokes equations [2,3], which describe vis-
In 130 years of chemical engineering, mathematical modeling cous fluid behavior, are one example of such a theoretical model.
has been invaluable to engineers for understanding and designing However, many of these models cannot be solved analytically for
chemical processes. Octave Levenspiel even stated that modeling realistic systems and require a considerable amount of computa-
stands out as the primary development in chemical engineering tional power to solve numerically. This drawback has ensured that
[1]. Today, in a fast-moving world, there are more challenges than most engineers first use simple models to describe reality. An
ever. The ability to predict the outcomes of certain events is neces- important historical—yet still relevant—example is Prandtl’s
sary, regardless of whether such events are related to the discovery boundary layer model [4]. In computational chemistry, scientists
and synthesis of active pharmaceutical ingredients for new dis- and engineers are willing to give up some accuracy in favor of time.
eases or to improvements in process efficiencies to meet stricter This willingness explains the popularity of density functional the-
environmental legislation. These events range from the reaction ory, in comparison with higher-level-of-theory models. However,
rate of a surface reaction or the selectivity of a reaction in a reactor, in many situations, higher accuracy is desired.
to the control of the heat supply to that reactor. Predictions can be Decades of modeling, simulations, and experiments have pro-
vided the chemical engineering community with a massive
⇑ Corresponding author. amount of data, which adds the option of making predictions from
E-mail address: Kevin.VanGeem@UGent.be (K.M. Van Geem). experience as an extra modeling toolkit. Machine learning models
https://doi.org/10.1016/j.eng.2021.03.019
2095-8099/Ó 2021 THE AUTHORS. Published by Elsevier LTD on behalf of Chinese Academy of Engineering and Higher Education Press Limited Company.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
M.R. Dobbelaere, P.P. Plehiers, R. Van de Vijver et al. Engineering 7 (2021) 1201–1211
are statistical and mathematical models that can ‘‘learn” from the growing potential of machine learning in chemical engineering
experience and discover patterns in data without the need for will be critically discussed; we will examine the pros and cons and
explicit, rule-based programming. As a field of study, machine list possible reasons for why machine learning in chemical engi-
learning is a subset of artificial intelligence (AI). AI is the ability neering with remain ‘‘hot” or end up as a ‘‘not.”
of machines to perform tasks that are generally linked to the
behavior of intelligent beings, such as humans. As shown in 2. Machine learning ABCs
Fig. 1, this field is not particularly new. The term ‘‘artificial intelli-
gence” was coined at Dartmouth College, USA in 1956, at a summer 2.1. The ‘‘A” in machine learning ABCs: Data
workshop for mathematicians who aimed at developing more cog-
nizant machines. From that point on, it took more than a decade A machine learning approach consists of three important links, as
before the first attempts were made to apply AI in chemical engi- illustrated in Fig. 2: data, representations, and models. The first link
neering [5]. In the 1980s, greater efforts were made in the field in a machine learning approach is the data that is used to train the
with the use of rule-based expert systems, which are considered model. As will be discussed later, the data used also proves to be
to be the simplest forms of AI. By that time, the field of machine the weakest link in the machine learning process. Virtually any data-
learning had started to grow, but in the chemical engineering set containing results from experiments, first-principles calcula-
community, with some exceptions, a lag of about 10 years was tions, or complex simulation models can be used to train a model.
experienced in the growth of machine learning. A sudden rise in However, because it is expensive to gather large amounts of accurate
publications on AI applications in chemical engineering in the data, it is customary to make use of ‘‘big data” approaches—using
1990s can be observed, with the adoption of clustering algorithms, large databases from various existing sources. Due to the cost of real
genetic algorithms, and—most successfully—artificial neural networks experiments, these large quantities of data are usually obtained via
(ANNs). Nevertheless, the trend did not persist. Venkatasubramanian fast simulations or text mining from patents and published work.
[6] names the lack of powerful computing and the difficult task The increased digitalization of research provides the scientific com-
of creating the algorithms as possible causes for this loss in munity with a plethora of open-source and commercial databases.
interest. Examples of commonly used sources of chemical information are
The past decade marked a breakthrough in deep learning, a sub- Reaxys [7], SciFinder [8], and ChemSpace [9] for reaction chemistry
set of machine learning that constructs ANNs to mimic the human and properties; GDB-17 [10] for small drug-like molecules; and
brain. As mentioned above, ANNs gained popularity among chem- National Institute of Standards and Technology (NIST) [11] and
ical engineers in the 1990s; however, the difference of the deep International Union of Pure and Applied Chemistry (IUPAC) [12]
learning era is that deep learning provides the computational for molecular properties such as solubility. In addition, several
means to train neural networks with multiple layers—the so- benchmarking datasets have been created to enable comparison
called deep neural networks. These new developments triggered between different machine learning models. Examples of these
chemical engineers, as reflected by an exponential rise in publica- benchmarks are QM9 and Alchemy, for quantum chemical proper-
tions on the topic. In the past, AI techniques could never become a ties [13]; and ESOL [14] and FreeSolv [15], for solubilities. Before
standard tool in chemical engineering; thus, it can be asked using any dataset for machine learning-based modeling, several
whether this is finally the moment. In this perspective article, we steps should be undertaken to ensure that the used data is of high
will first give an overview of the three major links in machine enough quality. The general aspect of ensuring data quality—from
learning today, applied to chemical engineering. In what follows, its generation to its storage—is known as data curation. More details
Fig. 1. Timeline of artificial intelligence, machine learning, and deep learning. The evolution of publications about AI in chemical engineering shows that a rise in publications
is followed by a phase of disinterest. Currently, AI in chemical engineering is once again in a ‘‘hot” phase, and it is unclear whether or not the curve will soon flatten out.
1202
M.R. Dobbelaere, P.P. Plehiers, R. Van de Vijver et al. Engineering 7 (2021) 1201–1211
Fig. 2. The three major links in machine learning for chemical engineering; every part has an impact on the eventual prediction performance and should be handled carefully.
about the necessity and consequences of data curation are discussed identifiers (InChIs) [29], or as three-dimensional (3D) coordinates.
further on. Recently, self-referencing embedded strings (SELFIES) [30] have
Several differences concerning data usage exist between been developed as a molecular string representation designed for
machine learning—and, more specifically, deep learning meth- machine learning applications. The molecular information is trans-
ods—and traditional modeling. First, ANNs learn from data and lated into a feature vector or tensor that is used as input for a deep
train themselves, although doing so requires large amounts of data. neural network or another machine learning model. The first way
Therefore, training datasets generally contain tens to hundreds of to represent a molecule is by using a (set of) well-chosen molecular
thousands of data points. Second, the dataset is split into three descriptor(s), such as the molecular weight, dipole moment, or
instead of two sets: a training, validation, and test set. Both the dielectric constant [31–33]. Another way to generate a molecular
training and validation sets are used in the training phase, while feature vector is by starting from the 3D geometry. Coulomb matri-
only the data in the training set is used for fitting. The validation ces [34], bags of bonds [35], and histograms of distances, angles,
set is an independent dataset that provides an unbiased evaluation and dihedrals [36] are a few examples of geometry-based repre-
of the model fit during the training phase. The test set evaluates sentations. However, 3D coordinates or calculated properties are
the final model fit with unseen data and is generally the main indi- generally unavailable in many applications. In such cases, the rep-
cator of the model quality. resentation can be created starting from a molecular graph, result-
ing in so-called topology-based representations.
2.2. The ‘‘B” in machine learning ABCs: Representation In topology-based representations, only a line-based identifier
is available. Encoders exist that directly translate the line-based
A second important link in a machine learning method is how identifier into a representation with techniques from natural lan-
the data is represented in the model. Even when the data is already guage processing [37–41], but usually the line-based identifier is
in numerical format, the selection of the variables or features that transformed into a feature vector in a similar fashion to
will make up the model input can have a significant impact on the geometry-based representations [42–60]. This is done by adding
model performance. This process is known as feature selection and simple atom and bond features to the molecular graph and then
has been the topic of several studies [16–19]. Limiting the number transmitting the information iteratively between atoms and bonds.
of selected features may reduce the computational cost of both Circular fingerprints [42–46] based on the Morgan algorithm [61],
training and executing the model, while improving the overall such as the extended-connectivity fingerprint [62], were among
accuracy. This feature-selection process is of lesser importance in the first molecular representations for machine learning applica-
so-called deep learning methods, which are assumed to internally tions. These fingerprints are so-called fixed molecular representa-
select those features that are considered to be important [20]. tions because they do not change during the training of the
Then, an input layer that consists of basic process parameters machine learning model. They remain popular in drug design for
(e.g., pressure, temperature, residence time), feed characterizations rapidly predicting the physical, chemical, and biological properties
(e.g., distillation curves, feed compositions), or catalyst properties of candidate drugs [63]. Because a fixed representation vector rep-
(e.g., surface area, calcination time) is often sufficient [21–27]. resents a molecule by the same vector in every prediction task, this
However, the task of representing the data becomes far more chal- type of input layer seems to conflict with the definition of a deep
lenging in the case of non-numerical data, such as molecules and neural network, which is assumed to learn the important features
reactions. [64]. There is a growing tendency to focus on learning how to rep-
Chemical engineering tasks often involve molecules and/or resent a molecule [47,52] instead of on human-engineering the
chemical reactions. Creating suitable numerical representations feature vector, as it is assumed that better capturing of the features
of these data types is a developing field in itself. In computer appli- will lead to higher accuracy, with less data and at a lower compu-
cations, the molecular constitution is typically represented by a tational cost [53,58].
line-based identifier, such as the simplified molecular-input line- Learned molecular representations are created as part of the
entry system (SMILES) [28] or the (IUPAC) international chemical prediction model. Starting from several initial molecular
1203
M.R. Dobbelaere, P.P. Plehiers, R. Van de Vijver et al. Engineering 7 (2021) 1201–1211
features—such as the heavy atoms, bond types, and ring features—a book [101] for a further introduction to machine learning
molecular representation is created that is updated during train- algorithms.
ing. This choice also indicates that a molecule has different repre- When the dataset is labeled—that is, when the correct classifica-
sentations depending on the prediction task. An extensive variety tion of each data point is known—supervised classification meth-
of learned topology-based representations [47–58] can be ods such as decision trees (and, by extension, random forests)
described using the message-passing neural network framework can be used [102,103]. Support vector machines are another possi-
reviewed by Gilmer et al. [59]. The weighted transfer of atom ble supervised classification method [104]. Although support vec-
and bond information throughout the molecular graph is charac- tor machines are commonly used for classification purposes,
teristic of message-passing neural networks. Many different repre- extensions have been made to allow regression via support vector
sentations exist, ranging in complexity, but it is important to note machines as well. Regression problems require supervised or
that a single representation that works for all kind of molecular active learning methods, although, in principle, any supervised
properties has not (yet) been developed [65]. For a more detailed learning method can be incorporated into an active learning
overview of the state of the art in representing molecules, readers approach. ANNs and all their possible variations [105–113], are
are referred to the review by David et al. [60]. the method that is most commonly associated with machine learn-
Chemical reactions are more complex data types than mole- ing. Depending on the application, one might choose feed-forward
cules. Similar to line-based molecular identifiers, reactions can be ANNs (for feature-based classification or regression), convolutional
identified by reaction SMILES [66] and reaction InChI (RInChI) neural networks (for image processing), or recurrent neural net-
[67], whereas SMIRKS [66] identify reaction mechanisms. As for works (for anomaly detection). A chemical engineer might encoun-
molecules, chemical reactions should also be vectorized in order ter convolutional neural networks used for representing molecules
to be useful in machine learning models. The most straightforward (see Section 2.2) [42–60] and ANNs [32,33,47,91,114–117], support
method is to start from the molecular descriptors (e.g., finger- vector machines [32], or kernel ridge regression [36,118] for pre-
prints) of the reagents and sum [68], subtract [50,69], or concate- dicting the properties of the representations. ANNs have been
nate [70–72] them. Another approach is to learn a reaction applied as a black-box modeling tool for numerous applications
representation based on the atoms and bonds that take actively in catalysis [23], chemical process control [119], and chemical pro-
part in the reaction [73]. Reactions can also be kept as text (typi- cess optimization [120]. A popular algorithm for classifying data
cally InChI) and, with a neural machine translation, the organic points when the labels are known is k-nearest neighbors, which
reaction product is then considered to be a translation of the reac- has been used, for example, for chemical process monitoring
tion products [58,74–78]. [121,122] and clustering of catalysts [86,123,124].
1204
M.R. Dobbelaere, P.P. Plehiers, R. Van de Vijver et al. Engineering 7 (2021) 1201–1211
Fig. 4. Strengths, weaknesses, opportunities, and threats in using machine learning as a modeling tool in chemical engineering.
function must be programmed. For regression problems, this statistical performance of the model on a test dataset, certain
implies that no detailed model equations must be derived or para- statements can be made about the accuracy and reliability of the
metrized [80]. These advantages allow efficient upscaling to large generated output. Detailed analysis of the model hyperparameters
systems and datasets without the need for extensive computa- (e.g., the number of nodes in an ANN) can be tedious, but can pro-
tional resources. An example is the current boom in predicting vide some insight into the correlations that have been learned by
quantum chemical properties using machine learning [32,33,35– the model. However, extracting physically meaningful explana-
37,39,40,47,49,50,52,55,65,68,71,73,115]. The usual ab initio tions for certain behaviors is infeasible. Hence, regardless of their
methods often require hours or days to calculate the properties speed and accuracy, machine learning models are a poor modeling
of a single molecule. Well-trained machine learning models can choice for explanatory studies.
make accurate predictions in a fraction of a second. Of course, This lack of interpretability contributes to the difficulty of
other fast techniques that can predict accurately have already been designing a proper machine learning model. As in any model, a
developed, but they are limited in application range compared machine learning model can overfit or underfit the data, with the
with machine learning models [125]. The inability to extrapolate proper model being situated somewhere in between. The risk of
is the major weakness of machine learning, but the application overfitting is typically much greater than the risk of underfitting
range can be extended quite easily by simply adding new data for machine learning models, and depends on the quality and
points. Active learning [126,127] makes it possible to expand the quantity of the training data, and on the complexity of the model.
range with a minimal amount of new data, which is ideal for cases Overfitting is an intrinsic property of the model structure and does
in which labeling is expensive (i.e., finding the true values of data not depend on the actual values of the hyperparameters—it can be
points), such as quantum chemical calculations [116] or chemical compared to fitting a (noisy) linear dataset with a polynomial of
experiments [72,128,129]. Furthermore, existing machine learning very high order. In deep learning, overfitting usually manifests
models, such as ChemProp [47] and SchNet [130,131], are ready to itself in the form of overtraining, which arises when the model is
use and do not require experience. Machine learning in general has shown the same data too many times. This results in the model
become very accessible with packages such as scikit-learn [132] memorizing noise instead of capturing general patterns. Overtrain-
and TensorFlow [133], and frameworks like Keras [134] (now part ing can be identified by comparing the model performance on the
of TensorFlow [133]) or PyTorch [135], which restrict the training training data with its performance on the validation and test data-
of a deep learning model to just a few lines of code. Such packages sets. If the training performance is much better than the validation
and frameworks give scientists the opportunity to shift their focus performance, the model may be overtrained. Finding the number
to the physical meaning of their research instead of spending of training epochs is often a difficult exercise. In order to avoid
precious time on developing high-level computer models. overfitting, a machine learning model requires a stopping criterion,
such as in other optimization problems. In traditional modeling,
4. Weaknesses where models typically involve at least some form of simplification
with respect to reality, this stopping criterion is typically based on
One of the main weaknesses of machine learning approaches is the change in performance on the training dataset, as achieving a
their black-box nature. Given a certain input, the approaches pro- high accuracy of the training data is the main challenge due to
vide an output. This situation is illustrated by Fig. 5. Based on the the simplifications. Achieving accuracy on the training dataset is
1205
M.R. Dobbelaere, P.P. Plehiers, R. Van de Vijver et al. Engineering 7 (2021) 1201–1211
Fig. 5. Unraveling the results from black-box models. A poor result is typically related to the training set used. When testing outside of the application range, a warning signal
should be raised. Good results require validation to understand what the model learns.
typically not the issue for machine learning models; rather, the ries or companies, and is hence not readily available. Even in a case
challenge mainly lies in achieving high accuracy on data the model where data is accessible, such as from an in-house database, the
was not directly trained on. Therefore, the stopping criterion available data might not be completely useful for machine learn-
should be based on the performance of the model on ‘‘unseen” ing. The same applies to data extracted from research papers or
data—the so-called validation dataset. For rigorously testing the patents using text-mining techniques [147]. The reason such data
optimized dataset, a completely independent dataset—the test might not be useful is because, in general, only successful experi-
dataset—is required, as is also common practice in traditional mod- ments are reported, while failed experiments remain unpublished
eling approaches. [148]. Furthermore, experiments or operation conditions that seem
A final—but often critical—weakness in machine learning to be nonsense to a human chemical engineer are not performed,
approaches is the data itself that was used. If there are too many because the engineer has insight and scientific knowledge.
systematic errors in the dataset, the network will make systematic Machine learning algorithms, however, do not know these bound-
errors itself, in what is known as the ‘‘garbage in–garbage out” aries and not including such ‘‘trivial” data might lead to bad
(GIGO) principle [136]. Some forms or sources of error can be iden- predictions.
tified relatively easily, while others—once made—are much harder
to find. As in every statistical method, outliers may be present. A
model trained on a small dataset is more affected by some outliers 5. Opportunities
than a large dataset. This is why not only quality, but also quantity
matters in machine learning. One possible solution to systematic The many strengths of machine learning methods present vari-
errors is to manually remove these points from the dataset; it is ous application opportunities, and recent developments have pro-
also possible to use algorithms for anomaly detection, such as vided ways to mitigate some of the most important criticisms. The
PCA [69,92], t-SNE [137,138], DBSCAN [139,140], or recurrent exceptionally high execution speeds of almost any trained machine
neural networks (LSTM networks) [111,141,142]. Recently, self- learning method makes such methods well-suited for applications
learning unsupervised neural network-based methods for anomaly in which accuracy and speed within predefined system boundaries
detection [143] have been developed [144–146]. Next to simple are important. Examples of such applications include feed-forward
outliers, there is always the possibility that the data points are process control and high-frequency real-time optimization [149–
actually wrong. Such data points might be one sample from an 151]. While empirical models often lack the accuracy for these
experiment in which a measurement error was made, or from a applications, detailed fundamental models are rarely fast enough
whole set of experiments that were conducted incorrectly. An to avoid computational delays. Machine learning models, trained
example could be the results from a chemical analysis in which on a fundamental model, can provide similar accuracy, yet at the
the apparatus was not calibrated. Training on a set of systemati- computational cost of an empirical model. In this case, a model is
cally false data is especially dangerous since the model will per- trained on high-level data and tries to predict the difference
ceive the false trend as truth. Identifying such cases is possible between the empirical outcome and the true value [152,153].
through diligent scrutiny of the published data. This example illus- Unsupervised algorithms can be used in process control applica-
trates the importance of data curation, which ensures that the data tions for discovering outliers in real-time data [93]. The combina-
used is accurate, reliable, and reproducible. tion of more accurate, rapid prediction and reliable industrial data
Obviously, data can only be curated when it is available. offers opportunities for the creation of digital twins and better con-
Although decades of modeling, simulating, and experimenting trol, leading to more efficient chemical processes.
have provided the chemical engineering community with a mas- A similar observation can be made in multiscale modeling
sive amount of data, this data is often stored in research laborato- approaches, where phenomena at a variety of different scales are
1206
M.R. Dobbelaere, P.P. Plehiers, R. Van de Vijver et al. Engineering 7 (2021) 1201–1211
modeled, resulting in a complex and strongly coupled set of equa- as the social sciences do [177], skepticism might grow in the com-
tions. The potential of machine learning in such applications munity due to the increasing irreproducible use of machine learn-
strongly depends on the aim of the multiscale approach. If the ing in the field. In Gartner’s hype cycle [178], machine learning and
aim is to gain fundamental insights into the lower scale phenom- deep learning are beyond the peak of inflated expectations [179],
ena, then machine learning is not advisable, due to its black-box and there is a risk of entering a period of disillusionment where
nature. However, if the smaller scales are incorporated into the interest is nearly gone. Next to irresponsible use of algorithms—
approach in order to obtain a more accurate model for larger scale and possibly more dangerous—is misinterpretation of the results.
phenomena, then machine learning could be used to replace the The black-box nature of the algorithms makes it difficult, and often
slow fundamental models for the smaller scales, without impact- nearly impossible, to understand why a certain result is obtained.
ing the interpretability of the larger scale phenomena. In addition, a model might give the correct outcome for the wrong
A final opportunity lies in providing an answer to one of the reasons [159]. Therefore, researchers should bear in mind an
main flaws of machine learning: its non-interpretability. The issue important rule from statistics when using machine learning: It is
of interpretable machine learning systems is not unique to about the correlations, not the causations.
chemical engineering problems—it is encountered in nearly any Another kind of unreasonable use of machine learning occurs
decision-making system [154–157]. An attempt has been made when the model leaves the application range it is created for.
in the field of catalysis to rationalize what exactly machine models The application range is determined by the training dataset and
learn [158]. This attempt, however, still does not provide any level is finite. When testing unknown data points, the researcher should
of direct interpretation of the model outcomes. Fig. 5 shows a check whether or not these points are within the application range.
workflow for explaining why a certain result is obtained. When When the points are outside of the range, it should be seen a warn-
the model outputs a good result, such as a chemical reaction pre- ing signal for the user that the model will perform poorly [92]. The
dictor giving the correct product, the model should only be trusted lower part of Fig. 5 depicts how the reason for obtaining a poor
after examining what the prediction is based on. A first step toward result is generally found by looking at the training set. Open-
interpretation of the model results is to quantify the individual source applications using clustering algorithms are available for
prediction uncertainties [159,160], as this gives an idea of the evaluating the data accuracy and its application range [180].
confidence the model has in its own decisions [115,161–164]. A last threat to applying machine learning in chemical engi-
One relatively straightforward way of doing so is via ensemble neering research is the growing educational gap when it comes
modeling. This methodology has been used for decades in weather to machine learning techniques. When applying computer and
forecasting and can be used in combination with nearly any model data science to chemistry and chemical engineering, it is important
type [165–167]. Several algorithms have also been created to to understand not only the tool that is used, but also the process it
determine how much certain input features influence the output is applied to. Therefore, simple training on how to use machine
[168], or to see which training points the model uses for a certain learning algorithms might become insufficient in the near future.
output [169,170]. When the results seem chemically or physically Instead, a good education on AI and statistical methods will
unreasonable, the model should be falsified instead of validated, by become vital in chemical engineering undergraduate programs.
finding adversarial examples [159]. Furthermore, the reason is usu- On the other hand, there is a need for more collaboration between
ally found in the dataset, with erroneous data or bias being present computer scientists and experts on the studied topic. Whereas
in the dataset [171,172]. undertrained researchers risk a wrong use of the computational
Another way of making machine learning models more inter- tools, computer and data scientists might obtain suboptimal
pretable is to include chemically relevant and well-founded infor- results when they are not fully familiar with the topic being stud-
mation in the models themselves. Interpretation will still require a ied. More interdisciplinary research and a symbiosis between
considerable amount of postprocessing, but—if human-readable machine learning experts and chemical experts might be a way
inputs are used and model architectures are not too complex—it to avoid a phase of disillusionment.
remains a feasible task. Very complex recurrent neural networks
using molecular fingerprints as input are nearly impossible to
interpret, as the model input is already difficult for a human to 7. Conclusions and perspectives
decipher. In risk management, the ‘‘as low as reasonably practica-
ble” (ALARP) principle is often applied [173]. Analogously, one In the past decade, machine learning has become a new tool in
could suggest an ‘‘as simple as reasonably possible” principle in the chemical engineer’s toolkit. Indeed, driven by its execution
order for machine learning models to be as interpretive as possible. speed, flexibility, and user-friendly applications, there is a strong,
growing interest in machine learning among chemical engineers.
On the flip side of this popularity is the risk of misusing machine
6. Threats learning or misinterpreting black-box results, which can poten-
tially lead to a distrust of machine learning within the chemical
The accessibility of machine learning models is both a major engineering community. The following three recommendations
strength and a major threat in research. While machine learning can help to improve the credibility of machine learning models
can be used by anyone with basic programming skills, it can also and turn them into an even more valuable and reliable modeling
be misused due to a lack of algorithmic knowledge. Today, a method.
plethora of machine learning algorithms are available, and a First, it is important to maintain easy and open access to data
tremendous number of combinations of parameters and hyper- and models within the community. High-quality data and open-
parameters is possible. Even for experienced users, machine learn- source models encourage researchers to use machine learning as
ing remains a reasoned trial-and-error method. Since researchers a tool and grant them the ability to focus on their topic rather than
are often unable to explain why one algorithm works while on programming and gathering data. Second, but related to the
another does not, some see machine learning as a type of modern first point, is the creation of interpretable models. Since machine
alchemy [174]. Moreover, the majority of published articles do not learning is already established in other research areas, new models
provide source code, or only a pseudocode, which makes it impos- for chemical applications are often inspired by existing algorithms.
sible to reproduce the work [175,176]. Although chemistry and Therefore, the field will benefit most from studying why a
chemical engineering do not face a reproducibility crisis as much certain output is generated from a given input, rather than from
1207
M.R. Dobbelaere, P.P. Plehiers, R. Van de Vijver et al. Engineering 7 (2021) 1201–1211
maintaining black boxes. The last recommendation is to invest in a USA. New York: Neural Information Processing Systems Foundation, Inc.;
2013.
profound algorithmic education. Although chemical engineers
[21] Bassam A, Conde-Gutierrez RA, Castillo J, Laredo G, Hernandez JA. Direct
typically have very strong mathematical and modeling skills, neural network modeling for separation of linear and branched paraffins by
understanding the computer science behind the graphical interface adsorption process for gasoline octane number improvement. Fuel
is a prerequisite for any modeler. This should also make it possible 2014;124:158–67.
[22] De Oliveira FM, de Carvalho LS, Teixeira LSG, Fontes CH, Lima KMG, Câmara
to define the application range of the model, which is crucial for an ABF, et al. Predicting cetane index, flash point, and content sulfur of diesel–
understanding of when the model is interpolating and when it is biodiesel blend using an artificial neural network model. Energy Fuels
extrapolating. This last point is definitely the most crucial: 2017;31(4):3913–20.
[23] Li H, Zhang Z, Liu Z. Application of artificial neural networks for catalysis: a
Machine learning models should be credible models, which can review. Catalysts 2017;7(10):306.
only be achieved by being vigilant for times when the model is [24] Abdul Jameel AG, Van Oudenhoven V, Emwas AH, Sarathy SM. Predicting
being used outside of its training set. octane number using nuclear magnetic resonance spectroscopy and artificial
neural networks. Energy Fuels 2018;32(5):6309–29.
[25] Plehiers PP, Symoens SH, Amghizar I, Marin GB, Stevens CV, Van Geem KM.
Acknowledgements Artificial intelligence in steam cracking modeling: a deep learning algorithm
for detailed effluent prediction. Engineering 2019;5(6):1027–40.
[26] Cavalcanti FM, Schmal M, Giudici R, Brito Alves RM. A catalyst selection
The authors acknowledge funding from the European Research method for hydrogen production through water–gas shift reaction using
Council (ERC) under the European Union’s Horizon 2020 research artificial neural networks. J Environ Manage 2019;237:585–94.
and innovation (818607). Pieter P. Plehiers and Ruben Van de [27] Hwangbo S, Al R, Sin G. An integrated framework for plant data-driven
process modeling using deep-learning with Monte-Carlo simulations.
Vijver acknowledge financial support, respectively, from a doctoral Comput Chem Eng 2020;143:107071.
(1150817N) and a postdoctoral (3E013419) fellowship from the [28] Weininger D. SMILES, a chemical language and information system. 1.
Research Foundation—Flanders (FWO). Introduction to methodology and encoding rules. J Chem Inf Comput Sci
1988;28(1):31–6.
[29] Heller S, McNaught A, Stein S, Tchekhovskoi D, Pletnev I. InChI–the
worldwide chemical structure identifier standard. J Cheminform 2013;5(1):7.
Compliance with ethics guidelines
[30] Krenn M, Häse F, Nigam A, Friederich P, Aspuru-Guzik A. Self-referencing
embedded strings (SELFIES): a 100% robust molecular string representation.
Maarten R. Dobbelaere, Pieter P. Plehiers, Ruben Van de Vijver, Mach Learn Sci Technol 2020;1(4):045024.
Christian V. Stevens, and Kevin M. Van Geem declare that they [31] Amar Y, Schweidtmann AM, Deutsch P, Cao L, Lapkin A. Machine learning and
molecular descriptors enable rational solvent selection in asymmetric
have no conflict of interest or financial conflicts to disclose. catalysis. Chem Sci 2019;10(27):6697–706.
[32] Yalamanchi KK, van Oudenhoven VCO, Tutino F, Monge-Palacios M, Alshehri
A, Gao X, et al. Machine learning to predict standard enthalpy of formation of
References hydrocarbons. J Phys Chem A 2019;123(38):8305–13.
[33] Yalamanchi KK, Monge-Palacios M, van Oudenhoven VCO, Gao X, Sarathy SM.
[1] Levenspiel O. Modeling in chemical engineering. Chem Eng Sci 2002;57(22– Data science approach to estimate enthalpy of formation of cyclic
23):4691–6. hydrocarbons. J Phys Chem A 2020;124(31):6270–6.
[2] Stokes GG. On the steady motion of incompressible fluids. In: Mathematical [34] Rupp M, Tkatchenko A, Müller KR, von Lilienfeld OA. Fast and accurate
and physical papers. Cambridge: Cambridge University Press; 2009. p. 1–16. modeling of molecular atomization energies with machine learning. Phys Rev
French. Lett 2012;108(5):058301.
[3] Navier CL. Memoire sur les lois du mouvement des fluides. Mem Acad Sci Inst [35] Hansen K, Biegler F, Ramakrishnan R, Pronobis W, von Lilienfeld OA, Müller
Fr 1827;6:389–440. French. KR, et al. Machine learning predictions of molecular properties: accurate
[4] Prandtl L. Über flussigkeitsbewegung bei sehr kleiner reibung. In: Riegels FW, many-body potentials and nonlocality in chemical space. J Phys Chem Lett
editor. Ludwig prandtl gesammelte abhandlungen. Berlin: Springer; 1904. p. 2015;6(12):2326–31.
484–91. German. [36] Faber FA, Hutchison L, Huang B, Gilmer J, Schoenholz SS, Dahl GE, et al.
[5] Siirola JJ, Powers GJ, Rudd DF. Synthesis of system designs: III. toward a Prediction errors of molecular machine learning models lower than hybrid
process concept generator. AIChE J 1971;17(3):677–82. DFT error. J Chem Theory Comput 2017;13(11):5255–64.
[6] Venkatasubramanian V. The promise of artificial intelligence in chemical [37] Gómez-Bombarelli R, Wei JN, Duvenaud D, Hernández-Lobato JM, Sánchez-
engineering: is it here, finally? AIChE J 2019;65(2):466–78. Lengeling B, Sheberla D, et al. Automatic chemical design using a data-driven
[7] Reaxys [Internet]. Amsterdam: Elsevier; c2021 [cited 2021 Jan 4]. Available continuous representation of molecules. ACS Cent Sci 2018;4(2):268–76.
from: https://www.elsevier.com/solutions/reaxys. [38] Liu S, Demirel MF, Liang Y. N-gram graph: simple unsupervised
[8] CAS SciFinder [Internet]. Columbus: American Chemical Society; c2021 [cited representation for graphs, with applications to molecules. In: Advances in
2021 Jan 4]. Available from: https://www.cas.org/products/scifinder. neural information processing systems 32. 2019 Dec 8–14; Vancouver, BC,
[9] ChemSpace [Internet]. Monmouth Junction: Chemspace US Inc.; c2021 [cited Canada. New York: Neural Information Processing Systems Foundation, Inc.;
2021 Jan 4]. Available from: https://chem-space.com/about. 2019.
[10] Ruddigkeit L, van Deursen R, Blum LC, Reymond JL. Enumeration of 166 [39] Wang S, Guo Y, Wang Y, Sun H, Huang J. SMILES-BERT: large scale
billion organic small molecules in the chemical universe database GDB-17. J unsupervised pre-training for molecular property prediction. In:
Chem Inf Model 2012;52(11):2864–75. Proceedings of the 10th ACM International Conference on Bioinformatics,
[11] NIST Chemistry WebBook. Washington, DC: National Institute of Standards Computational Biology and Health Informatics; 2019 Sep 7–10; Niagara Falls,
and Technology, US Department of Commerce; c2018 [cited 2021 Jan 4]. NY, USA. New York: IEEE; 2019. p. 429–36.
Available from: https://webbook.nist.gov/chemistry/. [40] Chithrananda S, Grand G, Ramsundar B. ChemBERTa: large-scale self-
[12] Pettit LD. The IUPAC stability constants database. Chem Int 2006;28(5):14–5. supervised pretraining for molecular property prediction. 2020.
[13] Chen G, Chen P, Hsieh CY, Lee CK, Liao B, Liao R, et al. Alchemy: a quantum arXiv:2010.09885.
chemistry dataset for benchmarking AI models. 2019. arXiv:1906.09427. [41] Fabian B, Edlich T, Gaspar H, Ahmed M. Molecular representation learning
[14] Delaney JS. ESOL: estimating aqueous solubility directly from molecular with language models and domain-relevant auxiliary tasks. 2020.
structure. J Chem Inf Comput Sci 2004;44(3):1000–5. arXiv:2011.13230.
[15] Mobley DL, Guthrie JP. FreeSolv: a database of experimental and calculated [42] Glem RC, Bender A, Arnby CH, Carlsson L, Boyer S, Smith J. Circular
hydration free energies, with input files. J Comput Aided Mol Des 2014;28 fingerprints: flexible molecular descriptors with applications from physical
(7):711–20. chemistry to ADME. IDrugs 2006;9(3):199–204.
[16] Hall MA. Correlation-based feature selection for machine learning [43] Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, et al.
[dissertation]. Hamilton: The University of Waikato; 1999. QSAR modeling: where have you been? where are you going to? J Med Chem
[17] Khalid S, Khalil T, Nasreen SA. A survey of feature selection and feature 2014;57(12):4977–5010.
extraction techniques in machine learning. In: Proceedings of 2014 Science [44] Sumudu PL, Steffen L. Computational methods in drug discovery. Beilstein J
and Information Conference; 2014 Aug 27–29; London, UK. New York: IEEE; Org Chem 2016;12:2694–718.
2014. [45] Unterthiner T, Mayr A, Klambauer G, Steijaert M, Wegner J, Ceulemans H,
[18] Xue B, Zhang M, Browne WN, Yao X. A survey on evolutionary computation et al. Deep learning as an opportunity in virtual screening. In: Proceedings of
approaches to feature selection. IEEE Trans Evol Comput 2016;20(4):606–26. Workshop on Machine Learning for Clinical Data Analysis, Healthcare and
[19] Cai J, Luo J, Wang S, Yang S. Feature selection in machine learning: a new Genomics (NIPS2014); 2014 Dec 8–13; Montreal, QC, Canada. Linz: Johannes
perspective. Neurocomputing 2018;300:70–9. Kepler University Linz; 2015.
[20] Szegedy C, Toshev A, Erhan D. Deep neural networks for object detection. In: [46] Mayr A, Klambauer G, Unterthiner T, Steijaert M, Wegner JK, Ceulemans H,
Proceedings of NIPS 2013-Twenty-Seventh Annual Conference on Neural et al. Large-scale comparison of machine learning methods for drug target
Information Processing Systems Conference. 2013 Dec 9–12; Nevada, CA, prediction on ChEMBL. Chem Sci 2018;9(24):5441–51.
1208
M.R. Dobbelaere, P.P. Plehiers, R. Van de Vijver et al. Engineering 7 (2021) 1201–1211
[47] Yang K, Swanson K, Jin W, Coley C, Eiden P, Gao H, et al. Analyzing learned [76] Duan H, Wang L, Zhang C, Guo L, Li J. Retrosynthesis with attention-based
molecular representations for property prediction. J Chem Inf Model 2019;59 NMT model and chemical analysis of ‘‘wrong” predictions. RSC Adv 2020;10
(8):3370–88. (3):1371–8.
[48] Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarelli R, Aspure A, Adams RP, [77] Lee AA, Yang Q, Sresht V, Bolgar P, Hou X, Klug-McLeod JL, et al. Molecular
et al. Convolutional networks on graphs for learning molecular fingerprints. transformer unifies reaction prediction and retrosynthesis across pharma
In: Proceedings of the 28th International Conference on Neural Information chemical space. Chem Commun 2019;55(81):12152–5.
Processing Systems; 2015 Dec 8–12; Bali, Indonesia. Cambridge: MIR Press; [78] Schwaller P, Laino T, Gaudin T, Bolgar P, Hunter CA, Bekas C, et al. Molecular
2015. p. 2224–32. transformer: a model for uncertainty-calibrated chemical reaction prediction.
[49] Coley CW, Barzilay R, Green WH, Jaakkola TS, Jensen KF. Convolutional ACS Cent Sci 2019;5(9):1572–83.
embedding of attributed molecular graphs for physical property prediction. J [79] Michalski RS, Carbonell JG, Mitchell TM. A comparative review of selected
Chem Inf Model 2017;57(8):1757–72. methods for learning from examples. Mach Learn 2013;1:41–82.
[50] Coley CW, Jin W, Rogers L, Jamison TF, Jaakkola TS, Green WH, et al. A graph- [80] Dey A. Machine learning algorithms: a review. Int J Comput Sci Inf Technol
convolutional neural network model for the prediction of chemical reactivity. 2016;7(3):1174–9.
Chem Sci 2018;10(2):370–7. [81] Pearson K. Contributions to the mathematical theory of evolution. Philos
[51] Defferrard M, Bresson X, Vandergheynst P. Convolutional neural networks on Trans R Soc Lond A 1894;185:71–110.
graphs with fast localized spectral filtering. In: Proceedings of the 30th [82] Hotelling H. Analysis of a complex of statistical variables into principal
International Conference on Neural Information Processing Systems; 2016 components. J Educ Psychol 1933;24(6):417–41.
Dec 5–10; Barcelona, Spain. New York: Curran Associates, Inc; 2016. p. [83] Pearson K. On lines and planes of closest fit to systems of points in space.
3844–52. Lond Edinb Dublin Philos Mag J Sci 1901;2(11):559–72.
[52] Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, et al. [84] Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res
MoleculeNet: a benchmark for molecular machine learning. Chem Sci 2017;9 2008;9:2579–605.
(2):513–30. [85] Ester M, Kriegel HP, Sander J, Xu XW. A density-based algorithm for
[53] Kearnes S, McCloskey K, Berndl M, Pande V, Riley P. Molecular graph discovering clusters in large spatial databases with noise. In: Proceedings of
convolutions: moving beyond fingerprints. J Comput Aided Mol Des 2016;30 the Second International Conference on Knowledge Discovery and Data
(8):595–608. Mining. 1996 Aug 2–4; Portland, OR, USA. New York: AAAI Press; 1996.
[54] Battaglia P, Pascanu R, Lai M, Rezende DJ, Koray K. Interaction networks for p. 226–31.
learning about objects, relations and physics. In: Proceedings of the 30th [86] Palkovits R, Palkovits S. Using artificial intelligence to forecast water
International Conference on Neural Information Processing Systems; 2016 oxidation catalysts. ACS Catal 2019;9(9):8383–7.
Dec 5–10; Barcelona, Spain. New York: Curran Associates, Inc.; 2016. p. [87] Likas A, Vlassis N, Verbeek J. The global k-means clustering algorithm. Pattern
4502–10. Recognit 2003;36(2):451–61.
[55] Schütt KT, Arbabzadah F, Chmiela S, Müller KR, Tkatchenko A. Quantum- [88] Tang J, Yan X. Neural network modeling relationship between inputs and
chemical insights from deep tensor neural networks. Nat Commun 2017;8 state mapping plane obtained by FDA–t-SNE for visual industrial process
(1):13890. monitoring. Appl Soft Comput 2017;60:577–90.
[56] Jørgensen PB, Jacobsen KW, Schmidt MN. Neural message passing with edge [89] Zheng S, Zhao J. A new unsupervised data mining method based on the
updates for predicting properties of molecules and materials. In: Proceedings stacked autoencoder for chemical process fault diagnosis. Comput Chem Eng
of 32nd Conference on Neural Information Processing Systems; 2018 Dec 3– 2020;135:106755.
8; Montreal, QC, Canada. New York: Neural Information Processing Systems [90] Gao H, Struble TJ, Coley CW, Wang Y, Green WH, Jensen KF. Using machine
Foundation, Inc.; 2018. learning to predict suitable conditions for organic reactions. ACS Cent Sci
[57] Li Y, Tarlow D, Brockschmidt M, Zemel R. Gated graph sequence neural 2018;4(11):1465–76.
network. 2017. arXiv1511.05493. [91] Vermeire FH, Green WH. Transfer learning for solvation free energies: from
[58] Winter R, Montanari F, Noé F, Clevert DA. Learning continuous and data- quantum chemistry to experiments. Chem Eng J 2020;418:129307.
driven molecular descriptors by translating equivalent chemical [92] Pyl SP, Van Geem KM, Reyniers MF, Marin GB. Molecular reconstruction of
representations. Chem Sci 2018;10(6):1692–701. complex hydrocarbon mixtures: an application of principal component
[59] Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE. Neural message passing analysis. AIChE J 2010;56(12):3174–88.
for quantum chemistry. In: Proceedings of the 34th International Conference [93] Thombre M, Mdoe Z, Jäschke J. Data-driven robust optimal operation of
on Machine Learning; 2017 Aug 6–11; Sydney, NSW, Australia. New thermal energy storage in industrial clusters. Processes 2020;8(2):194.
York: JMLR.org; 2017. p. 1263–72. [94] Lee JM, Yoo C, Choi SW, Vanrolleghem PA, Lee IB. Nonlinear process
[60] David L, Thakkar A, Mercado R, Engkvist O. Molecular representations in AI- monitoring using kernel principal component analysis. Chem Eng Sci
driven drug discovery: a review and practical guide. J Cheminform 2020;12 2004;59(1):223–34.
(1):56. [95] Choi SW, Park JH, Lee IB. Process monitoring using a Gaussian mixture model
[61] Morgan HL. The generation of a unique machine description for chemical via principal component analysis and discriminant analysis. Comput Chem
structures—a technique developed at chemical abstracts service. J Chem Doc Eng 2004;28(8):1377–87.
1965;5(2):107–13. [96] Ning C, You F. Data-driven decision making under uncertainty integrating
[62] Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model robust optimization with principal component analysis and kernel smoothing
2010;50(5):742–54. methods. Comput Chem Eng 2018;112:190–210.
[63] Pattanaik L, Coley CW. Molecular representation: going long on fingerprints. [97] Kano M, Hasebe S, Hashimoto I, Ohno H. A new multivariate statistical
Chem 2020;6(6):1204–7. process monitoring method using principal component analysis. Comput
[64] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521(7553):436–44. Chem Eng 2001;25(7–8):1103–13.
[65] Von Lilienfeld OA. First principles view on chemical compound space: gaining [98] Chiang LH, Pell RJ, Seasholtz MB. Exploring process data with the use of
rigorous atomistic control of molecular properties. Int J Quantum Chem robust outlier detection algorithms. J Process Contr 2003;13(5):437–49.
2013;113(12):1676–89. [99] Zhang X, Zou Y, Li S, Xu S. A weighted auto regressive LSTM based approach
[66] James CA. Daylight theory manual [internet]. Laguna Niguel: Daylight for chemical processes modeling. Neurocomputing 2019;367:64–74.
Chemical Information Systems, Inc.; c1997–2019 [cited 2021 Jan 4]. [100] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput
Available from: http://www.daylight.com/dayhtml/doc/theory/. 1997;9(8):1735–80.
[67] Grethe G, Blanke G, Kraut H, Goodman JM. International chemical identifier [101] Géron A. Hands-on machine learning with Scikit-Learn, Keras, and
for reactions (RInChI). J Cheminform 2018;10(1):22. TensorFlow: concepts, tools, and techniques to build intelligent
[68] Segler MHS, Waller MP. Neural-symbolic machine learning for retrosynthesis systems. Sebastopol: O’Reilly Media; 2019.
and reaction prediction. Chemistry 2017;23(25):5966–71. [102] Safavian SR, Landgrebe D. A survey of decision tree classifier methodology.
[69] Plehiers PP, Coley CW, Gao H, Vermeire FH, Dobbelaere MR, Stevens CV, et al. IEEE Trans Syst Man Cybern 1991;21(3):660–74.
Artificial intelligence for computer-aided synthesis in flow: analysis and [103] Ho TK. Random decision forests. In: Proceedings of 3rd International
selection of reaction components. Front Chem Eng 2020;2:5. Conference on Document Analysis and Recognition; 1995 Aug 14–16;
[70] Sanchez-Lengeling B, Aspuru-Guzik A. Inverse molecular design using Montreal, QC, Canada. New York: IEEE; 1995. p. 278–82.
machine learning: generative models for matter engineering. Science [104] Vapnik V. The support vector method of function estimation. In: Suykens
2018;361(6400):360–5. JAK, Vandewalle J, editors. Nonlinear modeling. Boston: Springer; 1998.
[71] Wei JN, Duvenaud D, Aspuru-Guzik A. Neural networks for the prediction of p. 55–85.
organic chemistry reactions. ACS Cent Sci 2016;2(10):725–32. [105] Matsugu M, Mori K, Mitari Y, Kaneda Y. Subject independent facial expression
[72] Eyke NS, Green WH, Jensen KF. Iterative experimental design based on active recognition with robust face detection using a convolutional neural network.
machine learning reduces the experimental burden associated with reaction Neural Netw 2003;16(5–6):555–9.
screening. React Chem Eng 2020;5(10):1963–72. [106] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep
[73] Coley CW, Barzilay R, Jaakkola TS, Green WH, Jensen KF. Prediction of organic convolutional neural networks. Commun ACM 2017;60(6):84–90.
reaction outcomes using machine learning. ACS Cent Sci 2017;3(5):434–43. [107] Shiffman D. Neural networks. In: Fry S, editor. The nature of code. Boston:
[74] Nam J, Kim J. Linking the neural machine translation and the prediction of Free Software Foundation; 2012. p. 444–80.
organic chemistry reactions. 2016. arXiv:1612.09529. [108] Hopfield JJ. Artificial neural networks. IEEE Circuits Devices Mag 1988;4
[75] Schwaller P, Gaudin T, Lányi D, Bekas C, Laino T. ‘‘Found in translation”: (5):3–10.
predicting outcomes of complex organic chemistry reactions using neural [109] Bontemps L, Cao VL, McDermott J, Le-Khac N. Collective anomaly detection
sequence-to-sequence models. Chem Sci 2018;9(28):6091–8. based on long short-term memory recurrent neural networks. In:
1209
M.R. Dobbelaere, P.P. Plehiers, R. Van de Vijver et al. Engineering 7 (2021) 1201–1211
Proceedings of International Conference on Future Data and Security Conference on Similarity Search and Applications; 2017 Oct 4–6; Munich,
Engineering; 2016 Nov 23–25; Can Tho City, Vietnam. Cham: Springer Germany. Berlin: Springer; 2017. p. 188–203.
International Publishing; 2016. [138] Perez H, Tah JHM. Improving the accuracy of convolutional neural networks
[110] Brotherton T, Johnson T. Anomaly detection for advanced military aircraft by identifying and removing outlier images in datasets using t-SNE.
using neural networks. In: Proceedings of 2001 IEEE Aerospace Conference; Mathematics 2020;8(5):662.
2001 Mar 10–17; Big Sky, MT, USA. New York: IEEE; 2001. [139] Çelik M, Dadasßer-Çelik F, Dokuz AS ß . Anomaly detection in temperature data
[111] Malhotra P, Vig L, Shroff G, Agarwal P. Long short term memory networks for using DBSCAN algorithm. In: Proceedings of 2011 International Symposium
anomaly detection in time series. In: Proceedings of 23rd European on Innovations in Intelligent Systems and Applications; 2011 Jun 15–18;
Symposium on Artificial Neural Networks, Computational Intelligence and Istanbul, Turkey. New York: IEEE; 2016. p. 91–5.
Machine Learning, ESANN 2015; 2015 Apr 22–24; Bruges, [140] Cassisi C, Ferro A, Giugno R, Pigola G, Pulvirenti A. Enhancing density-based
Belgium. Wallonie: i6doc; 2015. clustering: parameter reduction and outlier detection. Inf Syst 2013;38
[112] Chalapathy R, Menon AK, Chawla S. Anomaly detection using one-class (3):317–30.
neural networks. 2018. arXiv:1802.06360. [141] Fernando T, Denman S, Sridharan S, Fookes C. Soft + hardwired attention: an
[113] Zhou S, Shen W, Zeng D, Fang M, Wei Y, Zhang Z. Spatial–temporal LSTM framework for human trajectory prediction and abnormal event
convolutional neural networks for anomaly detection and localization in detection. Neural Netw 2018;108:466–78.
crowded scenes. Signal Process Image Commun 2016;47:358–68. [142] Filonov P, Lavrentyev A, Vorontsov A. Multivariate industrial time series with
[114] Grambow CA, Pattanaik L, Green WH. Deep learning of activation energies. J cyber-attack simulation: fault detection using an LSTM-based predictive data
Phys Chem Lett 2020;11(8):2992–7. model. 2016. arXiv:1612.06676.
[115] Scalia G, Grambow CA, Pernici B, Li YP, Green WH. Evaluating scalable [143] Chandola V, Banerjee A, Kumar V. Anomaly detection: a survey. ACM Comput
uncertainty estimation methods for deep learning-based molecular property Surv 2009;41(3):1–58.
prediction. J Chem Inf Model 2020;60(6):2697–717. [144] Ahmad S, Lavin A, Purdy S, Agha Z. Unsupervised real-time anomaly
[116] Li YP, Han K, Grambow CA, Green WH. Self-evolving machine: a continuously detection for streaming data. Neurocomputing 2017;262:134–47.
improving model for molecular thermochemistry. J Phys Chem A 2019;123 [145] Amini M, Jalili R, Shahriari HR. RT-UNNID: a practical solution to real-time
(10):2142–52. network-based intrusion detection using unsupervised neural networks.
[117] Grambow CA, Li YP, Green WH. Accurate thermochemistry with small data Comput Secur 2006;25(6):459–68.
sets: a bond additivity correction and transfer learning approach. J Phys [146] Schlegl T, Seeböck P, Waldstein SM, Schmidt-Erfurth U, Langs G.
Chem A 2019;123(27):5826–35. Unsupervised anomaly detection with generative adversarial networks
[118] Christensen AS, Bratholm LA, Faber FA, Anatole von Lilienfeld O. FCHL to guide marker discovery. In: Proceedings of International
revisited: faster and more accurate quantum machine learning. J Chem Phys Conference on Information Processing in Medical Imaging 2017; 2017
2020;152(4):044107. Jun 25–30; Boone, KY, USA. Cham: Springer International Publishing;
[119] Azlan Hussain M. Review of the applications of neural networks in chemical 2017. p. 146–57.
process control—simulation and online implementation. Artif Intell Eng [147] Schneider N, Lowe DM, Sayle RA, Tarselli MA, Landrum GA. Big data from
1999;13(1):55–68. pharmaceutical patents: a computational analysis of medicinal chemists’
[120] Schweidtmann AM, Mitsos A. Deterministic global optimization with bread and butter. J Med Chem 2016;59(9):4385–402.
artificial neural networks embedded. J Optim Theory Appl 2019;180 [148] Raccuglia P, Elbert KC, Adler PDF, Falk C, Wenny MB, Mollo A, et al. Machine-
(3):925–48. learning-assisted materials discovery using failed experiments. Nature
[121] Zhu W, Sun W, Romagnoli J. Adaptive k-nearest-neighbor method for process 2016;533(7601):73–6.
monitoring. Ind Eng Chem Res 2018;57(7):2574–86. [149] Wu Z, Rincon D, Christofides PD. Real-time adaptive machine-learning-based
[122] Yan S, Yan X. Using labeled autoencoder to supervise neural network predictive control of nonlinear processes. Ind Eng Chem Res 2020;59
combined with k-nearest neighbor for visual industrial process monitoring. (6):2275–90.
Ind Eng Chem Res 2019;58(23):9952–8. [150] Zhang Z, Wu Z, Rincon D, Christofides P. Real-time optimization and control
[123] Walker E, Kammeraad J, Goetz J, Robo MT, Tewari A, Zimmerman PM. of nonlinear processes using machine learning. Mathematics 2019;7(10):890.
Learning to predict reaction conditions: relationships between solvent, [151] Powell BKM, Machalek D, Quah T. Real-time optimization using
molecular structure, and catalyst. J Chem Inf Model 2019;59 reinforcement learning. Comput Chem Eng 2020;143:107077.
(9):3645–54. [152] Ramakrishnan R, Dral PO, Rupp M, von Lilienfeld OA. Big data meets quantum
[124] Zahrt AF, Henle JJ, Rose BT, Wang Y, Darrow WT, Denmark SE. Prediction of chemistry approximations: the D-machine learning approach. J Chem Theory
higher-selectivity catalysts by computer-driven workflow and machine Comput 2015;11(5):2087–96.
learning. Science 2019;363(6424):eaau5631. [153] Bikmukhametov T, Jäschke J. Combining machine learning and process
[125] Han K, Jamal A, Grambow CA, Buras ZJ, Green WH. An extended group engineering physics towards enhanced accuracy and explainability of data-
additivity method for polycyclic thermochemistry estimation. Int J Chem driven models. Comput Chem Eng 2020;138:106834.
Kinet 2018;50(4):294–303. [154] Gunning D, Aha DW. DARPA’s explainable artificial intelligence program. AI
[126] Settles B. From theories to queries: active learning in practice. JMLR Mag 2019;40(2):44–58.
2011;16:1–18. [155] Abdul A, Vermeulen J, Wang D, Lim BY, Kankanhali M. Trends and trajectories
[127] Settles B. Active learning literature survey. Computer Sciences Technical for explainable, accountable and intelligible systems: an HCI research
Report 1648. Madison: University of Wisconsin–Madison; 2009. agenda. In: Proceedings of the 2018 CHI Conference on Human Factors in
[128] Clayton AD, Schweidtmann AM, Clemens G, Manson JA, Taylor CJ, Niño CG, Computing Systems; 2018 Apr 21; Montreal, QC, Canada. New
et al. Automated self-optimisation of multi-step reaction and separation York: Association for Computing Machinery; 2018.
processes using machine learning. Chem Eng J 2020;384:123340. [156] Lepri B, Oliver N, Letouzé E, Pentland A, Vinck P. Fair, transparent, and
[129] Zhang C, Amar Y, Cao L, Lapkin AA. Solvent selection for mitsunobu reaction accountable algorithmic decision-making processes. Philos Technol 2018;31
driven by an active learning surrogate model. Org Process Res Dev 2020;24 (4):611–27.
(12):2864–73. [157] Wachter S, Mittelstadt B, Floridi L. Transparent, explainable, and accountable
[130] Schütt KT, Sauceda HE, Kindermans PJ, Tkatchenko A, Müller KR. SchNet-deep AI for robotics. Sci Robot 2017;2(6):eaan6080.
learning architecture for molecules and materials. J Chem Phys 2018;148 [158] Kammeraad JA, Goetz J, Walker EA, Tewari A, Zimmerman PM. What does the
(24):241722. machine learn? Knowledge representations of chemical reactivity. J Chem Inf
[131] Schütt KT, Kessel P, Gastegger M, Nicoli KA, Tkatchenko A, Müller KR. Model 2020;60(3):1290–301.
SchNetPack: a deep learning toolbox for atomistic systems. J Chem Theory [159] Kovács DP, McCorkindale W, Lee AA. Quantitative interpretation explains
Comput 2019;15(1):448–55. machine learning models for chemical reaction prediction and uncovers bias.
[132] Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Nat Comm 2021;12:1695.
Scikit-learn: machine learning in Python. J Mach Learn Res 2011;12:2825–30. [160] Preuer K, Klambauer G, Rippmann F, Hochreiter S, Unterthiner T.
[133] Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, et al. TensorFlow: a Interpretable deep learning in drug discovery. In: Samek W, Montavon G,
system for large-scale machine learning. In: Proceedings of the 12th USENIX Vedaldi A, Hansen LK, Müller KR, editors. Explainable AI: interpreting,
Symposium on Operating Systems Design and Implementation (OSDI’16); explaining and visualizing deep learning. Cham: Springer International
2016 Nov 2–4; Savannah, GA, USA. Northbrook: USENIX; 2016. Publishing; 2019. p. 331–45.
[134] Chollet F. Keras [internet]. San Francisco: GitHub, Inc.; 2021 Jun 18 [cited [161] Begoli E, Bhattacharya T, Kusnezov D. The need for uncertainty quantification
2021 Jan 4]. Available from: https://github.com/keras-team/keras. in machine-assisted medical decision making. Nat Mach Intell 2019;1
[135] Paszke A, Gross S, Massa F, Lerer A, Chintala S. Pytorch: an imperative style, (1):20–3.
high-performance deep learning library. In: Proceedings of 33rd Conference [162] Mohamed L, Christie MA, Demyanov V. Comparison of stochastic sampling
on Neural Information Processing Systems; 2019 Dec 8–14; Vancouver, BC, algorithms for uncertainty quantification. SPE J 2010;15(01):31–8.
Canada. New York: Neural Information Processing Systems Foundation, Inc.; [163] Gal Y, Ghahramani Z. Dropout as a Bayesian approximation: representing
2019. model uncertainty in deep learning. 2016. arXiv:1506.02142v6.
[136] Bininda-Emonds ORP, Jones KE, Price SA, Cardilloe M, Grenyer R, Purvis A. [164] Fridlyand A, Johnson MS, Goldsborough SS, West RH, McNenly MJ, Mehl M,
Garbage in, garbage out. In: Bininda-Emonds ORP, editor. Phylogenetic et al. The role of correlations in uncertainty quantification of transportation
supertrees. Berlin: Springer; 2004. p. 267–80. relevant fuel models. Combust Flame 2017;180:239–49.
[137] Schubert E, Gertz M. Intrinsic t-stochastic neighbor embedding for [165] Parker WS. Ensemble modeling, uncertainty and robust predictions. Wiley
visualization and outlier detection. In: Proceedings of International Interdiscip Rev Clim Change 2013;4(3):213–23.
1210
M.R. Dobbelaere, P.P. Plehiers, R. Van de Vijver et al. Engineering 7 (2021) 1201–1211
[166] Gneiting T, Raftery AE. Weather forecasting with ensemble methods. Science [173] Melchers RE. On the ALARP approach to risk management. Reliab Eng Syst Saf
2005;310(5746):248–9. 2001;71(2):201–8.
[167] Derome J. On the average errors of an ensemble of forecasts. Atmos Ocean [174] Hutson M. Has artificial intelligence become alchemy? Science 2018;360
1981;19(2):103–27. (6388):478.
[168] Sundararajan M, Taly A, Yan Q. Axiomatic attribution for deep networks. [175] Hutson M. Artificial intelligence faces reproducibility crisis. Science 2018;359
2017. arXiv:1703.01365. (6377):725–6.
[169] Datta A, Sen S, Zick Y. Algorithmic transparency via quantitative input [176] Gundersen OE, Kjensmo S. State of the art: reproducibility in artificial
influence: theory and experiments with learning systems. In: Proceedings of intelligence. In: Proceedings of The Thirty-Second AAAI Conference on
2016 IEEE Symposium on Security and Privacy (SP); 2016 May 22–26; San Artificial Intelligence (AAAI-18); 2018 Feb 2–7; New Orleans, LA, USA. Palo
Jose, CA, USA. New York: IEEE; 2016. p. 598–617. Alto: AAAI Press; 2018. p. 1644–51.
[170] Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D. A survey [177] Baker M. 1,500 scientists lift the lid on reproducibility. Nature
of methods for explaining black box models. ACM Comput Surv 2018;51 2016;533:452–4.
(5):93. [178] Fenn J, Linden A. Understanding Gartner’s hype cycles. Report. Stamford:
[171] Lin AI, Madzhidov TI, Klimchuk O, Nugmanov RI, Antipin IS, Varnek Gartner, Inc.; 2003 May. Report No: R-20-1971.
A. Automatized assessment of protective group reactivity: a step [179] Sicular S. Vashisth S. Hype cycle for artificial intelligence, 2020 [Internet].
toward big reaction data analysis. J Chem Inf Model 2016;56(11): Reading: CloudFactory; 2020 Jul 27 [cited 2021 Jan 4]. Available from: https://
2140–8. www.cloudfactory.com/reports/gartner-hype-cycle-for-artificial-intelligence.
[172] Pesciullesi G, Schwaller P, Laino T, Reymond JL. Transfer learning enables the [180] Symoens SH, Aravindakshan SU, Vermeire FH, De Ras K, Djokic MR, Marin GB,
molecular transformer to predict regio- and stereoselective reactions on et al. QUANTIS: data quality assessment tool by clustering analysis. Int J
carbohydrates. Nat Commun 2020;11(1):4874. Chem Kinet 2019;51(11):872–85.
1211