Machine learning enables polymer cloud-point engineering via inverse
design
Jatin N. Kumar,1†* Qianxiao Li,2† Karen Y.T. Tang,1 Tonio Buonassisi,3 Anibal L. GonzalezOyarce,2 Jun Ye,2
1
Institute of Materials Research & Engineering, 2 Fusionopolis Way, #08-03, Singapore
138634
2
Institute of High-Performance Computing, 1 Fusionopolis Way, #16-16, Singapore 138632
3
Massachussets Institute of Technology, Cambridge, MA 02139, USA
*Correspondence to: kumarjn@imre.a-star.edu.sg
†These authors contributed equally to this work
Abstract: Inverse design is an outstanding challenge in disordered systems with multiple
length scales such as polymers, particularly when designing polymers with desired phase
behavior. We demonstrate high-accuracy tuning of poly(2-oxazoline) cloud point via machine
learning. With a design space of four repeating units and a range of molecular masses, we
achieve an accuracy of 4°C root mean squared error (RMSE) in a temperature range of 24–
90°C, employing gradient boosting with decision trees. The RMSE is >3x better than linear
and polynomial regression. We perform inverse design via particle-swarm optimization,
predicting and synthesizing 17 polymers with constrained design at 4 target cloud points from
37 to 80°C. Our approach challenges the status quo in polymer design with a machine learning
algorithm, that is capable of fast and systematic discovery of new polymers.
Introduction
Polymers are ubiquitous in both structural and functional systems, owing to their
highly tunable physical, chemical, and electrical properties.1, 2, 3, 4 The development of
polymers has historically been based on an Edisonian approach. Herein, we develop a
machine learning framework to predict polymer structure (topology, composition,
functionality, and size), on the basis of target phase properties, specifically the cloud point.
This framework accommodates the complex disorder across multiple length scales that
distinguishes polymers from small molecules, 5, 6, 7 inorganic crystals,8 and systems-structure
optimization.9, 10, 11
Phase properties, which describe the order of a polymer across multiple length scales,
are determined by interactions of polymers with other polymers, the solution, and
themselves. One such phase property is the cloud point, the temperature at which polymers
are no longer miscible in solution. Numerous studies tabulate simple relationships between
cloud point and one or two experimental variables (e.g., structure12 and temperature13, 14), or
offer polynomial fits to the data.15 Ramprasad and colleagues applied machine learning to
density-functional theory (DFT) calculations to predict opto-electronic16, 17 and physical18 bulk
polymer properties.4, 18 However, this approach is computationally expensive,7, 19 particularly
for polymer systems,20 and does not enable scalable inverse design over a wide range of
conditions with high accuracy.21,22
In this study, we combine machine learning, domain expertise, and experiment to
solve the inverse design problem for polymers. Our framework (Fig. 1) has three parts: (1)
data curation (defining material descriptors) that relates poly(2-oxazolines) cloud point, size,
and relative ratios of 4 different monomer units; (2) machine learning algorithm selection and
hyperparameter tuning to enable fast forward prediction of cloud point based on structure
with the evaluation of algorithmic robustness over systematic error and differing data
quality23; and (3) use of said algorithm for inverse design using particle swarm optimization
(PSO) with design selection using an ensemble of neural networks. We demonstrate the
accuracy of our inverse-design paradigm by predicting the compositions of, and synthesizing,
17 polymers , not previously reported in literature, with cloud points between 37 to 80°C,
using a modular combination of 4 repeating monomer units. We achieve ~ 4°C error, nearly
within experimental error (1–3°C).
Fig. 1: Study framework. First, we train a machine learning model to predict cloud point on
the basis of poly(2-oxazoline) structure, with varying ratios of four monomer units (building
blocks) and molecular weights. Second, we demonstrate inverse design using the trained
algorithm and particle swarm optimization, predicting 17 polymer structures from userdefined cloud points. The model accommodates the inherent complexity of polymers over
multiple length scales.
Discussion
We combine and curate literature and experimental data to create the input into our
machine learning framework. Historical cloud-point data for poly(2-oxazoline)s15, 24, 25, 26, 27, 28,
29
was curated into a set of input variables ((1) molecular weight of the polymers; (2)
polydispersity index; (3) polymer type (homo, statistical, or block); (4) total number of each
monomer unit in the final polymer (A: EtOx, B: nPropOx, C: cPropOx, D: iPropOx, E: esterOx))
and output variables (cloud point in ˚C) (Table S1). We synthesized 87 poly(2-oxazoline)s by
similar methods to augment this data (Table S2). Cloud point was evaluated by dynamic-light
scattering (DLS) in accordance with best practices,30 particularly since DLS affords greater
weightage to the modal mass as a correction for the asymmetric molecular weight
distributions (MWD) of our synthesized polymers (details in supplementary materials). Due
to data scarcity, esterOx was not synthesized nor considered in inverse design. The
relationships of individual input variables to the output cloud point are plotted in Fig. 2.
a
d
b
c
f
e
Fig. 2: The dependencies cloud points to the mole fraction of (a) EtOx; (b) nPropOx; (c)
cPropOx; (d) iPropOx; and (e) molecular mass (M), where all zero values were filtered from
the graphs, and, (f) the number distribution of cloud-point where zero represents polymers
without a CP
We test whether machine learning methods have superior predictive accuracies to
simple regression methods in this multi-variable parameter space 31, 32, 33. We compare the
root-mean-squared errors (RMSE) of simple linear and quadratic regressions against machine
learning methods including support vector regressions (SVR), neural networks (NN) and
gradient boosting regression with decision trees (GBR) (Fig. 3, S3). The accuracies of the
various models are determined by splitting the input dataset into training, validation, and test
sets, with training and validation performed from historical data, while testing is performed
with experimental data. The RMSE and inference times are reported in Table S3.
a
b
c
d
e
f
Fig. 3: (top row) Comparison of three regression methods (a-c: linear, polynomial (order-2), and gradient boosting
(with decision trees) regressions.) The literature data is split into 68 training data points and 7 validation data
points. Test datapoints are 42 experimental data points produced in the lab. The results were compared using the
root-mean-squared error. We observe that GBR achieves the best generalization. (bottom row) Final GBR model
performance on 3 different random train-test splits of the combined dataset.
Linear and polynomial regressions, while significantly faster than the others,
performed poorly when compared to SVR, NN and GBR. Of the latter three, GBR was the more
accurate “out of the box.” Moreover, it possesses fast inference speed, which is essential for
efficient exploration of the parameter space in inverse design. We increased the predictive
accuracy by tuning via a cross-validation grid search on hyper-parameters. We used both
historic and experimental data, with a test set of 10%, to validate our choice of hyperparameters with the test error on 3 randomly split training and test sets (Fig. 3). We now
observe improved performance with an increased dataset and thorough tuning.
This algorithm is shown to generalize well across the variation in polymer dataset of
varying polydispersity. The historical datasets had narrow polydispersity indices with the
assumption of symmetrical MWDs, while the synthesized polymers had broad and
unsymmetrical MWD. The robustness of this algorithm in handling “noisy” data renders this
far more powerful than a simple algorithm which only works for the highest quality of data.
With a sufficiently accurate model, we finally retrain (using the tuned hyper-parameters) on
the entire dataset to produce a finalized forward model that we use for subsequent inverse
design.
The feature importance ranking (Fig. S4) indicates that “units of A” and “molecular
mass” are the two most important features defining cloud point. We note that these insights,
are not trivially derived from Figure 2, which indicates similarly strong dependences of
variables a–c on cloud point. Also, the molecular mass correlating most strongly with cloud
point is the mode, not the median or mean (Fig. S2), which we speculate could indicate a
critical threshold, e.g., of polymers with molecular mass above a certain concentration
necessary to induce globule formation.
While a forward predictive models in machine learning approaches for materials
science are fairly common, inverse design is far more challenging. This is because the
descriptors, which are usually high dimensional, are difficult to predict from outputs which
are low dimensional. In the case of our polymer dataset, the output of cloud point is a single
number, attributed to the 5 numbers representing molecular mass and composition of the
polymer.
Inverse design would provide the ability to design polymers based on a desired final
property and accelerate the synthesis process of target polymers based on design constraints
to meet desired cloud points. To further realize new material discovery, we propose to
extrapolate from our training dataset by designing terpolymers, which are non-existent in our
training set, and limiting EtOx composition which is common.
Typically, inverse optimization on piece-wise constant functions provides a large
number of different predicted designs. These may achieve our optimization and constraint
target according to the fitted GBR model. However, the quality of these designs vary,
particularly in the case of extrapolation. Validating all of these experimentally would be
inefficient and so a filtering method with an ensemble of M three-layer fully connected neural
networks (NN) was employed to select the most promising design candidates for
experimental validation. Each NN’s trainable parameters are initialized with distinct, random
values, resulting in different fitted predictors {𝑓#$ , … , 𝑓#' }, due to the non-convex nature of the
objective function and random initialization. For each design 𝑥, we then compared the
ensemble of NN-predicted cloud points {𝑓#$ (𝑥), … , 𝑓#' (𝑥)} with the GBR prediction 𝑓#(𝑥) and
$
#
only experimentally validated designs where 𝑓#(𝑥 ) ≈ ' ∑'
./$ 𝑓. (𝑥) (NN predictions agree
with GBR) and 𝑉𝑎𝑟{𝑓#$ (𝑥), … , 𝑓#' (𝑥)} was small. This ensures that 𝑥 is predicted with high
confidence and not an ad-hoc extrapolation. Fig. 4 illustrates the principle of this approach.
Although the NNs are also good approximators for the cloud point, they were not used as the
forward model for producing inverse design candidates because the feed-forward step of the
NN ensemble is still too slow compared with GBR, which consists of simple summing of
piecewise constant functions.
a
b
c
d
Fig. 4: (a) Framework of the selection criteria, where the dataset is used to train GBR and NN ensemble, PSO
predicts polymer design(x*) a desired CP(y*), and the design is verified for accuracy by the NN ensemble where
CP agreement is a down-selection criteria (b) Illustration of the validity of the filtering procedure. We observe
that given limited training data, not all extrapolated points are valid. However, when an ensemble of neural
networks trained with distinct initializations agree on a certain input, then we have a much greater confidence in
the validation of their predictions. (c) Final PSO based inverse design performance, showing an RMSE of 3.9 ˚C;
(d) Forward model (NN ensemble) performance of the polymers synthesized from design
Conclusion
Using this technique, we down-selected 17 polymers over our 4 desired cloud-points
(37, 45, 60, 80 ˚C), and imposed design criteria weighted on minimizing EtOx and designing
polymers with more than two components – unseen in the training data. These polymers
were synthesized, although an average of 3 iterations were required to achieve the target
mass and composition of the designs, owing to the difficulties with terpolymer synthesis,
where the Mayo-Lewis equation does not apply in calculating required feed ratio of monomer
for desired final copolymer composition. The mass and composition of the synthesized
polymers are reported in Table S4, showing minimal deviation from algorithmic design, along
with their cloud points (an average of 3 measurements). The RMSE of the obtained cloud
points was 3.9 ˚C, however when the polymer structure of the new polymers is fed back into
the NN ensemble, a larger RMSE is observed (6.1 ˚C) (Fig. 4). Deviation from the target cloud
points was within test RMSE between 37-60 ˚C but above it at 80 ˚C, and can be attributed to
sparseness of the data set at higher temperatures (Fig. 2F) – an in-depth analysis is provided
in supplementary materials. These results show that our combination of slow and fast
algorithms are able to design polymers with unique compositions with control over the
desired physical property and structural design.
Overall, a significant conceptual advance in polymer inverse design has been achieved
via judicious application of machine learning methods. This was done in three steps. First, we
curated and categorized historical and new data. Second, we selected and fine-tuned a
machine learning model based on gradient boosting regression with decision trees, resulting
in a cloud point predictive accuracy of 3.9 ˚C (RMSE). The model was able to generalize well
with both well-defined historic datasets as well as newly synthesized polymers of
unsymmetrical MWDs. Third, inverse design by particle swarm optimization which predicted
the design of new polymers based on desired cloud points (37, 45, 60, 80 ˚C). Extrapolation
beyond the training set was achieved via an ensemble of neural-networks as a crossvalidation technique to down-select 17 polymers with the lowest variance across predictions.
The RMSE of predicted polymers were similar to those of the forward model. This
methodology offers unprecedented control of polymer design, which may significantly
accelerate the development of polymers with other physical properties.
References
1.
Garcia SJ. Effect of polymer architecture on the intrinsic self-healing character of
polymers. Eur Polym J 2014, 53: 118-125.
2.
Rinkenauer AC, Schubert S, Traeger A, Schubert US. The influence of polymer
architecture on in vitro pDNA transfection. J Mater Chem B 2015, 3(38): 7477-7493.
3.
Paramelle D, Gorelik S, Liu Y, Kumar J. Photothermally responsive gold nanoparticle
conjugated polymer-grafted porous hollow silica nanocapsules. Chem Commun 2016,
52(64): 9897-9900.
4.
Mannodi-Kanakkithodi A, Pilania G, Huan TD, Lookman T, Ramprasad R. Machine
Learning Strategy for Accelerated Design of Polymer Dielectrics. Sci Rep 2016, 6:
20952.
5.
Wei JN, Duvenaud D, Aspuru-Guzik A. Neural Networks for the Prediction of
Organic Chemistry Reactions. ACS Cent Sci 2016, 2(10): 725-732.
6.
Gómez-Bombarelli R, Wei JN, Duvenaud D, Hernández-Lobato JM, SánchezLengeling B, Sheberla D, et al. Automatic Chemical Design Using a Data-Driven
Continuous Representation of Molecules. ACS Cent Sci 2018, 4(2): 268-276.
7.
Sanchez-Lengeling B, Roch LM, Perea JD, Langner S, Brabec CJ, Aspuru-Guzik A.
A Bayesian Approach to Predict Solubility Parameters. Adv Theory Simul; 2018. p.
doi:10.1002/adts.201800069.
8.
Ye W, Chen C, Wang Z, Chu I-H, Ong SP. Deep neural networks for accurate
predictions of crystal stability. Nat Commun 2018, 9(1): 3800.
9.
Gómez-Bombarelli R, Aguilera-Iparraguirre J, Hirzel TD, Duvenaud D, Maclaurin D,
Blood-Forsythe MA, et al. Design of efficient molecular organic light-emitting diodes
by a high-throughput virtual screening and experimental approach. Nat Mater 2016,
15: 1120.
10.
Brandt RE, Kurchin RC, Steinmann V, Kitchaev D, Roat C, Levcenco S, et al. Rapid
Photovoltaic Device Characterization through Bayesian Parameter Estimation. Joule
2017, 1(4): 843-856.
11.
Raccuglia P, Elbert KC, Adler PDF, Falk C, Wenny MB, Mollo A, et al. Machinelearning-assisted materials discovery using failed experiments. Nature 2016, 533: 73.
12.
Jiang R, Jin Q, Li B, Ding D, Shi A-C. Phase Diagram of Poly(ethylene oxide) and
Poly(propylene oxide) Triblock Copolymers in Aqueous Solutions. Macromolecules
2006, 39(17): 5891-5896.
13.
Ashbaugh HS, Paulaitis ME. Monomer Hydrophobicity as a Mechanism for the LCST
Behavior of Poly(ethylene oxide) in Water. Ind Eng Chem Res 2006, 45(16): 55315537.
14.
Aseyev V, Tenhu H, Winnik FM. Non-ionic Thermoresponsive Polymers in Water.
In: Müller AHE, Borisov O (eds). Self Organized Nanostructures of Amphiphilic
Block Copolymers II. Springer Berlin Heidelberg: Berlin, Heidelberg, 2011, pp 29-89.
15.
Hoogenboom R, Thijs HML, Jochems MJHC, van Lankvelt BM, Fijten MWM,
Schubert US. Tuning the LCST of poly(2-oxazoline)s by varying composition and
molecular weight: alternatives to poly(N-isopropylacrylamide)? Chem Commun
2008(44): 5758-5760.
16.
Huan TD, Mannodi-Kanakkithodi A, Kim C, Sharma V, Pilania G, Ramprasad R. A
polymer dataset for accelerated property prediction and design. Sci Data 2016, 3:
160012.
17.
Mannodi-Kanakkithodi A, Chandrasekaran A, Kim C, Huan TD, Pilania G, Botu V,
et al. Scoping the polymer genome: A roadmap for rational polymer dielectrics design
and beyond. Mater Today 2018, 21(7): 785-796.
18.
Kim C, Chandrasekaran A, Huan TD, Das D, Ramprasad R. Polymer Genome: A
Data-Powered Polymer Informatics Platform for Property Predictions. J Phys Chem C
2018, 122(31): 17575-17585.
19.
Kutzner C, Páll S, Fechner M, Esztermann A, de Groot BL, Grubmüller H. Best bang
for your buck: GPU nodes for GROMACS biomolecular simulations. 2015, 36(26):
1990-2008.
20.
Dünweg B, Kremer K. Molecular dynamics simulation of a polymer chain in solution.
1993, 99(9): 6983-6997.
21.
Stuart MAC, Huck WTS, Genzer J, Muller M, Ober C, Stamm M, et al. Emerging
applications of stimuli-responsive polymer materials. Nat Mater 2010, 9(2): 101-113.
22.
Halperin A, Kröger M, Winnik FM. Poly(N-isopropylacrylamide) Phase Diagrams:
Fifty Years of Research. Angew Chem Int Ed 2015, 54(51): 15342-15367.
23.
Friedman JH. Greedy Function Approximation: A Gradient Boosting Machine. Ann
Stat 2001, 29(5): 1189-1232.
24.
Contreras MM, Mattea C, Rueda JC, Stapf S, Bajd F. Synthesis and characterization
of block copolymers from 2-oxazolines. Des Monomers Polym 2015, 18(2): 170-179.
25.
Glassner M, Lava K, de la Rosa VR, Hoogenboom R. Tuning the LCST of poly(2cyclopropyl-2-oxazoline) via gradient copolymerization with 2-ethyl-2-oxazoline. J
Polym Sci A 2014, 52(21): 3118-3122.
26.
Diab C, Akiyama Y, Kataoka K, Winnik FM. Microcalorimetric Study of the
Temperature-Induced Phase Separation in Aqueous Solutions of Poly(2-isopropyl-2oxazolines). Macromolecules 2004, 37(7): 2556-2562.
27.
Park J-S, Akiyama Y, Winnik FM, Kataoka K. Versatile Synthesis of EndFunctionalized Thermosensitive Poly(2-isopropyl-2-oxazolines). Macromolecules
2004, 37(18): 6786-6792.
28.
Park J-S, Kataoka K. Precise Control of Lower Critical Solution Temperature of
Thermosensitive Poly(2-isopropyl-2-oxazoline) via Gradient Copolymerization with
2-Ethyl-2-oxazoline as a Hydrophilic Comonomer. Macromolecules 2006, 39(19):
6622-6630.
29.
Park J-S, Kataoka K. Comprehensive and Accurate Control of Thermosensitivity of
Poly(2-alkyl-2-oxazoline)s via Well-Defined Gradient or Random Copolymerization.
Macromolecules 2007, 40(10): 3599-3609.
30.
Zhang Q, Weber C, Schubert US, Hoogenboom R. Thermoresponsive polymers with
lower critical solution temperature: from fundamental aspects and measuring
techniques to recommended turbidimetry conditions. Mater Horizons 2017, 4(2): 109116.
31.
Cortes C, Vapnik V. Support-Vector Networks. Machin Learn 1995, 20(3): 273-297.
32.
Rokach L, Maimon O. Data Mining With Decision Trees: Theory and Applications.
World Scientific Publishing Co., Inc., 2014.
33.
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015, 521: 436.
Details of our code implementation and dataset can be found in our repository.
(https://github.com/LiQianxiao/CloudPoint-MachineLearning)
All other information is in the ESI.
Acknowledgements: J.N.K., Q.L., T.B. are support by the AME Programmatic Fund by the
Agency for Science, Technology and Research under Grant No. A1898b0043
Supplementary Materials:
Materials and Methods
Fig. S1-S5
Tables S1-S4
References (34-45)
Supplementary Materials for
Machine learning enables polymer cloud-point engineering via inverse design
Jatin N. Kumar, Qianxiao Li, Karen Y.T. Tang, Tonio Buonassisi, Anibal L. Gonzalez-Oyarce, Jun
Ye,
Correspondence to: kumarjn@imre.a-star.edu.sg
This file includes:
Materials and Methods
Supplementary Text
Figs. S1 to S5
Tables S1 to S4
Materials and Methods
Materials
2-n-propyl-2-oxazoline (nPropOx),(34) 2-cyclopropyl-2-oxazoline (cPropOx),(35) 2-isopropyl2-oxazoline (iPropOx)(36) were synthesized as described in literature, and distilled over
calcium hydride and stored with molecular sieves (size 5 Å) in a glovebox. 2-ethyl-2-oxazoline
(EtOx, Sigma-Aldrich) was distilled over calcium hydride and stored with molecular sieves (size
5 Å) in glovebox. All other reagents were used as supplied unless otherwise stated.
Analytical Methods
Nuclear magnetic resonance (NMR). The compositions of the polymers were determined
using 1H NMR spectroscopy. 1H NMR spectra were on JEOL 500MHz NMR system (JMNECA500IIFT) in CDCl3. The residual protonated solvent signals were used as reference.
Size exclusion chromatography (SEC). Gel permeation chromatography (GPC) measurements
were performed in THF (flowrate: 1 mL/min) on a Viscotek GPC Max module equipped with
Phenogel columns (10-3 and 10-5 Å) (size: 300 x 7.80 mm) in series heated to 40 ºC. The
average molecular weights and polydispersities were determined with a Viscotek TDA 305
detector calibrated with poly(methyl methacrylate) standards.
Dynamic Light Scattering (DLS). Measurements at various temperatures were conducted
using a Malvern Instruments Zetasizer Nano ZS instrument equipped with a 4 mV He–Ne laser
operating at l = 633 nm, an avalanche photodiode detector with high quantum efficiency, and
an ALV/LSE-5003 multiple tau digital correlator electronics system. on Malvern Nano ZS.
Solutions of polymers (5 mg/mL) were prepared by dissolving polymer in deionized water at
room temperature. The solutions were then heated to 100 °C and cooled down to remove
thermal memory, before measurements were taken.
Experimental Methods
For all polymerizations, the polymerization mixture was prepared in vials that were dried in
100 °C oven overnight before use, and crimped air-tight in a glove box. The mixture contained
the monomers (EtOx, nPropOx, cPropOx, iPropOx) of desired ratios, with a total monomer
concentration of 4 M, anhydrous acetonitrile (ACN) and methyl tosylate (MeOTs) as initiator.
The amount of methyl tosylate added was determined by the various [M]/[I] ratios.
Temperature controlled polymerizations were performed in sealed vials in a microwave
reactor equipped with IR temperature sensor at 140 °C for different length of time. The
mixture was then cooled to ambient temperature and quenched by addition of
tetramethylammonium hydroxide (2.5wt% in methanol, 2 equivalence relative to initiator).
The solutions were concentrated by removing some of the solvent under reduced pressure,
then precipitated in cold diethyl ether. The product was collected and dried under reduced
pressure overnight. All polymers were redissolved in THF for SEC, CDCl3 for 1H NMR and
deionized water for DLS. 1H NMR of P((EtOx)w(nPropOx)x(cPropOx)y(iPropOx)z) (500 MHz,
CDCl3, δ, ppm): 0.8 (d, 66.5 Hz, 4y H, CHCH2 CH2), 0.96 (s, 3x H, CH2CH2CH3), 1.11 (s, 6z H,
CHCH3CH3), 1.12 (s, 3w H, CH2CH3), 1.64 (s, 2x H, CH2CH2CH3) 2.30 (d, 56.5 Hz, 2x H,
NCOCH2CH2CH3), 2.38 (s, 2w H, NCOCH2CH3), 2.70 (d, 61.0 Hz, y H, CHCH2CH2), 2.80 (d, 123.5
Hz, z H, CHCH3CH3), 3.49 (s, 2(w+x+y+z) H, CH2 backbone). Whereby w, x, y and z is the mole
ratio of EtOx, nPropOx, cPropOx and iPropOx respectively.
Supplementary Text
Curation and synthesis of polymer library
To augment the historical dataset reported in Table S1, (15,24-29) a series of poly(2oxazolines) were synthesized by cationic ring-opening polymerization in a microwave reactor
at 140 °C and terminated with tetramethyl-ammonium hydroxide at the end of the reaction.
All copolymers were synthesized with EtOx and one of the propyl oxazolines and variations in
feed ratio were performed. SEC results are reported for all synthesized polymers in Table S2.
DLS measurements were performed in triplicate by preparing solutions of polymers at
a concentration of 5 mg/mL in deionized water. The solutions were then heated to 100 °C and
cooled down before measurements were taken to negate effect of thermal history. DLS
measurements of the polymer solutions were performed over a temperature sweep between
20 to 90 ˚C. The cloud point temperature for the synthesized polymers (Table S2) was
determined as the temperature at which the dissolved polymer chains of small hydrodynamic
diameter agglomerate to form large particles or mesoglobules, as demonstrated in Fig. S1 for
poly(nPropOx-co-EtOx) copolymers with a compositional variation at 20% increments.
The PDIs obtained experimentally are much higher than the PDIs from the historical
data. It can be assumed that the molecular weight distributions (MWD) for the historical data,
where the PDI is lower than 1.4, are typically symmetrical. Conversely, the MWD of the
polymers made experimentally had a long low-molecular weight tail (Figure S2). In the case
of cationic ring opening polymerization, this long tail can be attributed to impurities such as
water which terminate actively propagating chains. Due to the unsymmetrical MWD, the
number average molecular weight (Mn), is no longer a proper representation of the MWD,
particularly when comparing the dataset to historical data with polymers of narrow
polydispersities.
Zhang et al.(30) propose that DLS is one of the better methods to characterize cloud
points. They note that the intensity of scattered light due to a sharp change in refractive index
is influenced by the chains that are dehydrating and thereby changing morphology from coil
to globules. In contrast, only a minor difference in refractive index is observed from the
hydrated chains. For broad or unsymmetric MWDs such as with our polymers, we postulate
that the cloud point by DLS of the modal polymer molecular weight would represent the
polymer as a whole.
To validate this theory, a polymer was selected at random, and dialyzed against water
to remove some of the low molecular weight tail. Comparisons of the MWD before and after
dialysis (Fig. S2) show the removal of the low molecular weight tail, and the narrowing of the
MWD. However, DLS results (Fig. S2-inset) show no change to the cloud point of the polymer.
Thus, to better represent the polymer dataset, the modal molecular weight, or peak
molecular weight (Mp) was used to represent the molecular weight of the polymers from
Table S2.
Machine-learning methodology
Establishing a Machine Learning Baseline
It is often useful to establish a baseline for statistical methods on the currently
available data before further data collection and algorithm exploration. In this section, we
outline the development of our basic data driven approach which are broadly classified as
statistical models (e.g., multivariate analysis(37) and Bayesian inference(38)) and machine
learning models (e.g., support vector machines,(31) decision tree learning,(32) and deep
neural networks(33)). The former perform well on relatively small datasets, but require nontrivial domain information such as statistical priors and a forward mathematical model, which
may not always be available and can thus limit their applicability. On the other hand, machine
learning models lend their applicability to datasets where the underlying physical
mechanisms are unclear, or when the dataset has noise corruption.(39) While machine
learning typically requires large datasets and cannot infer underlying physical relationships,
its accuracy and fast inference speed makes it suitable for inverse design via global
optimization.
In this work, we recall that we wish to predict the cloud point (𝑦 ∈ 𝑅) based on the
polymer composition and other properties (𝑥 ∈ 𝑅7 ). We assume that there is some
relationship 𝑦 = 𝑓(𝑥) for some unknown function 𝑓. Hence, our goal is to parameterize and
fit an approximator 𝑓# of 𝑓. The literature dataset is split into 68 training samples and 7 test
samples, and we evaluate a total of five methods for fitting: 1) Linear regression; 2)
Polynomial regression of degree up to 2; 3) Support vector regression; 4) Neural network
regressor (2 hidden layers) 3) Gradient boosting regression with decision trees (GBR).(39)
Below, we sketch the basic idea of the GBR method, which is the final choice of our forward
model for inverse design and refer the reader to the text authored by Hastie, Tibshirani and
Friedman(40) for more details.
A Sketch of Gradient Boosting Regression
GBR makes use of the idea of “boosting”, which is a class of sequential ensembling
methods, where weak regressors (regression models with low capacity or approximation
power) are iteratively combined to form a strong regressor. The basic idea is as follows: fix a
space of weak approximators 𝐻 (e.g. decision trees) and start with a constant function 𝑓: . For
each 𝑘 ≥ 0, we set
𝑓>?$ = 𝑓> + argminG∈H 𝐿(ℎ, 𝑓 − 𝑓> )
(1)
where the loss function 𝐿 measures the “distance” between its arguments. In other words, at
each step we fit some function to approximate the current residual error 𝑓 − 𝑓> , and this
successively improves the approximation. Of course, in practice the minimization step in (1)
may be hard to evaluate, hence one can use “gradient boosting”, where ℎ is not chosen as a
true minimizer, but a function in the “steepest descent direction” of the loss function with
respect to ℎ. Detailed exposition on gradient boosting can be found int he previously
mentioned text.(40)
The results of the comparisons are shown in Figure 3 and S3, where we measure the
root-mean-squared error on training, validation and test sets, the latter of which is the
quantity to be used to discriminate model performances. The RMSE and the inference time is
reported in Table S3. Note that while the training and validation sets are random splits in of
the literature data, the test set are sample points obtained in our experiments. Thus, a model
that performs well on the tests set indicates that it has the ability to fuse both literature data
and our experimental data to form a more robust model. From our results, we observe that
linear regression and polynomial regression, while having fast inference speeds, perform
poorly in terms of test error. Moreover, polynomial regression suffers from the “curse of
dimensionality” when higher order polynomials are included, since the number of terms
increases exponentially with increasing maximum degree.
While all of the more sophisticated machine learning methods perform significantly
better, the most outstanding is GBR method performs the best when weighing both in RMSE
and inference time, even with minimal tuning. The inference time is important since we will
need to repeated call this forward model in our inverse design process, and a faster inference
time greatly enhances our exploration of the design space. Moreover, GBR (with decision
trees as base regressors) give us a measure of feature importance using the Gini impurity.(40)
In the present application, this gives us an estimation of the sensitivity of our cloud-point
model on the polymer properties, seen in Fig. S4.
To optimize the GBR for inverse design, hyperparameter tuning was further conducted
to bring the RMSE down to 3.9 ˚C. Details of which are presented in our data repository. With
a tuned model, we look towards inverse design in order to predict polymer structure from
desired cloud points.
Inverse Design via Particle Swarm Optimization
Our data-driven approximation 𝑓# of the forward relationship between the polymer
properties and the cloud point was demonstrated previously to be close to the true function
𝑓. In this section, we consider the problem of inverse design, where we want to find a polymer
configuration 𝑥 that achieves certain targets (e.g. cloud point, desired proportions), while
respecting certain constraints (e.g. molecular weight). Mathematically, this can be posed as a
constrained optimization problem
min 𝐹(𝑥, 𝑓#(𝑥)) subject to 𝐺 W𝑥, 𝑓# (𝑥)X ≥ 0.
L
(2)
Where 𝐹: 𝑅7 × 𝑅 → 𝑅 is the objective function and 𝐺: 𝑅7 × 𝑅 → 𝑅] is the vector-valued
constraint function.
The problem (2) is posed as a global optimization problem. In general, there are many
heuristic methods for solving it, including simulated annealing,(41) genetic algorithms,(42)
differential evolution,(43)etc. In this paper, we employ the particle swarm optimization (PSO)
algorithm.(44) It is especially suited for our use-case since 𝑓# is a boosted regression tree,
which is a piece-wise constant function with almost everywhere vanishing derivatives,
rendering gradient-based algorithms ineffective.
For the current application, we consider the following instance of objective and constraints:
Objective: Consider a mean-squared loss function that penalizes deviation from a target cloud
point 𝑦^ plus a regularization term that promotes certain desired design patterns
1
c
𝐹 W𝑥, 𝑓#(𝑥 )X = a𝑓#(𝑥 ) − 𝑦^ b + 𝑅(𝑥).
2
For the present application, we set 𝑅 so as to promote ternary and simpler designs (at most
three non-zero components), as well as minimizing the units of A (EtOx). By writing 𝑥 =
(𝑥d , 𝑥e , 𝑥f , 𝑥g , 𝑥h , 𝑥' ), we have
𝑅 (𝑥 ) = 𝜆$
j
(𝑥$ 𝑥c 𝑥k )$/k + 𝜆c 𝑥d
mn ,mo ,mp ∈(mq ,…,mr )
mn smo smp
where 𝜆$ , 𝜆c > 0 are regularization parameters. Note that there exist well-defined minima
since we also require all components of 𝑥 to be non-negative.
Constraints: First, we employ the element-wise bounds
(0,0,0,0,0,0) ≤ 𝑥 ≤ (203, 187, 43, 96, 0, 23196)
These bounds were selected based on the limits of the training data, and ease of synthesis.
Next, since M is the product of the degree of polymerization of the polymer and the
molecular weight of its monomer units, we make sure that the designed M values are
consistent (within 10%) with the designed compositions, i.e.
where
0.8 ∗ 𝑀(𝑥 ) ≤ 𝑥' ≤ 1.2𝑀(𝑥)
𝑀(𝑥 ) = 99.13𝑥d + 113.16𝑥e , 111.14𝑥f + 113.16𝑥g
Finally, to simplify the experimental process we require the maximal number of
monomer units to be at most 10 times of the minimal non-zero monomer unit, i.e.
max 𝑥. ≤ 10 min 𝑥.
./d,…,h
./d,…,h
m• €:
Our polymer design predictions were also given constraints based on our own
requirements. For the purpose of this study, we chose to minimize the amount of EtOx in the
polymer designs, especially since our training data was heavily populated with polymers
containing EtOx. Thus, we ran four sets of predictions, in decreasing order of preference,
where: (1) the algorithm limited EtOx to zero aggressively; (2) the algorithm limited EtOx to
zero less aggressively; (3) the algorithm limited EtOx to under 100 units; (4) the algorithm did
not limit EtOx. One of the other design parameters that we considered, was to have more
than 2 components in the polymer design – a feature that was not present in our training set,
nor is it commonplace when designing polymers for a desired physical property due to the
expansion into a multivariable parameter space.
Selection Criteria
Besides obvious selection criteria such as picking designs with predicted cloud points
close to the target cloud point, we developed more sophisticated selection procedures. As
typical in inverse optimization on piece-wise constant functions, depending on the random
initialization and the randomness of the PSO algorithm, we may arrive at a large number of
different predicted designs that achieves, according to the fitted GBR model, our optimization
and constraint targets. However, the quality of these designs varies (especially when
extrapolating from our training data) and testing all of them would be inefficient. Thus, we
employ a filtering method to select the most promising design candidates for experimental
validation. Concretely, we train an ensemble of 𝑀 three-layer, fully connected neural
networks (NN)(45) with sigmoid activations and mean-square loss on our full training set to
predict cloud points based on polymer properties. Each NN’s trainable parameters are
initialized with distinct, random values. Due to the non-convex nature of the objective
function and random initialization, with high probability each neural network will give rise to
a different fitted predictor {𝑓#$ , … , 𝑓#' }. For each design 𝑥, we then compare the ensemble of
NN-predicted cloud points {𝑓#$ (𝑥), … , 𝑓#' (𝑥)} with the GBR prediction 𝑓#(𝑥). We only choose
$
#
to experimentally validate designs where 𝑓#(𝑥 ) ≈ ' ∑'
./$ 𝑓. (𝑥) (NN predictions agree with
GBR) and 𝑉𝑎𝑟{𝑓#$ (𝑥), … , 𝑓#' (𝑥)} is small. This ensures that 𝑥 is predicted with high confidence
and not an ad-hoc extrapolation. Figure 4a and b summarize and illustrate the principle of
this approach. Note that although the NNs are also good approximators for the cloud point,
we do not use NNs as the forward model for producing inverse design candidates because
the feed-forward step of the NN ensemble is still too slow compared with GBR, which consists
of simple summing of piecewise constant functions.
Machine-Learning Validation
The inverse design generated a list of possible polymer mass and target compositions,
following the 4 constraint parameters above, and are reported in their entirety in our code
repository. The neural network on the trained dataset was used to predict a cloud point based
on the particle swarm prediction of size and composition, and all predictions with the smallest
difference between NN and GBR predictions and having low variance in the NN predictions
were down-selected. From this, further down-selection was performed to choose 4 polymer
designs per temperature with higher preference given to a more aggressive minimization of
EtOx. The final choice of polymers is summarized in Table S4. It can be noted that almost all
of the polymers were designed to have 3 components, with the exception of the 80 ˚C cloud
point polymers.
Also in Table S4 is the cloud point, composition and size of the polymers synthesized
experimentally. The RMSE of the experimental results against their NN prediction was found
to be 3.9 ˚C, which is the same the RMSE for the optimized GBR (Figure 4C).There is some
deviation from the exact design due to experimental error, and when the obtained
compositional and mass data was fed back into the NN for a forward predictive verification
of the cloud points, a higher RMSE of 6.1 ˚C was seen (Figure 4D).
However, the results conclusively show that the inverse design algorithm is able to
design polymers with unique compositions with a great deal of accuracy based on desired
cloud points, especially when the cloud point range is well trained. The algorithm was robust
enough to handle large variation in polymer quality as discussed earlier. Moreover, the
algorithmic methodology allows us to vary our configuration for the inverse design, which
would provide access a vast array of polymer design with a potential towards experiment
automation. Lastly, the general nature of this algorithm could allow us to work with other
similar polymer datasets, thereby accelerating the development of polymers in the future.
Fig. S1.
Temperature dependent DLS measurements for poly(nPropOx-co-EtOx) at various
compositional ratios demonstrating the cloud point dependence on polymer composition.
Mp = 17,400 Da
0.8
Before Dialysis
After Dialysis
Number Mean (d.nm)
4000
Mass Fraction
0.6
0.4
3000
2000
1000
0
48
0.2
49
50
o
T ( C)
0.0
3.0
3.5
4.0
4.5
5.0
5.5
log M
Fig. S2.
Gel permeation chromatogram and temperature dependent DLS data of poly(nPropOx-coEtOx) (sample numbers 38 & 39, Table S2) before and after dialysis showing a narrowing of
the molecular weight distribution, with no change in cloud point
Fig. S3.
Comparison of two regression methods (support vector regression (SVR) and neural network
regression (NN)). This serves as a basis of comparison to the other regressions shown in Figure
3a-c. The literature data is split into 68 training data points and 7 validation data points. Test
datapoints are 42 experimental data points produced in the lab. The results were compared
using the root-mean-squared error.
Fig. S4.
Feature importance via Gini impurity. Average values with standard deviation as error bars
are plotted for each feature over 100 training-validation (90%-10%) splits
Fig. S5.
The fit of the NN ensemble model on experimentally obtained designs. The 80 ˚C designs are
plotted separately from the other designs to show the main source of the deviation.
Table S1.
A list of historical data showing the degree of polymerization of EtOx (A), nPropOx (B),
cPropOx (C), iPropOx (D), esterOx (E), polymer type (1 for homopolymer, 2 for statistical
copolymer, 3 for gradient copolymer, 4 for block copolymer), molecular weight (M) in Da,
polydispersity index (PDI), and cloud point in ˚C.
No
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Units
of A
10
20
30
50
100
150
200
300
500
0
0
0
0
0
0
0
0
50
45
40
35
30
25
20
15
10
5
0
100
90
80
70
60
50
40
30
20
10
Units
of B
0
0
0
0
0
0
0
0
0
15
20
25
50
100
150
200
300
0
5
10
15
20
25
30
35
40
45
50
0
10
20
30
40
50
60
70
80
90
Units
of C
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Units
of D
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Units
of E
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Type
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
1
1
2
2
2
2
2
2
2
2
2
M
1300
2000
2600
3800
6700
9000
13300
21000
37300
3100
3700
4300
6200
8140
12300
15500
18000
3300
3500
3500
3700
3700
4000
5400
3900
3800
4000
4200
15300
15200
13600
12600
13000
10700
10200
9700
9600
7800
PDI
1.09
1.08
1.09
1.09
1.15
1.15
1.25
1.33
1.6
1.1
1.11
1.14
1.28
1.4
1.3
1.43
1.46
1.14
1.15
1.36
1.36
1.34
1.36
1.35
1.35
1.34
1.34
1.32
1.21
1.22
1.21
1.26
1.25
1.28
1.37
1.36
1.37
1.48
Cloud
Point
90.6
85.3
78.3
73.5
69.3
42.9
39
37.5
30.3
29.6
25.5
24.1
22.5
82
72.2
59.8
51.3
45.8
40
34.2
29.6
94.1
81.6
75.5
64.8
55.9
51.1
44.2
40
34.8
31.2
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
0
150
135
120
105
90
75
60
45
30
15
0
0
11
20
31
50
69
78
100
100
22
48
73
0
0
0
0
0
0
0
0
0
0
34
59
96
94
81
0
0
0
0
0
0
0
100
0
15
30
45
60
75
90
105
120
135
150
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
106
94
59
36
0
106
84
58
40
8
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
100
89
80
69
50
31
22
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
78
52
27
100
40
40
40
0
24
50
73
86
0
0
0
0
0
0
17
21
41
50
38
69
86
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
10
20
40
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
2
2
2
2
2
2
2
2
2
1
3
3
3
3
3
3
3
1
1
3
3
3
1
4
4
4
1
3
3
3
1
1
3
3
3
3
1
1
1
1
1
1
1
1
8140
17700
17000
17700
17200
17900
17600
18600
17900
18500
18600
17200
19700
20500
18000
14600
13400
15400
14100
14000
8000
9300
9300
9300
9700
6098
7670
10813
12000
13300
12300
12300
9700
12000
12900
12400
14000
10200
8000
1900
2400
4600
5650
4300
7800
9700
1.4
1.47
1.44
1.4
1.49
1.38
1.37
1.35
1.33
1.34
1.35
1.45
1.18
1.16
1.16
1.11
1.14
1.11
1.1
1.19
1.02
1.02
1.02
1.02
1.02
1.04
1.03
1.02
1.02
1.02
1.04
1.02
1.02
1.05
1.04
1.02
1.05
1.04
1.03
1.03
1.03
1.02
1.02
28
83.9
71.5
63.1
53.7
49.2
42.6
37.3
34.1
29.1
24.5
24.1
28
33
39
46
57
72
79
91
67.3
55.2
46
38.7
44.7
47.7
47.4
23.8
26.3
30.1
33.8
38.7
23.8
36.3
41.8
50.6
75.1
72.5
62.8
51.3
48.1
43.7
38.7
37.3
Table S2.
A list of data for synthesized polymers showing the degree of polymerization of EtOx (A),
nPropOx (B), cPropOx (C), iPropOx (D), esterOx (E), polymer type (1 for homopolymer, 2 for
statistical copolymer, 3 for gradient copolymer, 4 for block copolymer), molecular weight (M)
in Da, polydispersity index (PDI), and cloud point in ˚C.
No
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Units
of A
47
73
26
136
13
69
208
15
23
181
91
129
171
10
75
435
1166
388
954
208
222
130
129
72
55
57
116
141
90
49
76
84
100
58
85
50
43
63
Units
of B
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
15
20
7
15
16
29
35
40
22
57
56
69
59
82
79
67
99
Units
of C
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Units
of D
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Units
of E
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Type
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
M (Da)
4626
7274
2624
13458
1294
6853
20642
1448
2315
17897
9020
12787
16917
976
7407
43082
115564
38506
94602
20654
21991
14534
15033
7953
7213
7513
14801
17920
13442
7430
14044
14681
17784
12421
17695
13956
11895
17431
PDI
1.766
2.072
1.747
2.94
1.194
2.546
1.932
1.228
1.763
2.341
2.318
2.294
2.542
1.193
2.673
2.656
2.54
2.426
3.017
2.283
3.492
2.011
2.294
2.074
2.368
1.813
2.612
1.719
2.089
2.191
2.017
1.933
2.167
2.197
2.283
2.19
1.732
2.232
CP
81.8
87.5
88.5
88.5
86.3
60.8
60.8
61.5
61.5
65.5
65.5
74.5
71.3
75.0
66
66.3
64.8
63
56.8
57.5
49.3
49
49.3
44.5
43.5
39.5
39
38.5
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
63
38
50
12
13
34
37
12
14
0
0
0
0
198
68
119
49
63
24
19
4
4
0
178
70
151
49
130
40
78
14
29
0
0
108
95
29
170
109
181
142
213
291
176
224
269
99
93
128
52
55
137
146
114
140
71
76
163
138
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
199
190
0
113
130
88
0
0
10
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
19
14
47
24
79
37
42
17
38
37
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
66
26
0
53
31
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
20
11
50
21
87
44
128
35
175
83
219
0
0
188
0
0
0
0
0
0
0
0
5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
2
2
2
2
2
2
2
2
1
1
1
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
1
2
2
2
2
2
2
2
2
2
2
2
2
17431
14232
19451
7020
7430
18942
20170
14074
17228
7981
8574
18419
15647
21729
8337
17039
7497
14997
6479
6510
2211
4553
4059
19822
8223
20623
7244
22756
8994
22154
5388
22614
9447
24748
33175
30964
24111
29644
25526
27894
21400
23961
29958
23380
25736
27290
1.555
2.496
2.539
2.388
1.867
2.324
1.617
1.825
2.094
2.193
2.132
2.506
1.997
2.044
2.289
2.027
2.299
2.118
2.188
1.994
1.743
1.818
1.649
1.921
2.226
1.926
2.435
1.978
2.109
1.93
2.151
1.863
2.252
1.823
1.968
1.685
1.727
1.924
1.823
1.871
2.241
2.111
2.046
2.238
2.118
1.971
37.5
34.8
34
31
31.5
30
29.5
26.5
26
23.8
23.8
23.3
23.0
73.5
76.5
64.2
63.5
49.3
53
42.8
49
37
31.8
75.5
78
63.8
67.8
55.3
56.3
48.3
51.7
43.5
40.3
38.5
33.3
33.3
41.8
44.8
40.3
48.7
55.3
68.5
71.8
64
68
75.3
85
86
87
1
202
251
0
0
0
38
0
0
0
80
23
0
0
0
2
2
2
4270
29092
27434
1.682 34.5
2.104 58.3
2.004 67.5
Table S3.
A summary of the RMSE and inference times obtained by the 5 different regressions (linear,
polynomial (degree 2), support vector, gradient boosting and 3 layer neural network)
RMSE (°C)
Inference
Time (µs)
Linear
Polynomial Support
(˚ 2)
Vector
Gradient
Boosting
Neural
Network
11.6±1.1
25.8±6.6
9.31±3.37
7.24±0.46
8.09±0.80
26.0±1.4
29.5±3.2
156 ± 13
161±9
235±14
Table S4.
A summary of the 17 polymers made, including their target cloud point and design along
with the obtained cloud point and design (A: EtOx, B: nPropOx, C: cPropOx, D: iPropOx)
Target
CP
(˚C)
37
45
60
80
Cloud point (˚C)
Mass
Obtained
Composition
Target Composition
Obtained
CP
∆
Target
Obtained
%
Error
A
B
C
D
A
B
C
34.5
-2.5
13195
12629
-4.3
28
79
0
13
28
78
0
9
34
-3
12191
12546
2.9
20
67
0
33
11
63
0
41
36
-1
10838
9629
-11.2
21
59
0
40
30
52
0
33
34
-3
13875
14523
4.7
26
73
0
21
33
60
0
21
45.8
0.8
7712
6978
-9.5
11
0
14
94
10
0
14
91
43.5
-1.5
10554
12151
15.1
26
0
22
72
21
0
15
79
40.8
-4.2
14496
14694
1.4
35
39
0
46
38
37
0
40
45.5
0.5
7745
8040
3.8
26
0
28
65
24
0
18
72
57.8
-2.2
20035
19901
-0.7
84
0
16
20
80
0
12
23
50.5
-9.5
17111
17541
2.5
63
0
0
57
59
0
0
56
53.5
-6.5
11574
12447
7.5
73
38
9
0
70
39
6
0
56.3
-3.7
9257
8773
-5.2
67
32
21
0
68
34
13
0
70.8
-9.2
11725
10332
-11.9
95
0
26
0
98
0
17
0
74.5
-5.5
18612
17021
-8.5
104
0
17
0
103
0
12
0
76.3
-3.7
13330
13170
-1.2
98
0
22
0
100
0
15
0
74
-6
18975
17629
-7.1
108
0
0
12
104
0
0
11
77.8
-2.2
9079
9536
5.0
107
0
0
13
106
0
0
9
D
References
34.
E. Baeten, B. Verbraeken, R. Hoogenboom, T. Junkers, Continuous poly(2-oxazoline)
triblock copolymer synthesis in a microfluidic reactor cascade. Chem. Commun. 51,
11701-11704 (2015).
35.
M. M. Bloksma et al., Poly(2-cyclopropyl-2-oxazoline): From Rate Acceleration by
Cyclopropyl to Thermoresponsive Properties. Macromolecules 44, 4057-4064 (2011).
36.
S. Funtan, Z. Evgrafova, J. Adler, D. Huster, W. Binder, Amyloid Beta Aggregation
in the Presence of Temperature-Sensitive Polymers. Polymers 8, 178 (2016).
37.
T. W. Anderson, An Introduction To Multivariate Statistical Analysis. (Wiley, New
York, 1958), vol. 2.
38.
G. E. P. Box, G. C. Tiao, Bayesian Inference in Statistical Analysis. (John Wiley &
Sons, 2011), vol. 40.
39.
J. H. Friedman, Greedy Function Approximation: A Gradient Boosting Machine. Ann.
Stat. 29, 1189-1232 (2001).
40.
T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning. Springer
Series in Statistics (Springer, New York, NY, USA, 2001), vol. 1.
41.
S. Kirkpatrick, C. D. Gelatt, M. P. Vecchi, Optimization by Simulated Annealing.
Science 220, 671 (1983).
42.
M. Mitchell, An Introduction to Genetic Algorithms. (MIT Press, 1998).
43.
R. Storn, K. Price, Differential Evolution – A Simple and Efficient Heuristic for
global Optimization over Continuous Spaces. J. Global Optim. 11, 341-359 (1997).
44.
J. Kennedy, R. Eberhart, in Proceedings of ICNN'95 - International Conference on
Neural Networks. (1995), vol. 4, pp. 1942-1948 vol.1944.
45.
J. Schmidhuber, Deep learning in neural networks: An overview. Neural Networks 61,
85-117 (2015).