Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Machine learning enables polymer cloud-point engineering via inverse design Jatin N. Kumar,1†* Qianxiao Li,2† Karen Y.T. Tang,1 Tonio Buonassisi,3 Anibal L. GonzalezOyarce,2 Jun Ye,2 1 Institute of Materials Research & Engineering, 2 Fusionopolis Way, #08-03, Singapore 138634 2 Institute of High-Performance Computing, 1 Fusionopolis Way, #16-16, Singapore 138632 3 Massachussets Institute of Technology, Cambridge, MA 02139, USA *Correspondence to: kumarjn@imre.a-star.edu.sg †These authors contributed equally to this work Abstract: Inverse design is an outstanding challenge in disordered systems with multiple length scales such as polymers, particularly when designing polymers with desired phase behavior. We demonstrate high-accuracy tuning of poly(2-oxazoline) cloud point via machine learning. With a design space of four repeating units and a range of molecular masses, we achieve an accuracy of 4°C root mean squared error (RMSE) in a temperature range of 24– 90°C, employing gradient boosting with decision trees. The RMSE is >3x better than linear and polynomial regression. We perform inverse design via particle-swarm optimization, predicting and synthesizing 17 polymers with constrained design at 4 target cloud points from 37 to 80°C. Our approach challenges the status quo in polymer design with a machine learning algorithm, that is capable of fast and systematic discovery of new polymers. Introduction Polymers are ubiquitous in both structural and functional systems, owing to their highly tunable physical, chemical, and electrical properties.1, 2, 3, 4 The development of polymers has historically been based on an Edisonian approach. Herein, we develop a machine learning framework to predict polymer structure (topology, composition, functionality, and size), on the basis of target phase properties, specifically the cloud point. This framework accommodates the complex disorder across multiple length scales that distinguishes polymers from small molecules, 5, 6, 7 inorganic crystals,8 and systems-structure optimization.9, 10, 11 Phase properties, which describe the order of a polymer across multiple length scales, are determined by interactions of polymers with other polymers, the solution, and themselves. One such phase property is the cloud point, the temperature at which polymers are no longer miscible in solution. Numerous studies tabulate simple relationships between cloud point and one or two experimental variables (e.g., structure12 and temperature13, 14), or offer polynomial fits to the data.15 Ramprasad and colleagues applied machine learning to density-functional theory (DFT) calculations to predict opto-electronic16, 17 and physical18 bulk polymer properties.4, 18 However, this approach is computationally expensive,7, 19 particularly for polymer systems,20 and does not enable scalable inverse design over a wide range of conditions with high accuracy.21,22 In this study, we combine machine learning, domain expertise, and experiment to solve the inverse design problem for polymers. Our framework (Fig. 1) has three parts: (1) data curation (defining material descriptors) that relates poly(2-oxazolines) cloud point, size, and relative ratios of 4 different monomer units; (2) machine learning algorithm selection and hyperparameter tuning to enable fast forward prediction of cloud point based on structure with the evaluation of algorithmic robustness over systematic error and differing data quality23; and (3) use of said algorithm for inverse design using particle swarm optimization (PSO) with design selection using an ensemble of neural networks. We demonstrate the accuracy of our inverse-design paradigm by predicting the compositions of, and synthesizing, 17 polymers , not previously reported in literature, with cloud points between 37 to 80°C, using a modular combination of 4 repeating monomer units. We achieve ~ 4°C error, nearly within experimental error (1–3°C). Fig. 1: Study framework. First, we train a machine learning model to predict cloud point on the basis of poly(2-oxazoline) structure, with varying ratios of four monomer units (building blocks) and molecular weights. Second, we demonstrate inverse design using the trained algorithm and particle swarm optimization, predicting 17 polymer structures from userdefined cloud points. The model accommodates the inherent complexity of polymers over multiple length scales. Discussion We combine and curate literature and experimental data to create the input into our machine learning framework. Historical cloud-point data for poly(2-oxazoline)s15, 24, 25, 26, 27, 28, 29 was curated into a set of input variables ((1) molecular weight of the polymers; (2) polydispersity index; (3) polymer type (homo, statistical, or block); (4) total number of each monomer unit in the final polymer (A: EtOx, B: nPropOx, C: cPropOx, D: iPropOx, E: esterOx)) and output variables (cloud point in ˚C) (Table S1). We synthesized 87 poly(2-oxazoline)s by similar methods to augment this data (Table S2). Cloud point was evaluated by dynamic-light scattering (DLS) in accordance with best practices,30 particularly since DLS affords greater weightage to the modal mass as a correction for the asymmetric molecular weight distributions (MWD) of our synthesized polymers (details in supplementary materials). Due to data scarcity, esterOx was not synthesized nor considered in inverse design. The relationships of individual input variables to the output cloud point are plotted in Fig. 2. a d b c f e Fig. 2: The dependencies cloud points to the mole fraction of (a) EtOx; (b) nPropOx; (c) cPropOx; (d) iPropOx; and (e) molecular mass (M), where all zero values were filtered from the graphs, and, (f) the number distribution of cloud-point where zero represents polymers without a CP We test whether machine learning methods have superior predictive accuracies to simple regression methods in this multi-variable parameter space 31, 32, 33. We compare the root-mean-squared errors (RMSE) of simple linear and quadratic regressions against machine learning methods including support vector regressions (SVR), neural networks (NN) and gradient boosting regression with decision trees (GBR) (Fig. 3, S3). The accuracies of the various models are determined by splitting the input dataset into training, validation, and test sets, with training and validation performed from historical data, while testing is performed with experimental data. The RMSE and inference times are reported in Table S3. a b c d e f Fig. 3: (top row) Comparison of three regression methods (a-c: linear, polynomial (order-2), and gradient boosting (with decision trees) regressions.) The literature data is split into 68 training data points and 7 validation data points. Test datapoints are 42 experimental data points produced in the lab. The results were compared using the root-mean-squared error. We observe that GBR achieves the best generalization. (bottom row) Final GBR model performance on 3 different random train-test splits of the combined dataset. Linear and polynomial regressions, while significantly faster than the others, performed poorly when compared to SVR, NN and GBR. Of the latter three, GBR was the more accurate “out of the box.” Moreover, it possesses fast inference speed, which is essential for efficient exploration of the parameter space in inverse design. We increased the predictive accuracy by tuning via a cross-validation grid search on hyper-parameters. We used both historic and experimental data, with a test set of 10%, to validate our choice of hyperparameters with the test error on 3 randomly split training and test sets (Fig. 3). We now observe improved performance with an increased dataset and thorough tuning. This algorithm is shown to generalize well across the variation in polymer dataset of varying polydispersity. The historical datasets had narrow polydispersity indices with the assumption of symmetrical MWDs, while the synthesized polymers had broad and unsymmetrical MWD. The robustness of this algorithm in handling “noisy” data renders this far more powerful than a simple algorithm which only works for the highest quality of data. With a sufficiently accurate model, we finally retrain (using the tuned hyper-parameters) on the entire dataset to produce a finalized forward model that we use for subsequent inverse design. The feature importance ranking (Fig. S4) indicates that “units of A” and “molecular mass” are the two most important features defining cloud point. We note that these insights, are not trivially derived from Figure 2, which indicates similarly strong dependences of variables a–c on cloud point. Also, the molecular mass correlating most strongly with cloud point is the mode, not the median or mean (Fig. S2), which we speculate could indicate a critical threshold, e.g., of polymers with molecular mass above a certain concentration necessary to induce globule formation. While a forward predictive models in machine learning approaches for materials science are fairly common, inverse design is far more challenging. This is because the descriptors, which are usually high dimensional, are difficult to predict from outputs which are low dimensional. In the case of our polymer dataset, the output of cloud point is a single number, attributed to the 5 numbers representing molecular mass and composition of the polymer. Inverse design would provide the ability to design polymers based on a desired final property and accelerate the synthesis process of target polymers based on design constraints to meet desired cloud points. To further realize new material discovery, we propose to extrapolate from our training dataset by designing terpolymers, which are non-existent in our training set, and limiting EtOx composition which is common. Typically, inverse optimization on piece-wise constant functions provides a large number of different predicted designs. These may achieve our optimization and constraint target according to the fitted GBR model. However, the quality of these designs vary, particularly in the case of extrapolation. Validating all of these experimentally would be inefficient and so a filtering method with an ensemble of M three-layer fully connected neural networks (NN) was employed to select the most promising design candidates for experimental validation. Each NN’s trainable parameters are initialized with distinct, random values, resulting in different fitted predictors {𝑓#$ , … , 𝑓#' }, due to the non-convex nature of the objective function and random initialization. For each design 𝑥, we then compared the ensemble of NN-predicted cloud points {𝑓#$ (𝑥), … , 𝑓#' (𝑥)} with the GBR prediction 𝑓#(𝑥) and $ # only experimentally validated designs where 𝑓#(𝑥 ) ≈ ' ∑' ./$ 𝑓. (𝑥) (NN predictions agree with GBR) and 𝑉𝑎𝑟{𝑓#$ (𝑥), … , 𝑓#' (𝑥)} was small. This ensures that 𝑥 is predicted with high confidence and not an ad-hoc extrapolation. Fig. 4 illustrates the principle of this approach. Although the NNs are also good approximators for the cloud point, they were not used as the forward model for producing inverse design candidates because the feed-forward step of the NN ensemble is still too slow compared with GBR, which consists of simple summing of piecewise constant functions. a b c d Fig. 4: (a) Framework of the selection criteria, where the dataset is used to train GBR and NN ensemble, PSO predicts polymer design(x*) a desired CP(y*), and the design is verified for accuracy by the NN ensemble where CP agreement is a down-selection criteria (b) Illustration of the validity of the filtering procedure. We observe that given limited training data, not all extrapolated points are valid. However, when an ensemble of neural networks trained with distinct initializations agree on a certain input, then we have a much greater confidence in the validation of their predictions. (c) Final PSO based inverse design performance, showing an RMSE of 3.9 ˚C; (d) Forward model (NN ensemble) performance of the polymers synthesized from design Conclusion Using this technique, we down-selected 17 polymers over our 4 desired cloud-points (37, 45, 60, 80 ˚C), and imposed design criteria weighted on minimizing EtOx and designing polymers with more than two components – unseen in the training data. These polymers were synthesized, although an average of 3 iterations were required to achieve the target mass and composition of the designs, owing to the difficulties with terpolymer synthesis, where the Mayo-Lewis equation does not apply in calculating required feed ratio of monomer for desired final copolymer composition. The mass and composition of the synthesized polymers are reported in Table S4, showing minimal deviation from algorithmic design, along with their cloud points (an average of 3 measurements). The RMSE of the obtained cloud points was 3.9 ˚C, however when the polymer structure of the new polymers is fed back into the NN ensemble, a larger RMSE is observed (6.1 ˚C) (Fig. 4). Deviation from the target cloud points was within test RMSE between 37-60 ˚C but above it at 80 ˚C, and can be attributed to sparseness of the data set at higher temperatures (Fig. 2F) – an in-depth analysis is provided in supplementary materials. These results show that our combination of slow and fast algorithms are able to design polymers with unique compositions with control over the desired physical property and structural design. Overall, a significant conceptual advance in polymer inverse design has been achieved via judicious application of machine learning methods. This was done in three steps. First, we curated and categorized historical and new data. Second, we selected and fine-tuned a machine learning model based on gradient boosting regression with decision trees, resulting in a cloud point predictive accuracy of 3.9 ˚C (RMSE). The model was able to generalize well with both well-defined historic datasets as well as newly synthesized polymers of unsymmetrical MWDs. Third, inverse design by particle swarm optimization which predicted the design of new polymers based on desired cloud points (37, 45, 60, 80 ˚C). Extrapolation beyond the training set was achieved via an ensemble of neural-networks as a crossvalidation technique to down-select 17 polymers with the lowest variance across predictions. The RMSE of predicted polymers were similar to those of the forward model. This methodology offers unprecedented control of polymer design, which may significantly accelerate the development of polymers with other physical properties. References 1. Garcia SJ. Effect of polymer architecture on the intrinsic self-healing character of polymers. Eur Polym J 2014, 53: 118-125. 2. Rinkenauer AC, Schubert S, Traeger A, Schubert US. The influence of polymer architecture on in vitro pDNA transfection. J Mater Chem B 2015, 3(38): 7477-7493. 3. Paramelle D, Gorelik S, Liu Y, Kumar J. Photothermally responsive gold nanoparticle conjugated polymer-grafted porous hollow silica nanocapsules. Chem Commun 2016, 52(64): 9897-9900. 4. Mannodi-Kanakkithodi A, Pilania G, Huan TD, Lookman T, Ramprasad R. Machine Learning Strategy for Accelerated Design of Polymer Dielectrics. Sci Rep 2016, 6: 20952. 5. Wei JN, Duvenaud D, Aspuru-Guzik A. Neural Networks for the Prediction of Organic Chemistry Reactions. ACS Cent Sci 2016, 2(10): 725-732. 6. Gómez-Bombarelli R, Wei JN, Duvenaud D, Hernández-Lobato JM, SánchezLengeling B, Sheberla D, et al. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Cent Sci 2018, 4(2): 268-276. 7. Sanchez-Lengeling B, Roch LM, Perea JD, Langner S, Brabec CJ, Aspuru-Guzik A. A Bayesian Approach to Predict Solubility Parameters. Adv Theory Simul; 2018. p. doi:10.1002/adts.201800069. 8. Ye W, Chen C, Wang Z, Chu I-H, Ong SP. Deep neural networks for accurate predictions of crystal stability. Nat Commun 2018, 9(1): 3800. 9. Gómez-Bombarelli R, Aguilera-Iparraguirre J, Hirzel TD, Duvenaud D, Maclaurin D, Blood-Forsythe MA, et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat Mater 2016, 15: 1120. 10. Brandt RE, Kurchin RC, Steinmann V, Kitchaev D, Roat C, Levcenco S, et al. Rapid Photovoltaic Device Characterization through Bayesian Parameter Estimation. Joule 2017, 1(4): 843-856. 11. Raccuglia P, Elbert KC, Adler PDF, Falk C, Wenny MB, Mollo A, et al. Machinelearning-assisted materials discovery using failed experiments. Nature 2016, 533: 73. 12. Jiang R, Jin Q, Li B, Ding D, Shi A-C. Phase Diagram of Poly(ethylene oxide) and Poly(propylene oxide) Triblock Copolymers in Aqueous Solutions. Macromolecules 2006, 39(17): 5891-5896. 13. Ashbaugh HS, Paulaitis ME. Monomer Hydrophobicity as a Mechanism for the LCST Behavior of Poly(ethylene oxide) in Water. Ind Eng Chem Res 2006, 45(16): 55315537. 14. Aseyev V, Tenhu H, Winnik FM. Non-ionic Thermoresponsive Polymers in Water. In: Müller AHE, Borisov O (eds). Self Organized Nanostructures of Amphiphilic Block Copolymers II. Springer Berlin Heidelberg: Berlin, Heidelberg, 2011, pp 29-89. 15. Hoogenboom R, Thijs HML, Jochems MJHC, van Lankvelt BM, Fijten MWM, Schubert US. Tuning the LCST of poly(2-oxazoline)s by varying composition and molecular weight: alternatives to poly(N-isopropylacrylamide)? Chem Commun 2008(44): 5758-5760. 16. Huan TD, Mannodi-Kanakkithodi A, Kim C, Sharma V, Pilania G, Ramprasad R. A polymer dataset for accelerated property prediction and design. Sci Data 2016, 3: 160012. 17. Mannodi-Kanakkithodi A, Chandrasekaran A, Kim C, Huan TD, Pilania G, Botu V, et al. Scoping the polymer genome: A roadmap for rational polymer dielectrics design and beyond. Mater Today 2018, 21(7): 785-796. 18. Kim C, Chandrasekaran A, Huan TD, Das D, Ramprasad R. Polymer Genome: A Data-Powered Polymer Informatics Platform for Property Predictions. J Phys Chem C 2018, 122(31): 17575-17585. 19. Kutzner C, Páll S, Fechner M, Esztermann A, de Groot BL, Grubmüller H. Best bang for your buck: GPU nodes for GROMACS biomolecular simulations. 2015, 36(26): 1990-2008. 20. Dünweg B, Kremer K. Molecular dynamics simulation of a polymer chain in solution. 1993, 99(9): 6983-6997. 21. Stuart MAC, Huck WTS, Genzer J, Muller M, Ober C, Stamm M, et al. Emerging applications of stimuli-responsive polymer materials. Nat Mater 2010, 9(2): 101-113. 22. Halperin A, Kröger M, Winnik FM. Poly(N-isopropylacrylamide) Phase Diagrams: Fifty Years of Research. Angew Chem Int Ed 2015, 54(51): 15342-15367. 23. Friedman JH. Greedy Function Approximation: A Gradient Boosting Machine. Ann Stat 2001, 29(5): 1189-1232. 24. Contreras MM, Mattea C, Rueda JC, Stapf S, Bajd F. Synthesis and characterization of block copolymers from 2-oxazolines. Des Monomers Polym 2015, 18(2): 170-179. 25. Glassner M, Lava K, de la Rosa VR, Hoogenboom R. Tuning the LCST of poly(2cyclopropyl-2-oxazoline) via gradient copolymerization with 2-ethyl-2-oxazoline. J Polym Sci A 2014, 52(21): 3118-3122. 26. Diab C, Akiyama Y, Kataoka K, Winnik FM. Microcalorimetric Study of the Temperature-Induced Phase Separation in Aqueous Solutions of Poly(2-isopropyl-2oxazolines). Macromolecules 2004, 37(7): 2556-2562. 27. Park J-S, Akiyama Y, Winnik FM, Kataoka K. Versatile Synthesis of EndFunctionalized Thermosensitive Poly(2-isopropyl-2-oxazolines). Macromolecules 2004, 37(18): 6786-6792. 28. Park J-S, Kataoka K. Precise Control of Lower Critical Solution Temperature of Thermosensitive Poly(2-isopropyl-2-oxazoline) via Gradient Copolymerization with 2-Ethyl-2-oxazoline as a Hydrophilic Comonomer. Macromolecules 2006, 39(19): 6622-6630. 29. Park J-S, Kataoka K. Comprehensive and Accurate Control of Thermosensitivity of Poly(2-alkyl-2-oxazoline)s via Well-Defined Gradient or Random Copolymerization. Macromolecules 2007, 40(10): 3599-3609. 30. Zhang Q, Weber C, Schubert US, Hoogenboom R. Thermoresponsive polymers with lower critical solution temperature: from fundamental aspects and measuring techniques to recommended turbidimetry conditions. Mater Horizons 2017, 4(2): 109116. 31. Cortes C, Vapnik V. Support-Vector Networks. Machin Learn 1995, 20(3): 273-297. 32. Rokach L, Maimon O. Data Mining With Decision Trees: Theory and Applications. World Scientific Publishing Co., Inc., 2014. 33. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015, 521: 436. Details of our code implementation and dataset can be found in our repository. (https://github.com/LiQianxiao/CloudPoint-MachineLearning) All other information is in the ESI. Acknowledgements: J.N.K., Q.L., T.B. are support by the AME Programmatic Fund by the Agency for Science, Technology and Research under Grant No. A1898b0043 Supplementary Materials: Materials and Methods Fig. S1-S5 Tables S1-S4 References (34-45) Supplementary Materials for Machine learning enables polymer cloud-point engineering via inverse design Jatin N. Kumar, Qianxiao Li, Karen Y.T. Tang, Tonio Buonassisi, Anibal L. Gonzalez-Oyarce, Jun Ye, Correspondence to: kumarjn@imre.a-star.edu.sg This file includes: Materials and Methods Supplementary Text Figs. S1 to S5 Tables S1 to S4 Materials and Methods Materials 2-n-propyl-2-oxazoline (nPropOx),(34) 2-cyclopropyl-2-oxazoline (cPropOx),(35) 2-isopropyl2-oxazoline (iPropOx)(36) were synthesized as described in literature, and distilled over calcium hydride and stored with molecular sieves (size 5 Å) in a glovebox. 2-ethyl-2-oxazoline (EtOx, Sigma-Aldrich) was distilled over calcium hydride and stored with molecular sieves (size 5 Å) in glovebox. All other reagents were used as supplied unless otherwise stated. Analytical Methods Nuclear magnetic resonance (NMR). The compositions of the polymers were determined using 1H NMR spectroscopy. 1H NMR spectra were on JEOL 500MHz NMR system (JMNECA500IIFT) in CDCl3. The residual protonated solvent signals were used as reference. Size exclusion chromatography (SEC). Gel permeation chromatography (GPC) measurements were performed in THF (flowrate: 1 mL/min) on a Viscotek GPC Max module equipped with Phenogel columns (10-3 and 10-5 Å) (size: 300 x 7.80 mm) in series heated to 40 ºC. The average molecular weights and polydispersities were determined with a Viscotek TDA 305 detector calibrated with poly(methyl methacrylate) standards. Dynamic Light Scattering (DLS). Measurements at various temperatures were conducted using a Malvern Instruments Zetasizer Nano ZS instrument equipped with a 4 mV He–Ne laser operating at l = 633 nm, an avalanche photodiode detector with high quantum efficiency, and an ALV/LSE-5003 multiple tau digital correlator electronics system. on Malvern Nano ZS. Solutions of polymers (5 mg/mL) were prepared by dissolving polymer in deionized water at room temperature. The solutions were then heated to 100 °C and cooled down to remove thermal memory, before measurements were taken. Experimental Methods For all polymerizations, the polymerization mixture was prepared in vials that were dried in 100 °C oven overnight before use, and crimped air-tight in a glove box. The mixture contained the monomers (EtOx, nPropOx, cPropOx, iPropOx) of desired ratios, with a total monomer concentration of 4 M, anhydrous acetonitrile (ACN) and methyl tosylate (MeOTs) as initiator. The amount of methyl tosylate added was determined by the various [M]/[I] ratios. Temperature controlled polymerizations were performed in sealed vials in a microwave reactor equipped with IR temperature sensor at 140 °C for different length of time. The mixture was then cooled to ambient temperature and quenched by addition of tetramethylammonium hydroxide (2.5wt% in methanol, 2 equivalence relative to initiator). The solutions were concentrated by removing some of the solvent under reduced pressure, then precipitated in cold diethyl ether. The product was collected and dried under reduced pressure overnight. All polymers were redissolved in THF for SEC, CDCl3 for 1H NMR and deionized water for DLS. 1H NMR of P((EtOx)w(nPropOx)x(cPropOx)y(iPropOx)z) (500 MHz, CDCl3, δ, ppm): 0.8 (d, 66.5 Hz, 4y H, CHCH2 CH2), 0.96 (s, 3x H, CH2CH2CH3), 1.11 (s, 6z H, CHCH3CH3), 1.12 (s, 3w H, CH2CH3), 1.64 (s, 2x H, CH2CH2CH3) 2.30 (d, 56.5 Hz, 2x H, NCOCH2CH2CH3), 2.38 (s, 2w H, NCOCH2CH3), 2.70 (d, 61.0 Hz, y H, CHCH2CH2), 2.80 (d, 123.5 Hz, z H, CHCH3CH3), 3.49 (s, 2(w+x+y+z) H, CH2 backbone). Whereby w, x, y and z is the mole ratio of EtOx, nPropOx, cPropOx and iPropOx respectively. Supplementary Text Curation and synthesis of polymer library To augment the historical dataset reported in Table S1, (15,24-29) a series of poly(2oxazolines) were synthesized by cationic ring-opening polymerization in a microwave reactor at 140 °C and terminated with tetramethyl-ammonium hydroxide at the end of the reaction. All copolymers were synthesized with EtOx and one of the propyl oxazolines and variations in feed ratio were performed. SEC results are reported for all synthesized polymers in Table S2. DLS measurements were performed in triplicate by preparing solutions of polymers at a concentration of 5 mg/mL in deionized water. The solutions were then heated to 100 °C and cooled down before measurements were taken to negate effect of thermal history. DLS measurements of the polymer solutions were performed over a temperature sweep between 20 to 90 ˚C. The cloud point temperature for the synthesized polymers (Table S2) was determined as the temperature at which the dissolved polymer chains of small hydrodynamic diameter agglomerate to form large particles or mesoglobules, as demonstrated in Fig. S1 for poly(nPropOx-co-EtOx) copolymers with a compositional variation at 20% increments. The PDIs obtained experimentally are much higher than the PDIs from the historical data. It can be assumed that the molecular weight distributions (MWD) for the historical data, where the PDI is lower than 1.4, are typically symmetrical. Conversely, the MWD of the polymers made experimentally had a long low-molecular weight tail (Figure S2). In the case of cationic ring opening polymerization, this long tail can be attributed to impurities such as water which terminate actively propagating chains. Due to the unsymmetrical MWD, the number average molecular weight (Mn), is no longer a proper representation of the MWD, particularly when comparing the dataset to historical data with polymers of narrow polydispersities. Zhang et al.(30) propose that DLS is one of the better methods to characterize cloud points. They note that the intensity of scattered light due to a sharp change in refractive index is influenced by the chains that are dehydrating and thereby changing morphology from coil to globules. In contrast, only a minor difference in refractive index is observed from the hydrated chains. For broad or unsymmetric MWDs such as with our polymers, we postulate that the cloud point by DLS of the modal polymer molecular weight would represent the polymer as a whole. To validate this theory, a polymer was selected at random, and dialyzed against water to remove some of the low molecular weight tail. Comparisons of the MWD before and after dialysis (Fig. S2) show the removal of the low molecular weight tail, and the narrowing of the MWD. However, DLS results (Fig. S2-inset) show no change to the cloud point of the polymer. Thus, to better represent the polymer dataset, the modal molecular weight, or peak molecular weight (Mp) was used to represent the molecular weight of the polymers from Table S2. Machine-learning methodology Establishing a Machine Learning Baseline It is often useful to establish a baseline for statistical methods on the currently available data before further data collection and algorithm exploration. In this section, we outline the development of our basic data driven approach which are broadly classified as statistical models (e.g., multivariate analysis(37) and Bayesian inference(38)) and machine learning models (e.g., support vector machines,(31) decision tree learning,(32) and deep neural networks(33)). The former perform well on relatively small datasets, but require nontrivial domain information such as statistical priors and a forward mathematical model, which may not always be available and can thus limit their applicability. On the other hand, machine learning models lend their applicability to datasets where the underlying physical mechanisms are unclear, or when the dataset has noise corruption.(39) While machine learning typically requires large datasets and cannot infer underlying physical relationships, its accuracy and fast inference speed makes it suitable for inverse design via global optimization. In this work, we recall that we wish to predict the cloud point (𝑦 ∈ 𝑅) based on the polymer composition and other properties (𝑥 ∈ 𝑅7 ). We assume that there is some relationship 𝑦 = 𝑓(𝑥) for some unknown function 𝑓. Hence, our goal is to parameterize and fit an approximator 𝑓# of 𝑓. The literature dataset is split into 68 training samples and 7 test samples, and we evaluate a total of five methods for fitting: 1) Linear regression; 2) Polynomial regression of degree up to 2; 3) Support vector regression; 4) Neural network regressor (2 hidden layers) 3) Gradient boosting regression with decision trees (GBR).(39) Below, we sketch the basic idea of the GBR method, which is the final choice of our forward model for inverse design and refer the reader to the text authored by Hastie, Tibshirani and Friedman(40) for more details. A Sketch of Gradient Boosting Regression GBR makes use of the idea of “boosting”, which is a class of sequential ensembling methods, where weak regressors (regression models with low capacity or approximation power) are iteratively combined to form a strong regressor. The basic idea is as follows: fix a space of weak approximators 𝐻 (e.g. decision trees) and start with a constant function 𝑓: . For each 𝑘 ≥ 0, we set 𝑓>?$ = 𝑓> + argminG∈H 𝐿(ℎ, 𝑓 − 𝑓> ) (1) where the loss function 𝐿 measures the “distance” between its arguments. In other words, at each step we fit some function to approximate the current residual error 𝑓 − 𝑓> , and this successively improves the approximation. Of course, in practice the minimization step in (1) may be hard to evaluate, hence one can use “gradient boosting”, where ℎ is not chosen as a true minimizer, but a function in the “steepest descent direction” of the loss function with respect to ℎ. Detailed exposition on gradient boosting can be found int he previously mentioned text.(40) The results of the comparisons are shown in Figure 3 and S3, where we measure the root-mean-squared error on training, validation and test sets, the latter of which is the quantity to be used to discriminate model performances. The RMSE and the inference time is reported in Table S3. Note that while the training and validation sets are random splits in of the literature data, the test set are sample points obtained in our experiments. Thus, a model that performs well on the tests set indicates that it has the ability to fuse both literature data and our experimental data to form a more robust model. From our results, we observe that linear regression and polynomial regression, while having fast inference speeds, perform poorly in terms of test error. Moreover, polynomial regression suffers from the “curse of dimensionality” when higher order polynomials are included, since the number of terms increases exponentially with increasing maximum degree. While all of the more sophisticated machine learning methods perform significantly better, the most outstanding is GBR method performs the best when weighing both in RMSE and inference time, even with minimal tuning. The inference time is important since we will need to repeated call this forward model in our inverse design process, and a faster inference time greatly enhances our exploration of the design space. Moreover, GBR (with decision trees as base regressors) give us a measure of feature importance using the Gini impurity.(40) In the present application, this gives us an estimation of the sensitivity of our cloud-point model on the polymer properties, seen in Fig. S4. To optimize the GBR for inverse design, hyperparameter tuning was further conducted to bring the RMSE down to 3.9 ˚C. Details of which are presented in our data repository. With a tuned model, we look towards inverse design in order to predict polymer structure from desired cloud points. Inverse Design via Particle Swarm Optimization Our data-driven approximation 𝑓# of the forward relationship between the polymer properties and the cloud point was demonstrated previously to be close to the true function 𝑓. In this section, we consider the problem of inverse design, where we want to find a polymer configuration 𝑥 that achieves certain targets (e.g. cloud point, desired proportions), while respecting certain constraints (e.g. molecular weight). Mathematically, this can be posed as a constrained optimization problem min 𝐹(𝑥, 𝑓#(𝑥)) subject to 𝐺 W𝑥, 𝑓# (𝑥)X ≥ 0. L (2) Where 𝐹: 𝑅7 × 𝑅 → 𝑅 is the objective function and 𝐺: 𝑅7 × 𝑅 → 𝑅] is the vector-valued constraint function. The problem (2) is posed as a global optimization problem. In general, there are many heuristic methods for solving it, including simulated annealing,(41) genetic algorithms,(42) differential evolution,(43)etc. In this paper, we employ the particle swarm optimization (PSO) algorithm.(44) It is especially suited for our use-case since 𝑓# is a boosted regression tree, which is a piece-wise constant function with almost everywhere vanishing derivatives, rendering gradient-based algorithms ineffective. For the current application, we consider the following instance of objective and constraints: Objective: Consider a mean-squared loss function that penalizes deviation from a target cloud point 𝑦^ plus a regularization term that promotes certain desired design patterns 1 c 𝐹 W𝑥, 𝑓#(𝑥 )X = a𝑓#(𝑥 ) − 𝑦^ b + 𝑅(𝑥). 2 For the present application, we set 𝑅 so as to promote ternary and simpler designs (at most three non-zero components), as well as minimizing the units of A (EtOx). By writing 𝑥 = (𝑥d , 𝑥e , 𝑥f , 𝑥g , 𝑥h , 𝑥' ), we have 𝑅 (𝑥 ) = 𝜆$ j (𝑥$ 𝑥c 𝑥k )$/k + 𝜆c 𝑥d mn ,mo ,mp ∈(mq ,…,mr ) mn smo smp where 𝜆$ , 𝜆c > 0 are regularization parameters. Note that there exist well-defined minima since we also require all components of 𝑥 to be non-negative. Constraints: First, we employ the element-wise bounds (0,0,0,0,0,0) ≤ 𝑥 ≤ (203, 187, 43, 96, 0, 23196) These bounds were selected based on the limits of the training data, and ease of synthesis. Next, since M is the product of the degree of polymerization of the polymer and the molecular weight of its monomer units, we make sure that the designed M values are consistent (within 10%) with the designed compositions, i.e. where 0.8 ∗ 𝑀(𝑥 ) ≤ 𝑥' ≤ 1.2𝑀(𝑥) 𝑀(𝑥 ) = 99.13𝑥d + 113.16𝑥e , 111.14𝑥f + 113.16𝑥g Finally, to simplify the experimental process we require the maximal number of monomer units to be at most 10 times of the minimal non-zero monomer unit, i.e. max 𝑥. ≤ 10 min 𝑥. ./d,…,h ./d,…,h m• €: Our polymer design predictions were also given constraints based on our own requirements. For the purpose of this study, we chose to minimize the amount of EtOx in the polymer designs, especially since our training data was heavily populated with polymers containing EtOx. Thus, we ran four sets of predictions, in decreasing order of preference, where: (1) the algorithm limited EtOx to zero aggressively; (2) the algorithm limited EtOx to zero less aggressively; (3) the algorithm limited EtOx to under 100 units; (4) the algorithm did not limit EtOx. One of the other design parameters that we considered, was to have more than 2 components in the polymer design – a feature that was not present in our training set, nor is it commonplace when designing polymers for a desired physical property due to the expansion into a multivariable parameter space. Selection Criteria Besides obvious selection criteria such as picking designs with predicted cloud points close to the target cloud point, we developed more sophisticated selection procedures. As typical in inverse optimization on piece-wise constant functions, depending on the random initialization and the randomness of the PSO algorithm, we may arrive at a large number of different predicted designs that achieves, according to the fitted GBR model, our optimization and constraint targets. However, the quality of these designs varies (especially when extrapolating from our training data) and testing all of them would be inefficient. Thus, we employ a filtering method to select the most promising design candidates for experimental validation. Concretely, we train an ensemble of 𝑀 three-layer, fully connected neural networks (NN)(45) with sigmoid activations and mean-square loss on our full training set to predict cloud points based on polymer properties. Each NN’s trainable parameters are initialized with distinct, random values. Due to the non-convex nature of the objective function and random initialization, with high probability each neural network will give rise to a different fitted predictor {𝑓#$ , … , 𝑓#' }. For each design 𝑥, we then compare the ensemble of NN-predicted cloud points {𝑓#$ (𝑥), … , 𝑓#' (𝑥)} with the GBR prediction 𝑓#(𝑥). We only choose $ # to experimentally validate designs where 𝑓#(𝑥 ) ≈ ' ∑' ./$ 𝑓. (𝑥) (NN predictions agree with GBR) and 𝑉𝑎𝑟{𝑓#$ (𝑥), … , 𝑓#' (𝑥)} is small. This ensures that 𝑥 is predicted with high confidence and not an ad-hoc extrapolation. Figure 4a and b summarize and illustrate the principle of this approach. Note that although the NNs are also good approximators for the cloud point, we do not use NNs as the forward model for producing inverse design candidates because the feed-forward step of the NN ensemble is still too slow compared with GBR, which consists of simple summing of piecewise constant functions. Machine-Learning Validation The inverse design generated a list of possible polymer mass and target compositions, following the 4 constraint parameters above, and are reported in their entirety in our code repository. The neural network on the trained dataset was used to predict a cloud point based on the particle swarm prediction of size and composition, and all predictions with the smallest difference between NN and GBR predictions and having low variance in the NN predictions were down-selected. From this, further down-selection was performed to choose 4 polymer designs per temperature with higher preference given to a more aggressive minimization of EtOx. The final choice of polymers is summarized in Table S4. It can be noted that almost all of the polymers were designed to have 3 components, with the exception of the 80 ˚C cloud point polymers. Also in Table S4 is the cloud point, composition and size of the polymers synthesized experimentally. The RMSE of the experimental results against their NN prediction was found to be 3.9 ˚C, which is the same the RMSE for the optimized GBR (Figure 4C).There is some deviation from the exact design due to experimental error, and when the obtained compositional and mass data was fed back into the NN for a forward predictive verification of the cloud points, a higher RMSE of 6.1 ˚C was seen (Figure 4D). However, the results conclusively show that the inverse design algorithm is able to design polymers with unique compositions with a great deal of accuracy based on desired cloud points, especially when the cloud point range is well trained. The algorithm was robust enough to handle large variation in polymer quality as discussed earlier. Moreover, the algorithmic methodology allows us to vary our configuration for the inverse design, which would provide access a vast array of polymer design with a potential towards experiment automation. Lastly, the general nature of this algorithm could allow us to work with other similar polymer datasets, thereby accelerating the development of polymers in the future. Fig. S1. Temperature dependent DLS measurements for poly(nPropOx-co-EtOx) at various compositional ratios demonstrating the cloud point dependence on polymer composition. Mp = 17,400 Da 0.8 Before Dialysis After Dialysis Number Mean (d.nm) 4000 Mass Fraction 0.6 0.4 3000 2000 1000 0 48 0.2 49 50 o T ( C) 0.0 3.0 3.5 4.0 4.5 5.0 5.5 log M Fig. S2. Gel permeation chromatogram and temperature dependent DLS data of poly(nPropOx-coEtOx) (sample numbers 38 & 39, Table S2) before and after dialysis showing a narrowing of the molecular weight distribution, with no change in cloud point Fig. S3. Comparison of two regression methods (support vector regression (SVR) and neural network regression (NN)). This serves as a basis of comparison to the other regressions shown in Figure 3a-c. The literature data is split into 68 training data points and 7 validation data points. Test datapoints are 42 experimental data points produced in the lab. The results were compared using the root-mean-squared error. Fig. S4. Feature importance via Gini impurity. Average values with standard deviation as error bars are plotted for each feature over 100 training-validation (90%-10%) splits Fig. S5. The fit of the NN ensemble model on experimentally obtained designs. The 80 ˚C designs are plotted separately from the other designs to show the main source of the deviation. Table S1. A list of historical data showing the degree of polymerization of EtOx (A), nPropOx (B), cPropOx (C), iPropOx (D), esterOx (E), polymer type (1 for homopolymer, 2 for statistical copolymer, 3 for gradient copolymer, 4 for block copolymer), molecular weight (M) in Da, polydispersity index (PDI), and cloud point in ˚C. No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 Units of A 10 20 30 50 100 150 200 300 500 0 0 0 0 0 0 0 0 50 45 40 35 30 25 20 15 10 5 0 100 90 80 70 60 50 40 30 20 10 Units of B 0 0 0 0 0 0 0 0 0 15 20 25 50 100 150 200 300 0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60 70 80 90 Units of C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Units of D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Units of E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Type 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 M 1300 2000 2600 3800 6700 9000 13300 21000 37300 3100 3700 4300 6200 8140 12300 15500 18000 3300 3500 3500 3700 3700 4000 5400 3900 3800 4000 4200 15300 15200 13600 12600 13000 10700 10200 9700 9600 7800 PDI 1.09 1.08 1.09 1.09 1.15 1.15 1.25 1.33 1.6 1.1 1.11 1.14 1.28 1.4 1.3 1.43 1.46 1.14 1.15 1.36 1.36 1.34 1.36 1.35 1.35 1.34 1.34 1.32 1.21 1.22 1.21 1.26 1.25 1.28 1.37 1.36 1.37 1.48 Cloud Point 90.6 85.3 78.3 73.5 69.3 42.9 39 37.5 30.3 29.6 25.5 24.1 22.5 82 72.2 59.8 51.3 45.8 40 34.2 29.6 94.1 81.6 75.5 64.8 55.9 51.1 44.2 40 34.8 31.2 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 0 150 135 120 105 90 75 60 45 30 15 0 0 11 20 31 50 69 78 100 100 22 48 73 0 0 0 0 0 0 0 0 0 0 34 59 96 94 81 0 0 0 0 0 0 0 100 0 15 30 45 60 75 90 105 120 135 150 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 106 94 59 36 0 106 84 58 40 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 89 80 69 50 31 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 78 52 27 100 40 40 40 0 24 50 73 86 0 0 0 0 0 0 17 21 41 50 38 69 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 20 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 2 2 2 2 2 2 2 1 3 3 3 3 3 3 3 1 1 3 3 3 1 4 4 4 1 3 3 3 1 1 3 3 3 3 1 1 1 1 1 1 1 1 8140 17700 17000 17700 17200 17900 17600 18600 17900 18500 18600 17200 19700 20500 18000 14600 13400 15400 14100 14000 8000 9300 9300 9300 9700 6098 7670 10813 12000 13300 12300 12300 9700 12000 12900 12400 14000 10200 8000 1900 2400 4600 5650 4300 7800 9700 1.4 1.47 1.44 1.4 1.49 1.38 1.37 1.35 1.33 1.34 1.35 1.45 1.18 1.16 1.16 1.11 1.14 1.11 1.1 1.19 1.02 1.02 1.02 1.02 1.02 1.04 1.03 1.02 1.02 1.02 1.04 1.02 1.02 1.05 1.04 1.02 1.05 1.04 1.03 1.03 1.03 1.02 1.02 28 83.9 71.5 63.1 53.7 49.2 42.6 37.3 34.1 29.1 24.5 24.1 28 33 39 46 57 72 79 91 67.3 55.2 46 38.7 44.7 47.7 47.4 23.8 26.3 30.1 33.8 38.7 23.8 36.3 41.8 50.6 75.1 72.5 62.8 51.3 48.1 43.7 38.7 37.3 Table S2. A list of data for synthesized polymers showing the degree of polymerization of EtOx (A), nPropOx (B), cPropOx (C), iPropOx (D), esterOx (E), polymer type (1 for homopolymer, 2 for statistical copolymer, 3 for gradient copolymer, 4 for block copolymer), molecular weight (M) in Da, polydispersity index (PDI), and cloud point in ˚C. No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 Units of A 47 73 26 136 13 69 208 15 23 181 91 129 171 10 75 435 1166 388 954 208 222 130 129 72 55 57 116 141 90 49 76 84 100 58 85 50 43 63 Units of B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 20 7 15 16 29 35 40 22 57 56 69 59 82 79 67 99 Units of C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Units of D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Units of E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Type 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 M (Da) 4626 7274 2624 13458 1294 6853 20642 1448 2315 17897 9020 12787 16917 976 7407 43082 115564 38506 94602 20654 21991 14534 15033 7953 7213 7513 14801 17920 13442 7430 14044 14681 17784 12421 17695 13956 11895 17431 PDI 1.766 2.072 1.747 2.94 1.194 2.546 1.932 1.228 1.763 2.341 2.318 2.294 2.542 1.193 2.673 2.656 2.54 2.426 3.017 2.283 3.492 2.011 2.294 2.074 2.368 1.813 2.612 1.719 2.089 2.191 2.017 1.933 2.167 2.197 2.283 2.19 1.732 2.232 CP 81.8 87.5 88.5 88.5 86.3 60.8 60.8 61.5 61.5 65.5 65.5 74.5 71.3 75.0 66 66.3 64.8 63 56.8 57.5 49.3 49 49.3 44.5 43.5 39.5 39 38.5 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 63 38 50 12 13 34 37 12 14 0 0 0 0 198 68 119 49 63 24 19 4 4 0 178 70 151 49 130 40 78 14 29 0 0 108 95 29 170 109 181 142 213 291 176 224 269 99 93 128 52 55 137 146 114 140 71 76 163 138 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 199 190 0 113 130 88 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 14 47 24 79 37 42 17 38 37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 66 26 0 53 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 11 50 21 87 44 128 35 175 83 219 0 0 188 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 17431 14232 19451 7020 7430 18942 20170 14074 17228 7981 8574 18419 15647 21729 8337 17039 7497 14997 6479 6510 2211 4553 4059 19822 8223 20623 7244 22756 8994 22154 5388 22614 9447 24748 33175 30964 24111 29644 25526 27894 21400 23961 29958 23380 25736 27290 1.555 2.496 2.539 2.388 1.867 2.324 1.617 1.825 2.094 2.193 2.132 2.506 1.997 2.044 2.289 2.027 2.299 2.118 2.188 1.994 1.743 1.818 1.649 1.921 2.226 1.926 2.435 1.978 2.109 1.93 2.151 1.863 2.252 1.823 1.968 1.685 1.727 1.924 1.823 1.871 2.241 2.111 2.046 2.238 2.118 1.971 37.5 34.8 34 31 31.5 30 29.5 26.5 26 23.8 23.8 23.3 23.0 73.5 76.5 64.2 63.5 49.3 53 42.8 49 37 31.8 75.5 78 63.8 67.8 55.3 56.3 48.3 51.7 43.5 40.3 38.5 33.3 33.3 41.8 44.8 40.3 48.7 55.3 68.5 71.8 64 68 75.3 85 86 87 1 202 251 0 0 0 38 0 0 0 80 23 0 0 0 2 2 2 4270 29092 27434 1.682 34.5 2.104 58.3 2.004 67.5 Table S3. A summary of the RMSE and inference times obtained by the 5 different regressions (linear, polynomial (degree 2), support vector, gradient boosting and 3 layer neural network) RMSE (°C) Inference Time (µs) Linear Polynomial Support (˚ 2) Vector Gradient Boosting Neural Network 11.6±1.1 25.8±6.6 9.31±3.37 7.24±0.46 8.09±0.80 26.0±1.4 29.5±3.2 156 ± 13 161±9 235±14 Table S4. A summary of the 17 polymers made, including their target cloud point and design along with the obtained cloud point and design (A: EtOx, B: nPropOx, C: cPropOx, D: iPropOx) Target CP (˚C) 37 45 60 80 Cloud point (˚C) Mass Obtained Composition Target Composition Obtained CP ∆ Target Obtained % Error A B C D A B C 34.5 -2.5 13195 12629 -4.3 28 79 0 13 28 78 0 9 34 -3 12191 12546 2.9 20 67 0 33 11 63 0 41 36 -1 10838 9629 -11.2 21 59 0 40 30 52 0 33 34 -3 13875 14523 4.7 26 73 0 21 33 60 0 21 45.8 0.8 7712 6978 -9.5 11 0 14 94 10 0 14 91 43.5 -1.5 10554 12151 15.1 26 0 22 72 21 0 15 79 40.8 -4.2 14496 14694 1.4 35 39 0 46 38 37 0 40 45.5 0.5 7745 8040 3.8 26 0 28 65 24 0 18 72 57.8 -2.2 20035 19901 -0.7 84 0 16 20 80 0 12 23 50.5 -9.5 17111 17541 2.5 63 0 0 57 59 0 0 56 53.5 -6.5 11574 12447 7.5 73 38 9 0 70 39 6 0 56.3 -3.7 9257 8773 -5.2 67 32 21 0 68 34 13 0 70.8 -9.2 11725 10332 -11.9 95 0 26 0 98 0 17 0 74.5 -5.5 18612 17021 -8.5 104 0 17 0 103 0 12 0 76.3 -3.7 13330 13170 -1.2 98 0 22 0 100 0 15 0 74 -6 18975 17629 -7.1 108 0 0 12 104 0 0 11 77.8 -2.2 9079 9536 5.0 107 0 0 13 106 0 0 9 D References 34. E. Baeten, B. Verbraeken, R. Hoogenboom, T. Junkers, Continuous poly(2-oxazoline) triblock copolymer synthesis in a microfluidic reactor cascade. Chem. Commun. 51, 11701-11704 (2015). 35. M. M. Bloksma et al., Poly(2-cyclopropyl-2-oxazoline): From Rate Acceleration by Cyclopropyl to Thermoresponsive Properties. Macromolecules 44, 4057-4064 (2011). 36. S. Funtan, Z. Evgrafova, J. Adler, D. Huster, W. Binder, Amyloid Beta Aggregation in the Presence of Temperature-Sensitive Polymers. Polymers 8, 178 (2016). 37. T. W. Anderson, An Introduction To Multivariate Statistical Analysis. (Wiley, New York, 1958), vol. 2. 38. G. E. P. Box, G. C. Tiao, Bayesian Inference in Statistical Analysis. (John Wiley & Sons, 2011), vol. 40. 39. J. H. Friedman, Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 29, 1189-1232 (2001). 40. T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning. Springer Series in Statistics (Springer, New York, NY, USA, 2001), vol. 1. 41. S. Kirkpatrick, C. D. Gelatt, M. P. Vecchi, Optimization by Simulated Annealing. Science 220, 671 (1983). 42. M. Mitchell, An Introduction to Genetic Algorithms. (MIT Press, 1998). 43. R. Storn, K. Price, Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces. J. Global Optim. 11, 341-359 (1997). 44. J. Kennedy, R. Eberhart, in Proceedings of ICNN'95 - International Conference on Neural Networks. (1995), vol. 4, pp. 1942-1948 vol.1944. 45. J. Schmidhuber, Deep learning in neural networks: An overview. Neural Networks 61, 85-117 (2015).