Machine-learning-assisted materials discovery using failed experiments


Inorganic–organic hybrid materials1,2,3 such as organically templated metal oxides1, metal–organic frameworks (MOFs)2 and organohalide perovskites4 have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table5,6,7,8,9. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative10) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility11, photovoltaic properties12, gas adsorption capacity13 or lithium-ion intercalation14) to identify promising target candidates for synthetic efforts11,15; determination of the structure–property relationship from large bodies of experimental data16,17, enabled by integration with high-throughput synthesis and measurement tools18; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification19,20 or gas adsorption properties21). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on ‘dark’ reactions—failed or unsuccessful hydrothermal syntheses—collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted conditions for new organically templated inorganic product formation with a success rate of 89 per cent. Inverting the machine-learning model reveals new hypotheses regarding the conditions for successful product formation.

Figure 1: Schematic representation of the feedback mechanism in the dark reactions project.
Figure 2: Comparison of experimental outcomes relating to the formation of templated vanadium-selenite crystals, as a function of amine similarity.
Figure 3: SVM-derived decision tree.
Figure 4: Graphical representation of the three hypotheses generated from the model, and representative structures for each hypothesis.

We thank Y. Huang, G. Martin-Noble and D. Reilley for data entry and J. H. Koffer for synthetic efforts. M.Z. acknowledges support for the purchase of a diffractometer by the National Science Foundation (DMR 1337296), the Ohio Board of Reagents grant CAP-491 and from Youngstown State University. This work was supported by the National Science Foundation (DMR-1307801). A.J.N. and J.S. each acknowledge the Henry Dreyfus Teacher-Scholar Award program.

Author information

Authors and Affiliations



S.A.F., J.S. and A.J.N. conceived the project and wrote the paper. A.J.N. supervised the data capture. C.F. developed the web-accessible database. A.J.N. and P.D.F.A. tested the data reliability. J.S. and P.R. developed the reactant descriptors. P.R., C.F. and S.A.F. developed the machine-learning models. J.S. performed diamine selection. P.D.F.A performed the Cambridge Structural Database search. K.C.E., M.B.W. and A.M. performed the hydrothermal experimental reactions, supervised by A.J.N. M.Z. performed X-ray crystallography on the resulting products. P.D.F.A. performed the statistical analyses. P.D.F.A., A.J.N., J.S. and S.A.F. performed the decision-tree calculation and analysis. All authors discussed the results and commented on the manuscript.

Corresponding authors

Correspondence to Sorelle A. Friedler, Joshua Schrier or Alexander J. Norquist.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related audio

Supplementary information

Supplementary Information

This file contains Supplementary Text and Data, Supplementary Tables 1-10 and Supplementary Figures 1-5. Included are tables of descriptor definitions, model evaluation results, a learning curve, synthetic and crystallographic details, packing figures, amine structures and a full decision tree. (PDF 1975 kb)

Supplementary Data

This file contains information on the historical reactions, gathered from historical laboratory notebooks. This was the data used to construct the SVM model described in the manuscript. (CSV 5260 kb)

Supplementary Data

This file contains information on the new experiments that were performed to test whether the model improved upon human strategies ("chemical intuition”), during the course of this study. These reactions were not used to train the model. (CSV 369 kb)

Supplementary Data

This shell script file contains the specific model names and parameters used in the model construction described in Table S5 of the Supplementary Information. (TXT 1 kb)

Supplementary Data

This file contains the full crystallographic details for [C6H22N4][VO(C2O4)(SeO3)]2·2H2O. (CIF 19 kb)

This article is cited by


