Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Next Article in Journal
A Survey of Research on Vibration Friction Reduction Technologies in Aero-Engines
Previous Article in Journal
Enhanced Fire-Extinguishing Performance and Synergy Mechanism of HM/DAP Composite Dry Powder
Previous Article in Special Issue
Regulation of TS-1 Zeolite with Small Particle Size by Colloidal Silicon Seed-Induced Synthesis and Application in Oxidative Desulfurization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Machine Learning in Computational Design and Optimization of Disordered Nanoporous Materials

by
Aleksey Vishnyakov
1,2
1
Aramco Innovations LLC, 119234 Moscow, Russia
2
Department of Physics, Moscow State University, 119134 Moscow, Russia
Materials 2025, 18(3), 534; https://doi.org/10.3390/ma18030534
Submission received: 4 November 2024 / Revised: 30 December 2024 / Accepted: 31 December 2024 / Published: 24 January 2025

Abstract

:
This review analyzes the current practices in the data-driven characterization, design and optimization of disordered nanoporous materials with pore sizes ranging from angstroms (active carbon and polymer membranes for gas separation) to tens of nm (aerogels). While the machine learning (ML)-based prediction and screening of crystalline, ordered porous materials are conducted frequently, materials with disordered porosity receive much less attention, although ML is expected to excel in the field, which is rich with ill-posed problems, non-linear correlations and a large volume of experimental results. For micro- and mesoporous solids (active carbons, mesoporous silica, aerogels, etc.), the obstacles are mostly related to the navigation of the available data with transferrable and easily interpreted features. The majority of published efforts are based on the experimental data obtained in the same work, and the datasets are often very small. Even with limited data, machine learning helps discover non-evident correlations and serves in material design and production optimization. The development of comprehensive databases for micro- and mesoporous materials with low-level structural and sorption characteristics, as well as automated synthesis/characterization protocols, is seen as the direction of efforts for the immediate future. This paper is written in a language readable by a chemist unfamiliar with the data science specifics.

Graphical Abstract

1. Introduction

Nanoporous materials find multifarious applications in separations, catalysis, gas and liquid storage, insulation, the semiconductor industry, energy storage, and so on. Their pore sizes range from angstroms to 100 nm, and their porosity may reach 99%+ [1]. The body of published results related to the synthesis and characterization of porous materials is really vast, and the methods are reasonably standardized and well developed, which prompts the employment of this knowledge for the data-driven design of materials tailored for specific purposes. The stochastic optimization of potential zeolite structures by Eral and Deem [2] is a pioneering work in the data-driven design of porous adsorbents: the authors aimed at predicting new (“potential”) zeolite structures based on the structures of known ones using Monte Carlo (MC) optimization. Although the machine learning (ML) term was not mentioned, the optimization did not involve any calculations of specific chemical interactions. The only target was the structural similarity of the elements of the suggested structures to those of existing zeolites. That is, the new zeolite frameworks were constructed using the structural features learned from the knowledge database. The optimization net caught fishes of different kinds: despite a multitude of chemically unstable structures, existing frameworks that were not in the training database were also recovered, and some of the suggested structures were synthesized later.
The approach was based on the crystalline nature of the targeted adsorbent, from which the structural features could be readily constructed by expert chemists. The discovery of metal–organic frameworks (MOFs [3]), covalent networks (COFs [4]), and regular porous polymer networks (PPNs [5]) created a (relatively) vast pool of structures that could be readily classified into categories, interpreted in terms of tangible descriptors, unambiguously characterized, and compiled in databases, with structural characteristics and physico-chemical properties listed, including adsorption. The lion’s share of all applications of data-driven methods to porous materials target these groups of crystalline solids. And even “overcoming data scarcity” is addressed, mostly in areas where data are comparatively abundant or straightforward to generate [6]. The crystal structures of synthesized compounds have been collected in several databases, general [7,8,9,10] or material-specific [11,12,13]. These studies have been described in numerous recent reviews [14,15,16,17,18,19] and are not the target of this paper.
At the same time, a substantial effort has already been invested into the ML-assisted design, characterization and optimization of porous materials that are disordered in at least one aspect important for their function. That is, a disordered mesoporous silica with walls built from a zeolite [20] has ordered microporosity but disordered mesoporosity (here, we apply “micro-, meso- and macroscale” as commonly accepted in adsorption science, with micropores ranging from angstroms to 2 nm, mesoscale porosity below 100 nm, and macroscale is everything larger than that [21]). Disordered porous solids form numerous dissimilar groups, and despite an enormous body of experimental and modeling results on their structure and properties, the results are not easily interpreted with clear descriptors. The chemical space of microporous crystals and templated ordered mesoporous materials [22] is discrete. The space of disordered porous materials is continuous, and so is the space of their chemical and physical properties. All descriptions of disordered porous materials that the author is aware of are ill posed. Many different structures can be proposed that have the same (within experimental/modeling errors) porosity, density, specific surface area (SSA), X-ray diffractogram (XRD), acoustic properties, etc. At present, these navigation problems are making disordered materials not very attractive to data scientists despite their practical importance. For example, aerogels have way more industrially important applications than all MOFs, COFs, and PPNs combined.
Goals for the data-driven design and optimization of irregular porous materials: The focus of this review is the computational design and optimization of the synthesis and characterization of disordered porous materials. Designed are, obviously, the pore structure and the properties of the solid phase that comprises the structure. The goal is finding an available structure that provides for the desired properties. Optimization rather applies to the initial reactant formulation or to the conditions of synthesis needed in order to achieve the desired structure and therefore the desired properties as closely as possible. Therefore, most efforts reviewed have aimed at either obtaining a particular material or, sometimes, the optimization of the manner of production, although direct industrial applications of ML to heterogeneous porous solids are still to be found.
The main directions of the efforts outlined in the literature: There are several general reviews on the data-driven discovery and optimization of porous materials that also mention disordered solids. Beyond general expectations (e.g., [23,24]) and interesting methodological developments [25,26], a recent collection [27] gives a broad overview of the main pathways to apply data-based techniques (stand-alone or in conjunction with physics-driven modeling) to heterogeneous porous materials. In particular, the following directions of effort are suggested as potential breakthrough areas for the next ten years (out of these ten, three years have already passed, and for data science, three years is a long time):
(A)
Discovering the governing equations with ML [28] (Figure 1). Active ML can “interrogate” complex processes, which is “particularly useful for the analysis of highly heterogeneous, anisotropic materials, where idealized descriptions often fail”. Multi-dimensional, ill-posed problems are where machine learning often excels. Dealing with the uncertainty of the material’s structure remains a major challenge that seems unsurmountable at present: one must deal with many different realizations of the same process using different possible material structures that fall within the same description, and somehow ensure that the structure chosen for modeling is representative (computational costs are disregarded for the moment). The alternative is a physics-based homogenized model, which devalues the entire data-driven approach. Positive examples mentioned by D’Elia [28] do not involve heterogeneous porous materials. Unfortunately, in the current review, we have not found any papers that seem promising in terms of discovering the governing equations.
(B)
Data-driven acceleration of simulations in complex, multi-scale porous media [34] is certainly a promising and already fruitful direction. The main goal of ML is to optimize material structure/chemistry, etc. Due to difficulties in generalizing simulation results due to data navigation challenges, we need fast and computationally efficient tools to simulate processes of interest in various structural realizations. Surrogate data-based models outperform traditional ODE/PDE solvers in many cases, from fluid flow [35] to chemical kinetics [36]. Multi-physics phenomena involving drastically different spatial and temporal scales are especially problematic. At smaller scales, ML can reasonably replace physics-based modeling. Furthermore, the rapid development of physics-informed ML (PIML) [37] allows for the integration of physics and data-driven approaches. PIML is expanding to new types of systems and media, and its scope is already wide enough to allow for the design and optimization of disordered porous materials.
(C)
Machine learning structure–property relationships [38]. This field has already been well established. Numerous studies have focused on predicting the transport and mechanical properties of disordered porous materials using microscopic images. However, current practices show that these predictions are usually specific to a particular material, and the results obtained for one group of materials can’t be generalized to other groups, making it challenging to develop a more general approach. Physics-based modeling, such as Lattice-Boltzmann (LB) simulations for permeability [39] and finite element analyses (FEM) for elastic moduli, often serve as ground truth for these predictions. Nevertheless, obtaining representative samples for learning databases remains challenging. For example, rock physics data models are often based on a small number of samples that are cut and shifted using a sliding window technique to increase the training set size.
There are other challenges unnoticed in the previous reviews:
(D)
ML of property–property relationships. That is, the properties that can be easily measured are related to those that are difficult to determine experimentally or costly to model. For instance, the reconstruction of physical fields from real-time observations at a few locations has a long history (a brief review can be found in ref. [40]). Additionally, PIML enables the relationship between different physical fields to be established. This review contains a few useful examples.
(E)
Linking the initial formulations and synthesis condition to structural parameters of the resulting materials. The review below demonstrates that this is the most active area of application of ML techniques to disordered nanoporous materials at present (as expected, given that such relationships are complex, nonlinear, and almost impossible to model based on chemistry/physics alone).
Directions (A) and (B) explicitly rely on a thorough understanding of the structural properties of the material and the physical processes involved. Directions (C) may use physical models or be based entirely on data. Property–property relationships (Challenge D) are more difficult to establish compared to structure–property relationships, but properties can be understood through physical models of corresponding processes for dimensionality reduction. Materials synthesis (Challenge E) is commonly too complex, and data-based models summarize the available empirical knowledge.
This article does not target giving a “bold broad view” on the perspectives of ML application in the computational design of heterogeneous porous materials, but rather analyzes the current practices and compares them to the previous expectations. It more focuses on materials synthesis than on porous media in general (recent review [41] or specific applications (e.g., ref. [42,43,44,45,46]). It aims to review:
(i)
general approaches: what exactly is targeted currently and why (research driven by practical needs vs. research driven by methodological interest)
(ii)
material types
(iii)
data sources
(iv)
ML methods and optimization techniques applied in the studies of interest
(v)
achievements and problems
To the best or our knowledge, the efforts tackling challenges (D,E) in disordered porous materials have never been reviewed. Special attention is paid to the relations between the data-based models and physic-based models as described in Figure 1.
The paper is structured as follows: Section 2 gives an overview of ML application in nanoporous materials characterization; Section 3 is devoted to active carbons, Section 4 describes microporous polymers used in gas separation, Section 5 considered mesoporous oxides, aerogels, etc. Section 6 summarizes the review; Appendix A is a table that summarizes the current literature; Appendix B contains a glossary of ML methods applied in materials characterization, design and optimization, tailored for a chemist unfamiliar with data science.

2. ML in Characterization of Disordered Porous Materials

Facile and inexpensive analysis of the resulting porosity is essential for optimizing the synthesis of porous materials. ML based techniques are mainly used to reconstruct a 3D digital map of the porous matrix, primarily in binary and multiclass segmentation. Computer tomography (CT) as an inverse problem was reviewed quite recently [47] and generally is not specific to nanoscale, as well as the techniques for the evaluation of material macroscopic characteristics from images, which will be considered elsewhere. The most common method for 3D reconstruction of nanoporous materials is focused ion beam (FIB)-scanning electron microscopy (SEM). FIB destroys the material layer by layer and SEM provides a series of consecutive cross-sectional images of the 3D sample. However, the segmentation of these cross-sectional images is complicated by the “shine-through” effect, where deeper layers affect the image. Traditional thresholding methods [48] often lead to unsatisfactory results.
In supervised machine learning, the algorithm is trained “by example”. Fager et al. [49] in their FIB-SEM characterization of ethyl cellulose/hydroxypropyl cellulose films with submicron pores, generated 3000 × 2000 pixel images. The authors manually segmented 200 256 × 256 pixel fragments of consecutive layers. Since the segmentation depends on the top layer and several neighboring layers, the authors used five neighboring images as input for the segmentation of the current top layer. To take into account the surroundings of each point, each image was convolved with Gaussian filters with standard deviations varying from 1 to 128. Together with the raw top image, this resulted in 89 features. A random forest (RF) decision tree ensemble (DTE) technique was employed to learn from the manually segmented images and segment the other layers, thereby enabling pore reconstruction. The ML approach addresses the issues of subsurface and grayscale intensity overlap in the image segmentation for 3D reconstruction. Čalkovský et al. [50] explored the significance of image contrast in ML-based FIB-SEM image segmentation, also applied to nanoporous polymer structures. The study demonstrated the superior performance of ML-driven techniques compared to traditional segmentation algorithms.
A very challenging material, hierarchical nanoporous gold (HNPG), was studied by Sardhara et al. [51,52]. HNPG forms a disordered porous structure at the submicron scale, but the gold filaments themselves are porous on the 10+ nm scale. Manual segmentation of this material is time-consuming, as a large number of images must be segmented in order to collect a sufficient dataset for such a complex structure. The authors (1) obtained SEM images at different accelerating voltages (Figure 2), (2) created in silico porous structures similar to HNPG, and performed MC simulations of SEM on the synthetic structures. For the simulated structures, the ground truth is known, and an ML model was trained on the synthetic dataset. The simulation was based on solid principles: it was shown theoretically [53] that the microstructure can be described by a superposition of several wave vectors with the same wavelength but different random orientations. The principles of SEM are well-established and simulators are widely available. Noise and blur were added to enhance the resemblance between the simulation results and actual SEM images. Convolutional neural networks (CNNs) were trained using sliding-window techniques applied in two or three dimensions. Patches of 64 × 64 pixels were extracted and used as the final training set for the CNN. The number of neighboring slices taken into account ranged from three to nine. The CNN outperformed the traditional Otsu segmentation algorithm [54]. The authors concluded that the ML model successfully addressed the shine-through issue. Later, the same group optimized the number of layers required [55] for accurate reconstruction and analyzed the role of thickness quantification in the reconstruction of hierarchical nanoporous materials using FIB-SEM [56].
The applications of ML in characterization are not limited to images, and can be applied to various ill-posed inverse problems, such as pore characteristic reconstruction [57]. For example, Pietrow et al [58] suggested an ANN-based method for pore size distribution (PSD) calculation from positron lifetime distributions, which is a common technique for PSD. Good agreement between ML-based and traditional methods was obtained.
Overall, ML-based methods have the potential to solve a wide range of inverse problems in the interpretation of various types of spectroscopy data. However, their application to irregular nanoporous materials is currently limited to image processing. Reliable understanding of the formation of the pore structure allows overcoming the data scarcity problem via usage of synthetic datasets.

3. ML in Design, Optimization, and Screening of Active Carbons

Activated carbon is a microporous material that has numerous applications in gas separation, storage, medicine and environmental protection. In the petroleum industry, carbons are important for water treatment and gasoline vapor recovery [59]. Active carbons are usually derived from organic waste products such as coconut husks and wood chips. Through pyrolysis, organic material is converted into charcoal, which is then activated to create a highly porous adsorbent. It is assumed that the active carbon structure resembles crumpled paper sheets or semi-random stacks of thin graphite plates, as shown in high-resolution images [60] and simulations [61]. The carbons typically have a high available surface area and are characterized by their PSD (extracted from adsorption isotherms [62]), crystallinity, and impurities in the form of metals and other elements. Mesoporous carbons are also available. They are produced by using mesoporous silica as a template: the pores are filled with organic liquid, which is them polymerized and subject to pyrolysis, after which the silicious matrix is dissolved [63]. A vast amount of experimental data has been generated on carbons, including information about their original organic material composition, final product composition, X-ray diffractograms (XRD), adsorption isotherms, and breakthrough curves, etc. Although attempts to utilize this wealth of knowledge are still in their early stages, progress is being made. A specific overview of ML applications for carbons in water treatment can be found in ref. [41]. Removal of organic pollutants using carbons is discussed in ref. [45], while ML screening procedures for water purification are covered in ref. [46]. A very recent study on this topic was reported in ref. [64]. Here, we focus on other applications, but we also pay attention to some important methodological papers mentioned in previous reviews.
Because the pathways of active carbon synthesis are fairly standard, but involve many parameters, the relationship between synthesis and properties (challenge (E)) is a natural target for ML models. Here, ML helps in systematically choosing the raw materials and synthesis pathways for a particular application, involving maximum abstraction through entirely data-driven approaches (right column, Figure 1). Wang et al. [65] related the chemical composition of the source biomass, production operating parameters (organic source/activation agent mass ratio, carbonization time, carbonization temperature, activation time, and activation temperature), to the resulting characteristics of the carbons: SSA (estimated with the Brunauer-Emmett-Teller (BET) model [66] for multilayer adsorption on a nonporous surface), and yield. They created a database of more than 1500 different carbons using both chemical activation (baking with H3PO4, KOH, and ZnCl2 at 500–700 degrees Celsius) and gas activation (CO2 + steam). DTEs were employed to establish useful correlations. The hydrogen content in the original biomass was found to make a significant impact on the yield of gas-activated carbon, and the mass ratio has a strong influence on the SSA for chemically activated carbons. Unfortunately, the use of BET to estimate SSA is not a good choice for microporous materials [67].
Very recently, Chang and Lee [68] collected a set of 108 chemically activated carbons from their own studies and linked the synthesis procedure to the iodine adsorption number (IAN), one of the commonly accepted characteristics of granular carbons [69]. The activating agent, agent to biochar mass ratio, activating temperature and time were all treated as categorical features. Different ML regressors (artificial neural networks ANN, support vector machines SVM, DTEs) were applied in building the correlations, ANN showed the best performance. Using all features as categorical improved the precision score but obviously hindered the interpretation: all qualitative recommendations on synthesis procedures to improve IAN were not actually ML-based. In a somewhat similar effort, Lai et al. [70] collected a set of 46 carbons from the literature and related the yield and the SSA. While linking the biomass characteristics and the synthesis procedure to the results is important, a model with seven (which were not the same for the yield and SSA predictions) features to describe 46 samples can hardly be considered as predictive. Another similar effort with straw-derived carbons was described in ref. [71].
Li et al. [72] studied the relationship between the biomass characteristic and the synthesis conditions on one size and the characteristics (SSA, pore volume, micropore volume, nitrogen content) of nitrogen-doped carbons on the other side. Interestingly, the biomass source and composition turned out to be much less important than the activation chemicals and conditions. DTE based ML model performed very well. Wang et al. [73] reported ML-assisted screening of oxygen-rich porous carbons for aqueous supercapacitors. The carbons were derived from synthetic polymers, rather than from the biomass. Another attempt to relate the synthesis parameters to capacitance was reported by Yang et al. [74].
ML-assisted predictions of structure–property and property–property relationships for carbons has also attracted recent attention [75,76]. For example, Kusdhany and Lyth [77] used ANN to explore hydrogen uptake in microporous carbons. They composed a set of 1700+ data points for 68 samples at 77 K and different pressures. A problem in the approach is mixing up the parameters explicitly controlled (e.g., composition) with dependents (such as the BET SSA) in the feature set and including hydrogen pressure in the feature space (Figure 3).
Davoodi et al. [78] collected 2072 data points for H2 sorption by 68 active carbons. The authors paid specific attention to the element composition and ultramicropore presence. Unfortunately, the methodology shares some of the problems of ref. [77] and concluded that the pressure was the most important feature that influences hydrogen adsorption. Similar surrogate models were reported for other sorbates [43,79,80,81].
Kowalczyk et al. [82] predicted adsorption of paracetamol by active carbons from the pore size distributions using density functional theory (DFT), MC simulations and ML. The authors modelled paracetamol adsorption in carbon pores of different sizes with MC simulations, which allowed them to calculate the paracetamol sorption isotherms in any carbon with known PSD. Then the authors analyze the PSDs of actual carbons and select total SSA and the SSA formed by wide micropores and mesopores as the features that allowed a reasonable prediction of paracetamol sorption. Nanoporous carbon beads with a high surface area of supermicropores (997 m2/g) and mesopores (628 m2/g) had the highest adsorption capacity for paracetamol. While the ML itself in that paper hardly reveals anything that could not have been foretold, the work is methodically important, because it is physics-based and interpretable. The authors analyze complete adsorption isotherms, take into account the multimodality of PSDs and directly relate the performance to the adsorption mechanisms explored by MC simulations. An interpretable approach can be easily extended and is much more likely to guide synthetic effort comparable to a black box ML model fitted to a database of inconsistent characteristics.
CO2 sorption capacity of carbons derived from rice husk was analyzed by Palle et al [83]. Although CO2 at ambient conditions primarily adsorbs in micropores [84], the authors needed micropore volume, mesopore volume and SSA to fit the experimental data on CO2 adsorption.
Mashhadimoslem et al. [85] constructed a data-based model of sorption of N2, O2, and N2O on activated carbons and carbon molecular sieves. Instead of the data adsorption itself, the authors chose the parameters of the Langmuir adsorption model with two types of adsorption sites as the learning target. The main advantage of this approach is that adsorption properties of a sample are predicted from properties of other samples: points for the same sample are not a part of the training database. It also reduces the dimensionality of the problem, which is important with smaller datasets. There was a price to pay for that: the predictive ability is restricted to the Langmuir adsorption regime. The dataset in the paper in question [85] was not sufficient for significant outcomes.
Personal protection from various poisonous compounds and pollutants vs. historically one of the first applications of activated carbons. Koyama et al. [86,87] in a well-designed study, collected from the literature around 400 breakthrough curves for various pollutants on eighteen different granular organic carbon materials. Using DTEs, the authors successfully predicted early breakthrough times and related them to easily measurable properties of granulated carbons, such as the air-hexadecane partition coefficient, hydrogen bond acidity, and the dissolved organic carbon concentration in the influent water. In total, seventeen different properties were analyzed. Although the authors referred to the properties that correlated most with the breakthrough time as “top drivers”, the work is actually focused on the predictions of property—property relationships (Figure 4).
Zhou et al. [89] examined the correlation between carbon pore structure and electrical double-layer specific capacitance. They collected 70 active carbon samples from the literature. Micropore and mesopore SSA were used as features, and capacitance was the output. The scanning rate varied from 1 mV/s to 100 mV/s, with the capacitance predicted separately for each rate. At a rate of 1 mV/s, total SSA was the main factor determining capacitance, while at a rate of 100 mV/s, only mesopore area was significant.
Overall, although the results of ML applications to the structure and properties of carbon materials are still limited, there are promising trends: databases are being developed and a significant number of ML applications have clear objectives. Although ML optimization of active carbon synthesis has not led to any major breakthroughs, the statistical analysis of the role of original raw materials and synthesis processes provides a solid basis for material selection and screening, particularly for non-experts in carbon synthesis and analysis.
The published studies emphasize the importance of understanding the physical principles behind the problem and low significance of the choice of the ML techniques and the precision scores achieved. A number of papers have collected adsorption measurements for specific system types and built surrogate models to predict sorption capacity. By using points from the same isotherm for both training and testing, the precision scores are increased, creating the illusion of a predictive ability. However, any ML model within this framework is actually inferior to a spline interpolation of the isotherm, and has little to do with the actual challenges of predicting active carbon properties.
Several papers demonstrate, that reducing the “level of abstraction” (Figure 1) and employing semiempirical physics-based models as intermediates is an efficient way of dealing with data scarcity. As the problem dimensionality reduces, generalization ability increases. Potential problems are (1) making sure that the model does describe the underlying physical mechanism (2) making sure that the model does not merely hide the low precision of the measurements. For example, in ref. [86], a number of theoretical breakthrough curves were fitted to extremely scattered data. Accounting for the uncertainty of the such data in the models is a separate challenge.
Finally, the inclusion of a substantial volumes of raw data into databases is essential for further progress. XRD data and N2 adsorption isotherms at 77.4 K are measured practically for every new carbon synthesized. Scientifically sound and consistent characterization of the materials a key to fruitful ML of structure–property relationship and hopefully will facilitate discovery of new relationships. PSD published in the literature are obtained with different methods and often incomparable. Obtaining consistent and interpretable pore sizes is not very straightforward even with MC and DFT kernels [90,91]. The review shows, that despite IUPAC recommendations [92], the characterization seriously limits the applicability of data-based approaches to active carbons.

4. Molecular Design of Microporous Glassy Polymers for Gas Separation Membranes Using Generative Neural Networks

Microporous polymers present another type of irregular porous solids. Strictly speaking, microporosity is generally found in glassy polymers [93], but the volume fraction of micropores is low and they are isolated. Polymers with long and rigid segments have, due to packing difficulties, a higher and often continuous microporosity that makes them permeable to gases. Their main applications are gas separation: due to the specifics of the structure, the microporous polymers have different permeability to different gases [94,95]. The selectivity towards gas A vs. gas B is the permeability ratio, which is not the same as the sorption selectivity but, very roughly, the product thereof and the diffusion coefficient ratio. The trade-off between the membrane selectivity and permeability is known as the Robeson upper bound [96] for a given gas pair. The ideal membrane polymer is one with as many pathways passable by molecular A but impassable to molecule B as possible, which can be interpreted in terms of PSD and pore throat distributions.
Because of the practical importance, in particular to gas industry gasping for efficient CO2/CH4+C2H6/H2S separators, microporous polymer membranes are well studied, and a substantial experimental learning base is available. Unlike carbons, the MPs are determined by their chemical formula. This radically change the approaches to in silico design, which can be carried out as a search for the best monomer in the chemical space, somewhat similarly to the modern methods of computational drug design [97]. Still, due to very slow relaxation [98] of the polymer chains, the properties of the material depend also on manufacture process rather than on the backbone chemistry only.
Barnett et al. [99], compiled a database of solubilities and permeabilities for 700 different polymers to several common gases (CH4, CO2, N2, O2, H2, and He). The polymers were composed of a series of standard building blocks (the selection of suitable monomers is not actually very wide). GPR was trained and tested using this database, which was randomly split into two groups in a standard manner. The model was then used to predict the permeability of 11,000 other polymers, which were randomly generated from the same building blocks (no systematic search was conducted). One of these candidate structures was synthesized, and it showed a high selectivity for CO2 over CH4, with a reasonable permeability above the Robeson bound (see Figure 5). Later, Yuan et al. [100], published a more extensive study using DTEs.
Yang et al. [101] attempted a search in the chemical space by modifying the SMILES string of the monomer and using a generative DNN. This approach proposed a wider pool of candidate structures. The authors used two methods to interpret the monomer’s structure: Morgan fingerprint with frequency (MFF) [102], which captures the frequency of chemical substructures present in molecules (114 features), and standard chemical descriptors for the polymer repeating unit, such as gyration radius, eccentricity, and asphericity (146 features). These “fingerprints” were used as features, and the permeabilities were the targets for training. The dataset contained 778 polymers with at least one measured gas permeability (335 unique monomers). The authors also compiled several datasets for which predictions were made. One of these was similar to the training set. Another consisted of 8+ million polyimides. The third was a set of 1,000+ ladder polymers [103]. RF and DNN models were trained on the training set and applied to the candite sets. The authors identified a number of candidate structures that were expected to exceed the Robeson tradeoff upper bound. They explored selected candidates using MD simulations, which for the most part agreed extremely well with the ML predictions. The workflow and selected results are shown in Figure 6.
Basgodan et al. [105] related the permeability to selectivity ratio to the glass transition temperature Tg. using DTE-based approach. Very recent efforts by Jia et al. [106], Xu et al. [107], Cheung et al. [108], and Chen et al [109] feature generative DNN for the search in the chemical space of monomer structures and input from molecular modeling in the optimization algorithm. For example, in ref. [108] the molecular structures are obtained by energy minimization and then characterized using topological descriptors and rigidity. Molecular modeling links molecular structure of the polymer to the pore structure of the polymeric membrane material. It also makes the outcome more interpretable, which is important in molecular design.
Very recently, Glass et al. [110] considered an effect of positively charges amine coatings on the performance of polymer membranes for water purification. This work is rather similar to the applications of ML to carbon optimization reviewed in Section 3. Chemical agents applied in membrane functionalization, their concentration and pKa were chosen as the independent features, while pure water permeation and zeta potential were the output characteristics. All experiments were performed in the same paper.
In summary, data-driven computational design of the microporous polymeric membranes has already contributed to shifting the Robeson bound upwards. It is also important to note that the polymers that were predicted to have superior selectivity and permeability have actually been synthesized, and the validity of these predictions has been proven (often, but not always [106]). Compared to carbons and other irregular nanoporous materials, the in silico data-driven design of microporous polymers enjoys a clear pathways for navigation. Although the experimental pool of available results is not large, a strong connection between the chemistry of the backbone and the performance metrics that are clearly established radically facilitate the application of data-driven methods. Molecular simulations aid in the characterization of monomers and the interpretation of results. An interesting future development could be a full-scale “molecular” design, where the shape of monomers is considered in a coarse-grained fashion using MD and related to pore structure. Databases of simulation results from coarse-grained models can easily be created, linking the structural elements of the polymer and the resulting porosity. Polymer chemistry can first be optimized in silico using crude models to obtain a reasonable PSD and throat distribution. Then, an atomistic representation can be employed to obtain shapes based on the results from the first stage. The complexity of the chemical space and the non-trivial relationship with performance make data-driven approaches essential for optimizing microporous polymer membranes.

5. ML in Synthesis and Optimization of Disordered Mesoporous Materials

Mesoporous oxides have been well studied experimentally, and attempts to apply this knowledge to their design, screening, and optimization have been published in recent years. For instance, Sun et al. [111] prepared 77 samples of mesoporous alumina—silica ceramics using active carbon as a pore former, which was burned out after synthesis. The properties (mainly density) were then correlated with the initial mixture composition using several ML methods (k-nearest neighbors, KNN; support vector machine, SVM). Ten-fold cross-validation and random sampling techniques were employed to divide the dataset. While ML was introduced as a tool for porous ceramics design, it remains unclear how it actually helped. Wang et al. [112] synthesized 36 nanoporous silica xerogel samples using a segmented continuous flow reactor. The reactant mixture composition and synthesis process were adjusted to optimize yield and the concentration of silanol groups on the surface. The results were analyzed using SVM, which helped identify the reactant composition and synthesis conditions that maximize and minimize the diversity of the silanol group. The silanol group diversity influences the catalytic properties (see Figure 7).
Aerogels are highly porous materials synthesized by extracting the liquid component of a gel by means of supercritical drying or freeze-drying. This results in a continuous solid matrix (usually formed by a network of roughly spherical bodies) that occupies only a small fraction (0.1 to 10%) of the total material volume, with the rest being the pore space. The matrix materials widely vary, from inorganic oxides (SiO2, Al2O3, TiO2) to carbon. Due to their broad application range [113], ML-assisted design and synthesis of aerogels attracted substantial attention [114,115,116,117,118].
Younes et al [115] applied principle component analysis (PCA) to experimental results on ion removal from water with aerogels of different skeleton origin (cellulose, chitosan, graphene) and with different additives (such as MOFs). The database was limited to a dozen+ of samples explored experimentally [119], to which some other properties were added from the literature. The authors concluded the different samples were likely to perform approximately equally in ion removal with no feature (such as specific area or porosity) playing a critical role. PCA did not reveal much not seen by unaided eye, obviously because of the dataset size.
Tafreshi et al. [116] collected the details of the precursor (polymer, linker) chemistry and synthesis procedures for 60 different polyimide-based aerogel samples and attempted to relate the precursor chemistry and polymer concentration in the reactant mixture to the properties of the resulting material (such as density and porosity). They obtained very reasonable predictions using artificial neural networks (ANN). Considering a dataset of sixty samples and six easily controlled parameters, these good predictions are not surprising. However, the chemistry-structure relationship is important for future materials screening and optimization. Goodarzi et al. [118] collected chemical and structural data (aerogel density, solid matrix density, thermal conductivity, and pore scale) for 296 aerogels made from phenol formaldehyde, silica, polyvinyl chloride, resorcinol formaldehyde, polyurethane, and polyimides. They then used supervised machine learning methods to relate these parameters to the thermal conductivity of the aerogels. The results showed that the properties of the matrix material dominated the overall thermal conductivity. Gaussian process regression and nearest neighbor algorithms performed better than KNN. Han et al. [117] focused on the mechanical properties of concrete with aerogel additives. Due to the high uncertainty in the chemical composition of these materials, the authors considered both the precursor composition (e.g., water/binder ratio) and manufacturing procedures (aerogel and silica fume replacement rates) as features. A set of 660 data points was used for cross-validation and hyperparameter optimization through grid search. The water/binder ratio (as expected) was the most significant factor in determining the mechanical strength, while the aerogel replacement rate had the largest impact on thermal conductance. Ensemble learning, which is easier to interpret, performed better than traditional data analysis methods.
Rege et al. [120,121] applied purely computational approach to explore the structure–property relationships in aerogels. The aerogel formation was simulated as diffusion limited aggregation of spheres with sphere radius, concentration, seed step and walker step as the input parameters. DLCA lead to fractal structures characterized by fractal dimension and mechanical properties evaluated using finite-element modeling. The inverse problem of determining parameters that lead to the desired properties was solved via active learning. The fractal dimensions were substantially higher than typical to DLCA [122].
Finally, ML based optimization of aerogel applications also deserves mentioning. Zhou et al. designed an aerogel glazing system for thermal insulation in buildings [123]. Optimization was performed with a surrogate model based on supervised learning that replaced solution of thermal conduction PDEs and greatly accelerated the multivariant optimization. Uncertainties were estimated with Markov MC simulations. Later, the same authors introduced an ANN-based heuristic teaching-learning-based algorithm to optimize aerogel glazing systems using experimental observations from several different locations [124].
Overall, the effort invested into mesoporous materials is relatively weak compared to the microporous ones. ML applications to mesoporous materials are still rare and far between, but worst of all, ML applications often lead to unclear outcomes (which we also found for carbons, Section 3). Practically everywhere, the training dataset is obtained in the same work and is of very limited size. The breakthrough may be envisioned in the area of database building (including structural properties like SAXS|SANS and adsorption isotherms at standard conditions) and the development of fast ML-assisted simulation tools, which is presumably easier for, say, aerogels than for irregular microporous solids. The other way to data-driven design improvement is robotized fabrication, very recently demonstrated by Shrestha et al. [125]. The authors reported an automated optimization of MXene [44,81,126]—aerogel composites powered by active learning (Figure 8). An automated pipetting robot was operated to prepare 264 mixtures of Ti3C2Tx flat sheet MXene, cellulose, gelatin, and glutaraldehyde at different loadings. After freeze-drying, the aerogels structural integrity was evaluated to train a support vector machine classifier. Using the samples as the initial leaning base, 162 more samples were synthesized in eight iterations, after each of which the new samples were added to the dataset. An ANN-based model for prediction of the resulting properties from the initial composition and synthesis conditions was constructed. In general, aerogel-based composites are attractive targets for ML based optimization [114,127] due to their complexity (compared to the traditional aerogels) and multitargeted applications (electric conductance in conjunction with thermal insulation). Active learning based robotized optimization is where data-driven approaches should excel, because a deterministic algorithm is very difficult to build and tune beforehand. Active research efforts should be expected in this area in the near future.

6. Conclusions

State of the art. The paper reviews the current practices in ML application to irregular porous materials. It discusses the system types, goals, datasets, methods, achievements, and problems. The situation differs across different material classes. One important aspect is the lack of large established datasets that encompass the chemical composition, structural parameters, and properties of disordered porous materials or even separate classes of such materials. Some recent studies have started to develop seeds for future large datasets. With several exceptions, mostly dealing with microporous materials, ML methods are either (1) applied to small datasets of experimental data originating from the same work (2) applied to datasets obtained with simulations, with structures built with generative algorithms. Both groups contain a few good examples of data-driven design and optimization of porous materials. However, it is clear that at the moment, the data-based methods do not work up to their full potential.
Path forward. There are three obvious directions of improvement (i) database construction (ii) development of computational methods (iii) automated production and testing. Database construction is the primary direction for micro- and mesoporous materials like active carbons and mesoporous inorganic oxides. Widely available are X-ray/neutron scattering data, standard sorption isotherms, surface characteristics related to hydrophilicity (surface acidity for carbons [128]), sometimes adsorption selectivity for standard mixtures like CH4 + CO2, breakthrough curves and so on. Standard adsorption isotherms (N2 at 77 K, Ar at 87 K, CO2 at 273 K) allow PSD calculations. It is very important to have the original isotherms, from which PSD and SSA can be calculated in a reasonable and consistent manner. PSDs and SSA obtained in different papers are often inconsistent and sometimes obtained with methods not applicable to the particular material. The characterization data can be linked to the synthesis conditions and the performance (storage, separation, breakthrough, sound absorption, etc).
Tools for cost estimation would significantly strengthen the industrial impact of ML in porous materials discovery and optimization. As always across the field, solutions should be different for different material types. The performance of carbons, microporous polymers for gas separation, and aerogels improves in small increments, and ML is hardly expected to change that. Massive production promises substantial economic gains even from small performance gains, and the cost of the new material plays an important role. Approximate cost evaluation should be embedded in the screening. The first steps towards cost estimation have been made already (e.g., ref. [101], which considers availability of synthesis, see Supplementary Info). The cost estimation is quite different from that for the computational drug design [129] purposes. Finally, automated production and testing of 3D structures is supposed to excel, especially in conjunction with active learning algorithms, and that has been demonstrated already [125]. Robotized synthesis is important both for database development and as the area of ML application to materials, since data-driven methods have become indispensable in optimization of complex processes with unclear interrelationships between the parameter.
ML methods. Only two ML tasks: (i) generation of new monomers during the search in chemical space of polymer backbones (ii) generation of model porous structures are computationally challenging and require the heavy machinery of deep DNN for disordered porous materials design and optimization. In most cases, the current datasets are too small for any advanced learning method to make a real difference. With some exceptions, as of today there is not too much to do for data science professionals. In a number of cases considered here, a multivariate regression of explicitly defined parametric form would probably work just as well. Even the best scenario of database development in the near future is unlikely to create truly challenging problems. Data-driven acceleration of computations with surrogate models, PIML or machine learned potentials in particle-based simulations [130] is another area, which is beyond the scope of this review. No matter how promising this direction is, most of the current efforts are still limited to demonstrative purposes and warrant some level of caution. At least for now, development of ML methodologies does not seem a direction of a potential breakthrough in data-driven design and optimization of disordered porous materials.

Funding

This research received no external funding.

Institutional Review Board Statement

Approved for distribution by Saudi Arabian Oil Company.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The paper written solely by the author without any guidance from the employer Aramco Innovations LLC.

Appendix A. Short Summary of Select Published Papers on Data-Driven Design and Optimization of Disordered Porous Materials

MaterialPorosity TypeGoalDataset (Source & Size)ML MethodsRef.
Active carbons
Granular active carbonsToxic chemicals sorption in disordered microporosity, breakthroughRelating early breakthrough time to easily measurable carbon characteristics air-hexadecane partition coefficient, hydrogen bond acidity400 breakthrough curves for 18 carbons from literature, various pollutantsDTE[86]
Granular active carbonsDisordered microporosityBuild correlation between source biomass, activation chemical, synthesis procedure and IAN108 carbons, all sythesized in the authors labANN, DTE[86]
Activated carbonsDisordered microporosityBuild correlation between source biomass, synthesis procedure and yield + BET SSA1500+ carbons from literature, all literature dataDTE[65]
Activated carbonsH2 adsorption in disordered microporosityChoosing best carbons for H2 adsorption, optimization of carbons1700+ data points on H2 sorption on 68 carbons at 77 K and different p, features include composition, BET surface area, literature dataANN[77]
Activated carbonsH2 adsorption in disordered microporosityChoosing best carbons for H2 adsorption, optimization of carbons2072 data points on H2 sorption on 68 carbons at 77 K and different p, features include composition, BET surface area, literature dataDTE[78]
Activated carbonsdisordered micro- and meso-porosityExamining the correlation between the electric double layer capacitance and the pore structure of active carbon70 carbon samples from the literature; micro and meso pore SSA as features, capacitance as output at scan rates between 1 and 100 mV/sDTE, ANN[77,89]
Activated carbon and carbon molecular sievedisordered micro- and meso-porosityBuilding a surrogate model to predict sorption isotherms1200+ measurements of N2, O2 and N2O adsorptionANN[85]
Carbide-derived active carbondsdisordered micro- and meso-porosityCorrelation ciprofloxacin sorption capacity to strutural parameters (SSA, average pore size, total pore volume, micropore volme, called “texture”, their origin unfortunately unclear)87 different carbon samples, ciprofloxacin adsorption capacityLinear model[131]
Activated carbonsdisordered micro- and meso-porosityExamining the correlation between the electric double layer capacitance and the pore structure of active carbon70 carbon samples from the literature; micro and meso pore SSA as features, capacitance as output at scan rates between 1 and 100 mV/sDTE, ANN[77,89]
Mesoporous materials
Porous Al2O3/SiC ceramic cakesIrregular nanoporosity, honeycomb-like structureLinking synthesis conditions to porosity and Al2O3 content50 samples, all experimentally obtained in the same workANN[132]
Nanostructured hydrogel Interstitial (semi-regular, formed via amphiphilic segregation)Optimization of printing parametersExperiment, from the same work, 12 pointsSVM[133]
polyimide based aerogelsDisordered mesoporosity, highly porous mediumOptimization of reactant formulation and synthesis conditions to obtain best performing material60 different hydrogels, all measured in the same paperANN[116]
SiO2-Al2O3 porous ceramicsIrregular mesoporosity Optimization of reactant formulation and synthesis conditions to obtain best performing material77 samples, all from the same studyDiffe-rent me-thods[111]
Simulated aerogels Far goal—optimize pore structure; near—predict Df on simulations parameters3125 simulated aggregates of spherical particles, 3 featuresANN[120]
polyurethane aerogel, silica–resorcinol formalde-hyde aerogelIrregular mesoporosityLink gel density, solid phase density & conductivity to everall thermal conductivitySeveral dozen, all synthesiszed in the same study KNN, GPR[118]
silica xerogelsDisordered mesoporosityOptimize reactants compositions, synthesis procedure was adjusted to optimize the yield and silanol group surface concentration36 synthesized samples. All from the same workSVM[112]
Cellulose, chitosan and graphene based aerogels with microporous additivesDisordered mesoporosity, in some samples ordered microporosityEvaluation of different factors influencing; Aerogels Efficiency towards Ion Removal17 samples, from literaturePCA[115]
Aerogel incorporated concreteDisordered mesoporosity, disordered macroporosityElucidating factors determining thermal conductivity, mechanical properties660 samples, experimentally studiedDTE[117]

Appendix B. Lay Person Overview of the Main Machine Learning Methods Applied in Data-Driven Design, Characterization and Optimization of Disordered Porous Materials

Appendix B.1. Supervised Learning Methods

ML in the reviewed papers is solely used to build regression models, linking structural parameters to properties, and parameters characterizing the synthesis/production process to structure and processes. The following methods are most popular:
Support Vector Machine [134] (SVM). Uses kernel transformation of the data to multi-dimensional feature space where effective linear approximation is possible. A very common classical method for regression problems resilient to noisy data.
k nearest neighbors (KNN): the output value is the weighted or unweighted average of the values of k nearest neighbor of that location. k is a regularization parameter chosen to minimize the prediction error for the test set.
Decision tree ensembles (DTE): DT regressor creates a tree-like structure of decisions based on input features, where each internal node represents an attribute, and each leaf node contains a prediction. The algorithm recursively partitions the data into smaller subsets until reaching a stopping criterion. Tree depth is the main regularization parameter chosen to maximize the prediction accuracy. Most common DTE methods are random forest (RF) [135], which combines output of several random decision trees into a single result) and gradient boosting, [136] where next decision tree generation minimizes the error obtained by the previous tree generation.
Gaussian process regression (GPR, aka kriging): a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations.
Artificial neural network (ANN) consists of layers of artificial neurons connected by edges. Each artificial neuron receives signals from the connected neurons, then processes them and sends a response signal (in a form of a real number multiplied by an optimizable weights). The signals from the output layer is the outcome, the weights are optimized to minimize the error. The ANN topology (number and the size of layers and their connectivity) and the activation function (define the signal processing by the particular neuron) is chosen for the particular problem. ANN hold advantage over other methods on large datasets with many features, but are most expensive to train. A generative adversarial network (GAN) [137] two neural networks contest with each other in the form of a zero-sum game: one is the generator (generates structures close to the original distribution) and the discriminator network evaluates them. The discriminator is trained on the samples from the target distribution and evaluates the likelihood that the particular sample belongs to that distribution, while, the generator is trained to produce the samples that discriminator cannot distinguish (that is, gives high likelihood estimates).
Active learning (AL): [138] the algorithm queries the subject of learning to obtains the output for a one or several inputs; the responses are added to the dataset, the model is adjusted and a new query submitted until convergence criteria are satisfied. Most commonly based on Bayesian optimization, which treats a black box system as a random function described with a probability distribution that captures beliefs on the system behavior. After receiving responses to the queries, the distribution is updated and used to construct an acquisition function to formulate the next query.

Appendix B.2. Unsupervised Learning Methods

Principal component analysis (PCA) [139,140] is a linear dimensionality reduction technique, where the data is linearly transformed onto a new coordinate system in such a way that the directions (principal components) capturing the largest variation in the data can be easily identified.

References

  1. Van Der Voort, P.; Leus, K.; De Canck, E. Introduction to Porous Materials; John Wiley & Sons: Hoboken, NJ, USA, 2019. [Google Scholar]
  2. Earl, D.; Deem, M.; Peierls, R. Toward a Database of Hypothetical Zeolite Structures. Ind. Eng. Chem. Res. 2006, 45, 5449–5454. [Google Scholar] [CrossRef]
  3. Kitagawa, S. Metal–organic frameworks (MOFs). Chem. Soc. Rev. 2014, 43, 5415–5418. [Google Scholar]
  4. Côté, A.P.; Benin, A.I.; Ockwig, N.W.; O’Keeffe, M.; Matzger, A.J.; Yaghi, O.M. Porous, Crystalline, Covalent Organic Frameworks. Science 2005, 310, 1166–1170. [Google Scholar] [CrossRef] [PubMed]
  5. Lu, W.; Yuan, D.; Zhao, D.; Schilling, C.; Plietzsch, O.; Muller, T.; Bräse, S.; Guenther, J.; Blümel, J.; Krishna, R.; et al. Porous Polymer Networks: Synthesis, Porosity, and Applications in Gas Storage/Separation. Chem. Mater. 2010, 22, 5964–5972. [Google Scholar] [CrossRef]
  6. Nandy, A.; Duan, C.; Kulik, H.J. Audacity of huge: Overcoming challenges of data scarcity and data quality for machine learning in computational materials discovery. Curr. Opin. Chem. Eng. 2022, 36, 100778. [Google Scholar] [CrossRef]
  7. Allmann, R.; Hinek, R. The introduction of structure types into the Inorganic Crystal Structure Database ICSD. Acta Crystallogr. Sect. A Found. Crystallogr. 2007, 63, 412–417. [Google Scholar] [CrossRef] [PubMed]
  8. Vaitkus, A.; Merkys, A.; Sander, T.; Quirós, M.; Thiessen, P.A.; Bolton, E.E.; Gražulis, S. A workflow for deriving chemical entities from crystallographic data and its application to the Crystallography Open Database. J. Cheminform. 2023, 15, 123. [Google Scholar] [CrossRef] [PubMed]
  9. Kabekkodu, S.N.; Dosen, A.; Blanton, T.N. 5+: A comprehensive powder diffraction file™ for materials characterization. In Powder Diffraction; Cambridge University Press: Cambridge, UK, 2024; pp. 1–13. [Google Scholar]
  10. Groom, C.R.; Bruno, I.J.; Lightfoot, M.P.; Ward, S.C. The Cambridge structural database. Struct. Sci. 2016, 72, 171–179. [Google Scholar] [CrossRef]
  11. Chung, Y.; Haldoupis, E.; Bucior, B.; Haranczyk, M.; Lee, S.; Vogiatzis, K.; Ling, S.; Milisavljevic, M.; Zhang, H.; Camp, J. Computation-Ready Experimental Metal-Organic Framework (CoRE MOF) 2019 Dataset. Available online: https://zenodo.org/records/3677685 (accessed on 29 December 2024).
  12. Bucior, B.J.; Rosen, A.S.; Haranczyk, M.; Yao, Z.; Ziebel, M.E.; Farha, O.K.; Hupp, J.T.; Siepmann, J.I.; Aspuru-Guzik, A.; Snurr, R.Q. Identification schemes for metal–organic frameworks to enable rapid search and cheminformatics analysis. Cryst. Growth Des. 2019, 19, 6682–6697. [Google Scholar] [CrossRef]
  13. Martin, R.L.; Simon, C.M.; Medasani, B.; Britt, D.K.; Smit, B.; Haranczyk, M. In silico design of three-dimensional porous covalent organic frameworks via known synthesis routes and commercially available species. J. Phys. Chem. C 2014, 118, 23790–23802. [Google Scholar] [CrossRef]
  14. Demir, H.; Daglar, H.; Gulbalkan, H.C.; Aksu, G.O.; Keskin, S. Recent advances in computational modeling of MOFs: From molecular simulations to machine learning. Coord. Chem. Rev. 2023, 484, 215112. [Google Scholar] [CrossRef]
  15. Jablonka, K.M.; Ongari, D.; Moosavi, S.M.; Smit, B. Big-data science in porous materials: Materials genomics and machine learning. Chem. Rev. 2020, 120, 8066–8129. [Google Scholar] [CrossRef] [PubMed]
  16. Jensen, Z.; Olivetti, E. Machine Learning in Porous Materials. In Computer Simulation of Porous Materials: Current Approaches and Future Opportunities; The Royal Society of Chemistry: London, UK, 2021; p. 13. [Google Scholar]
  17. Sastre, G.; Daeyaert, F. AI-guided Design and Property Prediction for Zeolites and Nanoporous Materials; John Wiley & Sons: Hoboken, NJ, USA, 2023. [Google Scholar]
  18. Schwalbe-Koda, D.; Gómez-Bombarelli, R. Generating, Managing, and Mining Big Data in Zeolite Simulations. In AI-Guided Design and Property Prediction for Zeolites and Nanoporous Materials; Massachusetts Institute of Technology: Cambridge, MA, USA, 2023; pp. 81–111. [Google Scholar]
  19. Wang, J.; Tian, K.; Li, D.; Chen, M.; Feng, X.; Zhang, Y.; Wang, Y.; Van der Bruggen, B. Machine learning in gas separation membrane developing: Ready for prime time. Sep. Purif. Technol. 2023, 313, 123493. [Google Scholar] [CrossRef]
  20. Miao, C.; Wang, L.; Zhou, S.; Yu, D.; Zhang, C.; Gao, S.; Yu, X.; Zhao, Z. Preparation of Mesoporous Zeolites and Their Applications in Catalytic Elimination of Atmospheric Pollutants. Catalysts 2024, 14, 75. [Google Scholar] [CrossRef]
  21. Gregg, S.J.; Sing, K.S.W.; Salzberg, H. Adsorption surface area and porosity. J. Electrochem. Soc. 1967, 114, 279Ca. [Google Scholar] [CrossRef]
  22. Ciesla, U.; Schüth, F. Ordered mesoporous materials. Microporous Mesoporous Mater. 1999, 27, 131–149. [Google Scholar] [CrossRef]
  23. Ongari, D.; Talirz, L.; Smit, B. Too many materials and too many applications: An experimental problem waiting for a computational solution. ACS Cent. Sci. 2020, 6, 1890–1900. [Google Scholar] [CrossRef]
  24. Chu, T.; Park, S.; Fu, K. 3D printing-enabled advanced electrode architecture design. Carbon Energy 2021, 3, 424–439. [Google Scholar] [CrossRef]
  25. Pilania, G.; Wang, C.; Jiang, X.; Rajasekaran, S.; Ramprasad, R. Accelerating materials property predictions using machine learning. Sci. Rep. 2013, 3, 2810. [Google Scholar] [CrossRef]
  26. Deshwal, A.; Simon, C.M.; Doppa, J.R. Bayesian optimization of nanoporous materials. Mol. Syst. Des. Eng. 2021, 6, 1066–1086.27. [Google Scholar] [CrossRef]
  27. D’Elia, M.; Deng, H.; Fraces, C.; Garikipati, K.; Graham-Brady, L.; Howard, A.; Karniadakis, G.; Keshavarzzadeh, V.; Kirby, R.M.; Kutz, N. Machine learning in heterogeneous porous materials. arXiv 2022, arXiv:2202.04137. [Google Scholar]
  28. D’Elia, M.; Howard, A.; Kirby, R.M.; Kutz, N.; Tartakovsky, A.; Viswanathan, H. Discovering new governing equations using ML. In Machine Learning in Heterogeneous Porous Materials; The University of Utah: Salt lake City, UT, USA, 2022. [Google Scholar]
  29. Lee, M.-T.; Vishnyakov, A.; Neimark, A.V. Modeling Proton Dissociation and Transfer Using Dissipative Particle Dynamics Simulation. J. Chem. Theory Comput. 2015, 11, 4395–4403. [Google Scholar] [CrossRef]
  30. Lee, M.-T.; Vishnyakov, A.; Neimark, A.V. Coarse-grained model of water diffusion and proton conductivity in hydrated polyelectrolyte membrane. J. Chem. Phys. 2016, 144, 014902. [Google Scholar] [CrossRef] [PubMed]
  31. Vishnyakov, A.; Neimark, A.V. Self-Assembly in Nafion Membranes upon Hydration: Water Mobility and Adsorption Isotherms. J. Phys. Chem. B 2014, 118, 11353–11364. [Google Scholar] [CrossRef]
  32. Iriarte, D.; Andrada, H.; Maldonado Ochoa, S.A.; Silva, O.F.; Vaca Chávez, F.; Carreras, A. Effect of acid treatment on the physico-chemical properties of Nafion 117 membrane. Int. J. Hydrogen Energy 2022, 47, 21253–21260. [Google Scholar] [CrossRef]
  33. Salam, M.; Habib, M.; Arefin, P.; Ahmed, K.; Uddin, S.; Hossain, T.; Papri, N. Effect of Temperature on the Performance Factors and Durability of Proton Exchange Membrane of Hydrogen Fuel Cell: A Narrative Review. Mater. Sci. Res. India 2020, 17, 179–191. [Google Scholar] [CrossRef]
  34. Lu, H.; Fraces, C.; Tchelepi, H.; Tartakovsky, D.M. Multi-scale modeling in heterogeneous porous materials via ML. In Machine Learning in Heterogeneous Porous Materials; The University of Utah: Salt lake City, UT, USA, 2022; pp. 23–40. [Google Scholar]
  35. Kochkov, D.; Smith, J.A.; Alieva, A.; Wang, Q.; Brenner, M.P.; Hoyer, S. Machine learning–accelerated computational fluid dynamics. Proc. Natl. Acad. Sci. USA 2021, 118, e2101784118. [Google Scholar] [CrossRef] [PubMed]
  36. Akeweje, E.; Vanovskiy, V.; Vishnyakov, A. Surrogate Models of Hydrogen Oxidation Kinetics based on Deep Neural Networks. Theor. Found. Chem. Eng. 2023, 57, 196–204. [Google Scholar] [CrossRef]
  37. Karniadakis, G.E.; Kevrekidis, I.G.; Lu, L.; Perdikaris, P.; Wang, S.; Yang, L. Physics-informed machine learning. Nat. Rev. Phys. 2021, 3, 422–440. [Google Scholar] [CrossRef]
  38. Garikipati, K.; Graham-Brady, L.; Keshavarzzadeh, V.; Li, C.; Liu, X.; Zarzycki, P. ML in predicting material properties. In Machine Learning in Heterogeneous Porous Materials; The University of Utah: Salt lake City, UT, USA, 2022. [Google Scholar]
  39. Olhin, A.; Vishnyakov, A. Pore Structure and Permeability of Tight-Pore Sandstones: Quantitative Test of the Lattice–Boltzmann Method. Appl. Sci. 2023, 13, 9112. [Google Scholar] [CrossRef]
  40. Akeweje, E.; Olhin, A.; Avilkin, V.; Vishnyakov, A.; Panov, M. Real-Time Reconstruction of Complex Flow in Nanoporous Media: Linear vs Non-linear Decoding. In Computational Science—ICCS 2023; Springer Nature: Cham, Switzerland, 2023; pp. 580–594. [Google Scholar]
  41. Delpisheh, M.; Ebrahimpour, B.; Fattahi, A.; Siavashi, M.; Mir, H.; Mashhadimoslem, H.; Abdol, M.A.; Ghorbani, M.; Shokri, J.; Niblett, D. Leveraging machine learning in porous media. J. Mater. Chem. A 2024, 12, 20717–20782. [Google Scholar] [CrossRef]
  42. Osman, A.I.; Nasr, M.; Farghali, M.; Bakr, S.S.; Eltaweil, A.S.; Rashwan, A.K.; Abd El-Monaem, E.M. Machine learning for membrane design in energy production, gas separation, and water treatment: A review. Environ. Chem. Lett. 2024, 22, 505–560. [Google Scholar] [CrossRef]
  43. Namdeo, S.; Srivastava, V.C.; Mohanty, P. Machine learning implemented exploration of the adsorption mechanism of carbon dioxide onto porous carbons. J. Colloid Interface Sci. 2023, 647, 174–187. [Google Scholar] [CrossRef] [PubMed]
  44. Qian, C.; Sun, K.; Bao, W. Recent advance on machine learning of MXenes for energy storage and conversion. Int. J. Energy Res. 2022, 46, 21511–21522. [Google Scholar] [CrossRef]
  45. Wang, Z.; Wang, Q.; Yang, F.; Wang, C.; Yang, M.; Yu, J. How Machine learning boosts the understanding of organic pollutant adsorption on carbonaceous Materials: A comprehensive review with statistical insights. Sep. Purif. Technol. 2024, 350, 127790. [Google Scholar] [CrossRef]
  46. Wang, B.-Y.; Li, B.; Xu, H.-Y. Machine learning screening of biomass precursors to prepare biomass carbon for organic wastewater purification: A review. Chemosphere 2024, 362, 142597. [Google Scholar] [CrossRef]
  47. Tassiopoulou, S.; Koukiou, G.; Anastassopoulos, V. Algorithms in Tomography and Related Inverse Problems—A Review. Algorithms 2024, 17, 71. [Google Scholar] [CrossRef]
  48. Mehmet, S.; Bülent, S. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 2004, 13, 146–165. [Google Scholar] [CrossRef]
  49. Fager, C.; Röding, M.; Olsson, A.; Lorén, N.; von Corswant, C.; Särkkä, A.; Olsson, E. Optimization of FIB–SEM Tomography and Reconstruction for Soft, Porous, and Poorly Conducting Materials. Microsc. Microanal. 2020, 26, 837–845. [Google Scholar] [CrossRef]
  50. Čalkovský, M.; Müller, E.; Meffert, M.; Firman, N.; Mayer, F.; Wegener, M.; Gerthsen, D. Comparison of segmentation algorithms for FIB-SEM tomography of porous polymers: Importance of image contrast for machine learning segmentation. Mater. Charact. 2021, 171, 110806. [Google Scholar] [CrossRef]
  51. Sardhara, T.; Shkurmanov, A.; Li, Y.; Riedel, L.; Shi, S.; Cyron, C.J.; Aydin, R.C.; Ritter, M. Enhancing 3D Reconstruction Accuracy of FIB Tomography Data Using Multi-voltage Images and Multimodal Machine Learning. Nanomanuf. Metrol. 2024, 7, 4. [Google Scholar] [CrossRef]
  52. Sardhara , T.; Aydin , R.C.; Li , Y.; Piché , N.; Gauvin , R.; Cyron , C.J.; Ritter , M. Training Deep Neural Networks to Reconstruct Nanoporous Structures From FIB Tomography Images Using Synthetic Training Data. Front. Mater. 2022, 9, 837006. [Google Scholar] [CrossRef]
  53. Li, Y.; Dinh Ngô, B.-N.; Markmann, J.; Weissmüller, J. Datasets for the microstructure of nanoscale metal network structures and for its evolution during coarsening. Data Brief 2020, 29, 105030. [Google Scholar] [CrossRef] [PubMed]
  54. Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
  55. Shkurmanov, A.; Krekeler, T.; Ritter, M. Slice Thickness Optimization for the Focused Ion Beam-Scanning Electron Microscopy 3D Tomography of Hierarchical Nanoporous Gold. Nanomanuf. Metrol. 2022, 5, 112–118. [Google Scholar] [CrossRef]
  56. Sardhara, T.; Shkurmanov, A.; Li, Y.; Shi, S.; Cyron, C.; Aydin, R.; Ritter, M. Role of slice thickness quantification in the 3D reconstruction of FIB tomography data of nanoporous materials. Ultramicroscopy 2023, 256, 113878. [Google Scholar] [CrossRef]
  57. Gidley, D.; Peng, H.-G.; Vallery, R. Positron annihilation as a method to characterize porous materials. Annu. Rev. Mater. Res 2006, 36, 49–79. [Google Scholar] [CrossRef]
  58. Pietrow, M.; Miaskowski, A. Artificial neural network as an effective tool to calculate parameters of positron annihilation lifetime spectra. J. Appl. Phys. 2023, 134, 114902. [Google Scholar] [CrossRef]
  59. Marsh, H.; Reinoso, F.R. Activated Carbon; Elsevier: Amsterdam, The Netherlands, 2006. [Google Scholar]
  60. Harris, P.J.; Liu, Z.; Suenaga, K. Imaging the atomic structure of activated carbon. J. Phys. Condens. Matter 2008, 20, 362201. [Google Scholar] [CrossRef]
  61. Kowalczyk, P.; Terzyk, A.P.; Gauden, P.A.; Furmaniak, S.; Wiśniewski, M.; Burian, A.; Hawelek, L.; Kaneko, K.; Neimark, A.V. Carbon molecular sieves: Reconstruction of atomistic structural models with experimental constraints. J. Phys. Chem. C 2014, 118, 12996–13007. [Google Scholar] [CrossRef]
  62. Ravikovitch, P.I.; Vishnyakov, A.; Russo, R.; Neimark, A.V. Unified Approach to Pore Size Characterization of Microporous Carbonaceous Materials from N2, Ar, and CO2 Adsorption Isotherms. Langmuir 2000, 16, 2311–2320. [Google Scholar] [CrossRef]
  63. Mehdipour-Ataei, S.; Aram, E. Mesoporous Carbon-Based Materials: A Review of Synthesis, Modification, and Applications. Catalysts 2023, 13, 2. [Google Scholar] [CrossRef]
  64. Massaoudi, A.; Echouchene, F.; Ben Ayed, M.; Berguiga, A.; Harchay, A.; Al-Ghamdi, S.; Belmabrouk, H. Machine learning models for modeling the biosorption of Fe (III) ions by activated carbon from olive stone. Neural Comput. Appl. 2024, 36, 13357–13372. [Google Scholar] [CrossRef]
  65. Wang, C.; Jiang, W.; Jiang, G.; Zhang, T.; He, K.; Mu, L.; Zhu, J.; Huang, D.; Qian, H.; Lu, X. Machine Learning Prediction of the Yield and BET Area of Activated Carbon Quantitatively Relating to Biomass Compositions and Operating Conditions. Ind. Eng. Chem. Res. 2023, 62, 11016–11031. [Google Scholar] [CrossRef]
  66. Brunauer, S.; Emmett, P.H.; Teller, E. Adsorption of Gases in Multimolecular Layers. J. Am. Chem. Soc. 1938, 60, 309–319. [Google Scholar] [CrossRef]
  67. Kaneko, K.; Ishii, C.; Ruike, M.; kuwabara, H. Origin of superhigh surface area and microcrystalline graphitic structures of activated carbons. Carbon 1992, 30, 1075–1088. [Google Scholar] [CrossRef]
  68. Chang, J.; Lee, J.-Y. Machine Learning-Based Prediction of the Adsorption Characteristics of Biochar from Waste Wood by Chemical Activation. Materials 2024, 17, 5359. [Google Scholar] [CrossRef]
  69. Juhola, A. Iodine adsorption and structure of activated carbons. Carbon 1975, 13, 437–442. [Google Scholar] [CrossRef]
  70. Lai, X.; Zhou, P.; Kong, Y.; Wu, B.; Zhang, Q.; Cui, X. A machine learning and experimental-based model for prediction of soil sorption capacity toward phenanthrene. Environ. Res. 2024, 244, 117898. [Google Scholar] [CrossRef]
  71. Jiang, W.; Xing, X.; Li, S.; Zhang, X.; Wang, W. Synthesis, characterization and machine learning based performance prediction of straw activated carbon. J. Clean. Prod. 2018, 212, 1210–1223. [Google Scholar] [CrossRef]
  72. Li, X.; Huang, Z.; Shao, S.; Cai, Y. Machine learning prediction of physical properties and nitrogen content of porous carbon from agricultural wastes: Effects of activation and doping process. Fuel 2024, 356, 129623. [Google Scholar] [CrossRef]
  73. Wang, T.; Pan, R.; Martins, M.L.; Cui, J.; Huang, Z.; Thapaliya, B.P.; Do-Thanh, C.-L.; Zhou, M.; Fan, J.; Yang, Z. Machine-learning-assisted material discovery of oxygen-rich highly porous carbon active materials for aqueous supercapacitors. Nat. Commun. 2023, 14, 4607. [Google Scholar] [CrossRef]
  74. Yang, X.; Yuan, C.; He, S.; Jiang, D.; Cao, B.; Wang, S. Machine learning prediction of specific capacitance in biomass derived carbon materials: Effects of activation and biochar characteristics. Fuel 2023, 331, 125718. [Google Scholar] [CrossRef]
  75. Bouchelkia, N.; Tahraoui, H.; Amrane, A.; Belkacemi, H.; Bollinger, J.-C.; Bouzaza, A.; Zoukel, A.; Zhang, J.; Mouni, L. Jujube stones based highly efficient activated carbon for methylene blue adsorption: Kinetics and isotherms modeling, thermodynamics and mechanism study, optimization via response surface methodology and machine learning approaches. Process Saf. Environ. Prot. 2023, 170, 513–535. [Google Scholar] [CrossRef]
  76. Zhao, H.; Lyu, Y.; Hu, J.; Li, M.; Chen, H.; Jiang, Y.; Tang, M.; Wu, Y.; Sun, W. Reveal the major factors controlling quinolone adsorption on mesoporous carbon: Batch experiment, DFT calculation, MD simulation, and machine learning modeling. Chem. Eng. J. 2023, 463, 142486. [Google Scholar] [CrossRef]
  77. Kusdhany, M.I.M.; Lyth, S.M. New insights into hydrogen uptake on porous carbon materials via explainable machine learning. Carbon 2021, 179, 190–201. [Google Scholar] [CrossRef]
  78. Davoodi, S.; Vo Thanh, H.; Wood, D.A.; Mehrad, M.; Al-Shargabi, M.; Rukavishnikov, V.S. Machine-learning models to predict hydrogen uptake of porous carbon materials from influential variables. Sep. Purif. Technol. 2023, 316, 123807. [Google Scholar] [CrossRef]
  79. Kevin, D.A.; Aimikhe, V.J.; Ikeokwu, C.C. A Machine Learning Approach to Determining the CO2 Adsorption Capacity of Coconut Shell-Derived Activated Carbon. In Proceedings of the SPE Nigeria Annual International Conference and Exhibition, Lagos, Nigeria, 5–7 August 2024; p. D022S025R008. [Google Scholar]
  80. Zhang, J.; Zhang, X.; Li, X.; Song, Z.; Shao, J.; Zhang, S.; Yang, H.; Chen, H. Prediction of CO2 adsorption of biochar under KOH activation via machine learning. Carbon Capture Sci. Technol. 2024, 13, 100309. [Google Scholar] [CrossRef]
  81. Yuan, X.; Suvarna, M.; Low, S.; Dissanayake, P.D.; Lee, K.B.; Li, J.; Wang, X.; Ok, Y.S. Applied machine learning for prediction of CO2 adsorption on biomass waste-derived porous carbons. Environ. Sci. Technol. 2021, 55, 11925–11936. [Google Scholar] [CrossRef]
  82. Kowalczyk, P.; Terzyk, A.P.; Erwardt, P.; Hough, M.; Deditius, A.P.; Gauden, P.A.; Neimark, A.V.; Kaneko, K. Machine learning-assisted design of porous carbons for removing paracetamol from aqueous solutions. Carbon 2022, 198, 371–381. [Google Scholar] [CrossRef]
  83. Palle, K.; Vunguturi, S.; Gayatri, S.N.; Rao, K.S.; Babu, P.R.; Vijay, R. The prediction of CO2 adsorption on rice husk activated carbons via deep learning neural network. MRS Commun. 2022, 12, 434–440. [Google Scholar] [CrossRef]
  84. Vishnyakov, A.; Ravikovitch, P.I.; Neimark, A.V. Molecular Level Models for CO2 Sorption in Nanopores. Langmuir 1999, 15, 8736–8742. [Google Scholar] [CrossRef]
  85. Mashhadimoslem, H.; Ghaemi, A. Machine learning analysis and prediction of N2, N2O, and O2 adsorption on activated carbon and carbon molecular sieve. Environ. Sci. Pollut. Res. 2023, 30, 4166–4186. [Google Scholar] [CrossRef]
  86. Koyama, Y. Machine Learning Models to Predict Early Breakthrough of Recalcitrant Organic Micropollutants in Granular Activated Carbon Treatment of Water; North Carolina State University: Raleigh, NC, USA, 2021. [Google Scholar]
  87. Koyama, Y.; Fasaee, M.A.K.; Berglund, E.Z.; Knappe, D.R.U. Machine Learning Models to Predict Early Breakthrough of Recalcitrant Organic Micropollutants in Granular Activated Carbon Adsorbers. Environ. Sci. Technol. 2024, 58, 17114–17124. [Google Scholar] [CrossRef] [PubMed]
  88. Sontheimer, H.; Crittenden, J.C.; Summers, R.S. Activated Carbon for Water Treatment; DVGW-Forschungsstelle, Engler-Bunte-Institut, Universitat Karlsruhe (TH): Karlsruhe, Germany, 1988. [Google Scholar]
  89. Zhou, M.; Gallegos, A.; Liu, K.; Dai, S.; Wu, J. Insights from machine learning of carbon electrodes for electric double layer capacitors. Carbon 2020, 157, 147–152. [Google Scholar] [CrossRef]
  90. Neimark, A.V.; Ravikovitch, P.I.; Vishnyakov, A. Bridging scales from molecular simulations to classical thermodynamics: Density functional theory of capillary condensation in nanopores. J. Phys. Condens. Matter 2003, 15, 347. [Google Scholar] [CrossRef]
  91. Ravikovitch, P.I.; Vishnyakov, A.; Neimark, A.V.; Ribeiro Carrott, M.M.L.; Russo, P.A.; Carrott, P.J. Characterization of Micro-Mesoporous Materials from Nitrogen and Toluene Adsorption:  Experiment and Modeling. Langmuir 2006, 22, 513–516. [Google Scholar] [CrossRef] [PubMed]
  92. Thommes, M.; Kaneko, K.; Neimark, A.V.; Olivier, J.P.; Rodriguez-Reinoso, F.; Rouquerol, J.; Sing, K.S.W. Physisorption of gases, with special reference to the evaluation of surface area and pore size distribution (IUPAC Technical Report). Pure Appl. Chem. 2015, 87, 1051–1069. [Google Scholar] [CrossRef]
  93. Arya, R.K.; Thapliyal, D.; Sharma, J.; Verros, G.D. Glassy Polymers—Diffusion, Sorption, Ageing and Applications. Coatings 2021, 11, 1049. [Google Scholar] [CrossRef]
  94. Buonomenna, M.G. Microporous Polymers for Gas Separation Membranes: Overview and Advances. In Handbook of Polymer and Ceramic Nanotechnology; Hussain, C.M., Thomas, S., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 1–29. [Google Scholar] [CrossRef]
  95. Luo, S.; Han, T.; Wang, C.; Sun, Y.; Zhang, H.; Guo, R.; Zhang, S. Hierarchically microporous membranes for highly energy-efficient gas separations. Ind. Chem. Mater. 2023, 1, 376–387. [Google Scholar] [CrossRef]
  96. Robeson, L.M. The upper bound revisited. J. Membr. Sci. 2008, 320, 390–400. [Google Scholar] [CrossRef]
  97. Sabe, V.T.; Ntombela, T.; Jhamba, L.A.; Maguire, G.E.M.; Govender, T.; Naicker, T.; Kruger, H.G. Current trends in computer aided drug design and a highlight of drugs discovered via computational techniques: A review. Eur. J. Med. Chem. 2021, 224, 113705. [Google Scholar] [CrossRef]
  98. Grosberg, A.Y.; Khokhlov, A.R. Giant Molecules: Here, There, and Everywhere; World Scientific: Singapore, 2010. [Google Scholar]
  99. Barnett, J.W.; Bilchak, C.R.; Wang, Y.; Benicewicz, B.C.; Murdock, L.A.; Bereau, T.; Kumar, S.K. Designing exceptional gas-separation polymer membranes using machine learning. Sci. Adv. 2020, 6, eaaz4301. [Google Scholar] [CrossRef] [PubMed]
  100. Yuan, Q.; Longo, M.; Thornton, A.W.; McKeown, N.B.; Comesana-Gandara, B.; Jansen, J.C.; Jelfs, K.E. Imputation of missing gas permeability data for polymer membranes using machine learning. J. Membr. Sci. 2021, 627, 119207. [Google Scholar] [CrossRef]
  101. Yang, J.; Tao, L.; He, J.; McCutcheon, J.R.; Li, Y. Machine learning enables interpretable discovery of innovative polymers for gas separation membranes. Sci. Adv. 2022, 8, eabn9545. [Google Scholar] [CrossRef] [PubMed]
  102. Rogers, D.; Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 2010, 50, 742–754. [Google Scholar] [CrossRef]
  103. Zimmerman, C.M.; Koros, W.J. Comparison of gas transport and sorption in the ladder polymer BBL and some semi-ladder polymers. Polymer 1999, 40, 5655–5664. [Google Scholar] [CrossRef]
  104. Todeschini, R.; Consonni, V. Molecular Descriptors for Chemoinformatics. Available online: https://onlinelibrary.wiley.com/doi/book/10.1002/9783527628766 (accessed on 2 November 2024).
  105. Basdogan, Y.; Pollard, D.R.; Shastry, T.; Carbone, M.R.; Kumar, S.K.; Wang, Z.-G. Machine learning-guided discovery of polymer membranes for CO2 separation with genetic algorithm. J. Membr. Sci. 2024, 712, 123169. [Google Scholar] [CrossRef]
  106. Jia, Y.; Lu, Y.; Yang, H.; Chen, Y.; Hillman, F.; Wang, K.; Liang, C.Z.; Zhang, S. Control of Microporous Structure in Conjugated Microporous Polymer Membranes for Post-Combustion Carbon Capture. Adv. Funct. Mater. 2024, 34, 2407499. [Google Scholar] [CrossRef]
  107. Xu, J.; Suleiman, A.; Liu, G.; Perez, M.; Zhang, R.; Jiang, M.; Guo, R.; Luo, T. Superior Polymeric Gas Separation Membrane Designed by Explainable Graph Machine Learning. arXiv 2024, arXiv:2404.10903. [Google Scholar] [CrossRef]
  108. Cheun, J.-Y.; Liew, J.-Y.-L.; Tan, Q.-Y.; Chong, J.-W.; Ooi, J.; Chemmangattuvalappil, N.G. Design of polymeric membranes for air separation by combining machine learning tools with computer aided molecular design. Processes 2023, 11, 2004. [Google Scholar] [CrossRef]
  109. Chen, L.; Liu, G.; Zhang, Z.; Wang, Y.; Yang, Y.; Li, J. Machine learning and molecular design algorithm assisted discovery of gas separation membranes exceeding the CO2/CH4 and CO2/N2 upper bounds. Chem. Eng. Sci. 2024, 291, 119952. [Google Scholar] [CrossRef]
  110. Glass, S.; Schmidt, M.; Merten, P.; Abdul Latif, A.; Fischer, K.; Schulze, A.; Friederich, P.; Filiz, V. Design of Modified Polymer Membranes Using Machine Learning. ACS Appl. Mater. Interfaces 2024, 16, 20990–21000. [Google Scholar] [CrossRef] [PubMed]
  111. Sun, Z.; Hu, N.; Ke, L.; Lv, Y.; Liu, Y.; Bai, Y.; Ou, Z.; Li, J. Machine learning-assisted design of Al2O3–SiO2 porous ceramics based on few-shot datasets. Ceram. Int. 2023, 49, 29400–29408. [Google Scholar] [CrossRef]
  112. Wang, C.; Yang, Q.; Wang, J.; Zhao, J.; Wan, X.; Guo, Z.; Yang, Y. Application of support vector machine on controlling the silanol groups of silica xerogel with the aid of segmented continuous flow reactor. Chem. Eng. Sci. 2019, 199, 486–495. [Google Scholar] [CrossRef]
  113. Hrubesh, L.W. Aerogel applications. J. Non-Cryst. Solids 1998, 225, 335–342. [Google Scholar] [CrossRef]
  114. Rong, C.; Zhou, L.; Zhang, B.; Xuan, F.-Z. Machine learning for mechanics prediction of 2D MXene-based aerogels. Compos. Commun. 2023, 38, 101474. [Google Scholar] [CrossRef]
  115. Younes, K.; Kharboutly, Y.; Antar, M.; Chaouk, H.; Obeid, E.; Mouhtady, O.; Abu-Samha, M.; Halwani, J.; Murshid, N. Application of Unsupervised Machine Learning for the Evaluation of Aerogels’ Efficiency towards Ion Removal—A Principal Component Analysis (PCA) Approach. Gels 2023, 9, 304. [Google Scholar] [CrossRef] [PubMed]
  116. Tafreshi, O.A.; Saadatnia, Z.; Ghaffari-Mosanenzadeh, S.; Okhovatian, S.; Park, C.B.; Naguib, H.E. Machine learning-based model for predicting the material properties of nanostructured aerogels. SPE Polym. 2023, 4, 24–37. [Google Scholar] [CrossRef]
  117. Han, F.; Lv, Y.; Liu, Y.; Zhang, X.; Yu, W.; Cheng, C.; Yang, W. Exploring interpretable ensemble learning to predict mechanical strength and thermal conductivity of aerogel-incorporated concrete. Constr. Build. Mater. 2023, 392, 131781. [Google Scholar] [CrossRef]
  118. Valipour Goodarzi, B.; Bahramian, A. Applying machine learning for predicting thermal conductivity coefficient of polymeric aerogels. J. Therm. Anal. Calorim. 2021, 147, 6227–6238. [Google Scholar] [CrossRef]
  119. Paul, J.; Ahankari, S.S. Nanocellulose-based aerogels for water purification: A review. Carbohydr. Polym. 2023, 309, 120677. [Google Scholar] [CrossRef]
  120. Abdusalamov, R.; Pandit, P.; Milow, B.; Itskov, M.; Rege, A. Machine learning-based structure–property predictions in silica aerogels. Soft Matter 2021, 17, 7350–7358. [Google Scholar] [CrossRef]
  121. Pandit, P.; Abdusalamov, R.; Itskov, M.; Milow, B.; Rege, A. Data-driven inverse design and optimisation of silica aerogel model networks. PAMM 2023, 23, e202200329. [Google Scholar] [CrossRef]
  122. Neimark, A. A new approach to the determination of the surface fractal dimension of porous solids. Phys. A Stat. Mech. Its Appl. 1992, 191, 258–262. [Google Scholar] [CrossRef]
  123. Zhou, Y.; Zheng, S. Stochastic uncertainty-based optimisation on an aerogel glazing building in China using supervised learning surrogate model and a heuristic optimisation algorithm. Renew. Energy 2020, 155, 810–826. [Google Scholar] [CrossRef]
  124. Zhou, Y.; Zheng, S. Climate adaptive optimal design of an aerogel glazing system with the integration of a heuristic teaching-learning-based algorithm in machine learning-based optimization. Renew. Energy 2020, 153, 375–391. [Google Scholar] [CrossRef]
  125. Shrestha, S.; Barvenik, K.J.; Chen, T.; Yang, H.; Li, Y.; Kesavan, M.M.; Little, J.M.; Whitley, H.C.; Teng, Z.; Luo, Y. Machine intelligence accelerated design of conductive MXene aerogels with programmable properties. Nat. Commun. 2024, 15, 4685. [Google Scholar] [CrossRef] [PubMed]
  126. Zhang, Y.-Z.; El-Demellawi, J.K.; Jiang, Q.; Ge, G.; Liang, H.; Lee, K.; Dong, X.; Alshareef, H.N. MXene hydrogels: Fundamentals and applications. Chem. Soc. Rev. 2020, 49, 7229–7251. [Google Scholar] [CrossRef]
  127. Chen, Z.; Fu, X.; Liu, R.; Song, Y.; Yin, X. Fabrication, Performance, and Potential Applications of MXene Composite Aerogels. Nanomaterials. 2023, 13, 2048. [Google Scholar] [CrossRef]
  128. Corapcioglu, M.O.; Huang, C.P. The surface acidity and characterization of some commercial activated carbons. Carbon 1987, 25, 569–578. [Google Scholar] [CrossRef]
  129. Korolev, V.; Mitrofanov, A.; Korotcov, A.; Tkachenko, V. Graph Convolutional Neural Networks as “General-Purpose” Property Predictors: The Universality and Limits of Applicability. J. Chem. Inf. Model. 2020, 60, 22–28. [Google Scholar] [CrossRef] [PubMed]
  130. Shapeev, A.V. Moment Tensor Potentials: A Class of Systematically Improvable Interatomic Potentials. Multiscale Model. Simul. 2016, 14, 1153–1173. [Google Scholar] [CrossRef]
  131. Käärik, M.; Krjukova, N.; Maran, U.; Oja, M.; Piir, G.; Leis, J. Nanomaterial Texture-Based Machine Learning of Ciprofloxacin Adsorption on Nanoporous Carbon. Int. J. Mol. Sci. 2024, 25, 11696. [Google Scholar] [CrossRef] [PubMed]
  132. Altinkok, N.; Koker, R. Mixture and pore volume fraction estimation in Al2O3/SiC ceramic cake using artificial neural networks. Mater. Des. 2005, 26, 305–311. [Google Scholar] [CrossRef]
  133. Fu, Z.; Angeline, V.; Sun, W. Evaluation of printing parameters on 3D extrusion printing of pluronic hydrogels and machine learning guided parameter recommendation. Int. J. Bioprinting 2021, 7, 434. [Google Scholar] [CrossRef] [PubMed]
  134. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  135. Ho, T.K. Random decision forests. In Proceedings of the 3rd international conference on document analysis and recognition, Montreal, QC, Canada, 14–16 August 1995; pp. 278–282. [Google Scholar]
  136. Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]
  137. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
  138. Cohn, D.; Atlas, L.; Ladner, R. Improving generalization with active learning. Mach. Learn. 1994, 15, 201–221. [Google Scholar] [CrossRef]
  139. Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 1933, 24, 417. [Google Scholar] [CrossRef]
  140. Groth, D.; Hartmann, S.; Klie, S.; Selbig, J. Principal components analysis. Comput. Toxicol. Vol. II 2013, 930, 527–547. [Google Scholar]
Figure 1. Examples of different approaches to ML of fluid transport in heterogeneous porous media depending on the knowledge of the medium structure and understanding of the physics of the process, here shown on the example of “dynamically nanoporous” [29,30]. Nafion membrane with the nanoporous structure formed via nanoscale segregation of hydrated polymer. Left: most detailed approach with the nanopore structure available from structural simulations. Middle: volumes of the mobile and immobile subphases and some connectivity characteristics are available. Right: entirely homogenized model characterized with macroscopic dynamic properties. As the spatial and temporal scales increase, and the detailed structure of the porous matrix is unavailable or too complex to account for, non-interpretable entirely data-based model linking the input pressures and concentrations to the fluxes are applied. Drawn according to ref. [28]. Images from refs: [31] (ACS permission) [32]; (CCL); [33] (CCL).
Figure 1. Examples of different approaches to ML of fluid transport in heterogeneous porous media depending on the knowledge of the medium structure and understanding of the physics of the process, here shown on the example of “dynamically nanoporous” [29,30]. Nafion membrane with the nanoporous structure formed via nanoscale segregation of hydrated polymer. Left: most detailed approach with the nanopore structure available from structural simulations. Middle: volumes of the mobile and immobile subphases and some connectivity characteristics are available. Right: entirely homogenized model characterized with macroscopic dynamic properties. As the spatial and temporal scales increase, and the detailed structure of the porous matrix is unavailable or too complex to account for, non-interpretable entirely data-based model linking the input pressures and concentrations to the fluxes are applied. Drawn according to ref. [28]. Images from refs: [31] (ACS permission) [32]; (CCL); [33] (CCL).
Materials 18 00534 g001
Figure 2. Illustration of segmentation of FIB-SEM images of HNPG using a synthetic dataset. (A) a example of SEM image of HNPG slice at accelerating voltage of 1 kV, bar length is 300 nm, green square is a sliding window; (B) Same at 4 kV (more detailed representation with a more pronounced shine-through); (C) simulated 3D multiscale porous structure; (D) simulated SEM image at the accelerating voltage of 1 kV; (E) Same at 4 kV; (FH) Different multimodal ML architectures: early, intermediate, late fusion, correspondingly. Сompiled from refs. [51,52] (CCL).
Figure 2. Illustration of segmentation of FIB-SEM images of HNPG using a synthetic dataset. (A) a example of SEM image of HNPG slice at accelerating voltage of 1 kV, bar length is 300 nm, green square is a sliding window; (B) Same at 4 kV (more detailed representation with a more pronounced shine-through); (C) simulated 3D multiscale porous structure; (D) simulated SEM image at the accelerating voltage of 1 kV; (E) Same at 4 kV; (FH) Different multimodal ML architectures: early, intermediate, late fusion, correspondingly. Сompiled from refs. [51,52] (CCL).
Materials 18 00534 g002
Figure 3. Prediction of hydrogen uptake by active carbons [77]. Left: different approaches to division of the data into training and test datasets (random selection of points vs. random selection of isotherms described as different approaches to stratification). The bottom approach, referred to as “group K-fold CV “, in fact implies using points of the same isotherm to predict the points from the control set. While this technique appears more precise, it is hardly relevant to the practical goal of predicting H2 adsorption on a new carbon sample. Right: Boxplot of feature importance values from 500 different RF models trained on different bootstrapped samples. The dashed lines show the average importance across all models, full lines show the medians, the circles show the outliers. Note that gas pressure is a feature in the model, which should lead to a noise in the predicted isotherms far exceeding the experimental error. A similar problem exists in treating the time in DTE models for chemical kinetics (see supporting info to ref. [36]). Alternatively, one may try approximation of H2 isotherms with adsorption models and use parameters as features. The outcome (importance of the surface area) is of course trivial from the physical point of view. Unfortunately, the predicted isotherms are not provided in the paper or supplementary materials. Compiled from ref. [77] (CCL).
Figure 3. Prediction of hydrogen uptake by active carbons [77]. Left: different approaches to division of the data into training and test datasets (random selection of points vs. random selection of isotherms described as different approaches to stratification). The bottom approach, referred to as “group K-fold CV “, in fact implies using points of the same isotherm to predict the points from the control set. While this technique appears more precise, it is hardly relevant to the practical goal of predicting H2 adsorption on a new carbon sample. Right: Boxplot of feature importance values from 500 different RF models trained on different bootstrapped samples. The dashed lines show the average importance across all models, full lines show the medians, the circles show the outliers. Note that gas pressure is a feature in the model, which should lead to a noise in the predicted isotherms far exceeding the experimental error. A similar problem exists in treating the time in DTE models for chemical kinetics (see supporting info to ref. [36]). Alternatively, one may try approximation of H2 isotherms with adsorption models and use parameters as features. The outcome (importance of the surface area) is of course trivial from the physical point of view. Unfortunately, the predicted isotherms are not provided in the paper or supplementary materials. Compiled from ref. [77] (CCL).
Materials 18 00534 g003
Figure 4. ML of pollutant permeation through granulated activated carbon bed. Left: an example of a breakthrough curve taken from the literature: black points are the raw data scanned from an original paper, red curve is the approximation with Pore-Surface Diffusion Model (PSDM [88]); light blue envelope represents the range of anticipated uncertainty of the PSDM output. Note that PSDM parameters were used in the DTE fit rather than just the raw data; bed volume is the volume of liquid solution passed through the bed. Right: test set prediction accuracy with the Random Forest approach for different carbon types and pollutant class; BV10 is bed volume of water that can be treated until MP breakthrough reached 10% of the influent MP concentration (compiled from the figures of ref. [86] and graphically edited for clarity; fair use of thesis material [86] and permission from ACS [87]).
Figure 4. ML of pollutant permeation through granulated activated carbon bed. Left: an example of a breakthrough curve taken from the literature: black points are the raw data scanned from an original paper, red curve is the approximation with Pore-Surface Diffusion Model (PSDM [88]); light blue envelope represents the range of anticipated uncertainty of the PSDM output. Note that PSDM parameters were used in the DTE fit rather than just the raw data; bed volume is the volume of liquid solution passed through the bed. Right: test set prediction accuracy with the Random Forest approach for different carbon types and pollutant class; BV10 is bed volume of water that can be treated until MP breakthrough reached 10% of the influent MP concentration (compiled from the figures of ref. [86] and graphically edited for clarity; fair use of thesis material [86] and permission from ACS [87]).
Materials 18 00534 g004
Figure 5. ML-assisted design of high-performance microporous polymer membranes. The large synthetic toolbox available for creating new polymers is simulated by translating the polymer into a binary “fingerprint,” which is input to the ML algorithm. The model is trained with a random subgroup of polymers from our literature database and then tested against the remaining polymers. The model is then applied to a large set of literature data to discover high-performance polymers, thus facilitating machine-assisted design. Reproduced without alternation from ref. [99] (CCL).
Figure 5. ML-assisted design of high-performance microporous polymer membranes. The large synthetic toolbox available for creating new polymers is simulated by translating the polymer into a binary “fingerprint,” which is input to the ML algorithm. The model is trained with a random subgroup of polymers from our literature database and then tested against the remaining polymers. The model is then applied to a large set of literature data to discover high-performance polymers, thus facilitating machine-assisted design. Reproduced without alternation from ref. [99] (CCL).
Materials 18 00534 g005
Figure 6. Workflow and the results of the search in chemical space for gas separation polymeric membranes. The top part demonstrated the workflow: SMILES are characterized by MFFs or RDKit descriptors [104] that serve as features. ML models are trained on the dataset of existing measurements and applied to the pools of available candidate polymers. The predictions are verified by MD simulations for selected candidates. Bottom left (A): candidate monomers selected with the ML algorithms. Bottom right (B): predicted selectivities for the training dataset and different pools of candidate structures (polyimides in blue, ladder polymers in green) with respect to the different version of the Robeson upper bound. Compiled from the figures of ref. [101] (CCL).
Figure 6. Workflow and the results of the search in chemical space for gas separation polymeric membranes. The top part demonstrated the workflow: SMILES are characterized by MFFs or RDKit descriptors [104] that serve as features. ML models are trained on the dataset of existing measurements and applied to the pools of available candidate polymers. The predictions are verified by MD simulations for selected candidates. Bottom left (A): candidate monomers selected with the ML algorithms. Bottom right (B): predicted selectivities for the training dataset and different pools of candidate structures (polyimides in blue, ladder polymers in green) with respect to the different version of the Robeson upper bound. Compiled from the figures of ref. [101] (CCL).
Materials 18 00534 g006
Figure 7. Illustration of ML application in the design of xerogel synthesis with maximized porosity and silanol group diversity, characterized by the full-width-half-maxima (FWHM) of infrared band. (a) silanol group types (b) SVM correlation between the characteristics of the original reactant formulation/synthesis procedure and the resulting FWHM. (c,d) SEM images of xerogels with maximized (Silica-37) and minimized (Silica-38) FWHM (e) catalytic performance of catalysts based on the samples shown in (c,d) panels illustration the importance of the silanol diversity. Despite a small dataset size, SVM correlations are instrumental in choosing the proper synthetic procedure, while multiple correlations based on two features are inconclusive. Compiled using images taken from ref [112] with a permission from Elsevier.
Figure 7. Illustration of ML application in the design of xerogel synthesis with maximized porosity and silanol group diversity, characterized by the full-width-half-maxima (FWHM) of infrared band. (a) silanol group types (b) SVM correlation between the characteristics of the original reactant formulation/synthesis procedure and the resulting FWHM. (c,d) SEM images of xerogels with maximized (Silica-37) and minimized (Silica-38) FWHM (e) catalytic performance of catalysts based on the samples shown in (c,d) panels illustration the importance of the silanol diversity. Despite a small dataset size, SVM correlations are instrumental in choosing the proper synthetic procedure, while multiple correlations based on two features are inconclusive. Compiled using images taken from ref [112] with a permission from Elsevier.
Materials 18 00534 g007
Figure 8. (a) Schematic illustration of a multi-stage ML framework for constructing a prediction model via active learning loops, data augmentation, and robot-human teaming. (b) An autonomous testing platform integrated with a robotic arm and a compression tester. 2D Voronoi tessellation diagrams (c) without and (d) with the glutaraldehyde incorporation after 8 active learning loops. (e) the mean absolute error (MAE) and the mean relative error (MRE) values of various prediction models based on linear regression, decision tree, XGBoost, RF, and ANN algorithms. (f) MAE (top) and MRE (bottom) values of various ANN models based on different virtual-to-real data ratios. Reproduced from ref. [125] (CCL).
Figure 8. (a) Schematic illustration of a multi-stage ML framework for constructing a prediction model via active learning loops, data augmentation, and robot-human teaming. (b) An autonomous testing platform integrated with a robotic arm and a compression tester. 2D Voronoi tessellation diagrams (c) without and (d) with the glutaraldehyde incorporation after 8 active learning loops. (e) the mean absolute error (MAE) and the mean relative error (MRE) values of various prediction models based on linear regression, decision tree, XGBoost, RF, and ANN algorithms. (f) MAE (top) and MRE (bottom) values of various ANN models based on different virtual-to-real data ratios. Reproduced from ref. [125] (CCL).
Materials 18 00534 g008
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Vishnyakov, A. Machine Learning in Computational Design and Optimization of Disordered Nanoporous Materials. Materials 2025, 18, 534. https://doi.org/10.3390/ma18030534

AMA Style

Vishnyakov A. Machine Learning in Computational Design and Optimization of Disordered Nanoporous Materials. Materials. 2025; 18(3):534. https://doi.org/10.3390/ma18030534

Chicago/Turabian Style

Vishnyakov, Aleksey. 2025. "Machine Learning in Computational Design and Optimization of Disordered Nanoporous Materials" Materials 18, no. 3: 534. https://doi.org/10.3390/ma18030534

APA Style

Vishnyakov, A. (2025). Machine Learning in Computational Design and Optimization of Disordered Nanoporous Materials. Materials, 18(3), 534. https://doi.org/10.3390/ma18030534

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop