niazi2012
niazi2012
niazi2012
Received: 29 November 2011, Revised: 25 January 2012, Accepted: 01 February 2012, Published online in Wiley Online Library
This review covers the application of Genetic Algorithms (GAs) in Chemometrics. The first applications of GAs in
chemistry date back to the 1970s, and in the last decades, they have been more and more frequently used to solve
different kinds of problems, for example, when the objective functions do not possess properties such as continuity,
differentiability, and so on. These algorithms maintain and manipulate a family, or population, of solutions and
implement a “survival of the fittest” strategy in their search for better solutions. GAs are very useful in the optimization
and variable selection in modeling and calibration because of the strong effect of the relationship between presence/
absence of variables in a calibration model and the prediction ability of the model itself. This review is not a complete
summary of the applications of GAs to chemometric problems; its goal is rather to show the researchers the main
fields of application of GAs, together with providing a list of references on the subject. Copyright © 2012 John Wiley
& Sons, Ltd.
operator for N times randomly selects a chromosome of the 2. APPLICATION OF GENETIC ALGORITHMS
population. The probability of a particular chromosome of being
selected is a function of its associated response so that the best
IN OPTIMIZATION
ones have a greater probability of being picked up than the worst Stochastic optimization techniques such as GAs are gaining
ones. Following this step, a new population is obtained in which increasing popularity in various fields of chemistry, and the
the best chromosomes are copied more often; this leads to a better number of papers describing successful applications continues
average response. In the cross-over step, the N chromosomes to grow at a quick rate [26–29]. These methods are especially
forming the new population are randomly paired to form N/2 pairs. beneficial when the search space is complex with many local
From each pair of “parents”, two new chromosomes (the minima (or maxima) so that conventional techniques fail to find
“offsprings”) will be created by randomly assigning to each of them the global minimum (or maximum) and a full search is not feasible.
the genes of one of the two parents. As a result, the cross-over Although it is generally accepted that stochastic methods are the
allows the exploration of new experimental conditions by best choice in complex search space, there is no guarantee that
mixing values of variables already tested, although in different they will find the global optimum [29].
combinations. Hibbert [30] used GAs to optimize the rate coefficients for the
Mutation: Although the cross-over operator is active at the gene hydrolysis of adenosine 5′-triphosphate by fitting a kinetic model
level (whole genes are involved), the mutation takes place at bit to concentration versus time data. The fastest convergence to a
level. To do this, for each bit of each chromosome, a random good optimum is achieved by a hybrid GA in which a steepest
number is drawn to decide whether it has to be affected by a descent, pseudo-Newton procedure is iterated with an incest-
mutation. If so, the bit will be flipped (it will become 0 if it was 1 preventing GA, each providing a starting point for the other. In a
and vice versa). This operator allows the “jump” to new regions study by Hartke [31], a GA is used to find the global minimum
of the experimental domain and avoids the risk of being stuck energy structure for Si4 on an empirical potential energy surface.
in some specific conditions (if a gene is the same in all the Given a suitable encoding of the cluster geometry, and an
chromosomes of the population, without mutations the value of exponential scaling of the potential energy values to obtain a
the corresponding variable will stay the same forever). fitness function, the GA can successfully optimize all degrees of
After the reproductions and the mutations, the new generation freedom. With the number of potential energy function evaluations
replaces the previous one and the algorithm continues from the as a measure, the GA is more economical than either a set of
evaluation of the response. Figure 1 shows a flowchart of a GA. traditional local minimizations or a molecular dynamics-simulated
In this paper, the authors will review the applications of GAs in annealing approach.
three different areas (optimization; quantitative structure-activity Other applications of GAs to optimization are reported in the
relationship (QSAR) and molecular modeling; multivariate papers by Weber et al. [32], Jiang et al. [33], Van Kampen
calibration); a list of miscellaneous examples in chemometrics et al. [34], Niesse and Mayne [35], Shaffer and Small [13], Lavine
will also be given. et al. [36], Hanger and Huttner [37], Smith and Gemperline [38],
Kabrede and Hentschke [39], and Chen et al. [40].
In 2005, Babic et al. [41] reported a method for optimization of
Start
a thin layer chromatography separation on the basis the use of
GA, and in 2006, Yu et al. [42] reported an application of GA
Define the architecture of the GA (coding of the to optimize the buffer system of micellar electrokinetic
variables, number of chromosomes, probability capillary chromatography for separating the active components
of mutation, response, termination criteria, …) contained in Chinese medicine. Chedly et al. in 2009 [43]
used a GA for multiobjective optimization of molded foams
Generate initial population characteristics. The effects of injection process parameters on
the properties of molded foams are investigated. The input
Select-copy optimization parameters considered are injection temperature,
mold temperature, injection speed, plasticization back pressure,
Cross-over and screw rotation speed during the plasticization phase. The
output optimization parameters considered are density, shock
Mutation
absorption, and acoustic absorption. Finally, models are used
to carry out multiobjective optimization of injected foam
characteristics in the presence of a few constraints on decision
Decode the chromosomes
variables. This optimization is carried out using a very robust
technique, Nondominated Sorting Genetic Algorithm II. Several
Evaluate the response of each chromosome two-objective functions involving sometimes the maximization
and other times the minimization of foam characteristics
have been studied to illustrate the procedures and explain and
Termination No interpret the results obtained.
criteria satisfied? Recently, several papers described applications of GAs in
optimization such as Madaeni et al. [44], Cano-Odena et al. [45],
Shi and Xue [46], and Vadood et al. [47]. Bhatti et al. in 2011 [48]
Yes
described response surface methodology and artificial neural
End network (ANN) approach for electrocoagulation of copper from
simulated wastewater. Multiobjective optimization for maximizing
Figure 1. Flow chart of a genetic algorithm (GA). the copper removal efficiency and minimizing the energy
wileyonlinelibrary.com/journal/cem Copyright © 2012 John Wiley & Sons, Ltd. J. Chemometrics (2012)
Genetic algorithms in chemometrics
consumption was carried out using GAs over the ANN model. The developed a QSAR program combining a GA with MLR and cross-
optimization procedure resulted in the creation of nondominated validation.
optimal points that gave an insight regarding the optimal Some studies of GAs applied to QSAR/QSPR are reported in
operating conditions of the process. the papers by Hoffman et al. [56], Ros et al. [57], Hemmateennejad
Milani and Milani [49] presented a simple closed form equation et al. [58–60], Niculescu [61], Fatemi et al. [62], Kompani-Zareh [63],
for the prediction of cross-linking of ethylene propylene diene Guo et al. [64], Niazi et al. [65], and Wang et al. [66].
monomer rubber during accelerated sulfur vulcanization. To Ghasemi and Ahmadi [67] applied GAs for variable selection in a
estimate numerically the degree of cross-linking, kinetic model QSAR study of a series of pure nonionic surfactants containing
constants are evaluated through a simple data fitting, performed linear alkyl, cyclic alkyl, and alkeyphenyl ethoxylates. Modeling of
on experimental rheometer curves. The fitting procedure is a cloud point of these compounds as a function of the theoretically
new one and is achieved using an ad-hoc GA, provided that a derived descriptors was established by MLR and partial least
few points, strictly required to estimate model unknown constants squares (PLS) regression. The results indicate that GA is a very
with sufficient accuracy, are selected from the whole experimental effective variable selection approach for QSPR analysis. The
curve. To assess the results obtained with the model proposed, comparison of the two regression methods used showed that
a number of different compounds are analyzed, for which PLS has better prediction ability than MLR.
experimental or numerical data are available from the literature. Jalali-Heravi and Kyani [68] applied GA-KPLS (kernel PLS) as a
The important cases of moderate and strong reversions are novel nonlinear feature selection method in QSAR study. This
also considered, experiencing a convincing convergence of the technique combines GA as a powerful optimization method with
analytical model proposed. KPLS as a robust nonlinear statistical method for variable selection.
This feature selection method is combined with ANN to develop a
nonlinear QSAR model for predicting activities of a series of
substituted aromatic sulfonamides as carbonic anhydrase II
3. APPLICATION OF GENETIC ALGORITHMS inhibitors. Superiority of this method (GA-KPLS-ANN) over MLR
IN QUANTITATIVE STRUCTURE-ACTIVITY and GA-PLS-ANN (in which a linear feature selection method has
RELATIONSHIP/MOLECULAR MODELING been used) indicates that the GA-KPLS approach is a powerful
method for the variable selection in nonlinear systems.
Quantitative structure-activity relationship and quantitative Gharagheizi [69] reported using GA-based MLR for solubility
structure–property relationship (QSPR) studies are essentially parameter studies. Recently, several papers have been published
applied to chemometrics, pharmacodynamics, pharmacokinetics, by Riahi et al. [70], Ghavami et al. [71], Goodarzi et al. [72], Afiuni-
toxicity, and so on. A major step in constructing QSAR/QSPR Zadeh and Azimi [73], and Hao et al. [74].
models is finding one or more molecular descriptors. A wide
variety of descriptors have been reported to be used in QSAR
analysis. Whether by traditional methods or multivariate-based 4. APPLICATIONS OF GENETIC ALGORITHMS
techniques, the success of a modeling study depends also on IN MULTIVARIATE CALIBRATION
the selection of variables (molecular descriptors) and on the
representation of information. Variables should represent the Multivariate calibration is used to develop a quantitative
maximum information in activity variations, and collinearity relationship between the predictor variables in X and the
among them must be kept to a minimum. Among different response variable(s) in Y. Recently, multivariate calibration
variable selection strategies, GAs are an interesting, flexible, and underwent several enhancement/extensions [75,76] that have
widely used alternative [50,51]. found widespread use in analytical science. Nowadays, spectral
In 1998, Hou et al. applied a GA to the QSAR research of data are perhaps the most common type of data to which
pyrrolobenzothiazepinones and pyrrolobenzoxazepinones inhibi- chemometric techniques are applied. Owing to the development
tory activities with non-nucleoside HIV-1 reverse transcriptase [52]. of new instrumentation, data sets in which each object is
In 1999, Meusinger and Moros [53] determined the influence of described by several hundreds of variables can be easily
the molecular structure of organic compounds on their knocking obtained. Calibration methods, being based on latent variables,
behavior by using a nonbinary GA. Results obtained by GA allow taking into account the whole spectrum without having
were significantly better than those obtained by multiple linear to perform a previous feature selection. In the last decades, it
regression (MLR). The molecular structures of 240 potential gasoline has anyway been recognized that an efficient feature selection
components were described by 16 different structural groups. can be highly beneficial both to improve the predictive ability
Partial octane numbers were calculated for the structural groups to the model and to greatly reduce its complexity.
related to the substance classes paraffins, naphthenes, olefins, One of the greatest problems in multivariate analysis is to select
aromatics, and oxygenates. The sum of the calculated partial the combination of variables that produces the best result. This
octane numbers supplies the octane number of the compound. goal is attained through the elimination of those variables that
An MLR, a neural network, and a GA were used for the computations produce noise or that, although giving good information, are
of the connections between the structural groups and the knock strictly correlated with other already selected variables. Feature
ratings. Results obtained by GA were significantly better than those selection is very important both in studies of correlation and in
obtained by MLR. studies of classification and modeling.
In 1999, Hou et al. applied GAs to the structure-activity correlation Genetic algorithms have found widespread application in
study of a group of non-nucleoside HIV-1 inhibitors and some several fields involving multivariate calibration because one of
cinnamamides [54,55]. In these studies, it has been demonstrated the most important steps in a calibration is the selection of the
that GAs are very useful in data analysis and that they can be relevant variables. Leardi et al. [77] published one of the very first
applied as a very powerful technique in QSAR. The authors papers about the application of GAs to variable selection. Lucasius
J. Chemometrics (2012) Copyright © 2012 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/cem
A. Niazi and R. Leardi
and Kateman [78] showed that a GA generally performs better that Hervas et al. [120] coupled GAs and pruning computational
simulated annealing and stepwise regression; on the other hand, neural networks for the selection of the number of inputs
Horchner and Kalivas [79] demonstrated that simulated annealing required to correct temperature variations in kinetic-based
can give the same results. Wise et al. [80] also developed a GA for determinations. Giro et al. [121] developed a new methodology
feature selection. to design conducting polymers on the basis of the use of GAs
Broudiscou et al. [81] described a new technique based on GAs coupled to negative factor counting techniques. The authors
for constructing experimental designs; also, in 1996, Jouan- showed the results for a case study of polyanilines, one of
Rimbaud et al. [82] studied the random correlation in variable the most important families of conducting polymers. The
selection using GA in multivariate calibration. Several papers about methodology proved to be able of generating automatic
the application of GAs in multivariate calibration were published solutions for the problem of determining the optimum relative
before 2000 [83–90]. concentration for binary and ternary disordered polyaniline
In 2001, Liu and Wang [91,92] used a GA for the quantitative alloys exhibiting metallic properties.
analysis of overlapped spectra in Fourier transform infrared Maeder et al. [122] reported the application of GAs to the task
spectroscopy (FTIR) data, and Yoshida et al. [93] used a GA for of determining initial parameter estimates that lie near the
feature selection in mass data. Leardi et al. [94] used a GA for global optimum. In iterative nonlinear least squares fitting, the
variable selection for multivariate calibration for predicting reliable estimation of initial parameters that lead to convergence
concentrations in polymer films in FTIR data, and several to the global optimum can be difficult. Irrespective of the
researchers [95–115] published papers in which they used GAs algorithm used, poor parameter estimates can lead to abortive
for variable selection in different fields such as spectroscopy, divergence or in rare cases convergence to a local optimum.
electrochemistry, and chromatography. For the determination of the parameters of complex reaction
Goicoechea and Olivieri [95] presented a new method for mechanisms, where often little is known about what value these
wavelength interval selection with a GA to improve the predictive parameters should take, the task of determining good initial
ability of PLS calibration. It involves separately labeling each of the estimates can be time consuming and unreliable. In this
selected sensor ranges with an appropriate inclusion ranking. The contribution, the methodology of applying a GA to the task of
new approach intends to alleviate overfitting without the need of determining initial parameter estimates that lie near the global
preparing an independent monitoring sample set. A theoretical optimum is explained. A generalized GA was implemented
example is worked out to compare the performance of the new according to the methodology, and the results of its application
approach with previous implementations of GAs. Two experimen- are also given. The parameter estimates obtained were then
tal data sets are also studied: target parameters are the concentra- used as the starting parameters for a gradient search method,
tion of glucuronic acid in complex mixtures studied by Fourier which quickly converged to the global optimum. The GA was
transform mid-infrared spectroscopy and the octane number in successfully applied to both simulated kinetic measurements
gasolines monitored by near-infrared spectroscopy. Ghasemi where the reaction mechanism contained one equilibrium
et al. [98] proposed GAs for selecting wavelengths for PLS constant and two rate constants to be fitted and to kinetic
calibration using spectrophotometric method. The method is measurements of the complexation.
based on the development of the reaction between the analytes Fatemi et al. [123] used GAs in kinetic modeling and
and Zincon reagent. A series of synthetic solutions containing reaction mechanism studies. This study is focused on the
different concentrations of copper and zinc were used to check development of a systematic computational approach that
the prediction ability of the GA-PLS models. implements GA to find the optimal rigorous kinetic models.
Majidi et al. [104] used GAs for potential selection in differential This model consists of eight continuous parameters (e.g.,
pulse voltammetry method in simultaneous determination of Arrhenius and Van’t Hoff parameters) and six discrete
cysteine, tyrosine, and tryptophan on the unmodified glassy parameters representing the order of the reaction with respect
carbon electrode. The main difficulty in the analysis of these to each concentration. The optimal values of these parameters
analytes in the same samples is the high degree of overlapping have been obtained on the basis of GA. Furthermore, the best
of the voltammograms. The relationships between the currents type of Genetic operators and their corresponding parameters
and the concentrations are complex and highly nonlinear. The for this type of problems have been obtained on the basis of a
predictive ability of principal component regression (PCR), PLS, comprehensive study of the effect of these parameters on the
GA-PLS, and principal component-artificial neural network (PC- efficiency of the GA.
ANN) were examined for simultaneous determination of three Gianoli et al. [124] reported the application of GAs in kinetic
amino acids. For a regression model, everything that does not help modeling, and also, Sadi and Dabir [125] applied GAs for the
in constructing the model may be considered as noise. PC-ANN determination of kinetic parameters of free radical polymeriza-
and GA-PLS use significant data and show superiority over other tion of vinyl acetate by multiobjective optimization technique.
applied multivariate methods. Harris [126] studied applications of GAs for obtaining structure
solution from powder X-ray diffraction data, and Guruprasad
and Behera [127] applied GAs to textile.
5. MISCELLANEOUS APPLICATIONS
Genetic algorithms were employed in curve fitting [116]. In 1995,
Benedetti and Morosetti [117] reported the application of a GA Acknowledgement
to search for optimal and suboptimal RNA secondary structures.
In 1996, Dods et al. [118] used a GA approach for fitting polyatomic Financial support from the Italian Ministry of University and
spectra. Kariuki et al. [119] described the development of GAs for Research (PRIN 2008, CUP:D31J0000020001) is gratefully
solving crystal structures directly from powder diffraction data. acknowledged.
wileyonlinelibrary.com/journal/cem Copyright © 2012 John Wiley & Sons, Ltd. J. Chemometrics (2012)
Genetic algorithms in chemometrics
REFERENCES 33. Jian JH, Wang JH, Song XH, Yu RQ. Network training and architecture
optimization by a recursive approach and modified genetic
algorithm. J. Chemometr. 1996; 10: 253–267.
1. Holland JH. Adaptation in Natural and Artificial Systems. The 34. Van Kampen AHC, Buydens LMC, Lucasius, CB, Blommers MJJ.
University of Michigan Press, Michigan, 1975. Optimization of metric matrix embedding by genetic algorithms.
2. Mitchell M. An introduction to genetic algorithms. The MIT Press, J. Biomol. 1996; 7: 214–224.
Massachusetts, 1999. 35. Niesse JA, Mayne HR. global optimization of atomic and molecular
3. Otto M. Chemometrics. Wiely-VCH Verlag GmbH and Co.: clusters using the space-fixed modified genetic algorithm method.
Weinheim, 2007. J. Comput. Chem. 1997; 18: 1233–1244.
4. Massart DL, Vandeginste BGM, Buydens LMC, De Long S, Lewi PJ, 36. Lavin BK, Moores A, Helfend LK. Genetic algorithm for pattern
Smeyers-Verbeke J. Handbook of Chemometrics and Qualimetrics, recognition analysis of pyrolysis gas chromatographic data. J. Anal.
Part A. Elsevier Science: Amsterdam, 1997. Appl. Pyrol. 1999; 50: 47–62.
5. Vandeginste BGM, Massart DL, Buydens LMC, De Long S, Lewi PJ, 37. Hanger J, Huttner G. Optimization and analysis of force field
Smeyers-Verbeke J. Handbook of Chemometrics and Qualimetrics, parameters by combination genetic algorithms and neural networks.
Part B. Elsevier Science: Amsterdam, 1998. J. Comput. Chem. 1999; 20: 455–471.
6. Forrest S. Genetic algorithms: principles of natural selection applied 38. Smith BM, Gemperline PJ. Wavelength selection and optimization
to computation. Science 1993; 261: 872–878. of pattern recognition methods using the genetic algorithm. Anal.
7. Maddox J. Genetics helping molecular dynamics. Nature 1995; Chim. Acta 2000; 423: 167–177.
376: 209. 39. Kabrede H, Hentschke R. An improved genetic algorithm for global
8. Lucasius CB, Kateman G. Understanding and using genetic optimization and its application to sodium chloride clusters. J. Phys.
algorithms. Part 1: concepts, properties and context. Chemometr. Chem. B 2002; 106: 10089–10095.
Intell. Lab 1993; 19: 1–33. 40. Chen XG, Li X, Kong L, Ni JY, Zhao RH, Zou HF. Application of
9. Hibbert DB. Genetic algorithms in chemistry. Chemometr. Intell. Lab uniform design and genetic algorithm in optimization of reversed-
1993; 19: 277–293. phase chromatographic separation. Chemometr. Intell. Lab 2003;
10. Lucasius CB, Beckers MLM, Kateman G. Genetic algorithms in 67: 157–166.
wavelength selection: a comparative study. Anal. Chim. Acta 1994; 41. Babic S, Horvat AJM, Kastelan-Macan M. Use of a genetic algorithm
286: 135–153. to optimize TLC separation. J. Planar Chromat. 2005; 18: 112–117.
11. Kemsley EK. A genetic algorithm (GA) approach to the calculation 42. Yu K, Lin Z, Cheng Y. optimization of the buffer system of micellar
of canonical variates. Trends Anal. Chem. 1998; 17: 24–34. electrokinetic capillary chromatography for the separation of the
12. Tominaga Y. Representative subset selection using genetic active components in Chinese medicine ‘SHUANGDAN’ granule by
algorithms. Chemometr. Intell. Lab 1998; 43: 157–163. genetic algorithm. Anal. Chim. Acta 2006; 562: 66–72.
13. Shaffer RE, Small GW. Learning optimization from nature: simulated 43. Chedly S, Chettah A, Ichchou MN. Multiobjective optimization of
annealing and genetic algorithms. Anal. Chem. 1997; 69: 236A-242A. molded LDPE foams characteristics using genetic algorithm. J. Appl.
14. Wehrens R, Buydens LMC. Evolutionary optimization: a tutorial. Ploym. Sci. 2009; 114: 358–368.
Trends Anal. Chem. 1997; 17: 193–203. 44. Madaeni SS, Hasankiadeh NT, Kurdian AR, Rahipour A. Modeling
15. Luke BT. An overview of genetic methods. In Genetic Algorithms in and optimization of membrane fabrication using artificial neural
Molecular Modeling, Devillers J (ed.). Academic Press: New York, network and genetic algirthm. Sep. Purif. Technol. 2010; 76: 33–43.
1996; 35–66. 45. Cano-Odena A, Spilliers M, Dedroog T, De Grave K, Raman J,
16. Luke BT. Genetic algorithms and beyond. Data Handl. Sci. Techn. Vankelecom IFJ. Optimization of cellulose acetate nanaofilteration
2003; 23: 3–54. membrane for micropollutant removal via genetic algorithms and
17. Luke BT. Applying genetic algorithms and neural networks to high throughout experimentation. J. Membrane Sci. 2011; 366: 25–32.
chemometric problems. Data Handl. Sci. Techn. 2003; 23: 343–375. 46. Shi J, Xue X. Optimization design of electrodes for anode-supported
18. Meusinger R, Himmelreich U. Neural networks and genetic algorithms solid oxide fuel cells via genetic algorithm. J. Electrochem. Soc. 2011;
applications in nuclear magnetic resonance spectroscopy. Data 158: B143-B151.
Handl. Sci. Techn. 2003; 23: 281–321.
47. Vadood M, Semnani D, Morshed M. Optimization of acrylic dry
19. Hibbert DB. Hybrid genetic algorithms. Data Handl. Sci. Techn. 2003;
spinning production line by using artificial neural network and
23: 55–68.
genetic algorithm. J. Appl. Polym. Sci. 2011; 120: 735–744.
20. Maiocchi A. Genetic algorithms in molecular modeling: a review.
48. Bhatti MS, Kapoor D, Kalia RK, Reddy AS, Thukral AK. RSM and ANN
Data Handl. Sci. Techn. 2003; 23: 109–139.
modeling for electrocoagulation of copper from simulated
21. Leardi R. Genetic algorithms in chemometrics and chemistry: a
wastewater; multi objective optimization using genetic algorithm
review. J. Chemometr. 2001; 15: 559–569.
approach. Desalination 2011; 274: 74–80.
22. Leardi R. Genetic algorithm-PLS as a tool for wavelength selection
in spectral data sets. Data Handl. Sci. Techn. 2003; 23: 169–196. 49. Milani G, Milani F. EPDM accelerated sulfur vulcanization: A kinetic
23. Leardi R. Genetic algorithms in chemistry. J. Chromatogr. A 2007; model based on a genetic algorithm. J. Math. Chem. 2011; 49:
1158: 226–233. 1357–1383.
24. Hou T, Xu X. Applications of genetic algorithms to computer-aided 50. Zupan J, Novic M. General type of a uniform and reversible
drug design. Prog. Chem. 2004; 16: 35–41. representation of chemical structures. Anal. Chim. Acta 1997; 348:
25. Jouan-Rimbaud D, Massart DL, Leardi R, De Noord OE. Genetic 409–418.
algorithms as a tool for wavelength selection in multivariate 51. Kompani-Zareh M, Mirzaei M. Genetic algorithm-based method for
calibration. Anal. Chem. 1995; 67: 4295–4301. selection conditions in multivariate determination of povidone-
26. Clark DE, Westhead DR. Evolutionary algorithms in computer-aided iodine using hand scanner. Anal. Chim. Acta 2004; 521: 231–236.
molecular design. J. Comput. Aid. Mol. Des. 1996; 10: 337–358. 52. Hou TJ, Wang JM, Li YY, Xu XY. Application of genetic algorithm to
27. Devillers J. Genetic Algorithms in Molecular Modeling. Principles of the QSAR research of pyrrolobenzothiazepinones and pyrroloben-
QSAR and Drug Design. Academic Press: New York, 1996. zoxazepinone-novel and specific non-nucleoside HIV-1 reverse
28. Judson RS. Genetic algorithms and their use in chemistry. In Review transcription inhibitors. Chin. Chem. Lett. 1998; 9: 651–654.
in Computational Chemistry, Lipkowitz KB, Boyd DB (eds). VCH 53. Meusinger R, Moros R. Determination of quantitative structure-
Publishers: New York, 1997. octane rating relationships of hydrocarbons by genetic algorithms.
29. Wehrens R, Prestsch E, Buydens LMC. The quality of optimization by Chemometr. Intell. Lab 1999; 46: 67–78.
genetic algorithms. Anal. Chim. Acta 1999; 388: 265–271. 54. Hou TJ, Wang JM, Xu XJ. Applications of genetic algorithms on the
30. Hibbert DB. Ahybrid genetic algorithm for the estimation of kinetic structure-activity correlation study of a group of nin-nucleoside
parameters. Chemometr. Intell. Lab 1993; 19: 319–329. HIV-1 inhibitors. Chemometr. Intell. Lab 1999; 45: 303–310.
31. Hartke B. Global geometry optimization of clusters using genetic 55. Hou TJ, Wang JM, Liao N, Xu XJ. Applications of genetic algorithms on
algorithms. J. Phys. Chem. 1993; 97: 9973–9976. the structure-activity relationship analysis of some cinnamamides. J.
32. Weber L, Wallbaum S, Broger C, Gubernator K. Optimization of the Chem. Inf. Comp. Sci. 1999; 39: 775–781.
biological activity of combinatorial compound libraries by a genetic 56. Hoffman BT, Kopajtic T, Katz JL, Newman AH. 2D QSAR modeling
algorithm. Angew. Chem. 1995; 34: 2280–2282. and preliminary database searching for dopamine transporter
J. Chemometrics (2012) Copyright © 2012 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/cem
A. Niazi and R. Leardi
inhibitors using genetic algorithm variable selection of Molconn Z 80. Wise BM, Gallagher NB, Eschbach PA, Sharpe SW, Griffin JW.
descriptors. J. Med. Chem. 2000; 43: 4151–4159. Optimization of prediction error using genetic algorithms and
57. Ros F, Pintore M, Chretien JR. Molecular descriptor selection continuum regression: determination of the reactivity of automobile
combining genetic algorithms and fuzzy logic: application to emissions from FTIR spectra. Fourth Scand. Symp. on Chemometrics
database mining procedure. Chemometr. Intell. Lab 2002; 63: 15–26. (SSC4), Lund, June 1995.
58. Hemmateenejad B, Miri R, Akhond M, Shamsipur M. QSAR study of the 81. Broudiscou A, Leardi R, Phan-Tan-Luu R. Genetic algorithm as a tool
calcium channel antagonist activity of some recently synthesized for selection of D-optimal design. Chemometr. Intell. Lab 1996; 35:
dihydropyridine derivatives: an application of genetic algorithm for 105–116.
variable selection in MLR and PLS methods. Chemometr. Intell. Lab 82. Jouan-Rimbaud D, Massart DL, De Noord OE. Random correlation
2002; 64: 91–99. in variable selection for multivariate calibration with a genetic
59. Hemmateenejad B, Akhond M, Miri R, Shamsipur M. Genetic algorithm. Chemometr. Intell. Lab 1996; 35: 213–220.
algorithm applied to the selection of factors in principal component- 83. Brodhurst D, Goodacre R, Jones A, Rowland JJ, Kell DB. Genetic
artificial neural networks: application to QSAR study of calcium algorithms as a method for variable selection in multiple linear
channel antagonist activity of 1,4-dihydropyridines. J. Chem. Inf. Comp. regression and partial least squares regression. Anal. Chim. Acta
Sci. 2003; 43: 1328–1334. 1997; 348: 71–86.
60. Hemmateenejad B. Optimal QSAR analysis of the carcinogenic 84. Acros MJ, Alonso C, Ortiz MC. Genetic-algorithm-based potential
activity of drugs by correlation ranking and gentic algorithm-based selection in multivariate voltammetric determination of idomethacin
PCR. J. Chemometr. 2004; 18: 475–485. and acemethacin by partial least squares. Electrochim. Acta 1998;
61. Niculescu SP. Artificial neural networks and genetic algorithms in 43: 479–485.
QSAR. J. Mol. Struct. (THEOCHEM) 2003; 622: 71–83. 85. Leardi R, Lupianez Gonzalez A. Genetic algorithm applied to feature
62. Fatemi MH, Jalali-Heravi M, Konuze E. Prediction of bioconcentration selection in PLS regression: How and when to use them. Chemometr.
factor using genetic algorithm and artificial neural network. Anal. Intell. Lab 1998; 41: 195–207.
Chim. Acta 2003; 486: 101–108. 86. Ding Q, Small GW, Arnold MA. Genetic algorithm-based wavelength
63. Kompani-Zareh M. A QSPR study of boiling point of saturated alcohols selection for the near-infrared determination of glucose in biological
using genetic algorithm. Acta Chim. Slov. 2003; 50: 259–273. matrixes: initialization strategies and effects of spectral resolution.
64. Guo W, Cai W, Shao X, Pan Z. Application of genetic stochastic Anal. Chem. 1998; 70: 4472–4479.
resonance algorithm to quantitative structure-activity relationship 87. Frost VJ, Molt K. Use of genetic algorithm for factor selection in
study. Chemometr. Intell. Lab 2005; 75: 181–188. principal component regression. J. Near Infrared Spec. 1998; 6:
65. Niazi A, Jameh-Bozorghi S, Nori-Shargh D. Prediction of acidity A185-A190.
constants of thiazolidine-4-carbozylic acid derivatives using Ab initio 88. Wang J, Xian R, Yang B, Wang D, Wang Y, Chen S. Application of ge-
and genetic algorithm-partial least squares. Turk. J. Chem. 2006; netic algorithm-spectrophotometric method for the multicomponent
30: 619–628. simultaneous determination of rare earth elements in geological
66. Wang J, Krudy G, Xie XQ, Wu C, Holland G. Genetic algorithm- samples. Fenxi Huazue 1999; 27: 955–956.
optimized QSPR model for bioavailability, protein binding, and 89. Roger JM, Bellon-Maurel V. Using genetic algorithms to select
urinary excretion. J. Chem. Inf. Model. 2006; 46: 2674–2683. wavelengths in near-infrared spectra: application to sugar content
67. Ghasemi J, Ahmadi S. Combination of genetic algorithm and partial prediction in cherries. Appl. Spectrosc. 2000; 59: 1313–1320.
least squares for cloud point prediction of nonionic surfactants 90. Leardi R. Application of genetic algorithm-PLS for feature selection
from molecular structures. Ann. Chim. 2007; 97: 69–83. in spectral data sets. J. Chemometr. 2000; 14: 643–655.
68. Jalali-Heravi M, Kyani A. Application of genetic algorithm-kernel 91. Liu F, Wang JD. Using genetic algorithm for quantitative analysis of
partial least squares as a novel nonlinear feature selection method: overlapped spectra in FTIR spectra. Spectroscopy Spectral Anal.
activity of carbonic anhydrase II inhibitors. Eur. J. Med. Chem. 2007; 2001; 21: 609–610.
45: 649–659. 92. Liu F, Wang JD. Application of a genetic algorithm to quantitative
69. Gharagheizi F. QSPR studies for solubility parameter by means of analysis of overlapped FTIR spectra. Spectrosc. Lett. 2001; 34: 13–24.
genetic algorithm-based multivariate linear regression and generalized 93. Yoshida H, Leardi R, Funatsu K, Varmuza K. Feature selection by
regression neural network. QSAR Comb. Sci. 2008; 27: 165–170. genetic algorithms for mass spectral classifiers. Anal. Chim. Acta
70. Riahi S, Ganjali MR, Pourbasheer E, Norouzi P. QSPR study of GC retention 2001; 446: 485–494.
indices of essential oil compounds by multiple linear regression with a 94. Leardi R, Seasholtz MB, Pell RJ. Variable selection for multivariate
genetic algorithm. Chromatographia 2008; 67: 917–922. calibration using a genetic algorithm: prediction of additive
71. Ghavami R, Najafi A, Sajadi M, Djannaty F. Genetic algorithm as concentrations in polymer films from Fourier transform-infrared
variable selection procedure for the simulation of 13 C nuclear spectral data. Anal. Chim. Acta 2002; 461: 189–200.
magnetic resonance spectra of flavonoid derivatives using multiple 95. Goicoechea HC, Olivieri AC. Wavelength selection for multivariate
linear regression. J. Mol. Graph. Model. 2008; 27: 105–115. calibration using a genetic algorithm: a novel initialization strategy.
72. Goodarzi M, Freitas MP, Wu CH, Duchowicz PR. pKa modeling and J. Chem. Inf. Comp. Sci. 2002; 45: 1146–1153.
prediction of series of pH indicators through genetic algorithm- 96. Dieterle F, Kieser B, Gauglitz G. Genetic algorithms and neural
least square support vector regression. Chemometr. Intell. Lab networks for quantitative analysis of ternary mixtures using surface
2010; 101: 102–109. plasmon resonance. Chemometr. Intell. Lab 2003; 65: 67–81.
73. Afiuni-Zadeh S, Azimi G. A QSAR for modeling of 8-azaadenine 97. Chen K, Li T, Lu P. Application of genetic algorithms in resolution of
analogues proposed as Al adenosine receptor antagonists using chromatogram. Fenxi Huaxue 2003; 31: 158–162.
genetic algorithm coupling adaptive neuro-fuzzy inference system. 98. Ghasemi J, Niazi A, Leardi R. Genetic-algorithm-based wavelength
Anal. Sci. 2010; 26: 897–902. selection in multicomponent spectrophotometric determination
74. Hao, M, Li Y, Wang Y, Zhang S. Prediction of P2Y12 antagonists by PLS: application on copper and zinc mixture. Talanta 2003;
using a novel genetic algorithm-support vector machine coupled 59: 311–317.
approach. Anal. Chim. Acta 2011; 690: 56–63. 99. Goicoechea HC, Olivieri AC. A new family of genetic algorithms for
75. Gabrielsson J, Trygg J. Recent developments in multivariate calibration. wavelength interval selection in multivariate analytical spectroscopy.
Crit. Rev. Anal. Chem. 36; 2006: 243–255. J. Chemometr. 2003; 17: 338–345.
76. Wold S, Trygg J, Berglund A, Antii H. Some recent developments in 100. Lestander TA, Leardi R, Geladi P. Selection of near infrared
PLS mg. Chemometr. Intell. Lab 2001; 58: 131–151. wavelengths using genetic algorithms for the determination of
77. Leardi R, Boggia R, Terrile M. Genetic algorithms as a strategy for seed moisture content. J. Near Infrared Spec. 2003; 11: 433–446.
feature selection. J. Chemometr. 1992; 6: 267–281. 101. Abdollahi H, Bagheri L. Simultaneous spectrophotometric
78. Lucasius CB, Kateman G. Genetic algorithms for large-scale optimi- determination of vitamin K3 and 1,4-naphthoquinone after cloud
zation in chemometrics: an application. Trends Anal. Chem. 1991; point extraction by using genetic algorithm based wavelength
10: 254–261. selection-partial least squares regression. Anal. Chim. Acta 2004;
79. Horchner U, Kalivas JH. Further investigation on a comparative 514: 211–218.
study on simulated annealing and genetic algorithm for wave- 102. Abdollahi H, Bagheri L. Simultaneous spectrophotometric of p-
lengths selection. Anal. Chim. Acta 1995; 311: 1–13. benzoquinone and chloranil after microcrystalline naphthalene
wileyonlinelibrary.com/journal/cem Copyright © 2012 John Wiley & Sons, Ltd. J. Chemometrics (2012)
Genetic algorithms in chemometrics
extraction using genetic algorithm-based wavelength selection- 114. Arakawa M, Yamashita Y, Funatsu K. Genetic algorithm-based
partial least squares regression. Anal. Sci. 2004; 20: 1701–1706. wavelength selection method for spectral calibration. J. Chemometr.
103. Kompani-Zareh M, Farrokhi-Kurd S. Genetic algorithm applied to 2011; 25: 10–19.
the selection of conditions for the simultaneous quantification of 115. De Weijer AP, Lucasius CB, Buydens LMC, Kateman G, Heuvel HM,
three-food colorants using a hand scanner. Microchim. Acta 2005; Mannee H. Curve fitting using natural computation. Anal. Chem.
150: 77–85. 1994; 66: 23–31.
104. Majidi MR, Jouyban A, Asadpour-Zeynali K. Genetic algorithm based 116. Dane AD, Veldusi A, de Beer DKG, Leenaers AJG, Buydens LMC.
potential selection in simultaneous voltammetric determination of Application of genetic algorithms for characterization of thin layer
isoniazid and hydrazine by using partial least squares and artificial materials by glancing incidence X-ray refractometry. Physica B
neural networks. Electroanalysis 2005; 17: 915–918. 1998; 253: 254–268.
105. Zinn P, Adaptive multicomponent analysis by genetic algorithms. 117. Benedetti G, Morosetti S. A genetic algorithm to search for optimal
J. Chem. Inf. Model. 2005; 45: 880–887. and suboptimal RNA secondary structures. Biophys. Chem. 1995; 55:
106. Reynes C, De Souza S, Sabatier R, Figueres G, Vidal B. Selection of 253–259.
discriminant wavelength intervals in NIR spectrometry with genetic 118. Dods J, Gruner D, Brumer P. A genetic algorithm approach to fitting
algorithms. J. Chemometr. 2006; 20: 136–145. polyatomic spectra via geometry shifts. Chem. Phys. Lett. 1996; 261:
107. Niazi A, Soufi A, Mobarakabadi M. Genetic algorithm applied to 612–619.
selection of wavelength in partial least squares for simultaneous 119. Kariuki BM, Johnston RL, Harris KDM, Psallidas K, Ahn S, Serrano-
spectrophotometric determination of nitrophenol isomers. Anal. Gonzalez H. Application of a genetic algorithm in structure determi-
Lett. 2006; 39: 2359–2372. nation from powder diffraction data. Match 1998; 38: 123–135.
108. Ghasemi J, Ebrahimi DM, Hejazi L, Leardi R, Niazi A. Simultaneous 120. Hervas C. Algar JA, Silva M. Correction of temperature variations in
kinetic-spectrophotometric determination of sulfide and sulfite by kinetic-based determinations by use of pruning computational
partial least squares and genetic algorithms variable selection. neural networks in conjucation with gentic algorithms. J. Chem.
J. Anal. Chem. 2007; 62: 348–354. Inf. Comp. Sci. 2000; 40: 724–731.
109. Carneiro RL, Braga JWB, Bottoli CBG, Poppi RJ. Application of 121. Giro R, Cyrillo M, Galvao DS. Designing conducting polymers using
genetic algorithm for selection of variables for the BLLS method genetic algorithms. Chem. Phys. Lett. 2002; 366: 170–175.
applied to determination of pesticides and metabolites in wine. 122. Maeder M, Neuhold YM, Puxty G. Applications of a genetic algorithm:
Anal. Chim. Acta 2007; 595: 51–58. near optimal estimation of the rate and equilibrium constants of com-
110. Tewari JC, Dixit V, Cho BK, Malik KA. Detemination of origin and plex reaction mechanism. Chemometr. Intell. Lab 2004; 70: 193–203.
sugars of citrus fruits using genetic algorithm, correspondence 123. Fatemi S, Masoori M, Bozorgmehry Boozarjomehry R. Application of
analysis and partial least square combined with fiber optic NIR genetic algorithm in kinetic modeling and reaction mechanism
spectroscopy. Spectrochim. Acta A 2008; 71: 1119–1127. studies. Iran. J. Chem. Chem. Eng. 2005; 24: 37–46.
111. Fei Q, Li M, Wang B, Huan Y, Feng G, Ren Y. Analysis of cefalexin 124. Gianoli SI, Puxty G, Fisher U, Maeder M, Hungerbuchler K. Empirical
with NIR spectrometry coupled to artificial neural networks with kinetic modeling of on line simultaneous infrared and calorimetric
modified genetic algorithm for wavelength selection. Chemometr. measurement using a Pareto optimal approach and multi-objective
Intell. Lab 2009; 97: 127–131. genetic algorithm. Chemometr. Intell. Lab 2007; 85: 47–62.
112. Zou X, Zhao J, Mao H, Shi J, Yin X, Li Y. Genetic algorithm interval 125. Sadi M, Dabir B. Application of genetic algorithm to determine
partial least squares regression combined successive projection kinetic parameters of free radical polymerization of vinyl acetate
algorithm for variable selection in near-infrared quantitative by multi-objective optimization technique. Iran. J. Chem. Chem.
analysis of pigment in cucumber leaves. Appl. Spectrosc. 2010; Eng. 2007; 26: 29–37.
64: 786–794. 126. Harris KDM. Fundamentals and applications of genetic algorithms
113. Csefalvayova L, Pelikan M, Kralj Cigic I, Kolar J, Strli M. Use of genetic for structure solution from powder X-ray diffraction data. Comp.
algorithms with multivariate regression for determination of Mat. Sci. 2009; 45: 16–20.
gelatine in historic papers based on FT-IR and NIR spectral data. 127. Guruprasad R, Behera BK. Genetic algorithms and its application to
Talanta 2010; 82: 1784–1790. textile. Textile Asia 2009; 40: 35–38.
J. Chemometrics (2012) Copyright © 2012 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/cem