Abstract
Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abeare, S. H. M. (2009). Comparisons of boosted regression tree, GLM and GAM performance in the standardization of Yellowfin Tune catch-rate data from the gulf of Mexico Lonline fishery (MSc Thesis, p. 94). Pretoria: Department of Oceanography and Coastal Sciences.
Aertsen, W., Kint, V., Van Orshoven, J., Özkan, K., & Muys, B. (2010). Comparison and ranking of different modeling techniques for prediction of site index in Mediterranean mountain forests. Ecological Modelling, 221, 1119–1130.
Aertsen, W., Kint, V., Van Orshoven, J., & Muys, B. (2011). Evaluation of modelling techniques for forest site productivity prediction in contrasting ecoregions using stochastic multicriteria acceptability analysis (SMAA). Environmental Modelling and Software, 26(7), 929–937.
Akgun, A., Sezer, E. A., Nefeslioglu, H. A., Gokceoglu, C., & Pradhan, B. (2012). An easy-to-use MATLAB program (MamLand) for the assessment of landslide susceptibility using a Mamdani fuzzy algorithm. Computer and Geoscience, 38(1), 23–34.
Arthur, J. D., Wood, H. A. R., Baker, A. E., Cichon, J. R., & Raines, G. L. (2007). Development and implementation of a Bayesian-based aquifer vulnerability assessment in Florida. Natural Resources Research, 16(2), 93–107.
Baudron, P., Alonso-Sarría, F., García-Aróstegui, J. L., Cánovas-García, F., Martínez-Vicente, D., & Moreno-Brotóns, J. (2013). Identifying the origin of groundwater samples in a multi-layer aquifer system with Random Forest classification. Journal of Hydrology, 499, 303–315.
Beven, K. (1997). TOPMODEL: a critique. Hydrological Process, 11, 1069–1085.
Beven, K., & Freer, J. (2001). A dynamic TOPMODEL. Hydrological Process, 15(10), 1993–2011.
Bhat, S., Motz, L. H., Pathak, C., & Kuebler, L. (2015). Geostatistics-based groundwater-level monitoring network design and its application to the Upper Floridan aquifer, USA. Environmental Monitoring and Assessment, 187(1), 1–15.
Bonham-Carter, G. F. (1994). Geographic information systems for geoscientists: modeling with GIS (p. 416). New York: Pergamon inc.
Bou Kheir, R., Wilson, J., & Deng, Y. (2007). Use of terrain variables for mapping gully erosion susceptibility in Lebanon. Earth Surface Processes and Landforms, 32(12), 1770–1782.
Breiman, L. (1996). Bagging predictors. Machine Learning, 26(2), 123–140.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees (p. 368). New York: Chapman and Hall inc.
Chenini, I., & Ben Mammou, A. (2010). Groundwater recharges study in arid region: an approach using GIS techniques and numerical modeling. Computer and Geosciences, 36(6), 801–817.
Chenini, I., Ben Mammou, A., & May, M. E. (2010). Groundwater recharge zone mapping using GIS-based multi-criteria analysis: a case study in Central Tunisia (Maknassy Basin). Water Resources Management, 24(5), 921–939.
Chowdhury, A., Jha, M. K., Chowdary, V. M., & Mal, B. C. (2009). Integrated remote sensing and GIS-based approach for assessing groundwater potential in West Medinipur district, West Bengal, India. International Journal of Remote Sensing, 30, 231–250.
Chung, J. W., & Rogers, J. D. (2012). Interpolations of groundwater table elevation in dissected uplands. Groundwater, 50(4), 598–607.
Clapcott, J., Goodwin, E., & Snelder, T. (2013). Predictive models of benthic macro-invertebrate metrics. Cawthron Report No. 2301. p. 35
Corsini, A., Cervi, F., & Ronchetti, F. (2009). Weight of evidence and artificial neural networks for potential groundwater spring mapping: an application to the Mt. Modino area (Northern Apennines, Italy). Geomorphology, 111, 79–87.
Cutler, D. R., Edwards, T. C., Beard, K. H., Cutler, A., Hess, K. T., Gibson, J., & Lawler, J. J. (2007). Random forest for classification in ecology. Ecology, 88, 2783–2792.
Davoodi Moghaddam, D., Rezaei, M., Pourghasemi, H. R., Pourtaghie, Z. S., & Pradhan, B. (2015). Groundwater spring potential mapping using bivariate statistical model and GIS in the Taleghan watershed, Iran. Arabian Journal of Geosciences, 8(2), 913–929.
Demsar, U. (2007). Knowledge Discovery in the Environmental Sciences: Visual and Automatic Data Mining for Radon Problems. Groundwater Transactions in GIS, 11(2), 255–281.
Dixon, B. (2009). A case study using SVM, NN and logistic regression in a GIS to predict wells contaminated with Nitrate-N. Hydrogeology Journal, 17, 1507–1520.
Egan, J. P. (1975). Signal detection theory and ROC analysis (p. 277). New York: Academic inc.
Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77, 802–813.
Friedman, J. H., & Meulman, J. J. (2003). Multiple additive regression trees with application in epidemiology. Statistics in Medicine, 22, 1365–1381.
Ganapuram, S., Vijaya Kumar, G. T., Murali Krishna, I. V., Kahya, E., & Demirel, M. C. (2009). Mapping of groundwater potential zones in the Musi basin using remote sensing data and GIS. Advances in Engineering Software, 40, 506–518.
Geissen, V., Kampichler, C., López-de Llergo-Juarez, J. J., & Galindo-Acántara, A. (2007). Superficial and subterranean soil erosion in Tabasco, tropical Mexico: development of a decision tree modeling approach. Geoderma, 139, 277–287.
Geology Survey of Iran (GSI). (1997). Geology map of the Chaharmahal-e-Bakhtiari Province. http://www.gsi.ir/Main/Lang_en/index.html. Accessed September 2000
Ghayoumian, J., Mohseni, S. M., Feiznia, S., Nourib, B., & Malekian, A. (2007). Application of GIS techniques to determine areas most suitable for artificial groundwater recharge in a coastal aquifer in southern Iran. Journal of Asian Earth Sciences, 30, 364–374.
Guisan, A., & Thuiller, W. (2005). Predicting species distribution: offering more than simple habitat models. Ecology Letters, 8, 993–1009.
Gupta, M., & Srivastava, P. K. (2010). Integrating GIS and remote sensing for identification of groundwater potential zones in the hilly terrain of Pavagarh, Gujarat, India. Water International, 35, 233–245.
Gutiérrez, Á. G., Schnabel, S., & Felicisimo, A. M. (2009a). Modelling the occurrence of gullies in rangelands of southwest Spain. Earth Surface Processes and Landforms, 34, 1894–1902.
Gutiérrez, Á. G., Schnabel, S., & Lavado Contador, J. F. (2009b). Using and comparing two nonparametric methods (CART and MARS) to model the potential distribution of gullies. Ecologicl Modelling, 220(24), 3630–3637.
Harrell, F. E., Lee, K. L., & Mark, D. B. (1996). Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine, 15, 361–387.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction (p. 745). New York: Springer.
Ho, T. K. (1998). The random subspace method for constructing decision forests. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 20(8), 832–844.
International Institute for Aerospace Survey and Earth Sciences (ITC). (2005). ILWIS 3.3 user guide, www.itc.nl
Jaiswal, R. K., Mukherjee, S., Krishnamurthy, J., & Saxena, R. (2003). Role of remote sensing and GIS techniques for generation of groundwater prospect zones towards rural development–an approach. International Journal of Remote Sensing, 24(5), 993–1008.
Janssen, P. H. M., Heuberger, P. S. C., & Sanders, R. (1994). UNCSAM: a tool for automating sensitivity and uncertainty analysis. Environmental Software, 9(1), 1–11.
Jha, M. K., Chowdhury, A., Chowdary, V. M., & Peiffer, S. (2007). Groundwater management and development by integrated remote sensing and geographic information systems: prospects and constraints. Water Resources Management, 21, 427–467.
Lee, S., Song, K. Y., Kim, Y., & Park, I. (2012). Regional groundwater productivity potential mapping using a geographic information system (GIS) based artificial neural network model. Hydrogeology Journal, 20(8), 1511–1527.
Leuenberger, M., Kanevski, M., & Orozco, C. D. V. (2013). Forest fires in a random forest. Austria: EGU General Assembly.
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22.
Loosvelt, L., Peters, J., Skriver, H., Lievens, H., Van Coillie, F. M. B., De Baets, B., & Verhoest, N. E. C. (2012). Random Forests as a tool for estimating uncertainty at pixel-level in SAR image classification. International Journal of Applied Earth Observation and Geoinformation, 19, 173–184.
Manap, M.A., Nampak, H., Pradhan, B., Lee, S., Soleiman, W.N.A., & Ramli, M.F. (2012). Application of probabilistic-based frequency ratio model in groundwater potential mapping using remote sensing data and GIS. Arabian Journal of Geosciences. doi:10.1007/s12517-012-0795-z
McKay, G., & Harris, J. R. (2015). Comparison of the data-driven random forests model and a knowledge-driven method for mineral prospectively mapping: a case study for gold deposits around the Huritz Group and Nueltin Suite, Nunavut, Canada. Natural Resources Research. doi:10.1007/s11053-015-9274-z
Mohammady, M., Pourghasemi, H. R., & Pradhan, B. (2012). Landslide susceptibility mapping at Golestan Province, Iran: A comparison between frequency ratio, Dempster–Shafer, and weights of-evidence models. Journal of Asian Earth Sciences, 61, 221–236.
Moisen, G. G., Freeman, E. A., Blackard, J. A., Frescino, T. S., Zimmermann, N. E., & Edwards, T. C. (2006). Predicting tree species presence and basal area in Utah: a comparison of stochastic gradient boosting, generalized additive models, and tree-based methods. Ecological Modelling, 199, 176–187.3.
Mojiri, H.R., & Zarei, A.R. (2006). The investigation of precipitation condition in the Zagros area and its effects on the central plateau of Iran. The 2nd Conference of Water Resource Management. Tehran, Iran
Moore, I. D., & Burch, G. J. (1986). Sediment transport capacity of sheet and rill flow: application of unit stream power theory. Water Resources Research, 22(8), 1350–1360.
Moore, I. D., Grayson, R. B., & Ladson, A. R. (1991). Digital terrain modelling: a review of hydrological, geomorphological, and biological applications. Hydrological Process, 4, 3–30.
Murthy, K. S. R., & Mamo, A. G. (2009). Multi-criteria decision evaluation in groundwater zones identification in Moyale-Teltelesubbasin, South Ethiopia. International Journal of Remote Sensing, 30, 2729–2740.
Naghibi, S. A., & Pourghasemi, H. R. (2015). A comparative assessment between three machine learning models and their performance comparison by bivariate and multivariate statistical methods in groundwater potential mapping. Water Resources Management. doi:10.1007/s11269-015-1114-8
Naghibi, S. A., Pourghasemi, H. R., Pourtaghi, Z. S., & Rezaei, A. (2015). Groundwater qanat potential mapping using frequency ratio and Shannon’s entropy models in the Moghan watershed, Iran. Earth Sci Informatics, 8(1), 171–186.
Nandi, A., & Shakoor, A. (2009). A GIS-based landslide susceptibility evaluation using bivariate and multivariate statistical analyses. Engineering Geology, 110, 11–20.
Negnevitsky, M. (2002). Artificial Intelligence: A Guide to Intelligent Systems (p. 415). England: Addison– Wesley/Pearson Education inc.
Nicodemus, K. K. (2011). Letter to the Editor: On the stability and ranking of predictors from random forest variable importance measures predictors from random forest variable importance measures. Brieffings in Bioinformatics, 12, 369–373.
Oh, H. J., & Lee, S. (2010). Assessment of ground subsidence using GIS and the weights-of-evidence model. Engineering Geology, 115(1–2), 36–48.
Oh, H. J., Kim, Y. S., Choi, J. K., Park, E., & Lee, S. (2011). GIS mapping of regional probabilistic groundwater potential in the area of Pohang City, Korea. Journal of Hydrology, 399, 158–172.
Olden, J. D., Lawler, J. J., & Poff, N. L. (2008). Machine learning without tears: a primer for ecologists. The Quarterly Review of Biology, 83(2), 171–193.
Oliveira, S., Oehler, F., San-Miguel-Ayanz, J., Camia, A., & Pereira, J. M. C. (2012). Modeling spatial patterns of fire occurrence in Mediterranean Europe using multiple regression and random forest. Forest Ecology Management, 275, 117–129.
Ozdemir, A. (2011a). GIS-based groundwater spring potential mapping in the Sultan Mountains (Konya, Turkey) using frequency ratio, weights of evidence and logistic regression methods and their comparison. Journal of Hydrology, 411, 290–308.
Ozdemir, A. (2011b). Using a binary logistic regression method and GIS for evaluating and mapping the groundwater spring potential in the Sultan Mountains (Aksehir, Turkey). Journal of Hydrology, 405, 123–136.
Ozdemir, A., & Altural, T. (2013). A comparative study of frequency ratio, weights of evidence and logistic regression methods for landslide susceptibility mapping: Sultan Mountains, SW Turkey. Journal of Asian Earth Sciences, 64, 180–197.
Peters, J., Baets, B. D., Verhoest, N. E. C., Samson, R., Degroeve, S., Becker, P. D., & Huybrechts, W. H. (2007). Random forests as a tool for ecohydrological distribution modeling. Ecological Modelling, 207, 304–318.
Peters, J., Verhoest, N. E. C., Samson, R., Van Meirvenne, M., Cockx, L., & De Baets, B. (2009). Uncertainty propagation in vegetation distribution models based on ensemble classifiers. Ecological Modelling, 220(6), 791–804.
Pourghasemi, H. R., Mohammady, M., & Pradhan, B. (2012a). Landslide susceptibility mapping using index of entropy and conditional probability models in GS: Safarood Basin, Iran. Catena, 97, 71–84.
Pourghasemi, H. R., Gokceoglu, C., Pradhan, B., & Deylami Moezzi, K. (2012b). Landslide susceptibility mapping using a spatial multi criteria evaluation model at Haraz Watershed, Iran. In B. Pradhan & M. Buchroithner (Eds.), Terrigenous mass movements (pp. 23–49). Heidelberg, Germany: Springer. doi:10.1007/978-3-642-25495-6-2.
Pourtaghi, Z. S., & Pourghasemi, H. R. (2014). GIS-based groundwater spring potential assessment and mapping in the Birjand Township, southern Khorasan Province, Iran. Hydrogeology Journal, 22(3), 643–662.
Prasad, A. M., Iverson, L. R., & Liaw, A. (2006). Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems, 9, 181–199.
R Development Core Team. (2006). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computisng.
Rahmati, O., Nazari Samani, A., Mahdavi, M., Pourghasemi, H.R., & Zeinivand, H. (2014). Groundwater potential mapping at Kurdistan region of Iran using analytic hierarchy process and GIS. Arabian Journal of Geosciences doi:10.1007/s12517-014-1668-4
Rahmati, O., Pourghasemi, H. R., & Melesse, A. M. (2015). Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: A case study at Mehran Region, Iran. Catena, 137, 360–372. doi:10.1016/j.catena.2015.10.010
Razandi, Y., Pourghasemi, H.R., Samani Neisani, N., & Rahmati, O. (2015). Application of analytical hierarchy process, frequency ratio, and certainty factor models for groundwater potential mapping using GIS. Earth Science Informatics, doi:10.1007/s12145-015-0220-8
Refsgaard, J. C., Van Der Sluijs, J. P., Hojberg, A. L., & Vanrolleghem, P. A. (2007). Uncertainty in the environmental modelling process e A framework and guidance. Environmental Modelling & Software, 22, 1543–1556.
Ridgeway, G. (2006). Generalized Boosted Regression Models. Documentation on the R Package ‘gbm’, version 1.5-7, Available at: http://www.i- pensieri.com/gregr/gbm.shtml
Rodriguez-Galiano, V., Mendes, M. P., Jose Garcia-Soldado, M., Chica-Olmo, M., & Ribeiro, L. (2014). Predictive modeling of groundwater nitrate pollution using Random Forest and multisource variables related to intrinsic and specific vulnerability: A case study in an agricultural setting (Southern Spain). Science of the Total Environment, 476, 189–206.
Ruff, M., & Czurda, K. (2008). Landslide susceptibility analysis with a heuristic approach in the Eastern Alps (Vorarlberg, Austria). Geomorphology, 94, 314–324.
Saha, D., Dhar, Y. R., & Vittala, S. S. (2010). Delineation of groundwater development potential zones in parts of marginal Ganga Alluvial Plain in South Bihar, Eastern India. Environmental Monitoring and Assessment, 165, 179–191.
Saltelli, A., Chan, K., & Scott, E. M. (2000). Sensitivity Analysis. New York: Wiley.
Schapire, R. E. (2003). The boosting approach to machine learning: an overview. Nonlinear Estimation and Classification, 171, 149–171.
Solomon, S., & Quiel, F. (2006). Groundwater study using remote sensing and geographic information systems (GIS) in central highlands of Eritrea. Hydrogeology Journal, 14(5), 729–741.
Srivastava, P. K., & Bhattacharya, A. K. (2006). Groundwater assessment through an integrated approach using remote sensing, GIS and resistivity techniques: a case study from a hard rock terrain. International Journal of Remote Sensing, 27, 4599–4620.
Strobl, C., Boulesteix, A.L., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC Bioinformatics. doi:10.1186/1471-2105-9-307
Stumpf, A., & Kerle, N. (2011a). Object-oriented mapping of landslides using random forests. Remote Sensing Environment, 115(10), 2564–2577.
Stumpf, A., & Kerle, N. (2011b). Combining Random Forests and object-oriented analysis for landslide mapping from very high resolution imagery. Procedia Environmental Sciences, 3, 123–129.
Thuiller, W., & Lafourcade, B. (2009). BIOMOD: species/climate modelling functions. R Package Version 1.1-3/r118
Todd, D. K., & Mays, L. W. (2005). Groundwater hydrology (3rd ed., p. 656). New Jersey: John Wiley and Sons inc.
Trigila, A., Frattini, P., Casagli, N., Catani, F., Crosta, G., Es-positon, C., Iadanza, C., Lagomarsino, D., Scarascia Mugnozza, G., Segoni, S., Spizzichino, D., Tofani, V., & Lari, S. (2013). Landslide susceptibility mapping at national scale: the Italian case study. Landslide Sciences Practice, 1, 287–295.
Varouchakis, E. (2015). Integrated water resources analysis at Basin scale: a case study in Greece. Journal of Irrigation and Drainage Engineering, 10.1061/(ASCE)IR.1943-4774.0000966, 05015012.
Vorpahl, P., Elsenbeer, H., Märker, M., & Schröder, B. (2012). How can statistical models help to determine driving factors of landslides? Ecological Modelling, 239, 27–39.
Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G. J., Ng, A., Liu, B., Yu, P. S., Zhou, Z. H., Steinbach, M., Hand, D. J., & Steinberg, D. (2008). Top 10 algorithms in data mining. Knowledge Information System, 14, 1–37.
Yesilnacar, E.K. (2005). The Application of computational intelligence to landslide susceptibility mapping in Turkey. PhD Thesis. Department of Geomatics the University of Melbourne
Zipkin, E. F., Grant, E. H. C., & Fagan, W. F. (2012). Evaluating the predictive abilities of community occupancy models using AUC while accounting for imperfect detection. Ecological Applications, 22, 1962–1972.
Acknowledgments
The authors would like to thank the anonymous reviewers for their helpful comments on the previous version of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Naghibi, S.A., Pourghasemi, H.R. & Dixon, B. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran. Environ Monit Assess 188, 44 (2016). https://doi.org/10.1007/s10661-015-5049-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10661-015-5049-6