Abstract
Recently there has been some renewed interest in skew-normal distribution (SND) because it provides a nice and natural generalization (in terms of accommodating skewed data) over the usual normal distribution. In this study we have used the SND error in a regression set-up, discussed a step by step approach on how to estimate all the model parameters, and show how naturally the resultant SND-based regression model can lead to a superior fitting to a given dataset. This generalization enhances the precision in predicting the future value of the response variable when the values of the independent (or input) variables are available. We validate the applicability of our proposed SND-based regression model by using a recently acquired dataset from the Mekong Delta Region (MDR) of Vietnam which had necessitated this study from a public health perspective. Using the existing survey data our proposed model allows all the stakeholders to better predict the groundwater arsenic level at a site easily, based on its geographic characteristics, in lieu of costly chemical analyses, which can be very beneficial to developing countries due to their resource constraints.
Similar content being viewed by others
References
Arnold BC, Lin GD (2004) Characterizations of the skew-normal and generalized chi distributions. Sankhya 66(4):593–606. https://doi.org/10.2307/25053391
Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12(2):171–178. https://doi.org/10.4236/ojs.2015.56060
Azzalini A (1986) Further results on a class of distributions which includes the normal ones. Statistica 46(2):199–208. https://doi.org/10.6092/issn.1973-2201/711
Azzalini A, Capitanio A (2014) The skew-normal distribution and related families. Cambridge University Press, Cambridge
Berg M, Stengel C, Trang PTK, Viet PH, Sampson ML, Leng M, Samreth S, Fredericks D (2007) Magnitude of arsenic pollution in the Mekong and Red River Deltas—Cambodia and Vietnam. Sci Total Environ 372(2–3):413–425. https://doi.org/10.1016/j.scitotenv.2006.09.010
Cancho VC, Lachos VH, Ortega EMM (2010) A nonlinear regression model with skew-normal errors. Stat Papers 51(3):547–558. https://doi.org/10.1007/s00362-008-0139-y
Fallah A, Goodarzi Z (2016) Analysis of covariance by assuming a skew normal distribution for response variable. J Math Stat 45(6):1805–1818. https://doi.org/10.15672/HJMS.20159314058
Gupta AK, Nguyen TT, Sanqui JT (2003) Characterization of the skew-normal distribution. Ann Inst Stat Math 56(2):351–360. https://doi.org/10.1007/BF02530549
Henderson DJ, Parmeter CF (2015) Applied nonparametric econometrics. Cambridge University Press, New York. https://doi.org/10.1017/CBO9780511845765
Huynh U (2020) Studying groundwater arsenic concentration in the Mekong Delta Region through an improved regression model under skew-normal errors. PhD dissertation. Mahidol University, Thailand
McDonald JH (2014) Handbook of biological statistics, 3rd edn. Sparky House Publishing, Baltimore
McGeehan SL, Naylor DV, Shafii B (1992) Statistical evaluation of Arsenic Adsorption Data using linear-Plateau regression analysis. Soil Sci Soc Am J 56(4):1130–1133. https://doi.org/10.2136/sssaj1992.03615995005600040020x
Nguyen PK (2008) Geochemical study of arsenic behavior in Aquifer of the Mekong Delta, Vietnam. PhD dissertation. Kyushu University, Japan
Nickson R, McArthur J, Burgess W, Ahmed KM, Ravenscroft P, Rahmann M (1998) Arsenic poisoning of Bangladesh groundwater. Nature 395(6700):338. https://doi.org/10.1038/26387
Pham CHV (2015) Studying the mechanisms of arsenic release in groundwater in An Phu district, An Giang province. Master dissertation. University of Technology, Ho Chi Minh City, Vietnam
Pham CHV, Ho TNH, Frustchi M, Wang Y, Bernier R, Vo LP (2015) Spatial and time variation of arsenic occurrence and physiogeochemical influence to arsenic in groundwater in the Vietnamese Mekong delta - A case study of An Phu district, An Giang province. J Sci Technol 53(5A):282–289
Ravenscroft P, Brammer H, Richards K (2009) Arsenic pollution: a global synthesis. Wiley-Blackwell, West Sussex. https://doi.org/10.1002/9781444308785
RStudio Team (2019) RStudio: Integrated Development for R. RStudio, Inc., Boston, MA. http://www.rstudio.com/. Accessed 8 Apr 2019
Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with applications to Bayesian regression models. Can J Stat 31(2):129–150. https://doi.org/10.2307/3316064
Thiuthad P, Pal N (2019) Point estimation of the location parameter of a skew-normal distribution: some fixed sample and asymptotic results. J Stat Theory Pract 13(2):13–37. https://doi.org/10.1007/s42519-018-0033-4
Vo LP, Bernier R, Pham CHV, Ho TNH, Nguyen TBT (2015) Threat of arsenic occurrence in the Vietnamese Mekong Delta. J Geograph Res 63:129–142
Vo LP, Pham CHV, Nguyen VMM, Pham KBA, Vu VA, Nguyen TBT (2016) Arsenic pollution in shallow groundwater in a floodplain delta: a case study in An Phu, An Giang, in Mekong Delta, Vietnam. J Sci Technol 54:1–10
Webster R, Lark RM (2019) Analysis of variance in soil research: examining the assumptions. Eur J Soil Sci 70(5):990–1000. https://doi.org/10.1111/ejss.12804
Zeller CB, Lachos VH, Vilca-Labra FE (2011) Local influence analysis for regression models with scale mixtures of skew-normal distributions. J Appl Stat 38(2):343–368
Acknowledgements
We would like to thank the two anonymous referees who went over the first draft of this paper very meticulously, and made critical as well as constructive comments which helped us tremendously in improving the presentation of this work. We would also like to thank Assoc. Prof. Phu Le Vo, Faculty of Environment and Natural Resources, Ho Chi Minh City University of Technology VNU-HCM, Vietnam and Prof. Rizlan Bernier-Latmani, Environmental Microbiology Laboratory (EML) at EPFL, Switzerland, for allowing us to use the arsenic dataset. The first author also expresses her gratitude to Department of Mathematics, Faculty of Science, Mahidol University for supporting her doctoral research as this paper constitutes part of her doctoral dissertation. This work was done when the second author was visiting Ton Duc Thang University (TDTU), Vietnam, and was on his sabbatical leave from the University of Louisiana at Lafayette to supervise the first author’s research. He would like to express his sincere thanks to the TDTU administration for their generous hospitality.
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling Editor: Pierre Dutilleul.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
R codes for computation of estimated parameters under SND errors (see in Sect. 2.1)
Rights and permissions
About this article
Cite this article
Huynh, U., Pal, N. & Nguyen, M. Regression model under skew-normal error with applications in predicting groundwater arsenic level in the Mekong Delta Region. Environ Ecol Stat 28, 323–353 (2021). https://doi.org/10.1007/s10651-021-00488-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10651-021-00488-2
Keywords
- Bootstrap method
- Least squares method
- Normal distribution
- Prediction mean absolute error
- Prediction mean squared error
- Skew-normal distribution