Using a legacy soil sample to develop a mid-IR spectral library

R. A. Viscarra Rossel A B C , Y. S. Jeon B , I. O. A. Odeh B and A. B. McBratney A B
A Australian Centre for Precision Agriculture, Faculty of Agriculture, Food & Natural Resources, The University of Sydney, NSW 2006, Australia.

B Faculty of Agriculture, Food & Natural Resources, The University of Sydney, NSW 2006, Australia.

C Corresponding author. Email: r.viscarra-rossel@usyd.edu.au

Australian Journal of Soil Research 46(1) 1-16 https://doi.org/10.1071/SR07099
Submitted: 13 July 2007  Accepted: 21 November 2007   Published: 8 February 2008


This paper describes the development of a diffuse reflectance spectral library from a legacy soil sample. When developing a soil spectral library, it is important to consider the number of samples that are needed to adequately describe the soil variability in the region in which the library is to be used; the manner in which the soil is sampled, handled, prepared, stored, and scanned; and the reference analytical procedures used. As with any type of modelling, the dictum is ‘garbage in = garbage out’ and hopefully the converse ‘quality in = quality out’. The aims of this paper are to: (i) develop a soil mid infrared (mid-IR) diffuse reflectance spectral library for cotton-growing regions of eastern Australia from a legacy soil sample, (ii) derive soil spectral calibrations for the prediction of soil properties with uncertainty, and (iii) assess the accuracy of the predictions and populate the legacy soil database with good quality information. A scheme for the construction and use of this spectral library is presented. A total of 1878 soil samples from different layers were scanned. They originated from the Upper Namoi, Namoi, and Gwydir Valley catchments of north-western New South Wales (NSW) and the McIntyre region of southern Queensland (Qld). A conditioned Latin hypercube sampling (cLHS) scheme was used to sample the spectral data space and select 213 representative samples for laboratory soil analyses. Using these data, partial least-squares regression (PLSR) was used to construct the calibration models, which were validated internally using cross validation and externally using an independent test dataset. Models for organic C (OC), cation exchange capacity (CEC), clay content, exchangeable Ca, total N (TN), total C (TC), gravimetric moisture content θg, total sand and exchangeable Mg were robust and produced accurate results (R2adj. > 0.75 for both cross and test set validations). The root mean squared error (RMSE) of mid-IR-PLSR predictions was compared to those from (blind) duplicate laboratory measurements. Mid-IR-PLSR produced lower RMSE values for soil OC, clay content, and θg. Finally, bootstrap aggregation-PLSR (bagging-PLSR) was used to predict soil properties with uncertainty for the entire library, thus repopulating the legacy soil database with good quality soil information.

Additional keywords: mid-IR diffuse reflectance spectroscopy, spectral library, partial least squares regression, bagging-PLSR, legacy soil data.


We wish to acknowledge the Cotton Catchment and Communities CRC (CCC CRC) and the Grains Research and Development Corporation (GRDC) for their financial support.


