Machine Learning Approach to Model Soil Resistivity Using Field Instrumentation Data

Alam, Md Jobair Bin; Gunda, Ashish; Ahmed, Asif

doi:10.3390/geotechnics5010005

Open AccessArticle

Machine Learning Approach to Model Soil Resistivity Using Field Instrumentation Data

by

Md Jobair Bin Alam

^1,*

,

Ashish Gunda

² and

Asif Ahmed

²

¹

Department of Civil and Environmental Engineering, Roy G. Perry College of Engineering, Prairie View A&M University, Prairie View, TX 77446, USA

²

College of Engineering, SUNY Polytechnic Institute, Utica, NY 13502, USA

^*

Author to whom correspondence should be addressed.

Geotechnics 2025, 5(1), 5; https://doi.org/10.3390/geotechnics5010005

Submission received: 9 December 2024 / Revised: 2 January 2025 / Accepted: 6 January 2025 / Published: 11 January 2025

(This article belongs to the Special Issue Recent Advances in Geotechnical Engineering (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

:

Sub-surface soil hydrological characterization is one of the challenging tasks for engineers and soil scientists, especially the complex hydrological processes that combine key variables such as soil moisture, matric suction, and soil temperature. The ability to infer these variables through a singular measurable soil property, soil resistivity, can potentially improve sub-surface characterization. This research leverages various machine learning algorithms to develop predictive models trained on a comprehensive dataset of sensor-based soil moisture, matric suction, and soil temperature obtained from prototype ET covers, with known resistivity values. Different types of sensors were installed at multiple depths in the ET covers, and resistivity tests were conducted periodically at the same location. Cross-validation and feature selection methods were used to optimize model performance and identify key variables that most significantly impact soil resistivity. Strong inverse correlations between soil moisture and resistivity (r = −0.88) and weak positive correlations with temperature (r = 0.41) and suction (r = 0.34) were observed. Among the machine learning models evaluated, artificial neural networks and support vector machines demonstrated superior predictive performance, achieving a coefficient of determination (R²) above 0.77 and lower root mean square error (RMSE) values (less than 0.14). Linear regression and decision tree models exhibited suboptimal performance because of their limitations in capturing non-linear relationships and overfitting, respectively. Random forest demonstrated superior generalization capabilities compared to decision trees; however, it encountered challenges with mid-range data variability. The findings demonstrate the effectiveness of artificial neural networks in predicting field-scale soil resistivity by utilizing hydrological variables.

Keywords:

electrical resistivity; ET cover; machine learning; decision tree; random forest; support vector machine; artificial neural network

1. Introduction

Soil resistivity is a critical parameter influencing many geotechnical, hydrological, and agricultural processes. It serves as an indirect measure of various soil properties, including moisture content, matric suction, and temperature [1,2], among others. The electrical resistivity of soil adheres to Ohm’s law, with potential differences being measured through the transfer of artificially generated currents into the soil [3]. The electrical resistivity of soil reflects its ability to conduct electrical current, which is influenced by factors such as sub-surface soil moisture content and temperature [4], the degree of saturation [5], organic content [6], pore water composition [7], geologic formation [8], ion concentration in pore water, soil texture, and structure [9,10]. Therefore, understanding the data-driven interplay between soil resistivity and other soil hydrologic properties is crucial for applications like soil water management, landfill cover hydrology, groundwater monitoring, slope stability analysis, and precision agriculture [11,12].

Among the influencing factors of soil resistivity, soil moisture, and matric suction are particularly significant due to their direct impact on the soil’s electrical conductivity and pore water distribution. Soil moisture significantly affects soil resistivity, as water in the soil enhances ionic mobility, reducing resistivity [13,14]. Studies have demonstrated that resistivity decreases exponentially with increasing moisture content, particularly in fine-grained soils where pore water dominates the conduction process [15,16]. Soil resistivity measurements provide a non-invasive means to estimate soil moisture, making them invaluable in hydrological modeling for geotechnical infrastructures and irrigation scheduling [17,18]. Suction determines the distribution of water in soil pores, affecting the connectivity of water films that facilitate electrical conductivity [2,3]. High-suction conditions, indicative of dry soils, correspond to higher resistivity values due to the limited presence of conductive pore water [19,20]. Thus, integrating resistivity data with suction measurements offers deeper insights into soil–water dynamics and unsaturated soil behavior [21,22]. Soil temperature introduces additional variability in soil resistivity by altering the ionic mobility in pore water. Increased temperatures generally decrease resistivity as higher thermal energy enhances ion diffusion [23].

Earth infrastructures are exposed to the environment and, accordingly, the hydraulic properties of the soil of the infrastructures change substantially due to various natural processes such as wetting–drying cycling, the freeze-thaw effect, etc. [24]. Recent studies have demonstrated the potential of sensor-based soil moisture and suction measurements of various infrastructures for the real-time monitoring of soil hydrologic conditions [4,8]. For instance, time domain reflectometry (TDR) [25], thermal dissipation sensors [26], dielectric sensors, and tensiometers have enabled the continuous monitoring of these parameters in the field, enhancing the resolution and accuracy of resistivity estimations [12,13]. The integration of such field data with advanced machine learning (ML) models further presents a transformative approach to soil resistivity characterization [27,28]. ML models have emerged as powerful tools for analyzing these datasets, offering a means to capture non-linear and multi-variable relationships [29,30]. By integrating resistivity data with soil moisture, suction, and temperature measurements, ML models can provide reliable predictions of soil behavior under varying environmental conditions [31,32]. ML models, including artificial neural networks (ANNs), support vector machines (SVMs), random forests (RFs), and gradient boosting, have shown promise in capturing complex relationships between soil properties and resistivity [16,31]. The use of ML models is particularly evident in heterogeneous field conditions, where conventional methods often struggle to generalize [27].

Understanding the relationships between soil resistivity, moisture content, and matric suction has broad implications for fundamental soil science and applied engineering. Furthermore, integrating ML with field sensor data enables a scalable and adaptive framework for resistivity monitoring, providing insights at unprecedented temporal and spatial resolutions [33,34]; eventually, a large-scale site characterization can be facilitated by resistivity. While significant progress has been made in applying ML models for geotechnical engineering purposes, challenges remain in optimizing these models for field-scale applications. One critical issue is the lack of adequate data, variability in data quality, sensor installation methods, and environmental factors such as temperature fluctuations and vegetation [35]. The lack of standardized datasets for training and validating ML models in soil resistivity characterization further complicates their application. Most existing studies rely on limited datasets, often collected under controlled laboratory conditions [36,37], which may not fully represent the complexities of natural soils. Thus, there is a growing need for field-based datasets that incorporate continuous moisture variabilities, suction ranges, and soil temperature conditions, along with robust methodologies for data preprocessing and model evaluation. In this study, we aimed to address these gaps by exploring different ML models for predicting the field resistivity of clayey soil using sensor-based moisture, suction, and temperature data. Specifically, we evaluated the performance of models such as linear regression (LR) models, decision tree regressors (DTRs), RFs, SVMs, and ANNs in characterizing soil resistivity under variable field conditions. We also investigated the role of feature selection, data normalization, and model tuning in enhancing prediction accuracy and generalizability.

The study was conducted on evapotranspiration (ET) covers located in the City of Denton Landfill, Texas, USA, where six large-scale (1219 cm by 1219 cm) prototype ET covers were constructed using fine-grained soil (CH). Field hydrologic data were collected using state-of-the-art moisture and temperature sensors and tensiometers and calibrated to ensure reliability and consistency. Field electrical resistivity tests were conducted periodically using an advanced resistivity meter to portray electrical resistivity tomography (ERT), followed by data extraction from the ERT. The datasets were then integrated into a structured ML pipeline, encompassing data preprocessing, model training, and performance evaluation using cross-validation.

2. Electrical Resistivity Tomography (ERT)

Electrical resistivity tomography (ERT) is a geophysical imaging technique used to characterize the subsurface by mapping variations in electrical resistivity. It has emerged as a powerful method for characterizing subsurface structures and processes due to its non-destructive nature and ability to provide spatially distributed data. This method provides valuable insights into geological, hydrogeological, and environmental systems and has applications in a wide range of disciplines, including geotechnical engineering, archeology, and environmental monitoring.

ERT involves injecting a controlled current into the ground using electrodes and measuring the resulting voltage differences. The basic measurement principle relies on Ohm’s Law [38]:

R = V/I

(1)

where V is the voltage, I is the current, and R is the resistance. The configuration of electrodes influences the depth of investigation and resolution. Common electrode arrays include Wenner, Schlumberger, and dipole-dipole configurations, each with its unique advantages in spatial resolution and depth sensitivity [3]. By systematically varying the positions of current and potential electrodes, ERT collects apparent resistivity data over a grid, which reflects the subsurface resistivity distribution [39].

The collected apparent resistivity data are processed to estimate the true resistivity distribution. This requires solving an inverse problem, where numerical algorithms convert surface measurements into a resistivity model of the subsurface. Techniques like least-squares inversion, with regularization to stabilize the solution, are commonly employed. However, the inversion process is inherently non-unique, meaning multiple subsurface resistivity distributions can explain the same dataset.

3. Related Work

Machine learning techniques for field resistivity characterization have gained significant attention due to their ability to integrate various geophysical measurements, such as resistivity, moisture content, and soil suction, thereby enhancing the accuracy and efficiency of characterizing soil behavior. Several studies focused on the relationship between these parameters and their implications in geotechnical and geo-environmental engineering. Recent methodologies have employed advanced geophysical techniques, including ERT, which can provide high-resolution spatial data on soil properties [40]. Combined with ML algorithms, these methods allow for more robust resistivity predictions based on soil moisture and suction data.

The application of various ML algorithms in soil resistivity prediction has been a focal point in recent studies. Algorithms such as SVMs, RFs, and ANNs have been particularly effective in modeling complex non-linear relationships among the input variables— soil moisture, suction, and soil temperature [41,42]. For example, Driba et al., 2024 [43] demonstrated the potential of machine learning models to enhance the prediction of spatial variations in wetland soil properties, such as soil moisture content and soil organic matter (SOM). Their research explored these using synthetic data, constrained by limited field data, to train an Xtreme Gradient Boosting (XGBoost) algorithm for predicting soil property distributions based on geophysical measurements and soil samples. The study, conducted in Ohio, USA, analyzed correlations between electrical conductivity, soil moisture, and soil organic matter from 22 core samples, with the primary objective of geospatially mapping wetland soil properties. Ozcep et al., 2009 [44], utilized an artificial neural network approach to examine the relationship between soil moisture content and electrical resistivity. Laboratory testing generated 148 datasets for the study, achieving an R² value of 0.88, demonstrating the method’s effectiveness. However, the study was limited by a small dataset, minimal input variables, and constraints imposed by the controlled laboratory environment, which were not accounted for. In a more extensive study, Zamanian et al., 2024 [45], employed large-scale observations in Texas, USA, to predict geotechnical properties using soil resistivity values. This research utilized a deep learning model with three hidden layers trained on 842 observations to investigate the association between electrical resistivity and geotechnical properties. Key geotechnical properties, such as moisture content and unit weight, were identified through Spearman’s correlation and feature importance analyses. However, the study excluded soil suction and noted that resistivity values were predominantly clustered around 100 Ohm-m. Moreover, the SVM has been highlighted for its capability to handle high-dimensional spaces and its effectiveness in achieving high accuracy rates in classification tasks related to soil types based on resistivity data [41]. Studies have reported a model comparison, where RF outperformed both an SVM and ANN regarding robustness and accuracy when predicting soil resistivity using field data [42,46]. Applying hyperparameter tuning and cross-validation techniques within these ML models further enhances their predictive capabilities and generalization performance in varied soil conditions [46].

4. Materials and Methods

4.1. Field Investigation

The field investigation was conducted on six prototype evapotranspiration (ET) covers (Lysimeter). The lysimeters were constructed at the City of Denton MSW Landfill, Denton, TX, USA, as shown in Figure 1. The study area was in the semi-humid region that received an annual average precipitation of around 1500 mm during the study period. The construction of the lysimeters started on 17 June 2014, and after 4.5 months of extensive earthwork and instrumentation, it was completed on 1 November 2014. The lysimeters were located on top of an existing landfill cell, with an intermediate cover.

4.1.1. Description of the Lysimeters

Six large-scale lysimeters with dimensions of 1219 cm by 1219 cm × 122 cm were constructed side by side. Three were constructed on a flat surface with a 2% slope and three were constructed on a sloped section with a 25% slope. The lysimeters were covered with 92 cm of compacted clay, overlain by a 30.5 cm vegetation surface layer. Figure 2 shows the plan and section (schematic) of the lysimeters.

A 1219 cm tall embankment was constructed of clayey soil so the lysimeter pits could be excavated without contact with the underlying waste mass. After the construction of the embankment, the soil was excavated again in the lysimeter locations. The excavated areas were approximately 1219 cm by 1219 cm and nearly 122 cm deep. Subgrades of the excavated areas were compacted to provide an adequate smooth surface for the placement of the geomembrane. The geomembrane was placed on the subgrade and along the sidewalls of the excavations. Geocomposite drains were placed, overlaying the geomembrane at the bottom and along the side wall of the lysimeters. Then, 92 cm compacted clay was placed in each lysimeter in an approximately 30.5 cm lift to obtain the required 95% maximum dry density (MDD). The soil was wet and compacted with a sheep-footed compactor. The compaction of soil for each of the lysimeters was performed at dry of optimum (95% of the MDD at dry side) rather than wet of optimum. Dry-side compaction was performed to limit the potential for desiccation cracking in the compacted soil layer. A nuclear density gauge (NDG) was used to ensure the required compaction level in the lysimeters was met. Approximately 30.5 cm of topsoil was placed, overlaying the compacted clay layer in each lysimeter. The topsoil was compacted relatively lower than the compacted layer to retain the ET cover mechanism.

4.1.2. Description of the ERT Equipment, Field Testing, and Data Acquisition

Electrical resistivity measurements were performed on the constructed lysimeters from early January 2015 to the end of December 2017. Field resistivity tests were conducted monthly; however, during the summer months, the frequency of these tests was increased to a weekly schedule. A SuperSting R8/IP multi-channel resistivity meter was utilized during the field investigation (Figure 3). The resistivity tomography method involves electrode spacing that is contingent upon various parameters, including the required resolution for site investigations, the dimensions of the objects under examination, and the necessary depth of penetration for the site assessments. Improved resolution can be attained by utilizing reduced electrode spacing; however, this will result in a decreased penetration depth. A greater penetration depth can be achieved with increased electrode spacing while maintaining the same number of electrodes. The multi-channel resistivity equipment can perform ERT tests with a maximum of 56 electrodes. However, 28 electrodes were utilized to perform resistivity imaging for this study based on the area of the lysimeters and the depth of investigation to be covered. The cover soil depth for each lysimeter was 122 cm, and the geomembrane was placed at the bottom of the lysimeter. Therefore, the spacing between the electrodes was fixed at 15.24 cm intervals, considering the cover soil depth and the geomembrane position. A 427 cm transect was fixed in each lysimeter, and an ERT test was performed along that transect every time. A 12 V battery was employed to conduct the ERT in the field.

Different types of array configurations are available based on the respective positions of the potential electrodes and the current electrodes. In this study, the dipole–dipole array configuration was utilized for data analysis. The apparent resistivity data were collected and stored in a raw format following the completion of the ERT test. The raw data were extracted from the SuperSting R8/IP meter utilizing AGI Administrator software and transformed into a readable format suitable for analysis with AGI EarthImager 2D software (Version 2.1). An inverted resistivity section was generated from the measured apparent resistivity pseudo-section utilizing AGI EarthImager 2D software. Inversion is a process for plotting subsurface resistivity distribution that utilizes measured apparent resistivity data. EarthImager 2D software (Version 2.1) can execute forward modeling, damped least squares inversion, smooth model inversion, and robust inversion. A robust inversion model was utilized in the current studies. During the inversion process, a total of 8 iterations were conducted. The error reduction achieved was 3%, and the maximum RMS error was constrained to 6%. The minimum resistivity value was established at 1 Ohm-m, whereas the maximum value was designated as 50,000 Ohm-m. The resolution factor was set to 0.2, and the horizontal-to-vertical roughness ratio was established at 1. The authors utilized a default value of 10 for the robust data conditioner. Selected ERTs conducted in the study area are presented in Figure 4.

4.1.3. Field Instrumentation

To closely monitor the soil moisture and suction in the field, 48 moisture and temperature sensors (Decagon 5TM soil moisture and temperature sensors) and 12 tensiometers (Decagon MPS-2) (Figure 5a,b) were installed in the lysimeters. Eight moisture and temperature sensors were installed in each lysimeter. Each lysimeter was divided into two nests, an east nest, and a west nest, with four sensors in each nest. In each nest, a hole was bored with a motorized augur to a depth of almost 99 cm. The sensors were put in the hole at 23 cm intervals. The section of the installed moisture and temperature sensors is shown in Figure 6. The last sensor was positioned at the bottom of the vegetative layer, 30.5 cm below the surface, or the top of the compacted layer. The wires of the sensors were connected to a data logger (Figure 5c) located just outside of the lysimeter.

Twelve tensiometers were installed in all the lysimeters, two in each (Figure 6). Tensiometers were installed in the east nest of all the test sections. Similar approaches were used to install the tensiometers in all the lysimeters. One tensiometer was placed at a 30.5 cm depth, and the other was at a 76 cm depth from the surface, collocated with the moisture sensors. The wires of the tensiometers were connected to the corresponding data loggers (Figure 5c).

4.1.4. Soil Characteristics

The soil samples were collected from all six lysimeters, at different depths, during the construction of the lysimeters. From each of the soil lifts, three buckets (20 litter/bucket) of disturbed samples were collected (Figure 7) and shipped to the laboratory for laboratory characterization according to the ASTM. The soil was classified as Fat Clay (CH) according to the results of sieve analysis and Atterberg limit tests, following the Unified Soil Classification System (USCS). The maximum dry density and the corresponding optimum moisture content were determined following the Standard Proctor Compaction test procedure. The average basic properties of the cover soil are presented in Table 1.

4.2. Machine Learning (ML) Model

In this project, we developed five different machine learning models to predict outcomes based on collected data and implemented them across diverse applications. The models include linear regression, decision tree regressor, random forest, support vector machine, and artificial neural network models. Each model was selected based on its capacity to capture different features of data. The goal of these models was to predict the desired outcome by leveraging collected data, which were then pre-processed to enhance prediction accuracy. The study used the steps shown below (Figure 8) to develop the models in each algorithm. A brief description of each algorithm is provided after the flowchart.

In each model, the data were divided into training and testing sets. For all the models, 70% of the data was allocated for training, while the remaining 30% was used for testing. The performance of the models was compared based on key evaluation metrics, such as the coefficient of determination (R²), root mean square error (RMSE), bias, and unbiased root mean square error (ubRMSE). The following equations define the evaluation metrics (Equations (2)–(5)):

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}

(2)

R M S E = \sqrt{\frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{N}}

(3)

B i a s = \frac{\sum_{i = 1}^{N} ({\hat{y}}_{i} - y_{i})}{N}

(4)

u b R M S E = \sqrt{{R M S E}^{2} - {B i a s}^{2}}

(5)

where

y_{i}

represents the observed values;

{\hat{y}}_{i}

represents the model estimated values;

\bar{y}

represents the mean of the observations; and N represents the number of observations.

4.2.1. Linear Regression (LR)

The LR model considers the linear relationship between the dependent and independent variables. It was implemented as the baseline model to compare the performance of more complex models. The LR model was optimized using ordinary least squares to minimize the residual sum of squares between the observed and predicted outcomes. The LR model is presented in Equation (6).

y = Xβ + ϵ

(6)

where y is the target variable, X is the matrix of features, β represents the coefficients, and ϵ is the error term.

4.2.2. Decision Tree Regressor (DTR)

The DTR creates a tree-like structure where each node represents a decision rule and branches represent possible outcomes. It splits the data into subsets based on the most significant feature at each node, aiming to reduce the variance in the outcome variable. The DTR model was implemented using the Gini impurity metric. Gini impurity, G, for a node is given by the following equation (Equation (7)):

G = 1 - \sum_{i = 1}^{n} {p_{i}}^{2}

(7)

where p_i is the proportion of class i in the node.

4.2.3. Random Forest (RF)

The RF model was built by averaging multiple decision trees, each trained on different subsets of the data, to reduce overfitting. The random forest prediction is the average of M decision trees, as described in the following equation (Equation (8)):

y = \frac{1}{M} \sum_{m = 1}^{M} T_{m} (X)

(8)

where M is the total number of trees and T_m is the prediction from each tree. By aggregating predictions, random forest reduces variance and achieves more stable and accurate results, making it highly effective for handling large datasets and capturing complex interactions among features.

4.2.4. Support Vector Machine (SVM)

The SVM is a supervised machine learning technique primarily used for classification, though it can also be applied to regression tasks. The SVM works by finding a hyperplane (or a set of hyperplanes in higher dimensions) that best separates data points of different classes with the maximum possible margin. The objective of the SVM algorithm is to maximize this margin between the classes while minimizing classification errors. Mathematically, the SVM aims to solve the optimization problem:

\binom{m i n}{ω, b} \frac{1}{2} ‖ ω ‖^{2} s u b j e c t t o y_{i} (ω . x_{i} + b) \geq 1

(9)

where ω is the weight vector perpendicular to the hyperplane, b is the bias term, x_i represents the data points, and y_i denotes the class labels (+1 or −1) for each data point i. The term ‖ω‖ refers to the magnitude of the weight vector, and maximizing 1/‖w‖ ensures that the distance between the closest points (support vectors) of each class and the hyperplane is maximized, thereby creating a robust decision boundary. Non-linearly separable data can be handled by transforming data to a higher-dimensional space using kernel functions, allowing the SVM to find an optimal separation even in complex datasets.

4.2.5. Artificial Neural Network (ANN)

The ANN model consists of interconnected neurons organized in layers—an input layer, hidden layers, and an output layer. Each neuron in a hidden layer applies a weighted sum of its inputs followed by an activation function. The network adjusts its weights and biases during training to minimize the error between predicted and actual outputs. ANNs are particularly effective for modeling complex, non-linear relationships in the data. The output of a neuron in a hidden or output layer is calculated according to the following equation (Equation (10)):

z = \sum_{i = 1}^{n} w_{i} x_{i} + b

(10)

where w_i are the weights, x_i are the inputs, and b is the bias term. The activation function (e.g., ReLU, Sigmoid) is then applied to z. The architecture diagram is presented in Figure 9, according to which the ANN model was executed.

5. Results

5.1. Scatter Plots of Different Variables

To have a general understanding of the relationship of resistivity with other variables, scatter plots were adopted. Figure 10a shows the scatter plot of resistivity and soil moisture content. The strong relationship between moisture content and soil resistivity is well documented and is also evident from the scatter plot. The graph demonstrates a clear inverse correlation, indicating that resistivity decreases as moisture content increases. At a higher moisture content (e.g., 0.3–0.4), resistivity is low, clustering between 10 and 30 Ohm-m. As moisture content decreases (e.g., <0.2), resistivity increases significantly, reaching values up to 90 Ohm-m. The points follow a distinct downward trend, forming a non-linear inverse relationship.

Figure 10b shows the scatter plot of resistivity and soil matric suction. The plot shows the relationship is more scattered and does not show a clear trend. At low resistivity values (<20 Ohm-m), suction varies widely, ranging from 0 to over 1500 kPa. Also, at higher resistivity values (e.g., >60 or 70 Ohm-m), suction values are well-scattered, with fewer points reaching higher suction levels. This scatter plot of resistivity versus suction (Figure 10b) reveals a more complex and scattered relationship, indicating a weak or indirect correlation. It is well understood that soil matric suction increases as moisture content decreases, correlating with higher resistivity values. However, the scatter plot in the data suggests that suction alone may not be a straightforward predictor of resistivity, as it is influenced by soil texture, compaction, and pore size distribution.

In Figure 10c, the scatter plot of resistivity versus soil temperature is depicted from the field monitoring data. This graph shows a positive correlation between resistivity and soil temperature (°C), though the trend is less pronounced compared to moisture content. At lower temperatures (e.g., <10 °C), resistivity is generally lower, clustering below 40 Ohm-m. As temperature increases (e.g., >20 °C), resistivity values increase, with several points exceeding 80 Ohm-m. The relationship appears non-linear, with resistivity stabilizing at high temperatures. At higher temperatures, the mobility of ions in the pore water increases, typically leading to higher conductivity (lower resistivity). However, in drier soils, resistivity increases with temperature due to the evaporation of water and reduced ionic content, as seen in this plot (Figure 10c). Therefore, the influence of temperature on resistivity is dependent on moisture availability. In soils with low moisture content, temperature effects may dominate, while in moist soils, the impact of temperature is less significant compared to the effects of moisture.

From the field-measured data, the clearest relationship with a well-defined inverse correlation was observed in resistivity versus soil moisture content. Resistivity versus suction exhibited significant variability, indicating that suction alone cannot explain resistivity changes. It can be best interpreted alongside moisture content and other variables. Resistivity versus soil temperature shows a positive trend but with more variability than moisture content. The observed trend reflects the combined influence of temperature and moisture availability. The variability in the scatter plots highlights the need for multi-parameter models to interpret resistivity to increase accuracy.

5.2. Correlation Among Variables

Correlation analysis, which may be performed using either the Pearson correlation or the Spearman rank-order correlation, evaluates the connection between two variables. Quantifying the strength of a linear connection between variables is a popular application of the Pearson technique, which is often used in the domains of science and engineering studies. In comparison, Spearman rank-order correlation is a nonparametric measure of the strength and direction of the association between two ranked variables. It is very effective, particularly when the variables do not meet the assumptions of linearity or normal distribution required by the Pearson correlation coefficient.

These two categories are distinguished from one another primarily by how they evaluate their relationships. The Pearson correlation is used to determine the degree of linearity in a connection, while the Spearman correlation is used to determine whether a monotonic relationship currently exists. When changes in one variable are proportional to changes in another variable, this indicates the existence of a linear connection. One kind of connection, known as a monotonic relationship, is characterized by the fact that variables tend to change together, albeit not always in a consistent manner. When compared to the Spearman correlation, which needs the data to be converted into ranked values before analysis, the Pearson correlation may be calculated straight from the raw data collection.

Since the nature of the association was unclear before the investigation, both forms of correlation were studied. When there is a small number of data pairs, the correlation values need to be near 1 or −1 to reach statistical significance. On the other hand, when there are many data pairs, correlations that are close to zero may be deemed very significant. The correlation matrix provides insights into the relationships between different features in the dataset. In the heatmap or correlation matrix shown in Figure 11, the correlation between moisture content, suction, temperature, and resistivity is visualized. Moisture content and resistivity have a strong negative correlation of −0.88, indicating that resistivity tends to decrease as moisture content increases. The plot suggests that moisture content is crucial in determining resistivity, which could be vital for predictive modeling.

The scattered plot between suction and resistivity indicates a weaker relationship. The data points are widely scattered (Figure 10b) with no clear trend, reflecting a weaker correlation of 0.34. This suggests that while suction may influence resistivity, it is less significant compared to other variables like moisture content. The scatter plot of temperature versus resistivity shows a positive trend, where higher temperatures are associated with increased resistivity values. This observation is consistent with a relatively moderate correlation of 0.41 between the two variables. The relationship between temperature and resistivity indicates that temperature is a factor that can affect resistivity, though not as strongly as moisture content.

5.3. Evaluation of Machine Learning (ML) Models

The performance of each ML model was evaluated, and the results were compared based on R², RMSE, bias, and ubRMSE. The models demonstrated varying levels of performance, depending on the complexity of the relationships in the data. During the training process, no overfitting or underfitting was observed. This was achieved through proper hyperparameter tuning and the use of cross-validation, ensuring that each model generalized well to unseen data. The details of each model are described in the following sections. As the data were normalized before modeling, the range of the actual and predicted values ranged from 0 to 1.

5.3.1. Linear Regression

Linear regression provided a baseline model, capturing linear relationships, but it was limited in handling complex patterns in the data. The LR performed reasonably well, with an R² value of 0.7397 and an RMSE of 0.14913. However, it struggled with non-linear relationships, resulting in moderate errors. The plot (Figure 12) shows considerable data points deviating from the 45^o diagonal (dashed line), indicating that while this method works well for simple relationships, it fails to capture more complex patterns. This can further be understood from the estimated bias. The bias yielded the highest value (−0.0287) for the LR model compared to other ML models, indicating that the LR model assumed linearity even if the data were non-linear. While investigating the scatter plots of resistivity with temperature, suction, and moisture, it is evident that only moisture shows an interpretable relationship with resistivity. On the contrary, no trend was observed for suction and temperature with resistivity; LR is unable to capture this complex non-linear behavior, which is attributed to the lower R² value.

5.3.2. Decision Tree Regressor

After the linear regression, this study approached decision tree regressor modeling. While it captured non-linear interactions, the performance was limited due to overfitting and a lower generalization capability than ensemble models. The DTR struggled to capture the complexity of the data, with an R² value of 0.5634 and a higher RMSE of 0.19356. In addition, the DTR model had the largest ubRMSE (0.1935), suggesting that it struggled with random errors. The scatter plot in Figure 13 shows significant variance around the 45^o diagonal line, indicating that the model’s predictions deviate substantially from the actual values. The model is prone to overfitting and lacks generalization capability. The low accuracy can be attributed to overfitting and small datasets. As there were less than 300 datasets, DTR struggled with small datasets because they tended to over-split the data, creating models that did not generalize well. In addition, overfitting occurs because the tree creates highly specific splits that align closely with the noise in the training data rather than capturing the true underlying patterns. The datasets used in the study were from all seasons (spring, summer, and winter). As the behavior of soil also depends on seasonal variation and vegetation, the sub-grouping of such data is required for each season. If the dataset has imbalanced classes or outliers, a decision tree may become biased toward the majority class or overreact to the outliers. A large dataset with a seasonal sub-group might increase accuracy under this modeling technique.

5.3.3. Random Forest

The random forest technique was adopted after the decision tree mechanism. As the decision tree method overfitted the data, methods like RF have the potential to improve accuracy, as it combines multiple decision trees. As can be seen, it achieved strong predictive performance by averaging multiple decision trees, reducing overfitting, and increasing generalization power. The RF demonstrated relatively robust performance, with an R² of 0.7351 and an RMSE of 0.15007. Regarding the ubRMSE and bias, the RF model performed similarly to the LR model but with a slightly lower random error (ubRMSE = 0.146347). A closer look at the plot presented in Figure 14 reveals that the values are closer to the 45° line at the lower and upper range, while in the middle, it is more scattered. The reason can be attributed to the fact that RF combines predictions from many decision trees and each tree captures different aspects of the data through random sampling of the data and features. In addition to reducing the likelihood of data overfitting, this averaging technique may also help smooth out complex patterns, resulting in improved predictions at the extreme ends of the range (low and high values), where the data points are either more distinct or fewer.

As mentioned before, the dataset was not sub-grouped under seasonal variation. Based on previous studies [47], resistivity values capture more variation in the drier months compared to the wetter months. During the wetter months, resistivity values tend to cluster despite variations in temperature and suction levels. In other words, in the summer months, slight temporal variation can bring changes in the resistivity values, whereas in the winter months, it takes considerable variation in the ambiance to alter the resistivity of soil. As such, in the middle range, where there is often more data density, the RF model might struggle to capture subtle variations because of averaging effects.

5.3.4. Support Vector Machine

The support vector machine technique was adopted after RF modeling. The SVM model exhibited robust performance, especially in classification tasks and with smaller datasets, demonstrating an effective management of intricate decision boundaries. Figure 15 shows the actual and model-predicted values. The SVM model delivered one of the best performances, with an R² of 0.7698 and a lower RMSE of 0.14023 compared to the LR, DTR, and RF models’ RMSE. In addition, the SVM showed a balance of low bias (−0.00135) and relatively low ubRMSE (0.140226), making it one of the better-performing models.

The SVM aims to find the optimal boundary that separates the data into different regions, maximizing the margin between different classes or between predicted and actual values in regression tasks. In addition, unlike the RF, which averages predictions and may struggle in densely populated regions, the SVM can effectively balance the influence of different regions by focusing on maximizing the margin. However, the accuracy below 0.8 in terms of R² can be attributed to the absence of a trend between temperature and suction with resistivity. Even though resistivity variations were noted with these two variables, there is no clear trend, unlike moisture variations. As such, the improved modeling technique is also battling to considerably improve model accuracy.

5.3.5. Artificial Neural Network

Finally, the artificial neural network model was adopted in soil resistivity prediction. It delivered an almost similar predictive accuracy to the SVM. The ANN model demonstrated strong predictive performance, with an R² value of 0.7875 (2.27% higher than the SVM) and a low RMSE of 0.13505. The bias and ubRMSE further support the ANN model’s best performance in terms of bias (−0.00449) and ubRMSE (0.134976), indicating minimal systematic and random errors. The predicted values closely follow the 45^o diagonal line, as seen in Figure 16, indicating a good fit between actual and predicted values. The ANN effectively captured non-linear relationships in the data, making it one of the top-performing models along with the SVM. The SVM tends to perform exceptionally well with smaller datasets, as it does not require as much data to generalize. In addition, the dataset has a small number of features (inputs) for which both SVM and ANN may have comparable performance because the need for extensive feature extraction (an ANN strength) is reduced.

6. Discussion

6.1. Relationship Between Soil Resistivity and Hydrologic Variables

The scatter plots in Figure 10 show a strong inverse correlation between moisture content and resistivity. Soil moisture content plays a critical role in electrical conductivity. Water contains dissolved ions such as salts that contribute to the soil’s ability to conduct electricity. Higher moisture content increases the connectivity of water-filled pores, forming continuous conductive pathways. When soil moisture increases, ionic concentration in the pore water becomes a dominant factor in reducing resistivity. Conversely, when the soil is drier, the resistivity is higher due to limited ion mobility in disconnected water films. As a result, the amount of water in the soil changes the structure of the pores directly. This makes more conductive networks available, which makes the relationship between soil moisture and resistivity more deterministic and less affected by other factors, as pronounced in this study shown in Figure 10a.

On the other hand, there is a scattered correlation between matric suction and resistivity (Figure 10b), showing no clear trend. Matric suction is indirectly related to moisture content. As suction increases, moisture content decreases. Different soil types such as clay, silt, and sand have different moisture retention characteristics. Matric suction is influenced by the soil texture and pore structure, which adds complexity and variability. In this study, the test sections were constructed with CH-type soil, with almost 40% clay, 47% silt, and 13% sand. The variability in pore structure may have been influenced by these soil constituents, which suggests that matric suction might not accurately reflect the changes in soil resistivity. As a result, the relationship between suction and resistivity was not as consistent as that of moisture content. We anticipate that a different soil matrix in the field conditions may display a different scatter plot of resistivity versus suction. Figure 10c shows a scattered distribution, with no strong correlation between resistivity and soil temperature like the suction scatter plot. In porous media, resistivity is more sensitive to changes in temperature and relative humidity compared to nonporous materials, which show stronger resistivity changes with temperature. This phenomenon is particularly appropriate for sandstone and marble, where temperature alone has a stronger correlation with resistivity. However, in field conditions, temperature changes are often accompanied by changes in moisture content induced by evapotranspiration, which tends to overshadow the effect of temperature. In addition, the field conditions may have high variability in ionic concentration and soil composition. This coupling complicates isolating the direct effect of temperature on resistivity in the field conditions, as reflected in Figure 10c.

6.2. Evaluation of Model Performance Metrics

To evaluate the efficiency of the five machine learning models (LR, DTR, RF, SVM, and ANN), four performance metrics were used: R², RMSE, bias, and ubRMSE. The R² indicates the goodness-of-fit of the model and identifies how well the model captures variability in the data. The R² value of one indicates a perfect prediction. The RMSE captures the overall magnitude of prediction errors. Lower RMSE values indicate better model performance. Bias reveals systematic over or underprediction by identifying systematic errors in the model. The ubRMSE is a variation in the RMSE that isolates and measures the random errors in predictions, removing the systematic bias from the calculation. It is connected to the RMSE by accounting for and subtracting the bias (the systematic difference between predictions and actual values). The ubRMSE metric captures the variability in the errors due to random or stochastic factors, excluding systematic bias. A lower ubRMSE indicates that the predictions are closer to the actual values, with little randomness. Separating the bias and ubRMSE helps distinguish between systematic errors (captured by bias) and random errors (captured by ubRMSE). Incorporating these four metrics ensures both the accuracy and reliability of the models. The following table (Table 2) lists four measurement metrics of the five different ML models:

The R² values indicated that the ANN (R² = 0.787) performed the best, followed closely by the SVM (R² = 0.770) and RF (R² = 0.736). The LR also demonstrated reasonable accuracy (R² = 0.740), especially given its simplicity. The DTR significantly underperformed, with the lowest R² value (0.561).

The RMSE and ubRMSE followed a trend consistent with R². The ANN model achieved the lowest RMSE (0.135), followed by the SVM (0.140) and LR (0.149). The DTR had the highest RMSE (0.194), demonstrating its poorer predictive accuracy. The ubRMSE values were also close to the RMSE for all the models, indicating that random errors dominated the overall error, with little systematic bias in the predictions. For instance, the ANN model’s ubRMSE (0.135) and RMSE (0.135) are almost identical (Table 2), suggesting minimal systematic errors in the prediction.

Bias values for all models were relatively low, demonstrating minimal systematic deviation from the true values in all the models. The DTR, SVM, and ANN exhibited small bias magnitudes (−0.00306, 0.00135, and −0.00449, respectively), while the LR model showed the largest bias (−0.02866). The negative bias values in all models indicate that on average, the predictions are slightly underestimating the actual values. This systematic underestimation could result from the characteristics of the dataset, such as an uneven distribution of target values. However, these bias values are small enough to have a limited impact compared to the random error component.

Despite the ANN’s superior performance metrics, the difference between its performance and that of LR is not significant. This is likely due to the limited size of the dataset. Smaller datasets reduce the opportunity for complex models like the ANN to demonstrate their full potential. The LR, being simpler, is less prone to overfitting and can perform competitively when the data does not strongly demand non-linear modeling. Additionally, small datasets can limit the ability of the ANN to adequately learn intricate patterns of data, narrowing the performance gap with simpler models. This is reflected in the R² and RMSE values, where the ANN and LR are quite similar.

The ANN model performed best, achieving the highest R² and lowest RMSE, with minimal bias and random errors. However, the relatively small dataset constrained its advantage over simpler models like LR, which also achieved competitive results. The RF performed similarly to LR but had a slightly higher RMSE and bias. The SVM model emerged as another strong contender, with good predictive power and minimal error, balancing simplicity and accuracy. However, in dealing with large datasets, the ANN may be a more advantageous choice over the SVM, even if both models yield similar R² values. The ANN is designed to capture complex non-linear relationships within data and, with increased data points, tends to improve their performance by refining the weights and biases in their layered structure through backpropagation. The ANN technique scales well with larger datasets due to their inherent ability to learn and represent intricate patterns as they utilize a multi-layer architecture, which allows them to approximate almost any function given sufficient training data. In contrast, the SVM often faces computational limitations with large datasets, as the algorithm involves quadratic optimization that becomes increasingly complex with more support vectors, leading to higher training times and memory requirements. Therefore, the ANN model may offer greater scalability and adaptability in scenarios with abundant data, making them potentially better choices as dataset size increases. The DTR, on the other hand, lagged significantly behind the different models, reflecting its limitations in this scenario. Given the limited dataset size, future work could explore augmenting the data or applying cross-validation to evaluate model performance better. Additionally, ensemble methods like boosting could be tested to improve the performance of tree-based models.

7. Conclusions

Based on field instrumentation in Texas, USA, soil resistivity values were collected along with soil moisture, temperature, and suction. The study attempted to predict resistivity values using several machine learning techniques. The limitation of the study is the lack of large datasets, which is a must for performing high-end modeling. In addition, the datasets were not sub-grouped under various seasons, which might describe the regional differences in the model. The key findings of the study are pointed out below:

○: In terms of variable relationships, a strong inverse correlation between soil moisture content and resistivity (correlation coefficient of −0.88) was observed. A weak positive correlation between suction and resistivity was noted (0.34), indicating that suction alone is insufficient as a predictor. In addition, soil temperature and resistivity showed a weak positive correlation (0.41), influenced by moisture availability.
○: Artificial neural network and support vector machine models were the most effective, with coefficients of determination above 0.76 and low root mean square error (RMSE) values (0.135).
○: Linear regression and the decision tree regressor underperformed due to their limitations in capturing complex non-linear relationships.
○: Random forest provided better generalization than decision trees but struggled with subtle variations in mid-range data.
○: The absence of standardized datasets and the limited data size constrained model accuracy and generalization. Seasonal variability significantly impacts soil resistivity, which necessitates data subgrouping for better predictions.

Author Contributions

Conceptualization, M.J.B.A., A.A. and A.G.; methodology, M.J.B.A. and A.A.; software, A.A. and A.G.; validation, M.J.B.A. and A.A.; formal analysis, M.J.B.A., A.A. and A.G.; investigation, M.J.B.A. and A.A.; resources, M.J.B.A. and A.A.; data curation, M.J.B.A.; writing—original draft preparation, M.J.B.A., A.A. and A.G.; writing—review and editing, M.J.B.A. and A.A.; visualization, M.J.B.A. and A.A.; supervision, M.J.B.A. and A.A.; project administration, M.J.B.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing does not apply to this article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Mitchell, J.K.; Soga, K. Fundamentals of Soil Behavior, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2005; ISBN 978-0-471-46302-3. [Google Scholar]
Fredlund, D.G.; Rahardjo, H. Soil Mechanics for Unsaturated Soils; Wiley: New York, NY, USA, 1993; ISBN 978-0-471-85008-3. [Google Scholar]
Samouëlian, A.; Cousin, I.; Tabbagh, A.; Bruand, A.; Richard, G. Electrical Resistivity Survey in Soil Science: A Review. Soil Tillage Res. 2005, 83, 173–193. [Google Scholar] [CrossRef]
McCarter, W.J. The Electrical Resistivity Characteristics of Compacted Clays. Géotechnique 1984, 34, 263–267. [Google Scholar] [CrossRef]
Abu-Hassanein, Z.S.; Benson, C.H.; Blotz, L.R. Electrical Resistivity of Compacted Clays. J. Geotech. Eng. 1996, 122, 397–406. [Google Scholar] [CrossRef]
Ekwue, E.I.; Bartholomew, J. Electrical Conductivity of Some Soils in Trinidad as Affected by Density, Water and Peat Content. Biosyst. Eng. 2011, 108, 95–103. [Google Scholar] [CrossRef]
Kalinski, R.; Kelly, W. Estimating Water Content of Soils from Electrical Resistivity. Geotech. Test. J. 1993, 16, 323–329. [Google Scholar] [CrossRef]
Giao, P.H.; Chung, S.G.; Kim, D.Y.; Tanaka, H. Electric Imaging and Laboratory Resistivity Testing for Geotechnical Investigation of Pusan Clay Deposits. J. Appl. Geophys. 2003, 52, 157–175. [Google Scholar] [CrossRef]
Huisman, J.A.; Hubbard, S.S.; Redman, D.; Annan, P.A. Soil Water Content Measurements with Ground-Penetrating Radar: A Review. In Proceedings of the EGS-AGU-EUG Joint Assembly, Nice, France, 6–11 April 2003; p. 5210. Available online: https://hdl.handle.net/11245/1.226163 (accessed on 3 November 2024).
Robinson, D.A.; Jones, S.B.; Wraith, J.M.; Or, D.; Friedman, S.P. A review of advances in dielectric and electrical conductivity measurement in soils using time domain reflectometry. Vadose Zone J. 2003, 2, 444–475. [Google Scholar] [CrossRef]
Corwin, D.L.; Lesch, S.M. Characterizing soil spatial variability with apparent soil electrical conductivity: I. Survey protocols. Comput. Electron. Agric. 2005, 46, 103–133. [Google Scholar] [CrossRef]
Noborio, K. Measurement of Soil Water Content and Electrical Conductivity by Time Domain Reflectometry: A Review. Comput. Electron. Agric. 2001, 31, 201–252. [Google Scholar] [CrossRef]
Lu, N.; Likos, W.J. Unsaturated Soil Mechanics; John Wiley & Sons: Hoboken, NJ, USA, 2004; ISBN 978-0-471-44731-3. [Google Scholar]
Topp, G.C.; Davis, J.L.; Annan, A.P. Electromagnetic Determination of Soil Water Content: Measurements in Coaxial Transmission Lines. Water Resour. Res. 1980, 16, 574–582. [Google Scholar] [CrossRef]
Keller, G.V.; Frischknecht, F.C. Electrical Methods in Geophysical Prospecting. In International Series of Monographs on Electromagnetic Waves; Pergamon Press: Oxford, UK, 1966; ISBN 978-0-08-011525-2. [Google Scholar]
Klein, K.A.; Santamarina, J.C. Electrical conductivity in soils: Underlying phenomena. J. Environ. Eng. Geophys. 2003, 8, 263–273. [Google Scholar] [CrossRef]
Cordero-Vázquez, C.Y.; Delgado-Rodríguez, O.; Cisneros-Almazán, R.; Peinado-Guevara, H.J. Determination of Soil Physical Properties and Pre-Sowing Irrigation Depth from Electrical Resistivity, Moisture, and Salinity Measurements. Land 2023, 12, 877. [Google Scholar] [CrossRef]
Michot, D.; Thomas, Z.; Adam, I. Nonstationarity of the electrical resistivity and soil moisture relationship in a heterogeneous soil system: A case study. Soil 2016, 2, 241–255. [Google Scholar] [CrossRef]
Gao, Y. Soil-water retention behavior of compacted soil with different densities over a wide suction range and its prediction. Comput. Geotech. 2017, 91, 17–26. [Google Scholar] [CrossRef]
Najdi, A.; Encalada, D.; Mendes, J.; Prat, P.C.; Ledesma, A. Evaluating innovative direct and indirect soil suction and volumetric measurement techniques for the determination of soil water retention curves following drying and wetting paths. Eng. Geol. 2023, 322, 107179. [Google Scholar] [CrossRef]
Nadler, A.; Frenkel, H. Determination of soil solution electrical conductivity from bulk soil electrical conductivity measurements by the four-electrode method. Soil Sci. Soc. Am. J. 1980, 44, 1216–1221. [Google Scholar] [CrossRef]
Gupta, S.C.; Hanks, R.J. Influence of water content on electrical conductivity of the soil. Soil Sci. Soc. Am. J. 1972, 36, 855–857. [Google Scholar] [CrossRef]
Revil, A.; Glover, P.W.J. Theory of ionic-surface electrical conduction in porous media. Phys. Rev. B 1997, 55, 1757. [Google Scholar] [CrossRef]
Albrecht, B.; Benson, C. Effect of Desiccation on Compacted Natural Clays. J. Geotech. Geoenviron. Eng. 2001, 127, 67–75. [Google Scholar] [CrossRef]
Campbell, G.S.; Anderson, R.Y. Evaluation of Simple Transmission Line Oscillators for Soil Moisture Measurement. Comput. Electron. Agric. 1998, 20, 31–44. [Google Scholar] [CrossRef]
Phene, C.J.; Clark, D.A.; Cardon, G.E.; Mead, R.M. Soil Matric Potential Sensor Research and Applications. In Advances in Measurement of Soil Physical Properties: Bringing Theory into Practice; Soil Science Society of Amer: Madison, WI, USA, 1992; Volume 30, pp. 263–280. [Google Scholar] [CrossRef]
Shao, W.; Yue, W.; Zhang, Y.; Zhou, T.; Zhang, Y.; Dang, Y.; Wang, H.; Feng, X.; Chao, Z. The Application of Machine Learning Techniques in Geotechnical Engineering: A Review and Comparison. Mathematics 2023, 11, 3976. [Google Scholar] [CrossRef]
Zhang, W.; Li, H.; Li, Y.; Liu, H.; Chen, Y.; Ding, X. Application of deep learning algorithms in geotechnical engineering: A short critical review. Artif. Intell. Rev. 2021, 54, 5633–5673. [Google Scholar] [CrossRef]
Puri, N.; Prasad, H.D.; Jain, A. Prediction of geotechnical parameters using machine learning techniques. Procedia Comput. Sci. 2018, 125, 509–517. [Google Scholar] [CrossRef]
Baghbani, A.; Choudhury, T.; Costa, S.; Reiner, J. Application of artificial intelligence in geotechnical engineering: A state-of-the-art review. Earth-Sci. Rev. 2022, 228, 103991. [Google Scholar] [CrossRef]
Moghadas, D.; Badorreck, A. Machine learning to estimate soil moisture from geophysical measurements of electrical conductivity. Near Surf. Geophys. 2019, 17, 181–195. [Google Scholar] [CrossRef]
Kundu, S.K.; Dey, A.K.; Sapkota, S.C.; Debnath, P.; Saha, P.; Ray, A.; Khandelwal, M. Advanced predictive modelling of electrical resistivity for geotechnical and geo-environmental applications using machine learning techniques. J. Appl. Geophys. 2024, 231, 105557. [Google Scholar] [CrossRef]
Santie, P.A.W.; Wilopo, W.; Faris, F. Slope Stability Analysis Using Electrical Resistivity Tomography and Limit Equilibrium Method: A Case Study from Girimulyo, Kulon Progo. J. Appl. Geol. 2024, 9, 37–50. [Google Scholar] [CrossRef]
Li, C.; Wei, L.; Xu, Q.; Yang, L.; Li, J.; Wan, X. Structural Detection and Stability Monitoring of Deep Strata on a Slope Using High-Density Resistivity Method and FBG Strain Sensors. Appl. Sci. 2024, 14, 3272. [Google Scholar] [CrossRef]
Nalakurthi, N.V.S.R.; Abimbola, I.; Ahmed, T.; Anton, I.; Riaz, K.; Ibrahim, Q.; Banerjee, A.; Tiwari, A.; Gharbia, S. Challenges and Opportunities in Calibrating Low-Cost Environmental Sensors. Sensors 2024, 24, 3650. [Google Scholar] [CrossRef] [PubMed]
Piegari, E.; Di Maio, R. Estimating soil suction from electrical resistivity. Nat. Hazards Earth Syst. Sci. 2013, 13, 2369–2379. [Google Scholar] [CrossRef]
Kong, L.W.; Sayem, H.M.; Zhang, X.W.; Yin, S. Relationship between Electrical Resistivity and Matric Suction of Compacted Granite Residual Soil. In Proceedings of the PanAm Unsaturated Soils, Dallas, TX, USA, 12–15 November 2017; pp. 430–439. [Google Scholar] [CrossRef]
Millikan, R.A.; Bishop, E.S. Elements of Electricity: A Practical Discussion of the Fundamental Laws and Phenomena of Electricity and Their Practical Applications in the Business and Industrial World; American Technical Society: Orland Park, IL, USA, 1917. [Google Scholar]
Bai, L.; Huo, Z.; Zeng, Z.; Liu, H.; Tan, J.; Wang, T. Groundwater flow monitoring using time-lapse electrical resistivity and Self Potential data. J. Appl. Geophys. 2021, 193, 104411. [Google Scholar] [CrossRef]
Terry, N.; Day-Lewis, F.D.; Lane, J.W.; Johnson, C.D.; Werkema, D. Field Evaluation of Semi-Automated Moisture Estimation from Geophysics Using Machine Learning. Vadose Zone J. 2023, 22, e20246. [Google Scholar] [CrossRef] [PubMed]
Raczko, E.; Zagajewski, B. Comparison of Support Vector Machine, Random Forest and Neural Network Classifiers for Tree Species Classification on Airborne Hyperspectral APEX Images. Eur. J. Remote Sens. 2017, 50, 144–154. [Google Scholar] [CrossRef]
Ozsagir, M.; Erden, C.; Bol, E.; Sert, S.; Özocak, A. Machine learning approaches for prediction of fine-grained soils liquefaction. Comput. Geotech. 2022, 152, 105014. [Google Scholar] [CrossRef]
Driba, D.L.; Emmanuel, E.D.; Doro, K.O. Predicting Wetland Soil Properties Using Machine Learning, Geophysics, and Soil Measurement Data. J. Soils Sediments 2024, 24, 2398–2415. [Google Scholar] [CrossRef]
Ozcep, F.; Yildirim, E.; Tezel, O.; Asci, M.; Karabulut, S. Correlation Between Electrical Resistivity and Soil-Water Content Based on Artificial Intelligent Techniques. Int. J. Phys. Sci. 2010, 5, 47–56. [Google Scholar]
Zamanian, M.; Asfaw, N.; Chavda, P.; Shahandashti, M. Deep learning for exploring the relationship between geotechnical properties and electrical resistivities. Transp. Res. Rec. 2024, 2678, 659–672. [Google Scholar] [CrossRef]
Chala, A.T.; Ray, R. Assessing the Performance of Machine Learning Algorithms for Soil Classification Using Cone Penetration Test Data. Appl. Sci. 2023, 13, 5758. [Google Scholar] [CrossRef]
Alam, M.J. Evaluation of Plant Root on the Performance of Evapotranspiration (ET) Cover System. Ph.D. Thesis, University of Texas at Arlington, Arlington, TX, USA, 2017. [Google Scholar]

Figure 1. (a) Geographical location of the City of Denton Landfill; (b) landfill footprint.

Figure 2. (a) Side-by-side location of the six lysimeters (dimension: 1219 cm by 1219 cm × 122 cm). The flat surface lysimeters had a 2% slope, and the sloped section had a 25% slope; (b) section schematic of the lysimeter. The geomembrane (highlighted red), overlain by a geocomposite drainage layer (highlighted by a dotted blue line) was deployed over the compacted excavated subgrades. Each cover comprised 92 cm of compacted clay topped by a 30.5 cm vegetation layer. The sensors installed at each of the lysimeters include moisture and temperature sensors (as rectangle boxes) and tensiometers (as square boxes).

Figure 3. Electrical resistivity equipment (R8/IP resistivity meter) deployed in the field.

Figure 4. Resistivity tomography results of a few of the lysimeters.

Figure 5. The integration of multiple (a) Decagon 5TM soil moisture and temperature sensors alongside (b) Decagon MPS-2 tensiometers was facilitated to measure the volumetric moisture content and matric suction of the soil. The sensors were installed at intervals of 23 cm and were connected to a (c) data logger positioned outside the lysimeter. All moisture sensors and tensiometers were programmed for data synchronization.

Figure 6. Section of moisture and temperature sensors.

Figure 7. Soil sample collection (disturbed soil sample).

Figure 8. Methodology of the machine learning process.

Figure 9. Artificial neural network architecture.

Figure 10. Scatter plots of (a) resistivity and soil moisture content, (b) resistivity and soil suction, and (c) resistivity and soil temperature.

Figure 11. Correlation matrix of the input variables.

Figure 12. Measured vs. predicted plot for linear regression.

Figure 13. Measured vs. predicted plot for decision tree.

Figure 14. Measured vs. predicted plot for random forest.

Figure 15. Measured vs. predicted plot for support vector machine.

Figure 16. Measured vs. predicted plot for artificial neural network.

Table 1. Experimental test program on soil samples.

Grain Size Distribution (%)	Sand	Silt	Clay
Grain Size Distribution (%)	13	46.5	40.5
Atterberg Limits (%)	Liquid Limit	Plastic Limit	Plasticity Index
Atterberg Limits (%)	56	26.5	29.5
Specific Gravity, G_s		2.76
Maximum Dry Density, γ_d (kN/m³)		16.88
Optimum Moisture Content (%)		17.9
Saturated Hydraulic Conductivity, k_sat (cm/s)		8.72 × 10⁻⁷

Table 2. Comparison of different model metrics.

Model	R²	RMSE	Bias	ubRMSE
LR	0.739665	0.149127	−0.02866	0.146347
DTR	0.561386	0.193566	−0.00306	0.193542
RF	0.736337	0.150077	−0.01847	0.148936
SVM	0.769792	0.140233	−0.00135	0.140226
ANN	0.787592	0.135050	−0.00449	0.134976

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alam, M.J.B.; Gunda, A.; Ahmed, A. Machine Learning Approach to Model Soil Resistivity Using Field Instrumentation Data. Geotechnics 2025, 5, 5. https://doi.org/10.3390/geotechnics5010005

AMA Style

Alam MJB, Gunda A, Ahmed A. Machine Learning Approach to Model Soil Resistivity Using Field Instrumentation Data. Geotechnics. 2025; 5(1):5. https://doi.org/10.3390/geotechnics5010005

Chicago/Turabian Style

Alam, Md Jobair Bin, Ashish Gunda, and Asif Ahmed. 2025. "Machine Learning Approach to Model Soil Resistivity Using Field Instrumentation Data" Geotechnics 5, no. 1: 5. https://doi.org/10.3390/geotechnics5010005

APA Style

Alam, M. J. B., Gunda, A., & Ahmed, A. (2025). Machine Learning Approach to Model Soil Resistivity Using Field Instrumentation Data. Geotechnics, 5(1), 5. https://doi.org/10.3390/geotechnics5010005

Article Menu

Machine Learning Approach to Model Soil Resistivity Using Field Instrumentation Data

Abstract

1. Introduction

2. Electrical Resistivity Tomography (ERT)

3. Related Work

4. Materials and Methods

4.1. Field Investigation

4.1.1. Description of the Lysimeters

4.1.2. Description of the ERT Equipment, Field Testing, and Data Acquisition

4.1.3. Field Instrumentation

4.1.4. Soil Characteristics

4.2. Machine Learning (ML) Model

4.2.1. Linear Regression (LR)

4.2.2. Decision Tree Regressor (DTR)

4.2.3. Random Forest (RF)

4.2.4. Support Vector Machine (SVM)

4.2.5. Artificial Neural Network (ANN)

5. Results

5.1. Scatter Plots of Different Variables

5.2. Correlation Among Variables

5.3. Evaluation of Machine Learning (ML) Models

5.3.1. Linear Regression

5.3.2. Decision Tree Regressor

5.3.3. Random Forest

5.3.4. Support Vector Machine

5.3.5. Artificial Neural Network

6. Discussion

6.1. Relationship Between Soil Resistivity and Hydrologic Variables

6.2. Evaluation of Model Performance Metrics

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI