The second stage focused on processing the raw data to generate the thematic layers. Nine thematic layers were developed for the study area: precipitation, DSD, geomorphology, geology, CN, total dissolved solid, elevation, slope and major fracture. Many processing techniques and statistical algorithms were utilised in a spatial context to develop these thematic layers. The next subsection presents details about each thematic layer.
In the third stage, all the thematic layers were processed to model and map the best suitable dam location in Sharjah. Since the units of each thematic factor are different, combining and fusing the thematic factors is not applicable. Therefore, all the thematic layers were standardized by reclassifying each into five classes through the natural breaks (Jenks) technique [
32]. Each class was ranked in a scale from one to nine according to its relation to a suitable location for dam construction. Two main approaches were followed to determine the weighting, i.e., the AHP and ML techniques. AHP modelling depends mainly on determining the weight in line with the literature and expert opinion. Conversely, the ML approach depends on ground truth data about the available groundwater in the study area [
8]. The resulted weighting of the two approaches was moderated to obtain a new thematic layer weighting. The new weighting was utilised to generate the DSSM for Sharjah following the equation below:
where
k = element of the thematic layer set,
f = set of all the thematic layers,
W = weight of each thematic layer and
r = rating of the subclasses of each thematic layer.
Subsequently, the DSSM was validated following existing dams in the study area, where the weighting may be fine-tuned to achieve accuracy. The drainage streamlines (vector format) were finally overlaid at the DSSM to select the best locations for dams in Sharjah.
3.1. Data Used
This study adopted nine parameters for determining suitable sites for dam construction: precipitation, drainage density, geomorphology and geology, CN, TDS, slope, elevation and distance from major fractures. The brief descriptions of the aforementioned parameters for locating suitable sites for dam construction are indicated below:
Precipitation:Figure 3a depicts the annual total rainfall data obtained from the National Centre of Meteorology, UAE, for the period of 2003–2017 to prepare the rainfall map. For Sharjah, the recorded amount varied from 73 mm as the minimum annual total to 92 mm as the maximum annual total. Precipitation is high in the north-eastern part of the study area. High precipitation zones are considered appropriate for identifying suitable sites for dam construction. The inverse distance weighted (IDW) interpolation method was applied to identify the spatial layer. The IDW equation is as follows:
where
Z0 is the estimation value of variable
z in point
i,
zi is the sample value in point
i,
di is the distance of the sample point to the estimated point,
N is the coefficient that determines weight according to distance, and
n is the total number of predictions for each validation case [
33].
Hydrology and drainage stream density: Hydrology plays a vital role in understanding the structure of available water resources: surface and sub-surface resources. Streams originate from the north-eastern part of the UAE and flow towards the western part to join the sea. Thus, first-order streams are located at the eastern part, and a higher order of streams is located at the western part (
Figure 3b). For locating suitable dam sites, first-order streams are generally considered [
18]. DSD is the total length of streams of all orders divided by the area of the drainage basin and indicates the closeness of the spacing of channels [
23]. A major indicator of percolation rate is lithology, which determines the quality of a drainage network. The structure of a drainage network helps gauge the characteristics of a water holding zone [
19]. The drainage density of the study area ranges from 0 to 0.58 per km
2 (
Figure 3c). The suitability of locating a dam site is directly proportional to the drainage density because of its relationship with surface runoff and permeability. A high drainage density indicates a high prospect of groundwater and increased suitability for locating a dam site [
25]. The drainage streams and basins should also be mapped for preparing the drainage density raster layer.
Geomorphology: This thematic layer has seven classes: Fan deposit, high and low dunes, mountain, sand, urban areas and vegetation (
Figure 3d). It is important to mention that the urban areas class was considered within the geomorphology layer as it describes the physical feature (mainly asphalt for roads and concrete for buildings) for the urban centre in Sharjah. Urban areas are dominant along the western part of the study area near the shores [
8]. The eastern part of the study area is mostly covered by fan deposits, whilst the middle part is predominantly covered by low and high dunes.
Geology: The produced geology layer consists of alluvium, gabbro, limestone, metamorphic, ophiolite and sand classes (
Figure 3e). As the map indicates, most of the study area consists of sand. The profusion of alluvium and sand in the study area allows for suitable locations for dam construction.
Curve number: The CN is used to predict direct runoff or retention. The CN depends on the soil type, the effects of land use and cover and the hydrogeological condition. For this study, CNs were obtained for the mentioned conditions (
Figure 3f). A high CN indicates low suitability for locating dam sites.
Total dissolved solids: TDS is a fundamental parameter to assess water quality. A high TDS value indicates high turbidity and unsuitability for potable use. Given the high salinity of the region, the sea water of the Arabian Gulf represents a major hurdle in its use for potable drinking conditions. For the study area, TDS was measured in parts per million (ppm). The TDS values range from 657 to 2530 ppm for Sharjah (
Figure 3g). The map was prepared by referring to the values of TDS from the UAE Ministry of Environment and Water (2015).
Elevation: A digital elevation model (DEM) with a resolution of 30 m was acquired for Sharjah, and the elevation raster ranged from 0 to 413 m (above sea level) (
Figure 3h). The DEM influences the dam’s best location since it affects the water accumulation and movement [
34]. A low elevation has been considered suitable for dam site construction since the possibility of accumulating precipitated water as well as groundwater is higher at a lower elevation [
8,
35].
Slope: The slope degree parameter affects water velocity for surface and ground water. The lower the slope, the higher the possibility of water accumulation. As with elevation, a slope map was also obtained from the DEM at a 30 m resolution (
Figure 3i). For Sharjah, the slope ranged up to 58 degrees (
Figure 3i). Water velocity is directly proportional to slope. A slope under 5% is suitable for the water holding capacity of the proposed dam construction [
22].
Major fracture Euclidean distance: For the Northern UAE, major fractures are mostly concentrated in the mountainous regions (
Figure 3j). The stream flow depends on the major fracture and moves from higher to lower elevations. A distance of 100 m from the tectonic faults and fractures should be at least considered for considering suitable locations for dam construction. Areas with faults should be excluded from the consideration of dam site suitability [
18].
3.3. Machine Learning
ML techniques use computational algorithms and statistical models to develop models without any predetermined model. In other words, ML teaches the computer to do what humans naturally do, i.e., learn from experience and examples. As the quality of the experience and the examples increase, the better the ML achieved models become. ML techniques are of two types: supervised learning and unsupervised learning. The former tackles regression and classification problems, and the latter deals with clustering problems only. In this paper, we utilised three supervised ML techniques to classify the best location to build new dams in Sharjah. The techniques include random forest (RF), gradient boosted trees (GBT) and support vector machine (SVM). The RF technique is an advanced version of the decision tree method that uses a tree-like model structure for classification prediction through a multiple splitting process. The RF is an ensemble ML that consists of multiple tree models to improve prediction accuracy. The GBT is an ensemble tree-like model that consists of two steps: Firstly, it utilises the subsets of the original database to generate a series of averagely performing models. Secondly, it ‘boosts’ their performance by combining them using a certain cost function. Finally, the SVM technique uses kernel functions to transform the input data into a high dimensional feature space, where linear modelling is then employed to overcome any complex nonlinear relationship [
40]. SVM is best known for its capacity in classification problems [
41].
We established four databases ranging from 1000 to 2500 data sample across the study area to train the ML algorithms in this research. The independent parameters are the thematic layers developed in this study, and the dependent parameter was the potential groundwater. The groundwater potential map (GPM) was developed in an early study for the Northern Emirates [
8]. The GPM data considered in this study were categorised into three groups: high, medium and low. All ML modelling was conducted utilizing the RapidMiner software (
https://rapidminer.com/). Initial modelling revealed that the database with 2000 data samples attained the best accuracy. Therefore, all the results for the ML techniques, described in this section, are based on the 2000-sample database.
Each ML model has been fine-tuned to achieve its best accuracy. The best RF model has trees of 100 with a maximum depth of seven. Conversely, the best GBT has trees of 100 with a maximum depth of two. Finally, the best-scored SVM had a gamma of 0.001 and C of 100. Overall, RF had the best accuracy relative to the other techniques (
Table 4). RF reached 76.5% accuracy with a standard deviation of ±2.4%. The accuracy rates of the GBT and the SVM were 76% and 72.7%, respectively.
Table 5 shows the confusion matrix accuracy of the RF technique. The ‘True High’ accuracy reaches almost 80%.
Figure 4 presents the weight of each inputted parameter (the thematic layers) with respect to the dependent parameter (GPM) according to feature engineering techniques. The weights are generally comparable with the AHP weights that are based on the literature and expert opinions. However, a key difference occurs. The slope parameter from the ML received zero weight, thereby indicating that its presence is unimportant. By contrast, the slope from the AHP attained a 5% weighting. Given our knowledge of the study area, we assert that the slope is excluded in the ML weighting process because the variation in the slope within the study area is almost negligible.