Multi-Model Ensemble Approach For Soybean Crop Yield Estimation (Kharif-2023) in Latur District at Macroscale Level
Multi-Model Ensemble Approach For Soybean Crop Yield Estimation (Kharif-2023) in Latur District at Macroscale Level
Multi-Model Ensemble Approach For Soybean Crop Yield Estimation (Kharif-2023) in Latur District at Macroscale Level
Abstract:- Crop area estimation is a critical aspect of Unpredictable rainfall, rising temperatures, and extreme
agricultural monitoring and management, providing weather events like hailstorms and strong winds threaten crop
essential information for decision-making in the growth and yields. Farmers shift to drought-resistant crops,
agricultural sector. Study was carried out at Semantic rely heavily on irrigation, face soil degradation, and
Technologies and Agritech services Pvt. Ltd., GIS and experience economic vulnerability due to unstable
Remote sensing team, Pune during Kharif-2023. All production.
methodology given by YESTECH manual under Pradhan
Mantri Fasal Bima Yojana (PFMBY) was followed. Latur In the contemporary agricultural landscape, the accurate
district facing more weather-based yield losses during last estimation of crop yield has emerged as a critical aspect
few of years. In this case study we tried to estimate yield influencing various sectors including insurance, economy,
of soybean crop for agriculture-based stake holders, government policies, and ultimately, the welfare of farmers.
insurance companies, Government polices at Revenue Traditional methods of crop yield estimation are often marred
circle level (RC). Multimodal approach is beneficial over by limitations in accuracy and efficiency. However, the
single model yield estimation approach as it takes integration of advanced technologies, such as sophisticated
ensemble yield for perfect forecasting of crop yield. software, remote sensing, GIS (Geographic Information
Accuracy was in the range as given in YESTECH manual System), and cutting-edge artificial intelligence (AI) and
at RC level. Thus, overall results show that use of such machine learning (ML) techniques, has revolutionized the
model for yield estimation is one of the best approach to precision and reliability of crop yield estimation.
take the decisions for insurance based stake holders in
rainfed regions where more negative consequences on Importance of Accurate Crop Yield Estimation:
soybean productivity under different climate change
scenario was observed. Insurance Sector: Accurate crop yield estimates play a
pivotal role in the insurance sector, enabling precise risk
Keywords:- Remote Sensing, GIS, NPP, Machine Learning, assessment and facilitating the development of tailored
DSSAT-4.8, Soybean, Latur, Yield Simulation, Revenue insurance products.
Circle, Soybean productivity. Economic Implications: Crop yield estimates are
fundamental to economic forecasting, impacting
I. INTRODUCTION commodity markets, trade agreements, and pricing
mechanisms.
Agriculture is the backbone of global economies, Government Policies: Governments rely on accurate crop
providing sustenance and livelihoods for billions of people. yield estimates to formulate effective agricultural policies.
The ability to accurately predict crop yield is paramount for This includes allocation of subsidies, distribution of
effective resource management, risk mitigation, and informed resources, and planning for strategic interventions during
decision-making. Traditional methods, reliant on historical periods of adverse weather conditions or pest outbreaks.
data and manual observations, often fall short in addressing Public Welfare and Food Security: Accurate crop yield
the dynamic nature of modern agricultural challenges. The estimates are integral to ensuring food security and public
integration of advanced technologies has ushered in a new era welfare.
in agriculture, enabling a more nuanced and precise Farmers' Wellbeing: For farmers, precise crop yield
understanding of crop dynamics. Software applications, estimates translate into enhanced planning and risk
remote sensing, GIS, and AI/ML algorithms work management. Access to reliable information empowers
synergistically to process vast datasets, analyse patterns, and farmers to make informed decisions regarding crop
predict crop yields with unprecedented accuracy.
selection, resource allocation, and market participation, II. MATERIAL AND METHODS
thereby improving overall farm productivity and
livelihoods. A. Study Area
Study was carried out at Semantic Technologies and
The adoption of advanced methods for crop yield Agritech Services Pvt. Ltd., Pune during kharif season 2023
estimation is a transformative step towards building for particular assignment. For this study, all revenue circles
agricultural resilience in the face of evolving challenges. The (RC) in the districts of Latur of Maharashtra state were used
synergy between software applications, remote sensing, and as experimental sites. Field level data like ground truth, Crop
GIS technologies empowers stakeholders across sectors to cutting experiments were carried out.
make informed decisions, fostering a sustainable and
prosperous future for agriculture. By recognizing the B. Geography and Climate for Latur District:
multifaceted implications of accurate crop yield estimation, Latur districts cover 7,157 sq km of area. Annual rainfall
societies can work collaboratively to strengthen the averages 520 mm, with Kharif season receiving 350-390 mm.
foundations of global food security, economic stability, and Kharif temperatures range from 33-37°C maximum and 22-
the welfare of farming communities. 25°C minimum, with average relative humidity of 70-80%
located at latitudes 17°52′N to 18°50′N, 75°16′E to 76°42′E.
This report delves into the significance of employing Elevations range from 400-800 m. Soil and Drainage can be
advanced methods for estimating crop yield and highlights described as , Vertisols dominate the region, posing drainage
their implications across diverse domains. challenges due to flat topography and leading to waterlogging
during heavy rainfall. Situated in Boundary of Karnataka-
Maharashtra border, the district is surrounded by Osmanabad
in the South, Beed in the West, Parbhani and Nanded in the
North and Bidar district of Karnataka in the East. Soybean,
cotton, jowar, bajra, tur, and sugarcane is one of the major
crops which is taken in this district.
Methodology used is multimodal approach for Semi Physical Net Primary Productivity (NPP):
estimation of crop yield was given below. RC wise yield in
Tonnes/hector of soybean crop during kharif season 2023 was Data and Materials Used:
estimated by all following methods. The data and materials used in this study are as follows:
On the off chance that air temperature falls beneath Calculation of NPP and Grain Yield:
Tmin, which is quite a rare chance than Tscalar value will To compute the final Net Primary Productivity NPP and
automatically become 0. its Grain Yield, the formula and equation is used as follows.
The NPP sum has been multiplied with Harvest Index (0.45)
Light Use Efficiency (ℇ): to estimate per pixel yield.
The light use efficiency LUE is used for soybean crop
was 1.78 for the study. NPP = PAR * FAPAR * ℇ * Tstress * Wstress (Logic of
Monteith Equation 1972).
Crop Mask
The crop mask was derived utilizing Sentinel-1 Same methodology is followed by Upasana Singh et.al.
synthetic aperture radar (SAR) data obtained from the (2023) and also showing same results for all data used to run
European Space Agency (ESA) Copernicus Hub. Employing the model.
the R programming language, we employed the Random
Forest algorithm for the generation of the crop mask, Crop Simulation Model-DSSAT
implementing hyperparameter tuning techniques and Crop simulation model is a mathematical equation or the
contingency matrix analysis. This methodology was set of equations, which represents the behavior of system. We
systematically applied across our specified crops within the used CROPGRO – for Soybean crop. It is consisting of various
targeted area of interest. subroutines viz., Water balance subroutine, Phenology
subroutine, Nitrogen subroutine, and Growth and
In terms of accuracy assessment, our results yielded a Development subroutine described below.
robust accuracy range of 90% to 95% across all cultivated
crops and within various districts. This signifies a high level Data Input to Model
of precision in delineating and classifying the specified crops Material and method and all file process was carried out
within the delineated geographical regions. The meticulous by the procedure followed by Hoogenboom, G., et.al (2019)
incorporation of Random Forest algorithm, hyperparameter and (2024) Jones, J.W., (2003) and the minimum data
tuning, and contingency matrix analysis has facilitated the requirements for operation, calibration and validation of the
generation of a reliable and accurate crop mask, providing Crop models are described below.
valuable insights for agricultural monitoring and
management within the designated study area.
Input files planting material, transplant age, plants per hill, dates of
The files are organized into input, output and simulation beginning etc.
experiment performance data file. The experiment
performance files are needed only when simulated results are Crop Cultivars Directory File
to be compared with data recorded in a particular experiment. For Soybean CRGRO048 contained the list of different
In some cases, they could be used as input files to reset some cultivars with their genetic coefficients. The modified genetic
variable during the course of a simulation run. The input files coefficients viz., CSDVAR, PPSEN, EMG-FLW, FLW-FSD,
are further divided into those dealing with the experiment, FSD-PHM, WTPSD, SDPDVR, SDFDUR, PODDUR,
weather and soil and the characteristics of different THRESH, SDPRO and SDLIP is used. Variety selected was
genotypes. Similarly output files are also further divided into JS-335 which is mostly used in this area.
those dealing with the overview, summary, growth, water,
carbon and nitrogen balance. The genetic coefficients are the most important
parameters which represents the genetic characteristics of the
Soil properties directory file: The file SOIL.SOL contained cultivar and on which the crop phenology, biomass
the list of different soils with their physical and chemical production partitioning and yield potential of the crop
properties. depends. However, the actual performance is controlled by
Soil profile initial condition file: The soil profile initial the external factors also.
condition file contained the initial values of soil water, soil
reaction and soil nitrogen data pertaining to this situation Running the crop model: Once, all the desired files were
was entered. created carefully the model was run for all the crops
Irrigation management file: The Irrigation management cultivars. Each run of model created output files.
file has the provision of date and amount per fixed
irrigation (mm) applied depth (cm) of management. Machine Learning:
Irrigation data pertaining to this situation was entered. Methodology and processing of model is described
Fertilizer management file: The fertilizer management file below in details.
contained the date, form and amount of nitrogen
application. Accordingly, information on fertilizer Data Collection and Ground Truthing:
application was entered in the file.
Treatment management file: The treatment management Collect remote sensing data (optical and radar imagery)
file contained the description of each treatment under for the study area, covering the growing season of the
separate title and serial numbers. The file also contained crops.
dates of planting and emergence, plant population at Ground truth data collection using field surveys using
seeding and at emergence, planting method, planting CropTech App ( prepared by compony) for accurate
distribution, row spacing, row direction, planting depth, calibration and validation.
Pre-process the remote sensing data to correct for Use of Crop Cutting Experiment (CCE) for Crop with
atmospheric interference and geometric distortions. smart sampling methods to efficiently estimate crop
Apply image enhancement techniques to improve the parameters for crop.
visual quality of the images.
Employ supervised or unsupervised classification Training and Testing Models (Machine Learning):
algorithms to extract crop masks for Soybean fields.
Divide the dataset into training and testing sets, ensuring
Generation of Spectral Indices and use of RADAR no overlap between the two.
Backscatter: Evaluate the model's performance on the testing dataset
using evaluation metrics like accuracy, F1-score, and
Calculate vegetation indices (e.g., NDVI, NDRE, mean squared error (RMSE).
GNDVI) from the optical remote sensing data to assess
crop health and Vigor. Model Validation and Final Result:
Utilize backscatter data from radar imagery to analyse
surface roughness and other relevant crop information Validate the trained model using independent ground truth
(VV, VH). data collected during the growing season for Soybean.
Assess the model's accuracy and generalization ability to
ensure reliable yield estimation.
Obtain the final crop yield estimation results for Soybean
in the study area.
Individual Model Generation: Stacking: Use a meta-model that takes predictions from
individual models as inputs and predicts the final yield.
Machine Learning Approach: Voting: Each model votes for a final yield prediction, and
the most frequent prediction is considered.
Utilize various algorithms like Linear regression, Random
Forest, Extra Trees, k-earest neighbours, and neural Model Validation
networks.
Train these models on the dataset ensuring proper Split the dataset into training, validation, and test sets to
validation and calibration. avoid overfitting and ensure generalizability.
Use metrics like Root Mean Squared Error (RMSE), and
Crop Simulation Approach R-squared (R2) for evaluation.
Assess performance using the test dataset and ground
Use well-calibrated crop simulation models such as truth data.
DSSAT.
Simulate the growth and yield of crops using these models Quality Control
based on provided input data.
Calculate the normalized RMSE between the observed
Semi-physical Models: and ensemble model's estimated yield.
A semi-physical model in remote sensing and GIS is a Ensure RMSE does not exceed acceptable thresholds,
type of model that combines physical principles with refining the model if necessary.
remotely sensed data to estimate or predict biophysical
parameters, such as crop yield, biomass. These models are Validation
often used to monitor and manage natural resources, as well The accuracy of our model was evaluated based on crop
as to assess the impacts of climate change and other cutting experiment data (CCE data) of PMFBY (Pradhan
environmental stressors. Mantri Fasal Bima Yojana) for the crop season kharif-2023.
Model Averaging: Calculate the simple mean of Following were the results and conclusion for different
predictions from ML, semi- physical model and CSM methods/models used for estimation of yield of soyabean crop
models. in Latur districts of Maharashtra, Revenue-Circle wise.
Weighted Averaging: Assign weights based on individual
model performance and calculate the weighted average of
predictions.
Ahamadpur, Andhori, and Kingaon demonstrated Overall, most regions achieved yields above 2 tonnes per
relatively higher actual yields, surpassing 2 tonnes per hectare, indicating satisfactory performance for soybean
hectare. cultivation in 2023.
Hadolati, Belkund, Bhada, Killari, and Shirur Tajband had Regional variations in climate and agricultural techniques
comparatively lower actual yields, falling below 2 tonnes likely influenced the observed differences in soybean
per hectare. yields. Same results were reported by Xiao, X., et.al
Semi-Physical (NPP) Yield also showed variation, with (2006) and Yao, Y., et.al (2021)
some areas like Ujani, Chakur, and Latur displaying lower
performance. B. Crop Simulation Model DSSAT-4.8
Notable outliers include Halgara, which had exceptionally
high actual yields, and Nalgir, which had a notably high The highest yielding RCs are Halgara and Wadhavana
Semi-Physical Yield. (Bk), both with an average yield of 2.89 tonnes per
The dataset suggests a need for closer examination of hectare.
factors influencing yield discrepancies between regions, The lowest yielding RCs are Nalgir and Tandulja, with
such as soil quality and agricultural practices. average yields of 0.77 tonnes per hectare and 1.18 tonnes
per hectare, respectively.
Fig 9: Soybean Yield in T/ha by DSSAT for Latur during Kharif 2023
There is a significant variation in yield between different Bhosale, A. D., et.al (2015) and Deshmukh, S. D.,
RCs, with the highest yielding RCs producing more than et.al (2013) also elaborated same results for soybean.
four times the yield of the lowest yielding RCs.
There is a significant variation in yield between different C. Machine Learning
RCs, with some RCs having double the yield of others.
The results suggest the potential benefits of using DSSAT CCE yield and different indices under study showing
for predicting soybean crop yields, although specific accuracy 82 % in Machine learning model. By the method
environmental factors and RC conditions may influence (SVR) Support Vector Regression accuracy is showing
the accuracy of the predictions. Jadhav, S. D et.al (2018), highest value.
Fig 10: Soybean Yield in T/ha by ML for Latur during kharif 2023
Ahamadpur, Andhori, and Kingaon exhibited relatively Optimization of ML models can enhance predictive
stable yields around 2 tonnes per hectare in both actual accuracy and contribute to better-informed agricultural
and ML projections. practices.
Shirur Tajband, Chakur, and Nalegaon showed lower The results indicate the potential of ML to improve yield
yields, indicating potential challenges in those regions. predictions and optimize crop management strategies.
Halgara stood out with remarkably high actual yield but
substantially lower ML yield, suggesting potential
discrepancies in the ML model for that area.
D. Ensemble Model:
Fig 11: Soybean Yield in T/ha by Ensemble Model for Latur during Kharif 2023
Table 4: Statistical approach give weightage during kharif 2023 as following to different models.
Model Used DSSAT Yield Semi-Physical Yield Machine Learning Yield
Weightages in % 36.72 33.03 30.25
The Ensemble Yield represents a combination of all above The summary reveals the potential of ensemble
three predictive models or methods to estimate soybean techniques in predicting soybean yields, though
crop yield. adjustments may be needed to enhance accuracy in
Regions like Tandulja, Hisamabad, and Sakol exhibited regions with high % Error.
significant negative percent Error, suggesting Understanding and minimizing % Error can facilitate
considerable discrepancies between actual and predicted better decision-making for farmers and policymakers,
yields. optimizing agricultural practices and resource allocation.
Conversely, regions like Walandi and Shelgaon showed Continuous refinement and validation of predictive
positive percent Error, indicating slight overestimations in models can contribute to more reliable yield forecasts,
yield predictions. supporting sustainable soybean production in Latur
Ensemble Yield tended to align closely with Actual yield district. Same results were given by Md Didarul Islam
in some regions, such as Dewrjan and Borol, where the et.al (2023), Liujun Xiao et.al. (2022) and Ayan Das a
percent Error was minimal or zero. et.al (2023) in both Machine learning and ensemble
approach.
Table 5: Estimated Yield of Soybean Crop in Tones/Hectors with Different Models and
Percent Error with Ensemble Model for Year 2023
District Tehsil RC Field CCE DSSAT Semi- Machine Ensemble RMSE %
Yield Physical Learning Yield Error
Yield Yield
Latur Ahmadpur Ahamadpur 1.69 2.31 2.22 2.10 2.22 -31
Latur Ahmadpur Andhori 2.14 2.28 2.43 2.06 2.27 -7
Latur Ahmadpur Hadolati 1.77 2.30 1.64 2.10 2.01 -14
Latur Ahmadpur Khandali 1.58 2.66 2.01 2.08 2.28 -44
Latur Ahmadpur Kingaon 2.08 2.33 2.73 2.06 2.40 -15
Latur Ahmadpur Shirur tajband 1.35 2.05 2.01 1.33 1.84 -37
Latur Ausa Ausa 2.12 2.23 2.29 2.15 2.23 -5
Latur Ausa Belkund 1.84 2.18 1.68 2.07 1.97 -7
Latur Ausa Bhada 2.19 2.11 1.87 2.15 2.04 7
Latur Ausa Killari 2.00 2.27 1.73 2.06 2.02 -1
Latur Ausa Kinithot 2.28 2.42 2.14 2.12 2.24 2
Latur Ausa Lamjana 2.18 2.53 1.84 2.12 2.18 0
Latur Ausa Matola 2.08 2.09 1.72 2.08 1.96 6
Latur Ausa Ujani 2.71 2.51 1.70 2.08 2.11 22
Latur Chakur Ashta 1.82 2.51 2.42 2.08 2.36 -30
Latur Chakur Chakur 1.30 2.21 2.68 1.29 2.13 -63
Latur Chakur Nalegaon 1.34 2.60 1.99 1.23 2.02 -51
Latur Chakur Shelgaon 2.26 2.32 2.48 2.12 2.32 -3
Latur Chakur Wadwal (Na) 1.28 2.26 1.97 1.24 1.89 -47
Latur Chakur Zari Bk 1.35 2.74 2.41 1.28 2.24 -66
Latur Deoni Borol 2.04 2.06 2.01 2.15 2.07 -2
Latur Deoni Deoni 1.41 2.33 2.23 1.25 2.01 -42
Latur Deoni Walandi 2.58 2.54 1.98 1.23 1.99 23
Latur Jalkot Ghonasi 2.06 2.19 1.94 2.15 2.09 -1
Latur Jalkot Jalkot 2.37 2.22 2.05 2.11 2.13 10
Latur Latur Babhalgaon 1.57 2.05 1.77 2.06 1.95 -24
Latur Latur Chincholi (Bk) 2.26 2.10 2.10 2.14 2.11 7
Latur Latur Gategaon 1.93 2.26 2.03 2.13 2.15 -11
Latur Latur Harangul (Bk) 1.98 2.10 1.89 2.08 2.02 -2
Latur Latur Kanheri 2.01 2.32 1.70 2.11 2.05 -2
Latur Latur Kasarkheda 1.97 2.57 2.23 2.07 2.32 -18
Latur Latur Latur 2.14 2.14 1.68 2.09 1.97 8
Latur Latur Murud (Bk) 2.20 2.53 1.88 2.10 2.19 1
Latur Latur Tandulja 1.18 2.33 2.36 2.15 2.29 -94
Latur Nilanga Ambulga (Bk) 2.47 2.48 2.47 2.12 2.38 4
Latur Nilanga Aurad (Sha) 2.06 2.06 2.05 2.09 2.07 0
Latur Nilanga Bhutmugli 1.99 2.11 1.87 2.08 2.02 -1
Latur Nilanga Halgara 2.89 3.87 1.91 1.32 2.50 14
Latur Nilanga Kasar Balkunda 2.12 2.59 1.65 2.13 2.13 -1
Latur Nilanga Kasar Shirashi 2.28 2.65 1.92 2.07 2.24 2
Latur Nilanga Madansuri 2.02 2.29 1.75 2.06 2.04 -1
Latur Nilanga Nilanga 2.02 2.23 1.82 2.12 2.05 -2
Latur Nilanga Nitur 2.53 2.61 2.46 2.14 2.43 4
Latur Nilanga Panchincholi 2.39 2.53 2.25 2.14 2.33 3
Latur Renapur Karepur 1.23 2.26 2.66 1.30 2.15 -74
Latur Renapur Palsi 1.39 2.25 2.00 1.93 2.08 -49
Latur Renapur Pangaon 2.18 2.53 2.20 2.09 2.29 -5
Latur Renapur Poharegaon 2.30 2.55 2.06 2.18 2.28 1
Latur Renapur Renapur 1.39 2.65 2.11 1.31 2.11 -52
Latur Shirur-Anantpal Hisamabad 1.09 2.45 2.15 1.29 2.04 -86
Latur Shirur-Anantpal Sakol 1.32 2.66 2.52 1.27 2.24 -70
Latur Shirur-Anantpal Shirur Anantpal 1.53 2.06 2.18 1.29 1.90 -24
Latur Udgir Dewrjan 2.16 2.38 1.93 2.15 2.16 0
Latur Udgir Her 2.39 2.30 2.48 2.15 2.32 3
Latur Udgir Mogha 2.12 2.22 2.03 2.17 2.14 -1
Latur Udgir Nagalgaon 2.40 2.59 2.20 2.15 2.34 2
Latur Udgir Nalgir 1.57 0.77 2.37 2.15 1.70 -8
Latur Udgir Tondar 2.21 2.08 2.33 2.18 2.20 0
Latur Udgir Udgir 2.36 2.19 2.53 2.12 2.29 3
Table 6: Average Percent Error of all Approaches Estimated Yield of Soybean Crop for Year 2023.
Methods Field CCE DSSAT Yield Semi-Physical Yield Machine Learning Yield Ensemble Yield
Yield (T/h) 1.97 2.34 2.11 1.93 2.15
RMSE % Error -19 -7 2 -9