Abstract
Crop yield estimates are affected by various factors including weather, nutrients and management practices. Predicting yields on a large scale in a timely and accurate manner by considering these factors is essential for preventing climate risk and ensuring food security, particularly in the light of climate change and the escalation of extreme climatic events. In this study, integrating multi-source data (i.e. satellite-derived vegetation indices (VIs), satellite-derived climatic variables (i.e. land surface temperature (LST) and rainfall precipitation, weather station and field-surveys), we built one multiple linear regression (MLR), three machine learning (XGBoost, support vector regression, and random forest) and one deep learning (deep neural network) models to predict oil palm yield at block-level within the oil palm plantation. Moreover, time-series moving average and backward elimination feature selection technique were implemented at the pre-processing stage. The yield prediction models were developed and tested using MLR, XGBoost, support vector regression (SVR), random forest (RF) and deep neural network (DNN) algorithms. Their model performances were then compared using evaluation metrics and generated the final spatial prediction map based on the best performance. DNN achieved the best model performances for both selected (R2 = 0.91; RMSE = 2.92 t ha− 1; MAE = 2.56 t ha− 1 and MAPE = 0.09 t ha− 1) and full predictors (R2 = 0.76; RMSE of 3.03 t ha− 1; MAE of 2.88 t ha− 1; MAPE of 0.10 t ha− 1). In addition, advanced ensemble machine learning (ML) techniques such as XGBoost may be utilised as a supplementary for oil palm yield prediction at the block level. Among them, MLR recorded the lowest performance. By using backward elimination to identify the most significant predictors, the performance of all models was improved by 5–26% for R2, and that decreased by 3–31% for RMSE, 7–34% for MAE, and 1–15% for MAPE. After backward elimination, the DNN achieved the highest prediction accuracy among the other models, with a 14% increase in R-squared, a 11% decrease in RMSE, a 32% decrease in MAE and a 1% decrease in MAPE. Our study successfully developed efficient and accurate yield prediction models for timely predicting oil palm yield over a large area by integrating data from multiple sources. These would be useful for plantation management estimating oil palm yields to speed up the decision-making process for sustainable production.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
In accordance with the research agreement between UPM and FGV, the datasets generated during and/or analysed during the present study are not publicly available and cannot be disclosed to third parties for the time being.
References
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Zheng X (2016) {TensorFlow}: A System for {Large-Scale} Machine Learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 265–283
Abubakar A, Ishak MY, Makmom AA (2021) Impacts of and adaptation to climate change on the oil palm in Malaysia: a systematic review. Environ Sci Pollut Res 28(39):54339–54361
Agarap AF (2018) Deep learning using Rectified Linear Units (ReLU). 1:2–8. http://arxiv.org/abs/1803.08375. Accessed 20 May 2022
Aghighi H, Azadbakht M, Ashourloo D, Shahrabi HS, Radiom S (2018) Machine learning regression techniques for the silage maize yield prediction using Time-Series Images of Landsat 8 OLI. IEEE J Sel Top Appl Earth Obs Remote Sens 11(12):4563–4577. https://doi.org/10.1109/JSTARS.2018.2823361
Ahmed A, Ishak MY, Uddin MK, Abd Samad MY, Mukhtar S, Danhassan SS (2021) Effects of some weather parameters on oil palm production in the Peninsular Malaysia. June, pp 1–17. https://doi.org/10.20944/preprints202106.0456.v1
Ang Y, Shafri HZM, Bakar SA, Abidin H, Lee YP, Hashim SJ, Che’Ya NN, Hassan MR, Lim HS, Abdullah R (2022) A novel ensemble machine learning and time series approach for oil palm yield prediction using Landsat Time Series Imagery based on NDVI. Geocarto Int 0(0):1–24. https://doi.org/10.1080/10106049.2022.2025920
Arab ST, Noguchi R, Matsushita S, Ahamed T (2021) Prediction of grape yields from time-series vegetation indices using satellite remote sensing and a machine-learning approach. Remote Sens Appl Soc Environ 22(November 2020). https://doi.org/10.1016/j.rsase.2021.100485
Awal MA, Wan Ishak WI, Bockari-Gevao SM (2010) Determination of leaf area index for oil palm plantation using hemispherical photography technique. Pertanika J Sci Technol 18(1):23–32
Babu CN, Reddy BE (2014) A moving-average filter-based hybrid ARIMA-ANN model for forecasting time series data. Appl Soft Comput J 23:27–38. https://doi.org/10.1016/j.asoc.2014.05.028
Balasundram SK, Memarian H, Khosla R (2013) Estimating oil palm yields using vegetation indices derived from QuickBird. Life Sci J 10(4):851–860
Berra EF, Fontana DC, Kuplich TM (2018) Tree age as adjustment factor to NDVI. Rev Árvore 41(3). https://doi.org/10.1590/1806-90882017000300007
Bolton DK, Friedl MA (2013) Forecasting crop yield using remotely sensed vegetation indices and crop phenology metrics. Agric For Meteorol 173:74–84. https://doi.org/10.1016/j.agrformet.2013.01.007
Bouras EH, Jarlan L, Er-Raki S, Balaghi R, Amazirh A, Richard B, Khabba S (2021) Cereal yield forecasting with satellite drought-based indices, weather data and regional climate indices using machine learning in Morocco. Remote Sens 13(16). https://doi.org/10.3390/rs13163101
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Burke M, Lobell DB (2017) Satellite-based assessment of yield variation and its determinants in smallholder African systems. Proc Natl Acad Sci USA 114(9):2189–2194. https://doi.org/10.1073/pnas.1616919114
Cao J, Wang H, Li J, Tian Q, Niyogi D (2022) Improving the forecasting of winter wheat yields in Northern China with machine learning–dynamical hybrid subseasonal-to-seasonal ensemble prediction. Remote Sens 14(7):1707
Cao J, Zhang Z, Tao F, Zhang L, Luo Y, Zhang J, Han J, Xie J (2021) Integrating multi-source data for rice yield prediction across China using machine learning and deep learning approaches. Agric For Meteorol 297(December 2020):108275. https://doi.org/10.1016/j.agrformet.2020.108275
Cedrez CB, Hijmans RJ (2018) Methods for spatial prediction of crop yield potential. Agron J 110(6):2322–2330. https://doi.org/10.2134/agronj2017.11.0664
Chandra A, Mitra P, Dubey SK, Ray SS (2019) Machine learning approach for kharif rice yield prediction integrating multi-temporal vegetation indices and weather and non-weather variables. Int Arch Photogramm Remote Sens Spat Inf Sci - ISPRS Arch 42(3/W6):187–194. https://doi.org/10.5194/isprs-archives-XLII-3-W6-187-2019
Chapman R, Cook S, Donough C, Lim YL, Vun Vui Ho P, Lo KW, Oberthür T (2018) Using Bayesian networks to predict future yield functions with data from commercial oil palm plantations: A proof of concept analysis. Comput Electron Agric 151(October 2017):338–348. https://doi.org/10.1016/j.compag.2018.06.006
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794
Chen T, He T, Benesty M (2018) XGBoost: eXtreme gradient Boosting. R Package Version 0(71–2):1–4
Chlingaryan A, Sukkarieh S, Whelan B (2018) Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review. Comput Electron Agric 151(May):61–69. https://doi.org/10.1016/j.compag.2018.05.012
Chong KL, Kanniah KD, Pohl C, Tan KP (2017) A review of remote sensing applications for oil palm studies. Geo-Spatial Inform Sci 20(2):184–200. https://doi.org/10.1080/10095020.2017.1337317
Clevers JGPW, Leeuwen HJC, Van Sensing R, Verhoef W (1989) Estimating APAR by means of vegetation indices: a sensitivity analysis. XXIX ISPRS Congress Technical Commission VII: Interpretation of Photographic and Remote Sensing Data, 691–698
Diana SR, Dharma G (2019) Estimation the amount of oil palm production using Artificial Neural Network and NDVI SPOT-6 Imagery. Int J Innov Sci Res Technol 4(11):548–554
Dinku T, Funk C, Peterson P, Maidment R, Tadesse T, Gadain H, Ceccato P (2018) Validation of the CHIRPS satellite rainfall estimates over eastern Africa. Q J R Meteorol Soc 144(November 2017):292–312. https://doi.org/10.1002/qj.3244
Fairhurst TH, Mutert E (1999) Interpretation and Management of Oil Palm Leaf Analysis Data 13(1):48–51
Fashoto S, Mbunge E, Ogunleye G, Van den Burg J (2021) Implementation of machine learning for predicting maize crop yields using multiple linear regression and backward elimination. Malays J Comput 6(1):679. https://doi.org/10.24191/mjoc.v6i1.8822
Foster H (2003) Assessment of Oil Palm Fertilizer Requirements. In: Fairhurst, T.H. and Hardter, R., Eds, Oil Palm: Management for Large and Sustainable Yields, Potash and Phosphate Institute (PPI), Potash and Phosphate Institute of Canada (PPIC) and International Potash Institute, 231–257
Gitelson AA (2004) Wide dynamic range vegetation index for remote quantification of biophysical characteristics of vegetation. J Plant Physiol 161(2):165–173
Gitelson AA, Kaufman YJ, Merzlyak MN (1996) Use of a green channel in remote sensing of global vegetation from EOS- MODIS. Remote Sens Environ 58:289–298
Gitelson Anatoly A, Vina A, Arkebauer TJ, Rundquist DC, Keydan G, Leavitt B (2003) Remote estimation of leaf area index and green leaf biomass in maize canopies. Geophys Res Lett 30(5). https://doi.org/10.1029/2002gl016450
Gunn SR (1998) Support vector machines for classification and regression. ISIS technical report 14(1):5–16
Hartley CWS (1988) The oil palm: world agriculture series, 3rd edn. Longman, London
Hilal YY, Yahya A, Ismail WIW, Asha’Ari ZH (2021) Neural networks method in predicting oil palm FFB yields for the Peninsular states of Malaysia. J Oil Palm Res 33(3):400–412. https://doi.org/10.21894/jopr.2020.0105
Huete A, Didan K, Miura T, Rodriguez EP, Gao X, Ferreira LG (2002) Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sens Environ 83(1–2):195–213
Huete AR (1988) A soil-adjusted vegetation index (SAVI). Remote Sens Environ 25(3):295–309
Johnson DM (2014) An assessment of pre- and within-season remotely sensed variables for forecasting corn and soybean yields in the United States. Remote Sens Environ 141:116–128. https://doi.org/10.1016/j.rse.2013.10.027
Jordan CF (1969) Derivation of leaf-area index from quality of light on the forest floor. Ecology 50(4):663–666
Jurečka F, Lukas V, Hlavinka P, Semerádová D, Žalud Z, Trnka M (2018) Estimating crop yields at the field level using landsat and modis products. Acta Univ Agric et Silvic Mendelianae Brun 66(5):1141–1150. https://doi.org/10.11118/actaun201866051141
Kafy AA, Rahman AF, Al Rakib A, Akter KS, Raikwar V, Jahir DMA, Ferdousi J, Kona MA (2021) Assessment and prediction of seasonal land surface temperature change using multi-temporal Landsat images and their impacts on agricultural yields in Rajshahi, Bangladesh. Environ Challenges 4:100147
Kee NS, Eng CT, Thamboo S (1968) Nutrient contents of oil palms in Malaya. III. Micronutrient contents in vegetative tissues. Malays Agric J 46:332–391
Khaki S, Wang L (2019) Crop yield prediction using deep neural networks. Front Plant Sci 10(May):1–10. https://doi.org/10.3389/fpls.2019.00621
Khan N, Kamaruddin MA, Sheikh UU, Yusup Y, Bakht MP (2021) Oil Palm and Machine Learning: Reviewing One Decade of Ideas, Innovations, Applications, and Gaps. Agriculture 11(9):832. https://doi.org/10.3390/agriculture11090832
Khor JF, Ling L, Yusop Z, Tan WL, Ling JL, Soo EZX (2021) Impact of El Niño on oil palm yield in Malaysia. Agronomy 11(11). https://doi.org/10.3390/agronomy11112189
Liakos KG, Busato P, Moshou D, Pearson S, Bochtis D (2018) Machine learning in agriculture: A review. Sensors (Switzerland) 18(8):1–29. https://doi.org/10.3390/s18082674
Lindsey C, Sheather S (2010) Variable selection in linear regression. Stata J 10(4):650–669
Lobell DB, Di Tommaso S, You C, Djima IY, Burke M, Kilic T (2020) Sight for sorghums: Comparisons of satellite-and ground-based sorghum yield estimates in Mali. Remote Sens 12(1):1–16. https://doi.org/10.3390/RS12010100
Matsushita B, Yang W, Chen J, Onda Y, Qiu G (2007) Sensitivity of the Enhanced Vegetation Index (EVI) and Normalized Difference Vegetation Index (NDVI) to topographic effects: A case study in high-density cypress forest. Sensors 7(11):2636–2651. https://doi.org/10.3390/s7112636
Meroni M, Waldner F, Seguini L, Kerdiles H, Rembold F (2021) Yield forecasting with machine learning and small data: What gains for grains? Agric For Meteorol 308–309:108555. https://doi.org/10.1016/j.agrformet.2021.108555
MPOB (2018) MPOA supports Govt initiative towards fulfilling UN SDGs. https://bepi.mpob.gov.my/news/detail.php?id=26269. Accessed 19 Feb 2022
Newton IH, Tariqul Islam AFM, Saiful Islam AKM, Tarekul Islam GM, Tahsin A, Razzaque S (2018) Yield prediction model for potato using Landsat Time Series Images Driven Vegetation Indices. Remote Sens Earth Syst Sci 1(1–2):29–38. https://doi.org/10.1007/s41976-018-0006-0
Niazian M, Sadat-Noori SA, Abdipour M (2018) Modeling the seed yield of Ajowan (Trachyspermum ammi L.) using Artificial Neural Network and Multiple Linear Regression Models. Ind Crops Prod 117(November 2017):224–234. https://doi.org/10.1016/j.indcrop.2018.03.013
Oikonomidis A, Catal C, Kassahun A (2022) Deep learning for crop yield prediction: a systematic literature review. New Z J Crop Hortic Sci 0(0):1–26. https://doi.org/10.1080/01140671.2022.2032213
Ojha N, Merlin O, Suere C, Escorihuela MJ (2021) Extending the spatio-temporal applicability of DISPATCH soil moisture downscaling algorithm: A study case using SMAP, MODIS and Sentinel-3 Data. Front Environ Sci 9(March):1–17. https://doi.org/10.3389/fenvs.2021.555216
Orhan O, Ekercin S, Dadaser-Celik F (2014) Use of Landsat land surface temperature and vegetation indices for monitoring drought in the Salt Lake Basin Area, Turkey. Sci World J 2014(Vci). https://doi.org/10.1155/2014/142939
Panek E, Gozdowski D, Stępień M, Samborski S, Ruciński D, Buszke B (2020) Within-field relationships between satellite-derived vegetation indices, grain yield and spike number of winter wheat and triticale. Agronomy 10(11):1–18. https://doi.org/10.3390/agronomy10111842
Phan P, Chen N, Xu L, Dao DM, Dang D(2021) NDVI variation and yield prediction in growing season: A case study with tea in Tanuyen Vietnam. Atmosphere 12(8). https://doi.org/10.3390/atmos12080962
Piekutowska M, Niedbała G, Piskier T, Lenartowicz T, Pilarski K, Wojciechowski T, Pilarska AA, Czechowska-Kosacka A (2021) The application of multiple linear regression and artificial neural network models for yield prediction of very early potato cultivars before harvest. Agronomy 11(5). https://doi.org/10.3390/agronomy11050885
Qi J, Chehbouni A, Huete AR, Kerr YH, Sorooshian S (1994) A modified soil adjusted vegetation index. Remote Sens Environ 48(2):119–126
Rashid M, Bari BS, Yusup Y, Kamaruddin MA, Khan N (2021) A comprehensive review of crop yield prediction using machine learning approaches with special emphasis on palm oil yield prediction. IEEE Access 9:63406–63439. https://doi.org/10.1109/ACCESS.2021.3075159
Rouse JW, Hass RH, Schell JA, Deering DW(1973) Monitoring vegetation systems in the great plains with ERTS. In: Proceedings of the Third Earth Resources Technology Satellite Symposium, 1, pp 309–317
Sarkar M, Kabir S, Begum RA, Pereira JJ (2020) Impacts of climate change on oil palm production in Malaysia. Environ Sci Pollut Res 27(9):9760–9770
Schmidt-Hieber J (2020) Nonparametric regression using deep neural networks with ReLU activation function. Ann Stat 48(4):1875–1897. https://doi.org/10.1214/19-AOS1875
Schwalbert RA, Amado T, Corassa G, Pott LP, Prasad PVV, Ciampitti IA (2020) Satellite-based soybean yield forecast: Integrating machine learning and weather data for improving crop yield prediction in southern Brazil. Agric For Meteorol 284(December 2019):107886. https://doi.org/10.1016/j.agrformet.2019.107886
Shamshiri RR, Hameed IA, Balasundram SK, Ahmad D, Weltzien C, Yamin M (2018) Fundamental research on unmanned aerial vehicles to support precision agriculture in oil palm plantations. Agric Robot - Fundamentals and Application Intechopen, London 91–116
Srivastava AK, Safaei N, Khaki S, Lopez G, Zeng W, Ewert F, Gaiser T, Rahimi J (2022) Winter wheat yield prediction using convolutional neural networks from environmental and phenological data. Sci Rep 12(1):1–14. https://doi.org/10.1038/s41598-022-06249-w
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Tan XJ, Cheor WL, Yeo KS, Leow WZ (2022) Expert systems in oil palm precision agriculture: A decade systematic review. J King Saud Univ - Comput Inf Sci 34(4):1569–1594. https://doi.org/10.1016/j.jksuci.2022.02.006
Wang J, Zhou Q, Shang J, Liu C, Zhuang T, Ding J, Xian Y, Zhao L, Wang W, Zhou G, Tan C, Huo Z (2021) UAV-and Machine Learning-Based Retrieval of wheat SPAD values at the overwintering stage for variety screening. Remote Sens 13(24):1–19. https://doi.org/10.3390/rs13245166
Xue J, Su B (2017) Significant remote sensing vegetation indices: A review of developments and applications. J Sens 2017. https://doi.org/10.1155/2017/1353691
Yin G, Mariethoz G, McCabe MF (2017) Gap-filling of Landsat 7 imagery using the direct sampling method. Remote Sens 9(1):1–20. https://doi.org/10.3390/rs9010012
Zarei A, Hasanlou M, Mahdianpari M (2021) A comparison of machine learning models for soil salinity estimation using multi-spectral earth observation data. ISPRS Ann Photogramm Remote Sens Spat Inf Sci 3:257–263
Zhang L, Zhang Z, Luo Y, Cao J, Xie R, Li S (2021) Integrating satellite-derived climatic and vegetation indices to predict smallholder maize yield using deep learning. Agric For Meteorol 311(February):108666. https://doi.org/10.1016/j.agrformet.2021.108666
Zhang X, Zhang K, Sun Y, Zhao Y, Zhuang H, Ban W, Chen Y, Fu E, Chen S, Liu J, Hao Y (2022) Combining spectral and texture features of UAS-based multispectral images for maize leaf area index estimation. Remote Sens 14(2):1–17. https://doi.org/10.3390/rs14020331
Zhao B, Duan A, Ata-Ul-Karim ST, Liu Z, Chen Z, Gong Z, Zhang J, Xiao J, Liu Z, Qin A, Ning D (2018) Exploring new spectral bands and vegetation indices for estimating nitrogen nutrition index of summer maize. Eur J Agron 93(December 2017):113–125. https://doi.org/10.1016/j.eja.2017.12.006
Zhao Y, Potgieter AB, Zhang M, Wu B, Hammer GL (2020) Predicting wheat yield at the field scale by combining high-resolution Sentinel-2 satellite imagery and crop modelling. Remote Sens 12(6). https://doi.org/10.3390/rs12061024
Acknowledgements
We wish to express our gratitude to the Ministry of Higher Education (MOHE), Malaysia for funding through the Long-Term Research Grant Scheme (LRGS) of the Malaysian Research University Network (MRUN). It is categorized under the research programme ‘A Big Data Analytics Platform for Optimizing Oil Palm Yield Via Breeding by Design’ (Grant Number: 203.PKOMP.6770007) as a specific project: ‘Geoinformatics Data for Palm Oil Yield Prediction Using Machine Learning’ (Vote No: 6300268-10801). The team from FGV is also acknowledged for providing their expertise for the research. Also, an appreciation to FGV for the field data provided.
Funding
This research is supported by the Ministry of Higher Education (MOHE), Malaysia.
Author information
Authors and Affiliations
Contributions
YA analysed and interpreted remote sensing data regarding oil palm yield prediction along with other pertinent aspects using machine learning models, and drafted the manuscript. HZMS is the main supervisor who supervised the research and reviewed the manuscript. YPL commented on previous version of the manuscript. The historical yield and agronomic information were provided by SAB and MUUMJ. HSL and RA participated in the management of the research and checked the manuscript for clarity. TSY proofread the manuscript. HA, SJH, NNCY, MRH, YY, SAM, and MNS were also involved in the management of the research and checking the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
The manuscript is original, has not been published before, and is not currently being considered for publication elsewhere.
Conflict of interest
The authors declare that there is no conflict of interests regarding the publication of this article.
Additional information
Communicated by H. Babaie.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ang, Y., Shafri, H.Z.M., Lee, Y.P. et al. Oil palm yield prediction across blocks from multi-source data using machine learning and deep learning. Earth Sci Inform 15, 2349–2367 (2022). https://doi.org/10.1007/s12145-022-00882-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12145-022-00882-9