Optimization and Application of XGBoost Logging Prediction Model for Porosity and Permeability Based on K-means Method
Abstract
:1. Introduction
2. Methodology
2.1. Overview of Regional Geology
2.2. Method Process
- Data preparation;
- XGBoost model establishment: (1) Logging features and label extraction: feature variables suitable for machine learning methods were selected and extracted from the raw data to establish the dataset; (2) Dataset division: the dataset was randomly divided with 80% as the training set and the remaining 20% as the test set; (3) Feature combination optimization: the exhaustive feature variable combination method was used to weaken the problem of multicollinearity between feature variables, and the feature variable combination method of XGBoost method was preferred to provide a better feature variable representation for the model; (4) Hyper-parameter optimization: Random search was used to ensure that the hyper-parameters were searched in a wide range, and the hyper-parameters were manually adjusted in combination with the grid search so as to achieve a fine search of the parameters and to improve the stability and performance of the model; (5) Establishment of machine learning models: porosity and permeability XGBoost models were established through feature combination and hyper-parameter optimization steps; (6) Model evaluation: evaluated the model performance based on MAE and R2;
- K-means optimization model: Based on K-means clustering, similar input data were grouped to minimize intra-cluster differences and maximize inter-cluster differences, and the performance of the grouped porosity and permeability XGBoost prediction model was established and evaluated;
- Determine the model: The performance of the models was compared in steps (2) and (3), and the porosity and permeability prediction model that fits the study area was comprehensively determined;
- Model application: The final model from step (4) was applied to the layer S13 section in the study area to analyze the planar porosity and permeability distribution.
2.3. Data Description
Target Labels and Feature Variables
2.4. Evaluation Indicators
2.5. Principle of Extreme Gradient Boosting Tree
2.6. Principles of the K-means Method
3. Results
3.1. XGBoost Model Evaluation
3.2. K-means Optimized XGBoost Model Evaluation
4. Discussion
- Data preprocessing methods for XGBoost prediction model of permeability
- 2.
- Optimization effect of K-means method for XGBoost model
- 3.
- Effectiveness of XGBoost porosity and permeability modeling application
- 4.
- Model Limitations
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AC | Acoustic time difference logging |
CAL | Caliper logging |
CNL | Compensated neutron logging |
DEN | Density logging |
GR | Gamma ray logging |
PE | Photoelectric factor logging |
RLLD | Resistivity deep logging |
RLLS | Resistivity shallow logging |
RT | Resistivity logging |
SP | Spontaneous potential logging |
K | Potassium logging |
TH | Thorium logging |
U | Uranium logging |
POR_L | Porosity logging |
PERM_L | Permeability logging |
POR | Measured Porosity |
PERM | Measured Permeability |
SH | Shale logging |
SAND | Sand logging |
XGBoost | Extreme gradient boosting |
References
- Shi, B.; Chang, X.; Yin, W.; Li, Y.; Mao, L. Quantitative evaluation model for tight sandstone reservoirs based on statistical methods—A case study of the Triassic Chang 8 tight sandstones, Zhenjing area, Ordos Basin, China. J. Pet. Sci. Eng. 2019, 173, 601–616. [Google Scholar] [CrossRef]
- Zhao, W.; Li, X.; Wang, T.; Fu, X. Pore size distribution of high volatile bituminous coal of the southern Junggar Basin: A full-scale characterization applying multiple methods. Front. Earth Sci. 2020, 15, 237–255. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, Y.; Zhang, L.; Hou, J. Characterization of pore structure of constructed coal based on mercury intrusion, low-temperature N2 adsorption, and CO2 adsorption. China Coal Soc. 2019, 44, 1188–1196. [Google Scholar]
- Haskett, S.E.; Narahara, G.M.; Holditch, S.A. A Method for Simultaneous Determination of Permeability and Porosity in Low-Permeability Cores. SPE Form. Eval. 1988, 3, 651–658. [Google Scholar] [CrossRef]
- Zhan, H.; Li, X.; Hu, Z.; Duan, X.; Guo, W.; Li, Y. Influence of Particle Size on the Low-Temperature Nitrogen Adsorption of Deep Shale in Southern Sichuan, China. Minerals 2022, 12, 302. [Google Scholar] [CrossRef]
- Nie, B.; Lun, J.; Wang, K.; Shen, J. Three-dimensional characterization of open and closed coal nanopores based on a multi-scale analysis including CO2 adsorption, mercury intrusion, low-temperature nitrogen adsorption, and small-angle X-ray scattering. Energy Sci. Eng. 2020, 8, 2086–2099. [Google Scholar] [CrossRef]
- Qin, L.; Li, S.; Zhai, C.; Lin, H.; Zhao, P.; Shi, Y.; Bai, Y. Changes in the pore structure of lignite after repeated cycles of liquid nitrogen freezing as determined by nitrogen adsorption and mercury intrusion. Fuel 2020, 267, 117214. [Google Scholar] [CrossRef]
- Ni, X.; Zhao, Z.; Wang, B.; Li, Z. Classification of Pore–fracture Combination Types in Tectonic Coal Based on Mercury Intrusion Porosimetry and Nuclear Magnetic Resonance. ACS Omega 2020, 5, 33225–33234. [Google Scholar] [CrossRef]
- Yang, Q.; Xue, J.; Li, W.; Du, X.; Ma, Q.; Zhan, K.; Chen, Z. Comprehensive evaluation and interpretation of mercury intrusion porosimetry data of coals based on fractal theory, Tait equation and matrix compressibility. Fuel 2021, 298, 120823. [Google Scholar] [CrossRef]
- Pitman, E.D. Relationship of Porosity and Permeability to Various Parameters Derived from Mercury Injection-Capillary Pressure Curves for Sandstone. Am. Assoc. Pet. Geol. AAPG/Datapages 1992, 76, 191–198. [Google Scholar]
- Squelch, A.; Harris, B.; AlMalki, M. Estimating porosity from CT scans of high permeability core plugs. ASEG Ext. Abstr. 2012, 2012, 1–3. [Google Scholar] [CrossRef]
- Feng, C.; Yang, Z.; Feng, Z.; Zhong, Y.; Ling, K. A novel method to estimate resistivity index of tight sandstone reservoirs using nuclear magnetic resonance logs. J. Nat. Gas Sci. Eng. 2020, 79, 103358. [Google Scholar] [CrossRef]
- Lyu, Q.; Shi, J.; Gamage, R.P. Effects of testing method, lithology and fluid-rock interactions on shale permeability: A review of laboratory measurements. J. Nat. Gas Sci. Eng. 2020, 78, 103302. [Google Scholar] [CrossRef]
- Rieksts, K.; Hoff, I.; Scibilia, E.; Côté, J. Establishment of Intrinsic Permeability of Coarse Open-Graded Materials: Review and Analysis of Existing Data from Natural Air Convection Tests. Minerals 2020, 10, 767. [Google Scholar] [CrossRef]
- Yang, Y.; Wang, D.; Yang, J.; Wang, B.; Liu, T. Fractal analysis of CT images of tight sandstone with anisotropy and permeability prediction. J. Pet. Sci. Eng. 2021, 205, 108919. [Google Scholar] [CrossRef]
- Zhou, Z.; Wang, X.; Miao, X.; Qian, K.; Li, L.; Xu, P.; Lu, X. The application of modified isochronal well test in a low-permeability condensate gas field. Geosystem Eng. 2019, 22, 310–318. [Google Scholar] [CrossRef]
- Amaefule, J.O.; Altunbay, M.; Tiab, D.; Kersey, D.G.; Keelan, D.K. Enhanced Reservoir Description: Using Core and Log Data to Identify Hydraulic (Flow) Units and Predict Permeability in Uncored Intervals/Wells. In SPE Annual Technical Conference and Exhibition? SPE: Houston, TX, USA, 1993. [Google Scholar]
- Aquilera, R. Sandstone vs. carbonate petroleum reservoirs: A global perspective on porosity-depth and porosity-permeability relationships: Discussion. AAPG Bull. 2006, 90, 807–810. [Google Scholar] [CrossRef]
- Hamada, G.; Joseph, V. Developed correlations between sound wave velocity and porosity, permeability and mechanical properties of sandstone core samples. Pet. Res. 2020, 5, 326–338. [Google Scholar] [CrossRef]
- Bian, H.-L.; Guan, J.; Mao, Z.-Q.; Ju, X.-D.; Han, G.-Q. Pore structure effect on reservoir electrical properties and well logging evaluation. Appl. Geophys. 2014, 11, 374–383. [Google Scholar] [CrossRef]
- Helle, H.B.; Bhatt, A.; Ursin, B. Porosity and permeability prediction from wireline logs using artificial neural networks: A North Sea case study. Geophys. Prospect. 2001, 49, 431–444. [Google Scholar] [CrossRef]
- Babadagli, T.; Al-Salmi, S. A Review of Permeability-Prediction Methods for Carbonate Reservoirs Using Well-Log Data. SPE Reserv. Eval. Eng. 2004, 7, 75–88. [Google Scholar] [CrossRef]
- AlHomadhi, E.S. New correlations of permeability and porosity versus confining pressure, cementation, and grain size and new quantitatively correlation relates permeability to porosity. Arab. J. Geosci. 2013, 7, 2871–2879. [Google Scholar] [CrossRef]
- Liu, B.; Rostamian, A.; Kheirollahi, M.; Mirseyed, S.F.; Mohammadian, E.; Golsanami, N.; Liu, K.; Ostadhassan, M. NMR log response prediction from conventional petrophysical logs with XGBoost-PSO framework. Geoenergy Sci. Eng. 2023, 224, 211561. [Google Scholar] [CrossRef]
- Erofeev, A.; Orlov, D.; Ryzhov, A.; Koroteev, D. Prediction of Porosity and Permeability Alteration Based on Machine Learning Algorithms. Transp. Porous Media 2018, 128, 677–700. [Google Scholar] [CrossRef]
- Ao, Y.; Li, H.; Zhu, L.; Ali, S.; Yang, Z. The linear random forest algorithm and its advantages in machine learning assisted logging regression modeling. J. Pet. Sci. Eng. 2019, 174, 776–789. [Google Scholar] [CrossRef]
- Otchere, D.A.; Ganat, T.O.A.; Gholami, R.; Ridha, S. Application of supervised machine learning paradigms in the prediction of petroleum reservoir properties: Comparative analysis of ANN and SVM models. J. Pet. Sci. Eng. 2021, 200, 108182. [Google Scholar] [CrossRef]
- Al-Anazi, A.F.; Gates, I.D. Support-Vector Regression for Permeability Prediction in a Heterogeneous Reservoir: A Comparative Study. SPE Reserv. Eval. Eng. 2010, 13, 485–495. [Google Scholar] [CrossRef]
- Al-Anazi, A.; Gates, I. Support vector regression to predict porosity and permeability: Effect of sample size. Comput. Geosci. 2012, 39, 64–76. [Google Scholar] [CrossRef]
- Al-Anazi, A.; Gates, I. Support vector regression for porosity prediction in a heterogeneous reservoir: A comparative study. Comput. Geosci. 2010, 36, 1494–1503. [Google Scholar] [CrossRef]
- Thanh, H.V.; Yasin, Q.; Al-Mudhafar, W.J.; Lee, K.-K. Knowledge-based machine learning techniques for accurate prediction of CO2 storage performance in underground saline aquifers. Appl. Energy 2022, 314, 118985. [Google Scholar] [CrossRef]
- Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2020, 54, 1937–1967. [Google Scholar] [CrossRef]
- Pan, S.; Zheng, Z.; Guo, Z.; Luo, H. An optimized XGBoost method for predicting reservoir porosity using petrophysical logs. J. Pet. Sci. Eng. 2022, 208, 109520. [Google Scholar] [CrossRef]
- Elmgerbi, A.; Chuykov, E.; Thonhauser, G.; Nascimento, A. Machine Learning Techniques Application for Real-Time Drilling Hydraulic Optimization. In Proceedings of the International Petroleum Technology Conference 2022, Dhahran, Saudi Arabia, 21–23 February 2022. [Google Scholar]
- Shengli, G.; Jinxia, Y. Exploration of tight oil resources based on stratigraphic paleo-tectonics during hydrocarbon generation in the Ordos Basin, China. Arab. J. Geosci. 2019, 12, 387. [Google Scholar] [CrossRef]
- Zhu, S.; Cui, J.; Chen, J.; Luo, G.; Wang, W.; Yang, Y. Shallow Water Delta Sedimentary System and Reservoir Petrological Characteristics: A Case Study of Shan1-He8 Member in the Western Ordos Basin. Acta Sedimentol. Sin. 2021, 39, 126–139. [Google Scholar]
- Wei, Q.; Cui, G.; Liu, M. Reservoir characteristics and control factors of the Lower Member of the Permian He-8 in the southwestern part of the Ordos Basin. Lithol. Reserv. 2021, 33, 17–25. [Google Scholar]
- Liang, F.; Hang, W.; Niu, J. Source analysis of the Shan1 section of the Shanxi Formation of the Permian in the southwestern margin of the Ordos Basin to the He8 section of the Lower Shihezi Formation. Acta Sedimentol. Sin. 2018, 36, 142–153. [Google Scholar]
- Xiao, J.; Sun, F.; He, N.; Li, J.; Xiao, H. The North South Source Sedimentary Confluence Area and Paleogeography of the Shanxi Formation and Lower Shihezi Formation of the Permian in the Ordos Basin. J. Palaeogeogr. 2008, 4, 341–354. [Google Scholar]
- Markovic, S.; Bryan, J.L.; Rezaee, R.; Turakhanov, A.; Cheremisin, A.; Kantzas, A.; Koroteev, D. Application of XGBoost model for in-situ water saturation determination in Canadian oil-sands by LF-NMR and density data. Sci. Rep. 2022, 12, 13984. [Google Scholar] [CrossRef]
- Ibrahim, N.M.; Alharbi, A.A.; Alzahrani, T.A.; Abdulkarim, A.M.; Alessa, I.A.; Hameed, A.M.; Albabtain, A.S.; Alqahtani, D.A.; Alsawwaf, M.K.; Almuqhim, A.A. Well Performance Classification and Prediction: Deep Learning and Machine Learning Long Term Regression Experiments on Oil, Gas, and Water Production. Sensors 2022, 22, 5326. [Google Scholar] [CrossRef]
- Giffon, L.; Emiya, V.; Kadri, H.; Ralaivola, L. QuicK-means: Accelerating inference for K-means by learning fast transforms. Mach. Learn. 2021, 110, 881–905. [Google Scholar] [CrossRef]
Logging Curves | Mean | First Quartile | Median | Third Quartile | Min | Max | Std Deviation | Variance |
---|---|---|---|---|---|---|---|---|
AC (μs/m) | 216.20 | 203.68 | 212.68 | 225.65 | 181.86 | 379.57 | 19.17 | 367.71 |
CAL (in) | 25.04 | 23.14 | 24.89 | 26.07 | 21.30 | 39.68 | 2.48 | 6.14 |
CNL (%) | 11.12 | 6.17 | 9.27 | 13.51 | 0.13 | 50.59 | 7.05 | 49.73 |
DEN (g/cm3) | 2.53 | 2.49 | 2.56 | 2.62 | 1.29 | 2.85 | 0.16 | 0.03 |
GR (API) | 66.96 | 38.78 | 59.17 | 85.31 | 15.64 | 437.66 | 35.80 | 1282.35 |
PE | 2.91 | 2.21 | 2.55 | 3.00 | 0.79 | 13.70 | 1.45 | 2.11 |
RLLD (Ω·m) | 67.07 | 28.25 | 45.88 | 73.25 | 5.37 | 679.66 | 75.83 | 5751.95 |
RLLS (Ω·m) | 60.68 | 25.65 | 42.45 | 68.13 | 5.51 | 666.82 | 66.98 | 4487.29 |
RT (Ω·m) | 66.59 | 28.16 | 45.96 | 72.95 | 0.00 | 676.38 | 75.80 | 5747.06 |
SP (MV) | 56.52 | 47.81 | 57.45 | 65.93 | −7.18 | 113.31 | 14.33 | 205.47 |
K (%) | 1.32 | 0.69 | 1.18 | 1.79 | 0.11 | 4.32 | 0.77 | 0.60 |
TH (mg/L) | 8.91 | 4.94 | 7.60 | 11.62 | 1.51 | 42.88 | 5.15 | 26.53 |
U (mg/L) | 2.36 | 1.36 | 1.88 | 2.88 | 0.22 | 15.78 | 1.52 | 2.31 |
POR_L (%) | 4.55 | 2.16 | 4.69 | 6.74 | 0.00 | 15.20 | 3.16 | 9.97 |
PERM_L (10−3 μm2) | 0.33 | 0.01 | 0.11 | 0.40 | 0.00 | 11.37 | 0.72 | 0.52 |
Model | Feature Combination | MAE of Training Set | MAE of Test Set | R2 of Training Set | R2 of Test Set |
---|---|---|---|---|---|
POR | [AC, CAL, CNL, DEN, SP, RLLS, PE, GR] | 0.0075 | 0.0195 | 0.97 | 0.68 |
PERM | [AC, CAL, SP, RLLS, PE, GR, RLLD] | 0.0003 | 0.0006 | 0.99 | 0.31 |
LOG(PERM) | [AC, CAL, CNL, DEN, SP, RLLS, PE, GR, RLLD] | 0.3757 | 0.0920 | 0.97 | 0.57 |
Arctan(PERM) | [CAL, CNL, DEN, SP, PE, GR, RLLD] | 0.0932 | 0.0244 | 0.96 | 0.50 |
Algorithm | Hyper-Parameter Optimization | Hyper-Parameter Description |
---|---|---|
XGBoost | learning_rate | The smaller the learning rate, the smaller the impact of each tree, and the more stable the model training |
max_depth | Control the maximum depth of each tree, with small values, making it difficult for the model to overfit | |
min_child_weight | Preventing model overfitting on the training set | |
n_estimators | The number of decision trees, the more decision trees there are, the better the model performance |
Algorithm | Feature Combination | Hyper-Parameter Values | MAE of Training Set | MAE of Test Set | R2 of Training Set | R2 of Test Set |
---|---|---|---|---|---|---|
POR | [AC, CAL, CNL, DEN, SP, RLLS, PE, GR] | learning_rate: 0.1 max_depth: 50 min_child_weight: 9 n_estimators: 300 | 0.0008 | 0.0671 | 0.98 | 0.72 |
PERM | [AC, CAL, SP, RLLS, PE, GR, RLLD] | n_estimators: 100 | 0.0017 | 0.0084 | 0.99 | 0.32 |
LOG(PERM) | [AC, CAL, CNL, DEN, SP, RLLS, PE, GR, RLLD] | learning_rate: 0.02 max_depth: 90 min_child_weight: 9 n_estimators: 600 | 0.0205 | 0.3543 | 0.97 | 0.62 |
Arctan(PERM) | [CAL, CNL, DEN, SP, PE, GR, RLLD] | learning_rate: 0.02 max_depth: 50 min_child_weight: 9 n_estimators: 300 | 0.0225 | 0.0893 | 0.96 | 0.52 |
Algorithm | Group | Feature Combination | MAE of Training Set | MAE of Test Set | R2 of Training Set | R2 of Test Set |
---|---|---|---|---|---|---|
POR | 0 | [AC, CAL, CNL, DEN, SP, RLLS, PE, GR] | 0.0744 | 0.0544 | 0.98 | 0.68 |
1 | [CAL, CNL, SP, RLLS, PE, GR, RT] | 0.0007 | 0.0563 | 0.99 | 0.84 | |
PERM | 0 | [AC, CNL, RLLS, RLLD] | 0.0023 | 0.0085 | 0.98 | 0.32 |
1 | [AC, CAL, DEN, RT] | 0.0008 | 0.0131 | 0.99 | 0.79 | |
LOG(PERM) | 0 | [CAL, CNL, DEN, SP, RLLS, PE, GR, RLLD] | 0.0969 | 0.3761 | 0.96 | 0.58 |
1 | [CNL, SP, RLLS, RLLD, RT] | 0.0010 | 0.2493 | 0.99 | 0.83 | |
Arctan(PERM) | 0 | [CAL, CNL, DEN, SP, RLLS, PE, GR, RT] | 0.0246 | 0.0915 | 0.96 | 0.54 |
1 | [CNL, SP, PE, GR, RT] | 0.0009 | 0.0530 | 0.99 | 0.85 |
Algorithm | Group | Feature Combination | Hyper-parameter Values | MAE of Training Set | MAE of Test Set | R2 of Training Set | R2 of Test Set |
---|---|---|---|---|---|---|---|
POR | 0 | [AC, CAL, CNL, DEN, SP, RLLS, PE, GR] | learning_rate: 0.07 max_depth: 100 min_child_weight: 20 n_estimators: 500 | 0.0006 | 0.0685 | 0.99 | 0.73 |
1 | [CAL, CNL, SP, RLLS, PE, GR, RT] | n_estimators: 17 | 0.0135 | 0.0552 | 0.99 | 0.85 | |
PERM | 0 | [AC, CNL, RLLS, RLLD] | n_estimators: 100 | 0.0023 | 0.0085 | 0.98 | 0.32 |
1 | [AC, CAL, DEN, RT] | n_estimators: 500 | 0.0008 | 0.0131 | 0.99 | 0.79 | |
LOG(PERM) | 0 | [CAL, CNL, DEN, SP, RLLS, PE, GR, RLLD] | learning_rate: 0.01 max_depth: 40 min_child_weight: 9 n_estimators: 800 | 0.0570 | 0.3520 | 0.98 | 0.61 |
1 | [CNL, SP, RLLS, RLLD, RT] | n_estimators: 31 | 0.0482 | 0.2494 | 0.99 | 0.83 | |
Arctan(PERM) | 0 | [CAL, CNL, DEN, SP, RLLS, PE, GR, RT] | learning_rate: 0.1 max_depth: 80 min_child_weight: 15 n_estimators: 100 | 0.0211 | 0.0842 | 0.96 | 0.58 |
1 | [CNL, SP, PE, GR, RT] | n_estimators: 37 | 0.0009 | 0.0530 | 0.99 | 0.85 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, J.; Wang, R.; Jia, A.; Feng, N. Optimization and Application of XGBoost Logging Prediction Model for Porosity and Permeability Based on K-means Method. Appl. Sci. 2024, 14, 3956. https://doi.org/10.3390/app14103956
Zhang J, Wang R, Jia A, Feng N. Optimization and Application of XGBoost Logging Prediction Model for Porosity and Permeability Based on K-means Method. Applied Sciences. 2024; 14(10):3956. https://doi.org/10.3390/app14103956
Chicago/Turabian StyleZhang, Jianting, Ruifei Wang, Ailin Jia, and Naichao Feng. 2024. "Optimization and Application of XGBoost Logging Prediction Model for Porosity and Permeability Based on K-means Method" Applied Sciences 14, no. 10: 3956. https://doi.org/10.3390/app14103956
APA StyleZhang, J., Wang, R., Jia, A., & Feng, N. (2024). Optimization and Application of XGBoost Logging Prediction Model for Porosity and Permeability Based on K-means Method. Applied Sciences, 14(10), 3956. https://doi.org/10.3390/app14103956