Machine learning subsurface flow equations from data

Chang, Haibin; Zhang, Dongxiao

doi:10.1007/s10596-019-09847-2

Machine learning subsurface flow equations from data

Original Paper
Open access
Published: 12 July 2019

Volume 23, pages 895–910, (2019)
Cite this article

Download PDF

You have full access to this open access article

Computational Geosciences Aims and scope Submit manuscript

Machine learning subsurface flow equations from data

Download PDF

1291 Accesses
34 Citations
Explore all metrics

Abstract

Governing equations of physical problems are traditionally derived from conservation laws or physical principles. However, some complex problems still exist for which these first-principle derivations cannot be implemented. As data acquisition and storage ability have increased, data-driven methods have attracted great attention. In recent years, several works have addressed how to learn dynamical systems and partial differential equations using data-driven methods. Along this line, in this work, we investigate how to discover subsurface flow equations from data via a machine learning technique, the least absolute shrinkage and selection operator (LASSO). The learning of single-phase groundwater flow equation and contaminant transport equation are demonstrated. Considering that the parameters of subsurface formation are usually heterogeneous, we propose a procedure for learning partial differential equations with heterogeneous model parameters for the first time. Derivative calculation from discrete data is required for implementing equation learning, and we discuss how to calculate derivatives from noisy data. For a series of cases, the proposed data-driven method demonstrates satisfactory results for learning subsurface flow equations.

Article PDF

Reducing uncertainty in conceptual prior models of complex geologic systems via integration of flow response data

Article 07 January 2020

A Surrogate Modelling Approach Based on Nonlinear Dimension Reduction for Uncertainty Quantification in Groundwater Flow Models

Article Open access 25 May 2018

Data-driven discovery of governing equations for transient heat transfer analysis

Article 15 April 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Bear, J.: Dynamics of Fluids in Porous Media. New York: Environmental Science Series (1972)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009). https://doi.org/10.1137/080716542
Article Google Scholar
Bongard, J., Lipson, H.: Automated reverse engineering of nonlinear dynamical systems. Proc. Natl. Acad. Sci. USA 104(24), 9943–9948 (2007). https://doi.org/10.1073/pnas.0609476104
Article Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3(1), 1–122 (2010). https://doi.org/10.1561/2200000016
Article Google Scholar
Bruno, O., Hoch, D.: Numerical differentiation of approximated functions with limited order-of-accuracy deterioration. SIAM J. Numer. Anal. 50(3), 1581–1603 (2012). https://doi.org/10.1137/100805807
Article Google Scholar
Brunton, S.L., Proctor, J.L., Kutz, J.N.: Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. USA 113(15), 3932–3937 (2016). https://doi.org/10.1073/pnas.1517384113
Article Google Scholar
Chang, H., Zhang, D.: Identification of physical processes via combined data-driven and data-assimilation methods. J. Comp. Phy. 393, 337–350 (2019). https://doi.org/10.1016/j.jcp.2019.05.008
Article Google Scholar
Chartrand, R.: Numerical differentiation of noisy, nonsmooth data. ISRN Applied Mathematics 2011, 1–11 (2011). https://doi.org/10.5402/2011/164564
Article Google Scholar
Cullum, J.: Numerical differentiation and regularization. SIAM J. Numer. Anal. 8(2), 254–265 (1971). https://doi.org/10.1137/0708026
Article Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.J.: Least angle regression. Ann. Stat. 32(2), 407–451 (2004). https://doi.org/10.1214/009053604000000067
Article Google Scholar
Figueiredo, M.A.T., Nowak, R.D., Wright, S.J.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Sign. Proces. 1(4), 586–597 (2007). https://doi.org/10.1109/JSTSP.2007.910281
Article Google Scholar
Hastie, T., Tibshirani, R.J., Friedman, J.H.: The elements of statistical learning: data mining, inference, and prediction. New York: Springer series in statistics. https://doi.org/10.1007/978-0-387-84858-7 (2009)
Book Google Scholar
Hesterberg, T., Choi, N.H., Meier, L., Fraley, C.: Least angle and l1 penalized regression: a review. Statistics Surveys 2, 61–93 (2008). https://doi.org/10.1214/08-SS035
Article Google Scholar
Jauberteau, F, Jauberteau, J.L.: Numerical differentiation with noisy signal. Appl. Math. Comput. 215 (6), 2283–2297 (2009). https://doi.org/10.1016/j.amc.2009.08.042
Article Google Scholar
Knowles, I., Le, T., Yan, A.: On the recovery of multiple flow parameters from transient head data. J. Comput. Appl. Math. 169(1), 1–15 (2004). https://doi.org/10.1016/j.cam.2003.10.013
Article Google Scholar
Mangan, N.M., Brunton, S.L., Proctor, J.L., Kutz, J.N.: Inferring biological networks by sparse identification of nonlinear dynamics. IEEE Transactions on Molecular Biological and Multi-Scale Communications 2(1), 52–63 (2016). https://doi.org/10.1109/TMBMC.2016.2633265
Article Google Scholar
Mangan, N.M., Kutz, J.N., Brunton, S.L., Proctor, J.L.: Model selection for dynamical systems via sparse regression and information criteria. Proceedings of the Royal Society A-Mathematical Physical and Engineering Sciences 473(2204), 16 (2017). https://doi.org/10.1098/rspa.2017.0009
Article Google Scholar
Meng, J., Li, H.: An efficient stochastic approach for flow in porous media via sparse polynomial chaos expansion constructed by feature selection. Adv. Water Resour. 105, 13–28 (2017). https://doi.org/10.1016/j.advwatres.2017.04.019
Article Google Scholar
Ramos, G., Carrera, J., Gómez, S., Minutti, C., Camacho, R.: A stable computation of log-derivatives from noisy drawdown data. Water Resour. Res. 53(9), 7904–7916 (2017). https://doi.org/10.1002/2017WR020811
Article Google Scholar
Rosset, S., Zhu, J.: Piecewise linear regularized solution paths. Ann. Stat. 35(3), 1012–1030 (2007). https://doi.org/10.1214/009053606000001370
Article Google Scholar
Rudy, S.H., Brunton, S.L., Proctor, J.L., Kutz, J.N.: Data-driven discovery of partial differential equations. Sci. Adv. 3(4), e1602614 (2017). https://doi.org/10.1126/sciadv.1602614
Article Google Scholar
Schaeffer, H.: Learning partial differential equation via data discovery and sparse optimisation. Proceedings of the Royal Society A-Mathematical Physical and Engineering Sciences 473(2197), 20160446 (2017). https://doi.org/10.1098/rspa.2016.0446
Article Google Scholar
Schmidt, M., Lipson, H.: Distilling free-form natural laws from experimental data. Science 324(5923), 81–85 (2009). https://doi.org/10.1126/science.1165893
Article Google Scholar
Tibshirani, R.J.: The lasso problem and uniqueness. Electronic Journal of Statistics 7(1), 1456–1490 (2013). https://doi.org/10.1214/13-EJS815
Article Google Scholar
Zou, H.: The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 101(476), 1418–1429 (2006). https://doi.org/10.1198/016214506000000735
Article Google Scholar

Download references

Acknowledgments

This work is partially funded by the National Natural Science Foundation of China (Grant No. U1663208 and 51520105005) and the National Science and Technology Major Project of China (Grant No. 2017ZX05009-005 and 2016ZX05037-003). The link for the open-source Matlab code is provided in Hesterberg et al. [13]. The other computer codes and data used are available upon request from the corresponding author.

Author information

Authors and Affiliations

ERE and BIC-ESAT, College of Engineering, Peking University, Beijing, 100871, China
Haibin Chang & Dongxiao Zhang

Authors

Haibin Chang
View author publications
You can also search for this author in PubMed Google Scholar
Dongxiao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongxiao Zhang.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix: A

In this appendix, the finite difference scheme for calculating the derivatives in the candidate library is provided.

The finite difference schemes for calculating the derivatives in Eq. 10 are

$$ \begin{array}{@{}rcl@{}} \frac{\partial u}{\partial t}(x,t_{k} )&=&(u(x,t_{k} )-u(x,t_{k-1} ))/{\Delta} t,\\ \frac{\partial u}{\partial x}(x_{i} ,t)&=&(0.5u(x_{i+1} ,t)-0.5u(x_{i-1} ,t))/{\Delta} x, \\ \frac{\partial^{2}u}{\partial x^{2}}(x_{i} ,t)&=&(u(x_{i+1} ,t)-2u(x_{i} ,t)+u(x_{i-1} ,t))/({\Delta} x)^{2}, \\ \frac{\partial^{3}u}{\partial x^{3}}(x_{i} ,t)&=&(0.5u(x_{i+2} ,t)-u(x_{i+1} ,t)+u(x_{i-1} ,t)\\ &&-0.5u(x_{i-2} ,t))/({\Delta} x)^{3}. \end{array} $$

(A.1)

where Δt and Δx are the step sizes for time and space, respectively. Here, note that, to calculate the derivatives, we need the spatially or temporally nearby data. The data near the boundaries without sufficient nearby data for calculating the derivatives are not utilized for learning PDE.

The finite difference schemes for computing the derivatives in Eq. 13 are

$$ \begin{array}{@{}rcl@{}} \frac{\partial u}{\partial t}(x,t_{k} )\!&=&(u(x,t_{k} )-u(x,t_{k-1} ))/{\Delta} t,\\ \frac{\partial u}{\partial x}(x_{i,j} ,t)&=&(0.5u(x_{i+1,j} ,t)-0.5u(x_{i-1,j} ,t))/{\Delta} x, \\ \frac{\partial u}{\partial y}(x_{i,j} ,t)&=&(0.5u(x_{i,j+1} ,t)-0.5u(x_{i,j-1} ,t))/{\Delta} y, \\ \frac{\partial^{2}u}{\partial x^{2}}(x_{i,j} ,t)&=&(u(x_{i+1,j} ,t)-2u(x_{i,j} ,t)+u(x_{i-1,j} ,t))/({\Delta} x)^{2}, \\ \frac{\partial^{2}u}{\partial y^{2}}(x_{i,j} ,t)&=&(u(x_{i,j+1} ,t)-2u(x_{i,j} ,t)+u(x_{i,j-1} ,t))/({\Delta} y)^{2}, \\ \frac{\partial^{2}u}{\partial x\partial y}(x_{i,j} ,t)&=&(0.5\times (0.5u(x_{i+1,j+1} ,t)-0.5u(x_{i+1,j-1} ,t)) \\ &&-0.5\!\times\! (0.5u(x_{i-1,j+1} ,t)\\&&-0.5u(x_{i-1,j-1} ,t)))/{\Delta} x{\Delta} y.\\ \end{array} $$

(A.2)

where Δx and Δy are the step sizes in x and y dimension in space, respectively.

The finite difference scheme for calculating ∂K/∂y in Eq. 15 is

$$ \begin{array}{@{}rcl@{}} \frac{\partial K}{\partial y}(x_{i,j} )&=&\left( 2/\left( \frac{1}{K(x_{i,j+1} )}+\frac{1}{K(x_{i,j} )}\right)\right.\\ &&-\left.2/\left( \frac{1}{K(x_{i,j} )}+\frac{1}{K(x_{i,j-1} )}\right)\right)/{\Delta} y. \end{array} $$

(A.3)

The finite difference scheme for calculating ∂(K∂h/∂y)/∂y/∂y is

$$ \begin{array}{@{}rcl@{}} &&\frac{\partial }{\partial y}\left( K\frac{\partial u}{\partial y}\right)(x_{i,j} ,t)\\&=& \left( {\begin{array}{l} \!\left( 2/\!\left( \!\frac{1}{K(x_{i,j\text{+1}} )} + \frac{1}{K(x_{i,j} )}\right)\right)(u(x_{i,j\text{+1}} ,t) - u(x_{i,j} ,t))\\ \!-2/\!\left( \!\frac{1}{K(x_{i,j} )} + \frac{1}{K(x_{i,j\text{-1}} )}\right)(u(x_{i,j} ,t) - u(x_{i,j\text{-1}} ,t)) \end{array}} \!\!\right)\!/({\Delta} y)^{2}.\\ \end{array} $$

(A.4)

For the contaminant transport problem, the finite difference scheme for calculating the d th derivative is

$$ \frac{\partial^{d}u(x)}{\partial x^{d}}=\sum\limits_{l=-L}^{L} {\frac{a_{l} u(x+l{\Delta} x)}{({\Delta} x)^{d}}} , $$

(A.5)

where d > 0 denotes the order of the derivative, L denotes the number of data points in each side of the considered location, and a_l denotes the coefficient. When d is odd, a_−l = −a_l, while when d is even, a_−l = a_l. Tables 3, 4, and 5 show the coefficient values for the first three orders of derivatives with different choice of L.

Table 3 Coefficient values for calculating the first-order derivative (d = 1)

Full size table

Table 4 Coefficient values for calculating the second-order derivative (d = 2)

Full size table

Table 5 Coefficient values for calculating the third-order derivative (d = 3)

Full size table

Here, note that the derivative of u with respect to t is also calculated using Eq. A.5 by replacing x with t.

Appendix: B

In this appendix, we will discuss the reason that the coefficient K and ∂K/∂y in the PDE shown in Eq. 29 needs to be calculated using Eqs. 26 and A.3, respectively.

We take the term ∂(K∂h/∂y)/∂y for analysis. According to the equation in MODFLOW, ∂(K∂h/∂y)/∂y is calculated using the following formula:

$$ \begin{array}{@{}rcl@{}} &&\frac{\partial }{\partial y}\left( K\frac{\partial u}{\partial y}\right)(x_{i,j} ,t)\\&=&\left( \left( K\frac{\partial u}{\partial y}\right)(x_{i,j+1/2} ,t)-\left( K\frac{\partial u}{\partial y}\right)\right)/{\Delta} y. \end{array} $$

(B.1)

Performing simple manipulation, we have

$$ \begin{array}{@{}rcl@{}} &&\frac{\partial}{\partial y}\left( K\frac{\partial u}{\partial y}\right)(x_{i,j}, t)\\ &=&\left( {\begin{array}{l} K(x_{i,j+1/2} )\frac{\partial u}{\partial y}(x_{i,j+1/2} ,t)-K(x_{i,j+1/2} )\frac{\partial u}{\partial y}(x_{i,j} ,t)+K(x_{i,j+1/2} )\frac{\partial u}{\partial y}(x_{i,j} ,t) \\ -K(x_{i,j-1/2} )\frac{\partial u}{\partial y}(x_{i,j} ,t)+K(x_{i,j-1/2} )\frac{\partial u}{\partial y}(x_{i,j} ,t)-K(x_{i,j-1/2} )\frac{\partial u}{\partial y}(x_{i,j-1/2} ,t) \end{array}} \right)/{\Delta} y \\ &=&\frac{1}{2}K(x_{i,j+1/2} )\frac{\partial^{2}u}{\partial y^{2}}(x_{i,j+1/4} ,t)+\frac{\partial K}{\partial y}(x_{i,j} )\frac{\partial u}{\partial y}(x_{i,j} ,t)+\frac{1}{2}K(x_{i,j-1/2} )\frac{\partial^{2}u}{\partial y^{2}}(x_{i,j-1/4} ,t). \end{array} $$

(B.2)

If ∂²u/∂y²(x_i,j+ 1/4,t) and ∂²u/∂y²(x_i,j− 1/4,t) are close to ∂²u/∂y²(x_i,j,t), we have

$$ \begin{array}{@{}rcl@{}} &&\frac{\partial }{\partial y}\left( K\frac{\partial u}{\partial y}\right)(x_{i,j} ,t)\\&\approx& \frac{1}{2}(K(x_{i,j+1/2} )+K(x_{i,j-1/2} ))\frac{\partial^{2}u}{\partial y^{2}}(x_{i,j} ,t)\\&&+\frac{\partial K}{\partial y}(x_{i,j} )\frac{\partial u}{\partial y}(x_{i,j} ,t), \end{array} $$

(B.3)

where

$$ \frac{\partial K}{\partial y}(x_{i,j} )=(K(x_{i,j+1/2} )-K(x_{i,j-1/2} ))/{\Delta} y. $$

(B.4)

Conductivity at the interface of nearby grid blocks is calculated using the harmonic mean in MODFLOW to ensure flux continuity.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Chang, H., Zhang, D. Machine learning subsurface flow equations from data. Comput Geosci 23, 895–910 (2019). https://doi.org/10.1007/s10596-019-09847-2

Download citation

Received: 13 December 2018
Accepted: 14 June 2019
Published: 12 July 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s10596-019-09847-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Machine learning subsurface flow equations from data

Abstract

Article PDF

Similar content being viewed by others

Reducing uncertainty in conceptual prior models of complex geologic systems via integration of flow response data

A Surrogate Modelling Approach Based on Nonlinear Dimension Reduction for Uncertainty Quantification in Groundwater Flow Models

Data-driven discovery of governing equations for transient heat transfer analysis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Appendices

Appendix: A

Appendix: B

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Machine learning subsurface flow equations from data

Abstract

Article PDF

Similar content being viewed by others

Reducing uncertainty in conceptual prior models of complex geologic systems via integration of flow response data

A Surrogate Modelling Approach Based on Nonlinear Dimension Reduction for Uncertainty Quantification in Groundwater Flow Models

Data-driven discovery of governing equations for transient heat transfer analysis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Appendices

Appendix: A

Appendix: B

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation