Python and Data Analytics in Oil and Gas
Python and Data Analytics in Oil and Gas
Welcome Everyone!
1. Industry 4.0: Data Volume in Oil and Gas Industry has been increasing exponentially because of
advancement of technology, So why waste this data, if we can use it for building powerful Machine
Learning, Deep Learning Algorithms for solving problems. For that python is best with its superpowers
(Libraries)
Hello Buddies
How everyone doing
Hello
UPES
<class 'int'>
<class 'float'>
<class 'str'>
<class 'str'>
In [4]: a = 5
b = 'Petroleum'
In [5]: a
Out[5]: 5
In [6]: b
Out[6]: 'Petroleum'
Out[7]: 'UPES'
In [8]: # casting Type
x = '566'
type(x)
Out[8]: str
In [9]: y = float(x)
y
Out[9]: 566.0
In [10]: type(y)
Out[10]: float
Mathematical Operations
In [11]: x = 45
y = 21
print(x+y)
print(x-y)
print(x*y)
print(x/y)
66
24
945
2.142857142857143
441
In [15]: print(x**2+y**2)
2466
Strings
6.0 : Float
"6" : String
In [17]: # Operations
print('Petroleum'+'Engineering')
print('Spam'*3)
PetroleumEngineering
SpamSpamSpam
In [18]: print(4*3)
print(4*'3')
12
3333
In [19]: type(4*'3')
Out[19]: str
In [21]: porosity
Out[21]: '0.2'
In [22]: type('porosity')
Out[22]: str
In [25]: type(porosity)
Out[25]: float
In [28]: print(2!= 3)
True
In [29]: print(2 == 3 )
False
In [30]: print(2 = 3 )
'=' is used for intializing variable values, '==' is used for comparison
True
False
Identation
In [38]: if Pr > Pb: #Reservoir Pressure greater than Bubble Point Pressure: Un
dersqaturated Oil Reservoir
print('Your reservoir is Undersaturated Oil Reservoir')
elif Pr==Pb:
print('Your reservoir is Saturated Oil Reservoir')
else:
print('Gas has been evolved from your reservoir, so your reservoir
has both phases: Oil and Gas')
Gas has been evolved from your reservoir, so your reservoir has both
phases: Oil and Gas
While Loop : To repeat a block of code again and again; until the condition
satisfies
The code in body of while loop is executed repeatedly. This is called Iterations
In [40]: ## While loop Can be used to stop iteration after a specific input
else:
print('Access Granted')
Continue : to jump back to top of the while loop, rather than stopping it.
Stops the current iteration and continue with the next one.
In [43]: pressure = 3500
while pressure < 4500:
pressure+=100
if pressure ==4000:
print('Skipping 4000 psi')
continue
print(f'Pressure is {pressure} psi')
Lists
Used to store multiple items in a single variable
Square Brackets are used [] for creating lists
Mutable: Can Change length, elements, elements values
In [45]: porosity
In [46]: type(porosity)
Out[46]: list
Out[47]: 0.3
In [48]: porosity[-1]
Out[48]: 0.67
In [50]: porosity
In [52]: len(porosity)
Out[52]: 6
In [53]: # Empty list are widely used for populating it later with certain calcu
lations
specificgravity_of_crudes = [0.8,0.7,0.85,0.76,0.91,0.64,0.65]
denWater = 62.4
In [54]: CrudeDensity = []
i = 0
while i < len(specificgravity_of_crudes):
denO = denWater*specificgravity_of_crudes[i]
CrudeDensity.append(denO) #append: adding an item to the end of an
existing list
i+=1
CrudeDensity
In [55]: #insert: Like append but we can insert a new item at any position in li
st.
a = [1,2,3,4,5,6,7]
a.insert(4,'PETROLEUM')
a
print(subset1)
subset2 = superset[:]
print(subset2) #Skipping a part also works for first and end indices.
[1, 3, 5]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
In [57]: #Reversing the list
reverse_set = superset[-1::-1]
print(reverse_set)
[9, 8, 7, 6, 5, 4, 3, 2, 1]
Out[58]: [1, 2, 3, 4, 5, 6]
In [59]: #Multiply
a*3
Out[59]: [1, 2, 3, 1, 2, 3, 1, 2, 3]
In [60]: a*b
---------------------------------------------------------------------
------
TypeError Traceback (most recent call
last)
<ipython-input-60-8ce765dcfa30> in <module>
----> 1 a*b
Tuples
A tuple is a collection which is ordered and unchangeable
parenthesis are used ()
Immutable: Cannot be change: can be used for storing important data that cannot be changed by anyone
In [62]: #Ordered
oilprd[1]
Out[62]: 1000
In [63]: #Immutable
oilprd[3] = 2332
---------------------------------------------------------------------
------
TypeError Traceback (most recent call
last)
<ipython-input-63-8ca89081a59e> in <module>
1 #Immutable
----> 2 oilprd[3] = 2332
Dictionaries
Helps store data with labels
print(rock_properties)
In [65]: #Access
rock_properties['poro']
Out[65]: 0.25
In [66]: #Change
rock_properties['lithology'] = 'Shale'
In [67]: rock_properties
In [68]: #Saturation
rock_properties['WaterSat'] = 0.25
In [69]: rock_properties
Out[70]:
poro perm Lithology
0 0.1 50 S
1 0.2 100 S
2 0.3 150 C
3 0.5 200 S
Sets
Curly braces are used just like dictionaries
In [71]: a = {1,2,3,4,5,6,7,1,2,2}
In [72]: a
Out[72]: {1, 2, 3, 4, 5, 6, 7}
for loops
The tool with which we can utilize the power of computers(iterations)
Functions
Use of Function
In [78]: c
Out[78]: 7
In [79]: #Once we return from a function, it stops being executed, any code writ
en after the return will never be executed
def f(x,y,z):
return x/y +z
print('Hello')
In [80]: f(3,4,5)
Out[80]: 5.75
In [82]: api(0.9)
In [84]: pythhyp(3,4)
Out[84]: 5.0
In [85]: c = pythhyp(3,4)
Lambda Function
Single line function
In [87]: api_lambda(0.9)
Out[87]: 25.72222222222223
Day 2
Numpy
1. Stands for Numerical Python,numerical package for python
2. Numpy is also incredibly fast, as it has bindings to C libraries. So, NumPy operations help in
computational efficiency.
3. Entire mathematical package of Matlab is numpy
4. Data Manipulations of arrays and matix, Can be used for Multidimensional Data Preparations
5. Array doesn't exist in python core, there list is present
In [4]: type(permarr)
Out[4]: numpy.ndarray
arr = np.array(t)
arr
DImensions in array
In [7]: ar0
Out[7]: array(34)
In [8]: #Checking Dimension of array using ndim attribute
ar0.ndim
Out[8]: 0
In [11]: arpor
In [12]: arpor.ndim
Out[12]: 1
In [14]: porosity2d
In [15]: porosity2d.ndim
Out[15]: 2
In [17]: perm.ndim
Out[17]: 3
In [18]: perm
[[ 1, 2, 3, 2],
[ 3, 4, 5, 5]],
In [19]: porosity2d
In [20]: porosity2d.shape
Out[20]: (2, 5)
In [21]: perm
[[ 1, 2, 3, 2],
[ 3, 4, 5, 5]],
In [22]: perm.shape
Out[22]: (3, 2, 4)
In [23]: #Make an array of pressures ranging from 0 to 5000 psi with a step size
of 500 psi
pressures = np.arange(0,5500,500)
In [24]: pressures
Out[24]: array([ 0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 50
00])
In [25]: pressures.ndim
Out[25]: 1
In [27]: saturations.ndim
Out[27]: 1
Indexing Arrays
Accesing elements
In [28]: perm = np.array([[[1,2,3],[2,3,4]],[[34,45,56],[34,78,23]]])
In [29]: perm
In [30]: perm[0]
In [31]: perm[0][1]
In [32]: perm[0][1][1]
Out[32]: 3
Slicing Arrays
In [33]: perm = np.array([[13,23,32,43,54],[43,55,60,7,8],[12,34,45,77,87],[2,5
5,39,82,49]])
In [34]: perm
In [37]: perm
In [42]: perm[1:4]
In [40]: perm[1:4,1:4]
In [45]: np.random.randint(250,350,(3,4,3))
In [47]: porosity
[[0.53506026, 0.5134192 ],
[0.22483723, 0.50576969],
[0.61361707, 0.40238323],
[0.10464277, 0.2748988 ]],
[[0.73888073, 0.61586745],
[0.24838684, 0.92115323],
[0.73716043, 0.06420523],
[0.3844945 , 0.83562484]],
[[0.74130977, 0.32222591],
[0.05728948, 0.83661929],
[0.48840097, 0.68103858],
[0.11205602, 0.22901724]],
[[0.05598509, 0.41528224],
[0.10596222, 0.93729523],
[0.93885962, 0.02993201],
[0.55515742, 0.46881714]]])
Additional*
In [50]: x = np.array(range(10))
y = np.array(range(10))
z = np.array(range(5))
por = np.random.normal(loc = 0.5,scale = 0.15,size =(len(x), len(y), le
n(z)) )
print(por.shape)
(10, 10, 5)
In [54]: a*b
---------------------------------------------------------------------
------
TypeError Traceback (most recent call
last)
<ipython-input-54-8ce765dcfa30> in <module>
----> 1 a*b
In [55]: ara*arb
In [56]: 4551/0
---------------------------------------------------------------------
------
ZeroDivisionError Traceback (most recent call
last)
<ipython-input-56-c82d36dbfcfb> in <module>
----> 1 4551/0
In [57]: np.array(42523454)/0
C:\Users\acer\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: Ru
ntimeWarning: divide by zero encountered in true_divide
"""Entry point for launching an IPython kernel.
Out[57]: inf
Out[60]: ['Solarize_Light2',
'_classic_test_patch',
'bmh',
'classic',
'dark_background',
'fast',
'fivethirtyeight',
'ggplot',
'grayscale',
'seaborn',
'seaborn-bright',
'seaborn-colorblind',
'seaborn-dark',
'seaborn-dark-palette',
'seaborn-darkgrid',
'seaborn-deep',
'seaborn-muted',
'seaborn-notebook',
'seaborn-paper',
'seaborn-pastel',
'seaborn-poster',
'seaborn-talk',
'seaborn-ticks',
'seaborn-white',
'seaborn-whitegrid',
'tableau-colorblind10']
##plotting IPR
plt.figure(figsize = (9,6))
plt.plot(flowrates,pwf,c = "red",linewidth=3)
plt.xlabel("Flowrate(stb/day)")
plt.ylabel("pwf(psia)")
plt.grid(True)
plt.title("Vogel's IPR for Saturated Reservoir")
In [65]: vogelipr()
2. Pressure Profile
In [66]: def pressureprof():
re = float(input('Outer radius of Reservoir(ft): '))
rw = float(input('We4llbore Radius(ft): '))
Pwf = float(input('Bottomhole Pressure(PSI): '))
h = float(input('Net Pay Thickness(ft): '))
k = float(input('Average Reservoir Permeability(mD): '))
q = float(input('Flowrate(STB/Day): '))
mu = float(input('Oil Viscosity: '))
B = 1
r = np.linspace(rw,re,500)
Pressure = []
for i in range(len(r)):
P = Pwf + (141.2*q*mu*B*(np.log(r[i]/rw))/k/h)
Pressure.append(P)
plt.figure(figsize = [8,6])
plt.plot(r,Pressure)
plt.xlabel('r(ft)')
plt.ylabel('P(r), Psi')
plt.title('Reservoir Pressure Profile')
plt.grid(True)
In [67]: pressureprof()
re = 3000
rw = 0.5
r = np.linspace(rw,re,500)
pe = 4000
B = 1
h = 30 #ft
P = pe - (141.2*q*mu*B*(np.log(re/r))/k/h)
y_min = P[np.where(r==rw)]
plt.plot(r,P,linewidth=4)
plt.axhline(y_min,linewidth=3,color='red')
plt.ylim(0,5000)
plt.xlabel('r(ft)')
plt.ylabel('P(r), Psi')
plt.grid(True)
return r,P
In [71]: display(w)
Pandas
Ms Excel of Python but powerful This library helps us import | create | work with data in the form of tables.
Out[72]:
phi perm lith
1 0.40 20 shale
Out[73]:
phi perm lith Saturation
Out[75]:
DATEPRD NPD_WELL_BORE_CODE NPD_WELL_BORE_NAME ON_STREAM_HRS AVG_DOWNHOLE_
In [76]: volve.head()
Out[76]:
DATEPRD NPD_WELL_BORE_CODE NPD_WELL_BORE_NAME ON_STREAM_HRS AVG_DOWNHOLE_PRE
In [77]: #shape
volve.shape
In [81]: volve_pf12.head()
Out[81]:
DATEPRD NPD_WELL_BORE_CODE NPD_WELL_BORE_NAME ON_STREAM_HRS AVG_DOWNHOLE_P
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3056 entries, 1911 to 4966
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 DATEPRD 3056 non-null object
1 NPD_WELL_BORE_CODE 3056 non-null int64
2 NPD_WELL_BORE_NAME 3056 non-null object
3 ON_STREAM_HRS 3056 non-null float64
4 AVG_DOWNHOLE_PRESSURE 3050 non-null float64
5 AVG_DOWNHOLE_TEMPERATURE 3050 non-null float64
6 AVG_DP_TUBING 3050 non-null float64
7 AVG_ANNULUS_PRESS 3043 non-null float64
8 AVG_CHOKE_SIZE_P 3012 non-null float64
9 AVG_CHOKE_UOM 3056 non-null object
10 AVG_WHP_P 3056 non-null float64
11 AVG_WHT_P 3056 non-null float64
12 DP_CHOKE_SIZE 3056 non-null float64
13 BORE_OIL_VOL 3056 non-null float64
14 BORE_GAS_VOL 3056 non-null float64
15 BORE_WAT_VOL 3056 non-null float64
16 BORE_WI_VOL 0 non-null float64
17 FLOW_KIND 3056 non-null object
18 WELL_TYPE 3056 non-null object
dtypes: float64(13), int64(1), object(5)
memory usage: 477.5+ KB
In [83]: volve_pf12.set_index(pd.to_datetime(volve_pf12['DATEPRD']),inplace = Tr
ue)
In [84]: volve_pf12.head()
Out[84]:
DATEPRD NPD_WELL_BORE_CODE NPD_WELL_BORE_NAME ON_STREAM_HRS
DATEPRD
Out[85]: <AxesSubplot:ylabel='DATEPRD'>
In [86]: #Accessing a column
volve_pf12['AVG_DOWNHOLE_PRESSURE']
Out[86]: DATEPRD
2008-02-12 308.056
2008-02-13 303.034
2008-02-14 295.586
2008-02-15 297.663
2008-02-16 295.936
...
2016-09-13 0.000
2016-09-14 0.000
2016-09-15 0.000
2016-09-16 0.000
2016-09-17 0.000
Name: AVG_DOWNHOLE_PRESSURE, Length: 3056, dtype: float64
In [87]: volve_pf12[['AVG_DOWNHOLE_PRESSURE']]
Out[87]:
AVG_DOWNHOLE_PRESSURE
DATEPRD
2008-02-12 308.056
2008-02-13 303.034
2008-02-14 295.586
2008-02-15 297.663
2008-02-16 295.936
... ...
2016-09-13 0.000
2016-09-14 0.000
2016-09-15 0.000
2016-09-16 0.000
2016-09-17 0.000
In [88]: a =volve_pf12[['AVG_DOWNHOLE_PRESSURE','BORE_OIL_VOL']]
In [89]: a
Out[89]:
AVG_DOWNHOLE_PRESSURE BORE_OIL_VOL
DATEPRD
In [92]: volve_pf12['AVG_DOWNHOLE_PRESSURE']['2008-02-14']
Out[92]: 295.586
In [95]: #Plotting the values with value of date on x axis
#inbuilt plot function of pandas
volve_pf12.plot(figsize = (12,10),subplots= True)
Thank You