Machine Learning Guide for Oil and Gas Using Python Hoss Belyadipdf download
Machine Learning Guide for Oil and Gas Using Python Hoss Belyadipdf download
or textbooks at https://ebookmass.com
_____ Follow the link below to get your download now _____
https://ebookmass.com/product/machine-learning-guide-for-
oil-and-gas-using-python-hoss-belyadi/
https://ebookmass.com/product/machine-learning-on-geographical-data-
using-python-1st-edition-joos-korstanje/
https://ebookmass.com/product/machine-learning-for-time-series-
forecasting-with-python-francesca-lazzeri/
Seismic Imaging Methods and Applications for Oil and Gas
Exploration Yasir Bashir
https://ebookmass.com/product/seismic-imaging-methods-and-
applications-for-oil-and-gas-exploration-yasir-bashir/
https://ebookmass.com/product/time-series-algorithms-recipes-
implement-machine-learning-and-deep-learning-techniques-with-python-
akshay-r-kulkarni/
Subsea Valves and Actuators for the Oil and Gas Industry
Karan Sotoodeh
https://ebookmass.com/product/subsea-valves-and-actuators-for-the-oil-
and-gas-industry-karan-sotoodeh/
Machine Learning Guide for Oil
and Gas Using Python
Hoss Belyadi
Obsertelligence, LLC
Alireza Haghighat
IHS Markit
Table of Contents
Cover image
Title page
Copyright
Biography
Acknowledgment
Introduction
Artificial intelligence
Data mining
Machine learning
Anaconda introduction
Anaconda installation
Jupyter Notebook interface options
Creating a string
Defining a list
Creating a dictionary
Creating a tuple
Creating a set
If statements
For loop
Nested loops
List comprehension
Defining a function
Introduction to pandas
Conditional selection
Pandas groupby
Pandas joining
Pandas operation
Dropping NAs
Filling NAs
Numpy introduction
Data visualization
Introduction
Dimensionality reduction
Chapter 4. Unsupervised machine learning: clustering algorithms
K-means clustering
Hierarchical clustering
Outlier detection
Overview
Linear regression
Logistic regression
K-nearest neighbor
Decision tree
Random forest
Backpropagation technique
Data partitioning
Deep learning
Convolution
Activation function
Pooling layer
Cross-validation
Save-load models
Fuzzy set
Genetic algorithm
Notices
Knowledge and best practice in this field are constantly
changing. As new research and experience broaden our
understanding, changes in research methods, professional
practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their
own experience and knowledge in evaluating and using
any information, methods, compounds, or experiments
described herein. In using such information or methods
they should be mindful of their own safety and the safety
of others, including parties for whom they have a
professional responsibility.
ISBN: 978-0-12-821929-4
Keywords
Anaconda installation; Artificial Intelligence; Data mining;
Jupyter Notebook; Machine learning; Numpy library; Pandas
library; Python
Introduction
Artificial Intelligence (AI) and machine learning (ML) have grown in
popularity throughout various industries. Corporations, universities,
government, and research groups have noticed the true potential of
various applications of AI and ML to automate various processes
while increasing predicting capabilities. The potential of AI and ML
is a remarkable game changer in various industries. The
technological AI advancements of self-driving cars, fraud detection,
speech recognition, spam filtering, Amazon and Facebook's product
and content recommendations, etc., have generated massive
amounts of net asset value for various corporations. The energy
industry is at the beginning phase of applying AI to different
applications. The rise in popularity in the energy industry is due to
new technologies such as sensors and high-performance computing
services (e.g., Apache Hadoop, NoSQL, etc.) that enable big data
acquisition and storage in different fields of study. Big data refers to
a quantity of data that is too large to be handled (i.e., gathered,
stored, and analyzed) using common tools and techniques, e.g.,
terabytes of data. The number of publications in this domain has
exponentially increased over the past few years. A quick search on
the number of publications in the oil and gas industry with Society
of Petroleum Engineer's OnePetro or American Association of
Petroleum Geologists (AAPG) in the past few years attests to this
fact. As more companies realize the value added through
incorporating AI into daily operations, more creative ideas will
foster. The intent of this book is to provide a step-by-step, easy-to-
follow workflow on various applications of AI within the energy
industry using Python, a free open source programming language.
As one continues through this book, one will notice the incredible
work that the Python community has accomplished by providing
various libraries to perform ML algorithms easily and efficiently.
Therefore, our main goal is to share our knowledge of various ML
applications within the energy industry with this step-by-step guide.
Whether you are new to data science/programming language or at
an advanced level, this book is written in a manner suitable for
anyone. We will use many examples throughout the book that can be
followed using Python. The primary user interface that we will use
in this book is “Jupyter Notebook” and the download process of
Anaconda package is explained in detail in the following sections.
Artificial intelligence
Terminologies such as AI, ML, big data, and data mining are used
interchangeably across different organizations. Therefore, it is
crucial to understand the true meaning of each terminology before
diving deeper into various applications. AI is simply the use of
machine or computer intelligence rather than human or animal
intelligence. It is a branch of computer science that studies the
simulation of human intelligence processes such as learning,
reasoning, problem-solving, and self-correction by computers.
Creating intelligent machines that work, react, and mimic cognitive
functions of humans is the primary goal of AI. Examples of AI
include email classification (categorization), smart personal
assistants such as Siri, Alexa, and Google, automated respondents,
process automation, security surveillance, fraud detection and
prevention, pattern and image recognition, product recommendation
and purchase prediction, smart searches, sales, volumes, and
business forecasting, advertisement targeting, news feed
personalization, terrorist activity detection, self-driving cars, health
diagnostics, mortgage default prediction, house pricing prediction,
robo-advisors (automated portfolio manager), and virtual travel
assistant. As shown, the field of AI is only growing with
extraordinary potential for decades to come. In addition, the
demand for data science jobs has also exponentially grown in the
past few years where companies search desperately for computer
scientists, mathematicians, data scientists, and engineers that have
postgraduate and preferably PhD degrees from accredited
universities.
Data mining
Data mining is a terminology used in computer science and is
defined as the process of extracting specific information from a
database that was hidden and not explicitly available for the user,
using a set of different techniques such as ML. It is also called
knowledge discovery in databases (KDD). Teaching someone how
to play basketball is ML; however, using someone to find the best
basketball centers is data mining. Data mining is used by ML
algorithms to find links between various linear and nonlinear
relationships. Data mining is often used to help collect data on
various aspects of the business such as nonproductive time, sales
trend, production key performance indicators, drilling data,
completions data, stock market key indicators and information, etc.
Data mining can also be used to go through websites, online
platforms, and social media to collect and compile information
(Belyadi et al., 2019).
Machine learning
ML is a subset of AI. It is defined as the collection of using various
algorithms to teach computers to find patterns in data to be used for
future prediction and forecasting or as a quality check for
performance optimization. ML provides computers the ability to
learn without being explicitly programmed. Some of the patterns
may be hidden and therefore, finding those hidden patterns can add
significant shareholder value to any organization. Please note that
data mining deals with searching specific information while ML
focuses on performing a certain task. In Chapter 2 of this book,
various types of ML algorithms will be discussed. Also note that
deep learning is a subset of machine learning in which multi-layer
neural networks are used for various purposes including but not
limited to image and facial recognition, time series forecasting,
autonomous cars, language translation, etc. Examples of deep
learning algorithms are convolution neural network (CNN) and
recurrent neural network (RNN) that will be discussed with various
O&G applications in Chapter 6.
Anaconda introduction
It is highly recommended to download Anaconda, the standard
platform for Python data science which includes many of the
necessary libraries with its installation. Most libraries used in this
book are already preinstalled with Anaconda, so they don't need to
be downloaded individually. The libraries that are not preinstalled
in Anaconda will be mentioned throughout the chapters.
Anaconda installation
To install Anaconda, go on Anaconda's website
(www.anaconda.com) and click on “Get Started.” Afterward, click
on “Download Anaconda Installers” and download the latest
version of Anaconda either using Windows or Mac. Anaconda
distribution will have over 250 packages some of which will be used
throughout this book. If you do not download Anaconda, most
libraries must be installed separately using the command prompt
window. Therefore, it is highly advisable to download Anaconda to
avoid downloading majority of the libraries that will be used in this
book. Please note that while majority of the libraries will be installed
by installing Anaconda, there will be some libraries where they
would have to separately get installed using the command prompt
or Anaconda prompt window. For those libraries that have not been
preinstalled, simply open “Anaconda prompt” from the “start”
menu, and type in “pip install (library name)” where “library name”
is the name of the library that would like to be installed. Once the
Anaconda has been successfully installed, search for “Jupyter
Notebook” under start menu. Jupyter Notebook is a web-based,
interactive computing notebook environment. Jupyter Notebook
loads quickly, is user-friendly, and will be used throughout this
book. There are other user interfaces such as Spyder, JupyterLab, etc.
Fig. 1.1 shows the Jupyter Notebook's window after opening. Simply
go into “Desktop” and create a folder called “ML Using Python.”
Afterward, go to the created folder (“ML Using Python“) and click
on “New” on the top right-hand corner as illustrated in Fig. 1.2.
You now have officially launched a new Jupyter Notebook and are
ready to start coding as shown in Fig. 1.3.
Displayed in Fig. 1.4, the top left-hand corner indicates the
Notebook is “Untitled.” Simply click on “Untitled” and name the
Jupyter Notebook “Python Fundamentals.”
FIGURE 1.1 Jupyter Notebook window.
Pressing “shift + tab” two, three, and four times will keep
expanding the argument window until it occupies half of the
page.
The remainder of a division can also be found using “%” sign. For
example, remainder of 13 divided by 2 is 1.
13%2
Creating a string
To create a string, single or double quotes can be used.
x='I love Python'
Python output='I love Python'
x=“I love Python”
Python output='I love Python'
To index a string, bracket ([]) notation along with the element
number can be used. It is crucial to remember that indexing in
Python starts with 0. Let's assume that variable name y is defined as
a string “Oil_Gas.” Y[0] means the first element in Oil_Gas, while a
y[5] means the sixth element in Oil_Gas since indexing starts with 0.
y="Oil_Gas"
y[0]
Python output='O'
y[5]
Python output='a'
Defining a list
A list can be defined as follows:
list=['Land','Geology','Drilling']
list
Python output=['Land', 'Geology', 'Drilling']
list.append('Frac')
list
Python output=['Land', 'Geology', 'Drilling', 'Frac']
To replace more elements and keeping the last element of 100, the
following lines can be used:
list[0]='Reservoir_Engineer'
list[1]='Data_Engineer'
list[2]='Data_Scientist'
list[3]='Data Enthusiast'
list
Creating a dictionary
Thus far, we have covered strings and lists and the next section is to
talk about dictionary. When using dictionary, wiggly brackets ({})
are used. Below, a dictionary was created and named “a” for various
ML models and their respective scores.
a={'ML_Models':['ANN','SVM','RF','GB','XGB'],'Score':
[90,85,95,90,100]}
a
Python output={'ML_Models': ['ANN', 'SVM', 'RF', 'GB', 'XGB'],
'Score': [90, 85, 95, 90, 100]}
Creating a tuple
As opposed to lists that use brackets, tuples use parentheses to
define a sequence of elements. One of the advantages of using a list
is that items can be assigned; however, tuples do not support item
assignments which means they are immutable. For instance, let's
create a list and replace one of its elements and examine the same
concept with tuples. As shown below, a list with 4 elements of 100,
200, 300, and 400 was created. The first index of 100 was replaced
with “New” and the new list is as follows:
list=[100,200,300,400]
list[0]='New'
list
Python output=['New', 200, 300, 400]
Creating a set
Set is defined by unique elements which means defining the same
numbers multiple times will only return the unique numbers and
will not show the repetitive numbers. The wiggly brackets ({}) can be
used to generate a set as follows. As displayed, the generated output
only has 100,200,300 since each number was repeated twice.
set={100,200,300,100,200,300}
set
Python output={100, 200, 300}
If statements
If statements are perhaps one of the most important concepts in any
programming language. Let's start with a simple example and define
if 100 is equal to 200, print good job, otherwise, print not good. Make
sure the print statements following “if 100 = = 200:” and “else:”
are indented, otherwise, an error will be received. The “tab”
keyword can be used to indent in Jupyter Notebook. Please note that
indenting in Python means 4 spaces.
if 100= =200:
print('Good Job!')
else:
print('Not Good!')
Python output=Not Good!
is if Z < Y to print “SO SO” which is again not the case and
therefore, the term “else” is used to define all other cases, and the
output would be “BAD.”
X=100
Y=200
Z=300
if X>Y:
print('Good')
elif Z<Y:
print('SO SO')
else:
print('BAD')
Python output=BAD
A=X+Y
elif Z<Y:
B=X+Y+Z
else:
C=2∗(X+Y+Z)
C
Python output=1200
For loop
For loop is another very useful tool in any programming language
and allows for iterating through a sequence. Let's define i to be a
range between 0 and 5 (excluding 5). A for loop is then written to
result in writing 0 to 4. As shown below, “for x in i” is the same as
“for x in range(0,5)”
i=range(0,5)
for x in i:
print(x)
Python output=
0
1
2
3
4
Nested loops
A nested loop refers to a loop inside another loop. In various ML
optimization programs such as grid search (which will be
discussed), nested loops are very common to optimize the ML
hyperparameters. Below is a simple example of a nested loop:
Performance=[“Low_Performing”, “Medium_Performing”,
'High_Performing']
Drilling_Crews=[“Drilling_Crew_1”, “Drilling_Crew_2”,
“Drilling_Crew_3”]
for x in Performance:
for y in Drilling_Crews:
print(x, y)
Python output=Low_Performing Drilling_Crew_1
Low_Performing Drilling_Crew_2
Low_Performing Drilling_Crew_3
Medium_Performing Drilling_Crew_1
Medium_Performing Drilling_Crew_2
Medium_Performing Drilling_Crew_3
High_Performing Drilling_Crew_1
High_Performing Drilling_Crew_2
High_Performing Drilling_Crew_3
Let's try another for loop example except for creating an empty list
first, followed by performing basic algebra on a list of numbers as
follows:
i=[10,20,30,40]
out=[] for x in i:
out.append(x∗∗2+200)
print(out)
List comprehension
List comprehension is another powerful way of performing
calculations quickly. The calculations listed above could have been
simplified in the following list comprehension:
[x∗∗2+200 for x in i]
Python output=[300, 600, 1100, 1800]
Defining a function
The next concept in Python is defining a function. A function can be
defined by using “def” followed by “return” to perform various
mathematical equations:
def linear_function(n):
return 2∗n+20
linear_function(20)
Python output=60
As shown above, first use the syntax “def” to define any name that
is desirable. Afterward, return the equation that is desirable. Finally,
call the defined name followed by the number that is desired to use
to run the calculations.
Below is another example:
def Turner_rate(x):
return x∗∗2+50
Turner_rate(20)
Introduction to pandas
Pandas is one of the most famous libraries in Python, and it is
essentially designed to replicate the excel sheet formats in Python.
The primary role of pandas is data manipulation and analysis, and it
is heavily used for data preprocessing before implementing ML
models. Building various ML models becomes much easier after
learning the fundamentals of pandas and numpy (which will be
discussed next) libraries. To start off, let's create a dictionary and
covert that dictionary into a pandas table format as follows:
dictionary={'Column_1':[10,20,30],'Columns_2':
[40,50,60],'Column_3':[70,80,90]}
dictionary
Python output={'Column_1': [10, 20, 30], 'Columns_2': [40, 50,
60], 'Column_3': [70, 80, 90]}
ebookmasss.com