Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
25 views

Learn Autonomous Programming with Python Utilize Python’s capabilities in artificial intelligence, machine learning, deep... (P Divadkar, Varun) (Z-Library)

The document is a comprehensive guide titled 'Learn Autonomous Programming with Python,' focusing on leveraging Python for artificial intelligence, machine learning, deep learning, and robotic process automation. It includes detailed chapters on various topics such as web scraping, automating Excel, and intelligent automation, providing practical use cases and exercises. The author, Varun P Divadkar, draws on over 12 years of experience in the tech industry to offer insights and knowledge to both students and professionals looking to enhance their Python programming skills.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Learn Autonomous Programming with Python Utilize Python’s capabilities in artificial intelligence, machine learning, deep... (P Divadkar, Varun) (Z-Library)

The document is a comprehensive guide titled 'Learn Autonomous Programming with Python,' focusing on leveraging Python for artificial intelligence, machine learning, deep learning, and robotic process automation. It includes detailed chapters on various topics such as web scraping, automating Excel, and intelligent automation, providing practical use cases and exercises. The author, Varun P Divadkar, draws on over 12 years of experience in the tech industry to offer insights and knowledge to both students and professionals looking to enhance their Python programming skills.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 435

Learn Autonomous

Programming with
Python
Utilize Python’s capabilities in
artificial intelligence, machine
learning, deep learning and
robotic process automation

Varun P Divadkar

www.bpbonline.com
First Edition 2024

Copyright © BPB Publications, India

ISBN: 978-93-55517-630

All Rights Reserved. No part of this publication may be


reproduced, distributed or transmitted in any form or by any
means or stored in a database or retrieval system, without
the prior written permission of the publisher with the
exception to the program listings which may be entered,
stored and executed in a computer system, but they can not
be reproduced by the means of publication, photocopy,
recording, or by any electronic and mechanical means.

LIMITS OF LIABILITY AND DISCLAIMER OF WARRANTY

The information contained in this book is true to correct and


the best of author’s and publisher’s knowledge. The author
has made every effort to ensure the accuracy of these
publications, but publisher cannot be held responsible for
any loss or damage arising from any information in this
book.

All trademarks referred to in the book are acknowledged as


properties of their respective owners but BPB Publications
cannot guarantee the accuracy of this information.
www.bpbonline.com
Dedicated to
Every earnest reader of this book
About the Authors

In a diverse career spanning 12+ years, across six


multinational organizations, Varun P Divadkar holds the
distinction of having contributed to a variety of global
techno-functional engagements with his delivery expertise
in Python programming applications towards cutting edge
domains like Artificial Intelligence, Machine Learning, Deep
Learning and Robotic Process Automation. Being a qualified
Mechanical Engineer with a gold medal from MIT has
endowed him with uniquely acquired logical reasoning
abilities and quantitative aptitude. This has further enabled
him to grasp key mathematical and technical concepts with
a steep learning curve and thereby mentor students and
professionals in their journey towards exploration of these
cutting edge domains.
He began his career with the prestigious Tata group within
the manufacturing domain in the engineering services
sector and was involved in automation of engineering
designs. Thereafter, he had a stint of 5+ years with financial
consulting giants Ernst & Young and Deloitte, where he had
the privilege to work out of the global financial epicenter of
Wall Street, Manhattan, New York and the culturally
significant hub of Los Angeles, California. Within India, he
has worked across major commercial centers like Mumbai,
Bengaluru, Chennai, Hyderabad and Pune, thus adding to
his diverse expanse of professional network. Of recent, he
has been with three banking organizations and currently
leads the Machine Learning and Generative AI initiative at a
large global commercial bank as the Vice President.
About the Reviewer

Jyant, a passionate Data Enthusiast with a Postgraduate


degree in Artificial Intelligence from Great Lakes, embarked
on his IT journey with Accenture, where he honed his skills
in SAP BODS, SQL, PYTHON, and Excel, catering to US-based
retail clients. Later, at Impact Analytics, he delved into a
diverse tech stack, including R, Python, SQL, AWS, GCP,
GUROBI, and more, while refining his interpersonal skills.
During his tenure at Sapient, Jyant led teams in delivering a
spectrum of products, from price optimization to movie
recommendations and Credit Scoring Models, employing a
versatile toolkit encompassing Python, R, Advance SQL,
Superset, Airflow, Git, Azure, Pyspark, and others.
Currently, as a leader at Zscaler, Jyant spearheads the
development of two pivotal models: an attrition model and
an innovative chatbot powered by AWS LEX. His expertise
extends to LLm, Rag, and ReAct Prompt integration,
alongside utilizing SQLAlchemy to enhance the chatbot's
capabilities.
Jyant excels in tackling diverse challenges, is adept at
creating Regression models for promotion optimization,
utilizing Seaborn and Matplotlib for extensive EDA, and
applying CNN and RESNET50 for image attribute tagging. He
is also experienced at employing NLP for data extraction,
developing Movie Recommendation Engines with Lightfm
Neural Network models, and crafting intricate Price
Optimization and Credit Scoring Models using bank and
retail data.
Acknowledgement

I was extremely thrilled when BPB publications reached out


to me with the opportunity to write this wonderful book. I
would want to thank all the reviewers, technical experts,
and editors of BPB publications for their valuable support
and guidance throughout this journey and making it a
worthwhile experience.
I would want to acknowledge the generosity and consistent
effort of the worldwide open source community of data
science for providing a plethora of libraries and chunks of
free reading material online that enables one to levergage
their salient learnings and incorporate valuable inputs.
Last but not least, I would wish to thank all the interested
readers who have instilled in me, the motivation and
inspiration leading to the making of this book.
Preface

The contemporary industry has witnessed an


unprecedented growth of technology characterized by the
advent of big data, Internet of Things (IoT), Artificial
Intelligence, Machine Learning, Deep Learning and Robotic
Process Automation. This has led to an inevitable need for
professionals to upskill themselves in order to keep
themselves relevant with this moving industry trend.
This book has been designed with the sole purpose of
providing a launching pad to students and professionals for
breaking into these domains by leveraging the salient
powers of Python.
As one progresses through this book, one shall embark on
an exciting learning curve to get oneself familiarized with
preliminary domain specific concepts before picking up the
Python editor to write code. The initial chapter would be a
refresher that demonstrates the importance of Python in the
current scenario and the chapters to follow would
concentrate on individual topics. Every chapter would
observe a consistent sequence of an initial overview of
fundamental concepts followed by corresponding Python
exercises.Wherever necessary, detailed guidance has also
been provided for setup and installation of required
software.
Each chapter would end with an industry relevant use case
in Python. These use cases are golden exercises in brushing
up ones potential to add value and have been designed
keeping in mind the demands of today’s industry and the
salient skills and abilities required to fulfil them.
This book is intended for both students and professionals
who want to take their ability in Python programming to the
next level by providing industry relevant value addition. This
book should also be useful as an optimum handbook for
quick reference of basic concepts of specific domains
discussed within the book.
This book shall definitely get one enthused in the arena of
Python automation and channelize ones thought process in
the appropriate direction for continuing to build value added
applications in Python.
Chapter 1: Why Python for Automation? – This chapter
provides an overview of why Python is the chosen favorite
programming language when it comes to building value
added applications in popular domains like Artificial
Intelligence, Machine Learning, Deep Learning. It shall
discuss the inherent advantages associated with Python as
an open source language and then highlight frequently used
Python libraries.
Chapter 2: RPA Foundations – This chapter introduces
the reader to fundamental concepts of Robotic Process
Automation (RPA) and discusses the various components of
RPA. Thereafter, an overview of industry leaders in RPA like
UiPath, Automation Anywhere and Blueprism has been
provided. The chapter shall introduce the user to the Python
‘rpa’ library and end with an interesting practical use case
with it.
Chapter 3: Getting Started with AI/ML in Python – This
chapter gets the reader kickstarted with fundamental
concepts in Artificial Intelligence, Machine Learning and
Deep Learning. Classical machine learning algorithms have
been discussed followed by an introduction to deep
learning, neural networks, types of neural networks and
their applications like Natural Language Processing.
Chapter 4: Automating Web Scraping – This chapter
familiarizes the reader with the process of extracting data
from the web using Python libraries ‘requests’ and ‘beautiful
soup.’ Data being a central component of all applications,
this chapter presents a useful approach to leverage the web
to access data.
Chapter 5: Automating Excel and Spreadsheets – This
chapter revisits the arena of Excel automation by
highlighting the central role that Python plays by speeding
up the process. The libraries ‘openpyxl’, ‘xlwings’ and
xlsxWriter’ shall be discussed individually followed by a
practical use case in Python which shall illustrate the beauty
of collectively utilizing them in combination.
Chapter 6: Automating Emails and Messaging – This
chapter enters the foray of email and social media
messaging by demonstrating the powers of Python libraries
in sending swift messages through Gmail and Whatsapp.
Chapter 7: Working with PDFs and Images – This
chapter expands the capabilities of Python by showcasing
the ability to read data from PDF documents. The chapter
further discusses the concept of Optical Character
Recognition (OCR) and illustrates how ‘pytesseract’ and
‘OpenCV’ enable one to achieve it in practice by reading
data from images.
Chapter 8: Mechanizing Applications, Folders and
Actions – This chapter introduces the ‘os’ and ‘shutil’
modules in Python that enable one to automate the process
of reading, writing and moving files and folders and is
followed by the ‘PyAutoGUI’ and ‘PyWinAuto’ libraries that
are used to automate mouse operations on a computer.
Chapter 9: Intelligent Automation Part 1: Using
Machine Learning – This chapter consolidates into the
idea of machine learning by taking a deep dive into various
machine learning algorithms and demonstrates their smooth
implementation using Python libraries. This chapter covers
both supervised and unsupervised learning methods and
ends with a practical use case in Python that makes the user
conversant with the process of independently building
machine learning Python applications.
Chapter 10: Intelligent Automation Part 2: Using
Deep Learning – This chapter revisits the concept of a
neural network and implements a basic neural network from
scratch in Python. The concept of back propagation has
been illustrated using the perceptron. Thereafter the
libraries ‘TensorFlow’ and ‘PyTorch’ have been discussed
followed by a discussion on Natural Language Processing
(NLP). An interesting practical use case in Python concludes
the chapter.
Chapter 11: Automating Business Process Workflows
– This chapter begins with an introduction to the concept of
orchestration which is basically workflow automation or can
also be thought of as the automation of automations.
Thereafter the ‘luigi’ and ‘prefect’ modules shall be
discussed that enable one to achieve orchestration.
Chapter 12: Hyperautomation – This chapter discusses
the novel concept of hyperautomation which is basically the
application of Machine Learning, RPA and other AI tools in
tandem with regular automation. This chapter shall revisit
the topics of NLP and RPA from a different perspective.
Chapter 13: Python and UiPath – This chapter builds on
the basic concepts of RPA tool UiPath to illustrate the salient
role that Python plays in enhancing its capabilities. This
chapter shall be a detailed walkthrough of the setup and
environment required to achieve UiPath integration with
Python.
Chapter 14: Architecting Automation Projects – This
chapter gives an overview of the various components that
need to be taken care of while architecting an automation
project. The concept of virtual environment and the ‘pip’
command have been discussed in detail in this chapter.
Chapter 15: The PyScript Framework – This chapter
introduces the reader to an ongoing project in Python called
the ‘PyScript’ framework which essentially enhances the
potential of an HTML script by allowing the presence of a
Python code within it. This chapter shall be exciting for
developers who have background in Javascript and shall
provide them with a functionality to experiment with a
different perspective.
Chapter 16: Test Automation in Python – This chapter
concludes the book by discussing a topic that also usually
comes at the end of the development cycle, which is
testing. The chapter elaborates on ‘Selenium’, ‘PyTest’ and
the Robot FrameworK that enables smooth automation of
the testing process.
Code Bundle and Coloured
Images
Please follow the link to download the
Code Bundle and the Coloured Images of the book:

https://rebrand.ly/nh3wzsj
The code bundle for the book is also hosted on GitHub at
https://github.com/bpbpublications/Learn-Autonomous-
Programming-with-Python.
In case there’s an update to the code, it will be updated on
the existing GitHub repository.
We have code bundles from our rich catalogue of books and
videos available at https://github.com/bpbpublications.
Check them out!

Errata
We take immense pride in our work at BPB Publications and
follow best practices to ensure the accuracy of our content
to provide with an indulging reading experience to our
subscribers. Our readers are our mirrors, and we use their
inputs to reflect and improve upon human errors, if any, that
may have occurred during the publishing processes
involved. To let us maintain the quality and help us reach
out to any readers who might be having difficulties due to
any unforeseen errors, please write to us at :
errata@bpbonline.com
Your support, suggestions and feedbacks are highly
appreciated by the BPB Publications’ Family.
Did you know that BPB offers eBook versions of
every book published, with PDF and ePub files
available? You can upgrade to the eBook version at
www.bpbonline.com and as a print book customer,
you are entitled to a discount on the eBook copy.
Get in touch with us at :
business@bpbonline.com for more details.
At www.bpbonline.com, you can also read a
collection of free technical articles, sign up for a
range of free newsletters, and receive exclusive
discounts and offers on BPB books and eBooks.

Piracy
If you come across any illegal copies of our works in
any form on the internet, we would be grateful if you
would provide us with the location address or website
name. Please contact us at
business@bpbonline.com with a link to the
material.

If you are interested in


becoming an author
If there is a topic that you have expertise in, and you
are interested in either writing or contributing to a
book, please visit www.bpbonline.com. We have
worked with thousands of developers and tech
professionals, just like you, to help them share their
insights with the global tech community. You can make
a general application, apply for a specific hot topic
that we are recruiting an author for, or submit your
own idea.

Reviews
Please leave a review. Once you have read and used
this book, why not leave a review on the site that you
purchased it from? Potential readers can then see and
use your unbiased opinion to make purchase
decisions. We at BPB can understand what you think
about our products, and our authors can see your
feedback on their book. Thank you!
For more information about BPB, please visit
www.bpbonline.com.

Join our book’s Discord space


Join the book’s Discord Workspace for Latest updates,
Offers, Tech happenings around the world, New Release and
Sessions with the Authors:
https://discord.bpbonline.com
Table of Contents

1. Why Python for Automation?


Introduction
Structure
Objectives
Python as an open-source language
Python’s repository of extensive libraries
Python as a high-level language
Portability aspect of Python
Salient advantages of Python
Conclusion

2. RPA Foundations
Introduction
Structure
Objectives
History of Robotic Process Automation
What is RPA
Components of RPA
Various RPA tools in the market
Comparison between various RPA tools
RPA Python package
Practical use case of RPA with Python
Conclusion

3. Getting Started with AI/ML in Python


Introduction
Structure
Objectives
Background and history of AI
Machine learning concepts
Supervised and unsupervised learning
Popular Python Libraries for ML
Reinforcement learning
Deep learning
Introduction to neural networks
Types of neural networks
Natural language processing
Transformers and large language models
Conclusion

4. Automating Web Scraping


Introduction
Structure
Objectives
What is web scraping
Popular Python libraries for web scraping
The requests module in Python
The Beautiful Soup Library
Inspecting the web page
Extracting information from the Web Page
Legal considerations of web scraping
Practical use case in Python
Conclusion

5. Automating Excel and Spreadsheets


Introduction
Structure
Objectives
Need for automating Excel using Python
Introduction to openpyxl library
Open and modify an existing workbook
Access a cell using Range name
Merging cells
Looping through cells
Working with Excel formulae using openpyxl
Create charts using openpyxl
Styling a chart
Other Python libraries for Excel automation
Comparison summary of Python libraries
Practical use case in Python
Conclusion

6. Automating Emails and Messaging


Introduction
Structure
Objectives
Prerequisites for Gmail automation
Turning on 2-step verification for Gmail
Getting app password
Sending a Gmail message using Python
Automating WhatsApp messaging
Practical use case in Python
Conclusion

7. Working with PDFs and Images


Introduction
Structure
Objectives
PyPDF library
Read a PDF file using PyPDF2
Rotate and merge PDF files
Working with images using the PIL library
Optical character recognition
Working with OpenCV
Practical use case in Python
Conclusion

8. Mechanizing Applications, Folders and Actions


Introduction
Structure
Objectives
The os module in Python
The shutil module in Python
Copy and move a file using shutil
Move files based on extension using shutil
Using the PyAutoGUI Library
Implementing basic mouse functions using PyAutoGUI
Implementing basic keyboard functions using
PyAutoGUI
Exploring message box functions using PyAutoGUI
Practical use case in Python
Conclusion

9. Intelligent Automation Part 1: Using Machine


Learning
Introduction
Structure
Objectives
Implementing supervised machine learning algorithms
using Python
Linear regression
Key concepts in Machine Learning models
Logistic regression
K nearest neighbors
Naïve Bayes
Support vector machines
Decision trees
Implementing unsupervised learning algorithms using
Python
Dimensionality reduction
Principal component analysis
Linear discriminant analysis
K means clustering
Practical use case in Python
Conclusion

10. Intelligent Automation Part 2: Using Deep


Learning
Introduction
Structure
Objectives
Implementing a neural network in Python
Backpropagation
Popular Python libraries for deep learning
Deep learning applications
Natural language processing
Practical use case in Python
Conclusion

11. Automating Business Process Workflows


Introduction
Structure
Objectives
Understanding a business process workflow
Introduction to orchestration
Automation versus orchestration: Differences
Orchestration platforms available in market
Achieving orchestration with Python
Prefect
Luigi
Practical use case in Python
Conclusion

12. Hyperautomation
Introduction
Structure
Objectives
Defining hyperautomation: What it is and why it matters
The hyperautomation cycle: Key steps and processes
Exploring typical use cases for hyperautomation
Enhancing document understanding with optical
character recognition
Implementing conversational agents: The role of
chatbots
Advancing efficiency with robotic process automation
Navigating the challenges of hyperautomation
Practical use case in Python
Conclusion

13. Python and UiPath


Introduction
Structure
Objectives
Setting up the Python environment in UiPath
Exploring Python activities in UiPath
Creating the Python script
Integrating Python with UiPath
Conclusion

14. Architecting Automation Projects


Introduction
Structure
Objectives
Introduction to virtual environment
Setting up a virtual environment
Virtual environment directories at a glance
Additional considerations involving a virtual
environment
Python PIP revisited
Performing basic operations using pip
Working with the requirements.txt file
Using Docker for containerization
Conclusion

15. The PyScript Framework


Introduction
Structure
Objectives
Introduction to PyScript
Creating a basic webpage using PyScipt
Adding working Python code to the webpage
Using third party libraries with PyScript
Referencing external Python files in PyScript
Conclusion

16. Test Automation in Python


Introduction
Structure
Objectives
Introduction to Selenium
Setting up the Selenium Python API
Exploring web automation with Selenium Python API
Pytest library
Advantages and limitations of Pytest
Python Robot Framework
Running test cases in the Python Robot Framework
Conclusion

Index
Chapter 1
Why Python for
Automation?

Introduction
This chapter introduces the reader to the significance of
Python as an open-source language and further delves
deeper to explain why it is the most powerful and
formidable tool for autonomous programming. The chapter
emphasizes Python’s inherent flexibility, adaptability, and
user-friendly nature, attributed to its high-level language
and straightforward syntax. It also throws insights into
Python’s flexibility and adaptability considerations and ease
of use as a high-level language with a simple syntax.
Further into the chapter, Python’s rich assortment of
libraries like ‘Pandas’, ‘NumPy’, ‘Matplotlib’ would be
discussed which are pivotal for machine learning. In
subsequent chapters, we will delve deeper into the practical
implementations of these libraries.

Structure
The chapter covers the following topics:
• Python as an open-source language
• Python’s repository of extensive libraries
• Python as a high-level language
• Portability aspect of Python
• Salient advantages of Python

Objectives
At the end of this chapter, you will clearly understand why
Python has been chosen as the language for autonomous
programming despite the availability of numerous other
alternatives from the myriad pool of programming
languages.

Python as an open-source language


Congratulations on choosing the best place to begin your
journey of learning autonomous programming with Python!
As the book’s name implies, the story’s crux is built on the
Python language. This chapter will tell you precisely why
Python has been chosen as an optimum tool to construct
the monument of autonomous programming.
Let us keep Python aside for a moment to understand the
first advantage it has. Imagine you wish to create a short
application that would automatically produce some Excel
reports out of a few raw chunks of Excel data. You consult
your friend, a professional in writing algorithms in C++, JAVA
and Visual Basic. He advises you to use the API of either of
these languages and go ahead with building your
application. As soon as you agree with your friend, the first
roadblock you stumble upon is the realization that this
exercise cannot be easily performed free of cost and that
you must purchase certain licenses for using the respective
platforms. Now in search of hope elsewhere, you research a
bit about Python and realize that Python does not have any
such cost constraint!
And that is the first gigantic advantage of using Python.
Python is an open-source language which means that the
source code is freely available, distributable, and modifiable
by users making it the most preferred language to develop
and maintain code and share it with communities globally.
This is exactly the reason why Python is the most preferred
tool for building numerous libraries which we shall soon
discuss.

Python’s repository of extensive


libraries
Python provides extensive libraries which are basically
reusable pieces of code that data scientists use to further
build their applications. The most popular libraries are listed
below with some short descriptions:
• NumPy: This is one of the most popular libraries of Py
thon that is used to work with large matrices. It is an a
bbreviation for Numerical Python. This library forms
the core of scientific computing as well as constitutes
a building block for other advanced machine learning l
ibraries like TensorFlow and SciPy.
• Pandas: This library is one of the key offerings of Pyt
hon. The pandas DataFrame is the fundamental unit t
hat enables one to work with data in a tabular format.
It allows one to read data from raw sources as a DataF
rame, perform transformations on it using multiple inb
uilt methods and export the data to various other appl
ications.
• Matplotlib: Data visualization has become the need
of the hour to have quick insights around raw data as
well as transformed data. The library matplotlib does
a great job of enabling one to derive valuable insights
out of data by providing one with the ability to plot va
riety of charts like pie charts, histograms, and scatter
plots.
Other Python libraries that are primarily used in data
science, machine learning and deep learning applications
have been summarized below but shall be discussed in
detail in relevant chapters to follow. However, this is not an
exhaustive list and exercises from actual chapters would
have more libraries to discuss.
• TensorFlow: This is an advanced deep learning librar
y used to perform complex computations in Mathemat
ics and Physics.
• Beautiful Soup: This is a popular Python library for w
eb scraping. Web scraping is the process of automatin
g the data gathering process from the web. Beautiful
Soup helps achieve this by enabling data pulling from
HTML and XML files.
• Scrapy: This is an open-source Python library used fo
r web scraping.
• Json: This is a Python package which provides useful t
ools for working with JSON (JavaScript Object Notatio
n) objects.
• SciPy: SciPy is an abbreviation for Scientific Python. It
is built on NumPy and is used for scientific computing.
• Scikit-learn: This is commonly used for implementin
g machine learning algorithms like regression, classifi
cation, and clustering.
• PyPDF: This is used for reading and transforming PDF
files.
• Openpyxl: Python library used for Excel automation.
Pywhatkit: Python library that is used to send Whats
• App messages.
• OpenCV: Python library used for Computer Vision app
lications.

Python as a high-level language


Returning to JAVA and C++, one needs to be conversant
with the syntax and familiar with memory management
considerations to develop applications in these languages.
Such kinds of languages fall into the category of low level or
middle level languages. However, Python tells a different
story. The syntax is extremely simple, so much that calling it
an extended version of pseudo code would not be an
exaggeration!
This is exactly the reason why Python is the preferred tool
for programmers worldwide to deploy open-source projects
in machine learning and deep learning. Python provides that
flexibility and luxury to the developer to focus only on the
algorithm and not worry about any additional brackets or
semicolons apart from the ones required for some standard
data types.
Despite this ease of syntax, Python retains the authority of
being an Object-Oriented Programming (OOP) language
and exhibits all attributes of object-oriented languages like
abstraction, encapsulation, polymorphism, and inheritance.

Portability aspect of Python


Portability of a programming language is the ability to use
the code of that language on different machines or
platforms without requiring modification. For example, if you
write a Python code on a Windows machine, you do not
have to make any changes to it if you would want to run it
on Linux or Mac. That is the beauty of Python as a portable
language. In certain cases, however, care needs to be taken
to avoid platform-specific dependencies or behaviors.
Nevertheless, Python’s emphasis on portability and its vast
standard library makes it easier to write cross-platform code
compared to many other languages.

Salient advantages of Python


Other significant advantages of Python are the following:
• Dynamically typed language: Python is a dynamic
ally typed language, which means that one need not s
pecify the data type of a variable beforehand. The dat
a type of the variable is decided during run time.
• Large community: Python language has a huge dev
eloper community worldwide which is invested in assi
sting and guiding each other in growing this prospect
and initiative. Hence, it is possible to obtain quick sup
port for anything related to Python due to this large co
mmunity.
• Support for multiple programming paradigms: P
ython is one of the languages that supports multiple p
rogramming paradigms viz., imperative, functional, pr
ocedural, and object-oriented, thus adding to the flexi
bility of the user.
• Scalable: Python’s ability to handle large amounts of
data makes it a scalable programming language.
• Embeddable: It is possible to execute Python code fr
om an application coded in another language, thus m
aking it embeddable.
• Extensible: A new functionality can be added to a Py
thon code by writing a code in another programming l
anguage and linking it with the Python code, thus mak
ing Python an extensible language.

Conclusion
Now that we know why Python is the chosen favorite, what
else are we waiting for! In the next chapter, we will
introduce you to the exciting realm of Robotic Process
Automation (RPA) technology and witness the power of
Python in executing RPA processes independently. Without
further delay, let kick start our journey by moving to the
next chapter on RPA foundations!

Join our book’s Discord space


Join the book’s Discord Workspace for Latest updates,
Offers, Tech happenings around the world, New Release and
Sessions with the Authors:
https://discord.bpbonline.com
Chapter 2
RPA Foundations

Introduction
This chapter intends to build a foundation of Robotic
Process Automation (RPA), which is one of the hallmarks
of the current industry. Today, RPA technology can be
conveniently leveraged by an HR department to
automatically source candidates and screen their resumes,
thus reducing hours of manual effort. This chapter takes the
reader into the historical background and progress of robotic
technology and how it has led the industry to the current
state of RPA. Components of RPA have been covered in a
lucid manner in order to get the reader familiar with the
concepts. Various industry leaders in the RPA market like
Automation Anywhere, UiPath and Blue Prism have been
described in this chapter along with their comparative
advantages and disadvantages over each other. The end of
this chapter tries to highlight the role that Python plays in
enhancing the potential of RPA and discusses a practical use
case of how Python can be conveniently leveraged to
achieve the same RPA which these tools do.

Structure
The chapter covers the following topics:
• History of Robotic Process Automation
• What is RPA
∘ Components of RPA
∘ Various RPA tools in the market
∘ Comparison between various RPA tools
• RPA Python package
• Practical use case of RPA with Python

Objectives
At the end of this chapter, you would have a practical
understanding of Robotic Process Automation (RPA), its
various offerings in the market and how Python can be
efficiently utilized to achieve RPA.

History of Robotic Process


Automation
The contemporary era is called Industry 4.0 which is
basically the assimilation of digital technology and the
Internet of Things (IoT) into manufacturing. Automation
has always been a part of prior industrial revolutions. As far
as the automation of computer processes is concerned, the
advent of the internet, cloud computing and artificial
intelligence has greatly provided an impetus to the rise of
niche solutions to execute the automation of workflows and
business processes. Although early developments in
automation started with screen scraping software, the world
has come a long way to realize market leaders in RPA who
are now able to provide automation as a service. Popular
names under this category, are Blue Prism, Automation
Anywhere and UiPath.
What is RPA
Robotic Process Automation is a technology that enables
the construction of software robots (known as bots) which
automatically perform regular manual operations that a
human performs on a computer. These operations include
opening a website, clicking buttons, navigating web pages
entering credentials and many more. As we see, all these
operations are manual and repetitive, consume loads of
time and have a propensity to be error prone due to human
intervention and fatigue.
RPA exactly solves this problem by enabling automated
execution of the entire repetitive business process workflow.
Also, RPA being a bot does not suffer from human
shortcomings like error and fatigue. Hence, it will save a
tremendous amount of time and effort for an organization.

Components of RPA
We have now understood that RPA is software enabling the
creation of bots for process automation. Like every other
software, RPA has certain components which are generic to
all the RPA products that we have in the market. Let us look
at each of these components:
• Recorder: This is the most important but basic and u
ser-friendly component of an RPA tool. This is similar t
o the Record Macro feature that Microsoft Excel prov
ides. In this feature, the user records the manual proc
ess while it is being performed. What the recorder doe
s is that it captures salient elements that are being na
vigated by the user on the screen like buttons, web pa
ge URLs and dropdowns and treats them as objects. It
also captures the properties of these objects. After the
recording is complete, the user has the flexibility to ed
it these properties and make them generic by passing
them into variables, thus transforming a hardcoded re
cording into a dynamic bot! The figure below is a scre
enshot of the recorder feature in UiPath:

Figure 2.1: A screenshot of the app/web recorder in UiPath

• Development studio: This is the primary componen


t of an RPA tool. A development studio is the actual int
erface or environment that contains various features t
o develop a software bot. These features have been li
sted below:
∘ Dashboard with a Graphical User Interface
(GUI) for navigating features.
∘ Various types of recording functionalities.
∘ Standard templates to choose from with drag and
drop feature.
The Figure 2.2 below shows the interface of the
Development Studio in UiPath:
Figure 2.2: Development studio in UiPath

• Plugins and extensions: Plugins or extensions allow


the expansion of the original capability of the RPA tool
by enabling its integration with an external program o
r third-party software.
• Bot runner: This is the component that enables to ru
n the bot that is created by the Bot Creator.
• Control center: In order to centrally manage, schedu
le and control the bots, a control center is used.

Various RPA tools in the market


There are numerous companies that offer RPA solutions in
the market. However, this section shall discuss only the
three market leaders which are UiPath, Automation
Anywhere and Blue Prism.
• UiPath: This is one of the leaders in RPA tools used fo
r desktop and web automation. A big advantage of thi
s tool is that it provides a community edition for users
who intend to learn the tool. A person with no knowle
dge of coding can also use this tool because it offers d
rag and drop features for the user to create bots. The
primary components of UiPath are UiPath Studio, UiPat
h Robot and UiPath orchestrator. These shall be discus
sed in detail in another chapter related to UiPath.
• Automation anywhere: This is a web-based automa
tion tool which also has a free version called Communi
ty Edition. The three main components of Automation
Anywhere are:
∘ Bot Creator: This is a component in which the
user creates the bot.
∘ Control room: This is the interface that is used
to centrally manage the bots.
∘ Bot runner: This is the component that is used
to run the bots after they have been created in
the Bot Creator. This also reports the bot
execution status to the control room.
• Blue Prism: This is an RPA tool that helps organizatio
ns automate manual and repetitive processes. This is
a platform independent tool and can be used on any p
latform. This does not have a community edition like A
utomation Anywhere or UiPath but has a learning versi
on which is free for 180 days. One disadvantage of thi
s tool is that it does not have a recording functionality
like UiPath and Automation Anywhere. The main comp
onents of Blue Prism are:
∘ Object studio: This is where Blue Prism objects
are created, or bots are built.
∘ Process studio: This is similar to object studio
but here one can define the flow of the process
∘ Control room: This is where the processes are
controlled.
Apart from the three market leaders discussed above, there
are numerous other RPA tools in the market like Pega,
OpenConnect, WorkFusion, Contextor, Cryon, OnviSource,
AutomationEdge and Foxtrot.

Comparison between various RPA


tools
Considering that there are multiple options to choose from
when it comes to RPA tools in the market, it can be a tough
decision to make for an organization while finalizing its
choice of an RPA platform. However, there are certain
important parameters that could be taken into account
while selecting the optimum RPA tool for the organization.
They are recording feature, availability of free version, drag
and drop functionality, architecture and cost considerations.
the comparison between Automation Anywhere, Blue Prism
and UiPath based on these parameters is shown in the Table
2.1 below:

Automatio
Parameter n Blue Prism UiPath
Anywhere
Macro Yes No Yes
recording
Communit Yes No Yes
y edition
Drag and Not for all Yes Yes
drop tasks.
(Requires
scripting)
Automatio
Parameter n Blue Prism UiPath
Anywhere
Architectur Client Client Web based
e server server orchestrat
or
Cost High High Pricing is
acquisition deploymen entry level
cost t cost
Table 2.1: Comparison of RPA tools in market

RPA Python package


Python has provided an open source package named rpa
that allows the developer to perform automation in the
same way that an RPA tool would have performed.
Let us perform a simple task of opening the Google search
web page and typing some characters in the textbox. To
perform this task, open any Python IDE like Spyder.
To use this library, first we need to install it using the
following command:
pip install rpa
Later it needs to be imported as shown below, in order to
use it in Jupyter notebook or any Python IDE:
import rpa as r
Next, type the code below into your Python editor:
r.init(visual_automation=False, chrome_browser=True)
r.url('https://www.google.com/')
r.type('//*[@name="q"]', 'news')
r.click('//*[@name="btnK"]')
r.wait(5) # ensure results are fully loaded
r.snap('page', 'Captured_Image.png')
r.close()
Now, we shall try to understand the meaning of each line:
The first line is:
r.init(visual_automation=False, chrome_browser=True)
Here, the init() method has been called that initializes the
tagUI process. If you are wondering what is tagUI, it is
basically the process that is running behind when we use
the Python rpa package.
Now, we also observe that the function has two arguments
listed below:
visual_automation=False, chrome_browser=True
These are basically the default values of the arguments. So
even if we do not specify these and just call r.init(), the
Python code would run in the same way.
Now, if we change the first argument to
visual_automation=True, then it would enable us to use
keyboard functionalities in the process. If you need detailed
information on this package along with the different
functions and use cases, please visit the GitHub repository
using this link: https://github.com/tebelorg/RPA-Python.
Now, the next argument chrome_browser=True enables
us to specify that the browser which needs to be opened is
Chrome. This argument is always ‘True’ by default. In case
you would need to open the web page in any other browser
then this argument would need to be set to ‘False’.
The second line is:
r.url('https://www.google.com/')
This specifies that the URL which needs to be opened is the
Google Search page.
The next line is:
r.type('//*[@name="q"]', 'news')
The figure below shows a screenshot of the word ‘news’
being typed into the Google search page:

Figure 2.3: The word ‘news’ being typed into Google Search page

This line uses the type() method which enables the code to
type characters into the webpage element. The webpage
element is specified in the first argument which is
'//*[@name="q"]'. This is the XPath of the search textbox
in the Google window. The XPath which stands for XML Path
Language is basically an expression which is used to identify
an element in an XML document, in this case the search
textbox. We shall see how to obtain XPaths in the next
section containing the practical use case. The second
argument is news. This argument specifies the text that
needs to be entered. In this case, the text being entered is
‘news’.
The next line is:
r.click('//*[@name="btnK"]')
This line instructs the code to click the button titled Google
Search which shall produce the search results. The XPath
identifier for this button is //*[@name="btnK"].
After this, the next line is:
r.wait(5)
This line basically adds a wait time of 5 seconds to the
program until the web page is opened. This is necessary
because if the program prematurely executes the next line
without waiting for the process from the initial line to be
completed, then the program will throw an error.
The next line is:
r.snap('page', 'Captured_Image.png')
This line uses the snap() method to take a screenshot of
the page that is currently open in the website window. This
snapshot is then saved as an image file named
Captured_Image.png which is provided as the second
argument of the method. This image file would be saved in
the home directory of the program which is basically the
same folder where the Python file is saved. This screenshot
would basically display the search result obtained after
typing the word news in Google Search box.
The last line r.close() would close the rpa object and the
process thereafter.
In this way, we saw how the Python rpa package offers a
functionality to automate regular manual processes which
would have otherwise also been done using RPA tools like
UiPath, Automation Anywhere and Blue Prism. In this case
we took a simple example of opening a web page, producing
a search result, taking a screenshot, and saving the
screenshot as an image. This Python rpa package is an
excellent tool for those who are familiar with programming
and would want to unleash the potential of RPA through the
medium of writing code.
Let us now take a little complex but real-world use case of
visiting a training registration web page and filling
information of every participant from an Excel file. In this
use case we would build on the experience that we acquired
in the exercise above and leverage the use of Python loops
and Pandas dataframes to execute what we want.

Practical use case of RPA with Python


Let us apply what we learned in the previous exercise to
implement a practical RPA use case with the Python RPA
package.
The only input data required for this exercise would be the
Training.xlsx Excel workbook. Save the workbook on your
machine at any location. Let us save that in
C:\Training.xlsx. The Figure 2.4 below shows the table in
the workbook that is of our interest:

Figure 2.4: Participant data for enrollment in trainings

The website that we would be using is


https://codenboxautomationlab.com/registration-
form/ which contains the sample training enrollment form
as shown in the Figure 2.5 below:
Figure 2.5: Training registration form

Our aim of this exercise shall be to register every participant


from the Excel table for their respective training by filling
details in the webform. If we must perform this process
manually, it would be really time consuming to copy and
paste every single detail from the Excel. We shall use the
Python rpa package coupled with the power of Python loops
and Pandas library to automate this process from start to
end.
Let us import the pandas library and read the input Excel as
a dataframe using the below lines of code:
import pandas as pd
input_df = pd.read_excel(r"C:\Training.xlsx")
We convert the table from the Excel document into a
pandas dataframe and store it in variable input_df:
Next, we would loop through every row of the dataframe
and capture values from the columns First Name, Last
Name, Email, Course, Enrollment Month in respective
variable names first_name, last_name, email, course,
month as shown in the code below:
for i in range(len(input_df)):
first_name = input_df.loc[i, "First Name"]
last_name = input_df.loc[i, "Last Name"]
email = input_df.loc[i, "Email"]
course =input_df.loc[i, "Course"]
month =input_df.loc[i, "Enrollment Month"]
The objective now is to open the webpage for every
instance of the loop, capture column values into variable
names and fill details from the variable names into the
webpage by capturing screen elements using the Python
rpa package.
Let us continue within the loop with the following line of
code:
r.init()
r.url('https://codenboxautomationlab.com/registration-
form/')
Like we saw in the example from the previous section, these
two lines would initialize the tagUI process and open the
webpage URL:
https://codenboxautomationlab.com/registration-
form/.
Next, we would want to enter the information from the
variables into respective fields in the webpage. The first
three fields First Name, Last Name and Email are in the
web page are basically textboxes.
We would need to obtain the XPath of each of the text fields
in order to pass their reference in the code. The way to do
that is to right click on the text box and select Inspect. The
HTML code for the textbox would be highlighted as shown in
the Figure 2.6 below:

Figure 2.6: HTML code for field ‘First Name’

We find that the name property has value fname which can
be used as an identifier in the type() method. The code
below would execute that for us. We enter the value from
variable first_name into the field First Name:
r.type('//*[@name="fname"]', first_name)
In a similar manner, we could enter the values from
variables last_name and email into fields Last Name and
Email respectively by capturing their identifiers using the
name property. The two lines of code below would execute
that for us.
r.type('//*[@name="lname"]', last_name)
r.type('//*[@name="email"]', email)
Next, we would want to mention the correct course for each
participant by selecting the appropriate field in the
dropdown. This value has been captured in the variable
course. Similar to the previous manner, we capture the
element identifier from the HTML code and enter it into the
code as shown below. Here, we note that since the field here
is a dropdown, we would need to use the select() method
instead of the type() method.
r.select('//*[@name="nf-field-22"]', course)
Similarly, we mention the month of enrollment by selecting
the appropriate month from the dropdown corresponding to
the value in variable month.:
r.select('//*[@name="nf-field-24"]', month)
For the remaining field How do you know about us?* which
is a radio button, we shall select ‘Others’ as the default
option for every participant. The code below would do that
for us. Note that we use the click() method as this is a
radio button.
r.click('//*[@id="nf-field-23-6"]')
Finally, we would need to click the Register button at the
bottom of the webpage. This field could not be shown in the
earlier snapshot due to space considerations. Hence, it is
being shown in the Figure 2.7 below:

Figure 2.7: Register button


As usual, we capture the element identifier for the button
and use the click method to execute the code as shown
below:
r.click('//*[@id="nf-field-15"]')
In order to ensure that the next page of the website loads
after we click the Register button, we add a wait time of 10
seconds after the above line of code is executed. If we do
not add this wait time, then the code would run to the next
iteration of the loop and try to fetch the webpage
prematurely without the webpage being loaded. The wait
time of 10 seconds is added using the line of code below:
r.wait(10) # ensure results are fully loaded
Finally, we would close the tagUI process for every iteration
using the line of code below.
r.close()
This procedure would then repeat for each iteration of the
loop where the webpage would be loaded, values from
variables would be entered into respective fields and then
the tagUI process would be closed.
At the end of each iteration, you would see a window
confirming the registration of each participant. As there are
five rows in the Excel, we would be having five iterations.
The Figure 2.8 below shows the confirmation window:
Figure 2.8: Confirmation window

We have now seen each section of the code separately but


in case you would need the entire chunk of code so that it
can be readily pasted in the editor, here it is:
import pandas as pd
import rpa as r
input_df = pd.read_excel(r"C:\Training.xlsx")
for i in range(len(input_df)):
first_name = input_df.loc[i, "First Name"]
last_name = input_df.loc[i, "Last Name"]
email = input_df.loc[i, "Email"]
course =input_df.loc[i, "Course"]
month =input_df.loc[i, "Enrollment Month"]
r.init()
r.url('https://codenboxautomationlab.com/registration-f
orm/')
r.type('//*[@name="fname"]', first_name)
r.type('//*[@name="lname"]', last_name)
r.type('//*[@name="email"]', email)
r.select('//*[@name="nf-field-22"]', course)
r.select('//*[@name="nf-field-24"]', month)
r.click('//*[@id="nf-field-23-6"]')
r.click('//*[@id="nf-field-15"]')
r.wait(10) # ensure results are fully loaded
r.close()

Conclusion
In this chapter, we first went through the fundamental
concepts of Robotic Process Automation tools and then
compared various tools available in the market based on
important parameters. These tools offer drag and drop
functionality that enable a person to execute RPA processes
without much knowledge of coding. However, for Python
enthusiasts who want to execute the same automation
through scripting, we saw how the rpa open-source package
from Python does wonders in automating most common
processes like filling web page forms by looking at a
practical use case.
We are now on course to begin an exciting journey through
contemporary era’s most glamorous technological
buzzwords, and they are none other than Artificial
Intelligence and Machine Learning!
In the next chapter, we would be acquainting ourselves with
the fundamental concepts of Artificial Intelligence and
Machine Learning. We would also be studying commonly
used machine learning algorithms and later get introduced
to deep learning and neural networks. As eager as we could
get, let us now go to the next chapter!
Chapter 3
Getting Started with
AI/ML in Python

Introduction
This chapter introduces the reader to fundamental concepts
of artificial intelligence and machine learning. The chapter
begins with the historical evolution of artificial intelligence
and discusses the current and future scenario. The reader is
then introduced to the definition and basic concepts of
machine learning followed by machine learning algorithms
grouped into various categories including supervised and
unsupervised learning. Various Python libraries and their
applications in AI/ML frameworks shall provide the reader
with an idea of the significance of Python in AI/ML
implementations. Further pages into this chapter shall
discuss deep learning and its applications and the chapter
will end with a summary of neural networks, natural
language processing and transformers.

Structure
The chapter covers the following topics:
• Background and history of AI
• Machine learning concepts
• Supervised and unsupervised learning
• Popular Python libraries for ML
• Reinforcement learning
• Deep learning
• Introduction to neural networks
∘ Types of neural networks
• Natural language processing
• Transformers and large language models

Objectives
The main intention behind this chapter is to acquaint the
reader with basic theoretical understanding of machine
learning and deep learning concepts in an easy way which
shall enable the reader to effectively apply the acquired
knowledge to independently model machine learning
algorithms using Python in later chapters. Hence, this
chapter being a foundation oriented one, is more focused on
theory than about writing Python code. By the end of this
section, you will have a clear grasp of key principles in
artificial intelligence, machine learning, and deep learning,
setting the stage for hands-on applications.

Background and history of AI


We have almost all watched the movie Terminator. Well, that
is how most of us visualize the concept of artificial
intelligence (AI), and why should we not? In fact, that is
exactly the aim of AI which intends to create a machine that
can precisely mimic a human in terms of thinking and
feeling. Even though we are still far away from that ideal
situation, the rise of large language models (LLM)
chatbots like ChatGPT are slowly giving a ray of hope to take
mankind towards that ideal scenario which is termed
Artificial General Intelligence (AGI) or Strong AI. The
current level of AI at which we stand is called Weak AI or
Narrow AI because it does not have capabilities that can
completely replicate a human being like understanding
emotions or sarcasm. Further, any form of AI will only be as
intelligent as the dataset on which it is trained. Let us have
a look at the history of AI starting from early trends to
current developments and try to understand what the future
holds for this domain.
In 1950, English mathematician and computer scientist Alan
Turing came up with a test to determine whether a machine
can mimic a human in terms of thinking. This test also
called Turing Test has become a blueprint of measure even
today of the artificial intelligence that a machine possesses.
In this test, a person is supposed to simultaneously interact
with another human and a machine. Based on the responses
received from the human and the machine, the person who
is interacting is supposed to distinguish the machine’s
response from the human’s response. If at some point of
time the person fails to distinguish between the machine’s
response and the human’s response, then the machine is
considered to have successfully passed the Turing Test.
Since then, incremental attempts have been made at
enhancing the potential of machines to mimic human
abilities. With the rise of big data, cloud computing and
Internet of Things (IoT), computational capabilities have
risen manifold thus providing the environment to train
models on massive chunks of data in less time. Just like
smartphones initially came into the world as a luxury
product and then became an inseparable part of our lives, AI
is currently a buzzword, but it is expected to take over our
lives in a similar manner and it is only a matter of time!
Contemporary use cases of AI like autonomous self-driving
cars, face recognition technology, intelligent chatbots like
ChatGPT are all a result of this constant upgradation in AI
owing to enhancement in computational capabilities. The
future of AI promises to be exciting as well as daunting with
the rise of technology like deepfakes to manipulate videos
and the usage of AI generated models to replicate real
humans or film stars!
Whenever the discussion hovers around AI, one cannot
avoid the supplementary topics of machine learning, deep
learning and the broader purview of discussion called data
science. Let us quickly have a look at each of these, where
they fit into the wider scheme of things and how they are
related to each other.
Machine learning is the process through which a machine is
trained to learn by itself by finding patterns between the
input data and the output data and predict the new output
when a new input is provided to it. How is this related to
artificial intelligence? To put it simply, artificial intelligence
is the objective that we are trying to realize, and Machine
Learning is the means through which we realize that
objective. AI is the noun and Machine Learning (ML) is
the verb.
Coming to deep learning (DL), it is a process of learning
that enables computers to learn in a way which is similar to
human learning by means of successive layers of neural
networks. We shall look into neural networks in detail in
later sections.
Data science is a broad term that deals with all areas of the
data lifecycle like data gathering, data scraping, data
engineering, data mining and data transformation to name
a few. Since most of these processes also form a part of
building ML models, there is always some overlap between
data science, AI, ML and DL.
The Venn diagram in Figure 3.1 below should provide more
clarity on this relationship and overlap:

Figure 3.1: Relationship between artificial intelligence, machine learning, deep


learning and data science.

Machine learning concepts


Now that we know where machine learning fits into the
larger picture of Artificial Intelligence, let us now try to
understand it in greater depth. So, what exactly is machine
learning? Let us try to understand this concept analytically.
We all are familiar with mathematical formulae and
equations. Suppose we are provided an equation y = 3x+4.
What do we understand from this equation? Firstly, we see
that the equation has two variables. One is the independent
variable x and the other one is the dependent variable y. We
could also say that x is the input variable and y is the output
variable. In order to get the output variable y, we need to
multiply the input variable x by 3 and add 4 to the product.
In other words, there exists a relationship or mapping
between the input variable x and the output variable y. In
the language of mathematics, we say that y is a function of
x and represent it as y=f(x). Now, if we are provided with
any arbitrary input and are asked to find the output, we
would be able to do it easily using the formula y = 3x+4.
Suppose we are provided with the following set of inputs:
{2,3,9,6,10,15}. We can insert each of these values into the
equation y=3x+4 and get the corresponding value of y. Let
us take the first value of 2. Putting it in the equation
y=3x+4 gives us y=3*2+4. Hence, y=6+4=10.
Hence, for the first input value of 2, the output is 10. In a
similar manner, we can calculate the respective outputs for
the remaining set of inputs. The Table 3.1 below shows the
final output values mapped against each of the inputs:
Inputs Outputs
2 10
3 13
9 31
6 22
10 34
15 49
Table 3.1: Final Outputs mapped against the inputs

Hence, what did we exactly do here? We already knew a


function that maps the input to the output and had a set of
inputs with us. Using these two entities, we could easily
obtain the output for each of the inputs. In other words, we
already had a set of rules that mapped the input to the
output. This is called rules-based programming and is
commonly used in all our regular computer applications like
calculator or even in complex applications like office
software and games. Machine learning is exactly in contrast
to this. Let us take a different example to understand this.
Consider a case where we already have a set of inputs and a
set of outputs as shown in the Table 3.2 below:
Inputs Outputs
5 15
20 60
3 9
7 21
150 450
100 300
Table 3.2: Inputs and outputs

Now suppose we are given an arbitrary value say 50 and are


asked to find out the corresponding output, would we be
able to do so? The first question that would occur is what is
the relationship between the input and the output? All that
we have is just a set of inputs and a set of outputs. But is
there a way to find a relationship between the input and the
output? A quick glance at the inputs and their corresponding
outputs reveals that each of the outputs is three times the
input. 5 multiplied by 3 is 15, 20 multiplied by 3 is 60 and so
on. Now, we seem to have found out the mapping function
as well, which is y=3x. Let us insert the value 50 in this
function y=3x to get the output. In this case, the output
would be y = 3*50 = 150. So what did we exactly do in this
approach? We already had the inputs and outputs with us
but did not know the function. We had to know the function
before we could find out the output for any arbitrary input.
This is exactly what machine learning is!
As opposed to rules-based programming where we already
have the function, the job of machine learning is to find out
the function that maps the input to the output so that the
correct output could be predicted thereafter for any new set
of input. In this case, the inputs and outputs were simple
numbers, but they could be anything like complex images.
Also, in this example we found out the function by mere
observation but, there are various machine learning
algorithms that do the job of finding this relationship. These
algorithms fall into various categories like regression,
classification, and clustering which we shall see in this
chapter as well as in related chapters to come.
Now that we have conceptually understood what machine
learning is, let us delve deeper to understand the different
types of machine learning.

Supervised and unsupervised


learning
Depending on the type of input data, machine learning can
be classified into Supervised Learning and Unsupervised
Learning. Let us try to understand each of these:
Supervised learning: In the example from the previous
section, we saw that every item in the input data set was
accompanied by a corresponding item in the output data
set. This kind of data set where every input is accompanied
by a corresponding output is called as a ‘labelled’ data set.
Whenever a machine is trained on such a labelled data set
the type of machine learning is called as ‘supervised
learning’ because the training is guided or supervised by the
presence of an output data. Typical examples of supervised
algorithms are linear regression, logistic regression, k
nearest neighbors, Naïve Bayes, decision trees, random
forests and support vector machines. Let us take a quick
look at each of them. We shall implement them in detail
using Python in later chapters. Here, we shall try to
understand them.
• Linear regression: This is the process of drawing a s
traight line that fits best to a set of points. Linear Regr
ession is also sometimes conveniently called least sq
uares regression because the criterion chosen to fit
the line is that the sum of the squares of the vertical d
istance of the points from the line should be the least.
The example in the previous section where we found
out by observation that y=3x was a classic example o
f linear regression.
• Logistic regression: Although this algorithm include
s the name regression, it is a classification algorithm. I
t is used when wish to predict the output of a categori
cal variable. For example, if we want to know whether
the output is True or False, 1 or 0, On or Off, Yes or No,
we use logistic regression. When we have two possibl
e outcomes, it is called binary logistic regression. Logi
stic regression can also extend to Multinomial and Ord
inal when we have more than two possible outcomes.
Logistic regression uses the sigmoid function which ba
sically maps any real number data set into a range be
tween 0 and 1. The Figure 3.2 below shows the Sigmo
id function which is given by the following equation:

Figure 3.2: Sigmoid function


K nearest neighbors: As the name implies, this is a
• classification algorithm that tries to classify a data poi
nt into its nearest class. The data set fed to the algorit
hm would be a labelled data set consisting of points a
s the input and their corresponding classes as the lab
elled output to which the points respectively belong. T
he aim of this algorithm is to correctly classify the unk
nown set of new input data points into their respectiv
e classes based on the nearest or least distance to th
e classes. Hence the name nearest neighbors. The
distance measure is usually the Euclidean distance th
at we use in Cartesian coordinate systems. Please ref
er to the following Figure 3.5:

Figure 3.3: K nearest neighbors

• Naïve Bayes: This is a classification algorithm based


on Bayes Theorem of conditional probability which is e
xpressed mathematically as shown below:
This theorem has many applications in situations
where we want to take a decision on the probability of
an event based on the occurrence of another event.
As the name ‘Naïve’ implies, the algorithm assumes
that each of the features resulting in the outcome of
an event have an equal and independent
contribution. The most popular application of this
algorithm is in classifying an email as spam or not
spam.
• Decision tree: A decision tree algorithm is the proce
ss of fitting a decision-making thought process to a la
belled data set. Similar to the way in which we think,
a decision tree would start with a root node and bifurc
ate further into branches that would contain decision
nodes and leaf nodes. Observe the figure below that s
hows root nodes, decision nodes and leaf nodes of a s
ample decision tree describing the thought process w
hile buying a house. Please refer to the following Figur
e 3.4:

Figure 3.4: Sample decision tree to finalize the decision of buying a house
• Random forests: This is a continuation of decision tr
ee algorithm but what we do here is we combine the r
esults of multiple decision trees to come up with the r
esult. The limitation with having a single decision tree
is that there is a possibility of overfitting. Overfitting is
the phenomenon where an algorithm fits the training
data set so rigidly that it does not have the bandwidth
to accommodate any flexibility to allow itself to fit wit
h equivalent ease on the testing dataset. Random fore
st is an example of an Ensemble Learning algorithm th
at uses the Bagging technique. Ensemble Learning is t
he technique of improving results by combining the re
sults of several models. Bagging is the technique whe
re different samples of training data are fed to the alg
orithm to obtain a different result for each sample. In t
his case, random samples of training data are fed to t
he algorithm. This results in the creation of multiple d
ecision trees that produce multiple outputs. The result
is decided by a majority vote process. The most frequ
ently occurring output is chosen as the result.
• Support vector machine: This is a classification alg
orithm that creates an optimal hyperplane that separa
tes the dataset in the best possible way into categorie
s or classes based on differentiable attributes. If the d
ata set is two dimensional with two classes, then the
hyperplane would be a straight line. The algorithm wo
rks by choosing the points closest to the line from bot
h classes and trying to maximize the distance betwee
n the points and the line called as margin. These poin
ts are called as ‘Support Vectors’ hence the name sup
port vector machine.
Figure 3.5: Support vector machine

Unsupervised learning: This is the type of machine


learning where the input dataset is not labelled into
categories. The algorithm has to learn on its own from the
input. The most popular type of algorithm in this category is
clustering. K Means clustering is one of the types of
clustering algorithms where the algorithm clubs the data
sets into K different clusters after repetitive iterations.
DBSCAN is another popular clustering algorithm.
Hierarchical clustering is a kind of clustering method which
is used whenever the clusters are to be ordered in a
predetermined way.

Popular Python Libraries for ML


Let us now look at some popular machine learning libraries
in Python which we shall be using for implementing ML
algorithms in later chapters. We shall only go through them
in this section at a high level to understand their unique
significance and relative differences. Their actual
implementation in Python shall be done in later chapters:
• Scikit Learn: This is one of the most popular Python l
ibraries for supervised and unsupervised machine lear
ning. This library has a separate class for every algorit
hm and each of those classes has a method called fit
() that fits the algorithm to the dataset.
• TensorFlow: This is an advanced deep learning librar
y used to perform complex computations in Mathemat
ics and Physics through a tensor. A tensor is basically
a multidimensional array with a uniform data type. Alt
hough this is a Python library, the actual mathematica
l operations are performed in C++. Python simply dire
cts these pieces of code and holds them together. Ten
sorFlow can be used to train and run deep neural net
works. Recently, TensorFlow has also integrated Keras
through the tf.Keras module.
• OpenCV: This is an open source Python library used f
or Computer Vision applications and image processin
g. Computer vision is the process by which a compute
r is made to understand images and videos and extra
ct insightful information from them to take further dec
isions. Popular applications of computer vision are fac
e recognition applications, autonomous self-driving ca
rs and image generation applications. Optical Chara
cter Recognition (OCR) is another application of Op
enCV that is used for text detection.
• Keras: This is a high-level neural network library built
in Python. It does not perform the low-level computati
ons by itself but uses other backend libraries like Tens
orFlow, Theano and CNTK to do them.
• Theano: This is another popular deep learning library
that is used to perform fast mathematical computatio
ns on multidimensional arrays. This is built on top of n
umpy. This library works faster on a GPU than on a CP
U. Keras library also uses Theano at the back end to p
erform low level computations.
• PyTorch: This is another useful and popular library fo
r deep neural networks which have proved useful in th
e machine understanding of natural language. Natur
al language processing (NLP) mechanisms work on
models based on Recurrent Neural Networks
(RNN) and Recursive Neural Networks. These complic
ated implementations are made fast and simple by Py
Torch.
• NLTK: NLTK is the abbreviation for Natural Language T
oolkit and is popularly used in the most common NLP
application which is chatbot. It provides a variety of m
odules and functions useful in performing common NL
P tasks like stemming, lemmatization, and tokenizatio
n.
• Spacy: This is another library used for NLP that is ne
w compared to NLTK. Hence, it has additional features
as compared to NLTK like word vector support. Also, S
pacy is known to be faster than NLTK.
Hence, this was an overview of the most popular Python
libraries that are currently being used to develop most of
the machine learning and deep learning applications. It has
been overwhelming for sure! Before we delve further into
the topics of neural networks and deep learning, let us
familiarize ourselves with a very useful concept that forms
the core of many self-learning mechanisms that we see
today, including popular the chatbot like ChatGPT by
OpenAI. This is the concept of reinforcement learning which
we shall see in the next section. So let us jump to the next
section without wasting further time!
Reinforcement learning
In earlier sections, we have seen that the two types of
machine learning are supervised learning where there are
labelled data sets provided and then unsupervised learning
where there are no labelled data sets, and the algorithm has
to learn on its own from the data by grouping together items
with similar attributes. There is another mechanism which
enables a machine to learn on its own based on the
outcome of a previous output or in other words from
‘experience’. Reinforcement learning systems use
algorithms that receive feedback to determine whether the
output decision was correct, wrong, or neutral. Hence,
reinforcement learning is an autonomous self-learning
mechanism. The Figure 3.6 below shows the reinforcement
learning mechanism:

Figure 3.6: Reinforcement learning mechanism

As shown in the figure above, the agent sends out an action


to the environment. The environment then returns that
action with a new state. This state captures feedback from
the environment and the agent adjusts its next action based
on the feedback.
Reinforcement learning along with supervised learning is
popularly used in fine tuning large language models
(LLM) enabled chatbots like ChatGPT.
Now that we are familiar with machine learning algorithms
and the corresponding Python libraries, let us start with the
exciting journey of deep learning!

Deep learning
The name deep learning is derived from process of
training multiple layers. The central mechanism of deep
learning is an Artificial Neural Network (ANN) which is
intended to mimic the neural learning mechanism in a
person. Deep learning involves the training of multiple
layers of neural networks which is achieved by adjusting the
weights of every neuron. More on this to come in later
chapters.
One might wonder why does one need to approach a deep
neural network in spite of the availability of all these
machine learning algorithms? The reason is that where
machine learning algorithms fall short of accuracy at
numerous instances, deep neural networks do an
exceptional job of achieving the desired result! Due to the
rise of high and efficient computational systems and
availability of large datasets, deep learning has become a
popular method of training complex multi layered neural
networks. Let us now try to understand the basic concepts
of a neural network.

Introduction to neural networks


The fundamental unit of any neural network is the
perceptron as shown in Figure 3.7 below:
Figure 3.7: Basic architecture of a perceptron

Let us try to understand this architecture in detail:


• x1, x2, x3 and x4 are the input values that go into the
perceptron. In this case we have taken four inputs. Ho
wever, a perceptron can have as many numbers as po
ssible of inputs that it might require.
• w1, w2, w3 and w4 are the weights that are associate
d with each of these inputs. Every input is basically m
ultiplied by its associated weight. In this case, the ope
ration that is performed is w1*x1 + w2*x2 + w3*x3 +
w4*x4. Mathematically, this expression is known as th
e dot product of the weights and the inputs.
• b0 is the bias term. This value is added to the final dot
product that is obtained above. The bias term is adde
d in order to offset the final dot product by a standard
value. This term can have a positive value, negative v
alue or even zero value.
• The final expression thus becomes w1*x1 + w2*x2 +
w3*x3 + w4*x4 + b0. We see in the diagram that ther
e is an activation function at the end before the final o
utput is produced. We will discuss activation functions
in detail later. For the time being, we would want to th
ink of activation function like a trigger that decides wh
ether the perceptron would be activated or not.
• After the expression consisting of inputs, weights and
the bias is evaluated, the activation function is applie
d to the expression to decide the final output of the p
erceptron.
Now that we have understood the basic architecture of a
perceptron, let us apply this further to see how we can
construct a simple neural network out of multiple
perceptrons. The Figure 3.8 below shows the architecture of
a simple neural network:

Figure 3.8: Basic architecture of a Neural Network

As observed in the figure below, a neural network is


composed of several perceptrons or nodes. As a basic
constituent of a neural network, the perceptron would be
hereafter referred to as a node.
We observe that the neural network consists of three layers.
The Input Layer, Hidden Layer and the Output Layer.
Each of these layers can have one or many nodes. In this
case, we have three nodes in the input later, four nodes in
the hidden layer and two nodes in the output layer. It is
important to note that in this case we have just one hidden
layer, but one may have a neural network with multiple
hidden layers. Moreover, the choice of the number of nodes
in the input, hidden and output layers depends upon the
complexity of the data that the neural network is trained on.
However, we also use certain rules of thumb in most cases
like the number of hidden nodes should be roughly 2/3 of
the sum of the number of input and output nodes.
Now, we would be interested in knowing how this neural
network works!
We have already seen earlier how a single node operates. In
a similar way, we have three input nodes with their three
input values x1, x2, x3. We observe from the diagram that
these inputs are being fed to four nodes in the hidden layer.
Hence, each of the nodes in the hidden layer would be fed
with these same three inputs, but with different weights as
mentioned below:
• y1 would be fed with inputs x1, x2, x3 with weights w
11, w12, w13.
• y2 would be fed with inputs x1, x2, x3 with weights w
21, w22, w23.
• y3 would be fed with inputs x1, x2, x3 with weights w
31, w32, w33.
• y4 would be fed with inputs x1, x2, x3 with weights w
41, w42, w43.
As seen earlier, there would also be four bias terms b1, b2,
b3 and b4 that would be added to these outputs. The final
expressions that would be fed to each of the nodes are as
below. Also, remember that each of these expressions would
be subject to an activation function before producing the
final output:
• y1 = w11*x1 + w12*x2 + w13*x3 + b1.
• y2 = w21*x1 + w22*x2 + w23*x3 + b2.
• y3 = w31*x1 + w32*x2 + w33*x3 + b3.
• y4 = w41*x1 + w42*x2 + w43*x3 + b4.
Now, these values at hidden layers y1, y2, y3 and y4 would
be fed to the output layers z1 and z2 with their respective
weights and biases and after applying the activation
function as shown in the expressions below:
• z1 = v11*y1 + v12*y2 + v13*y3 + v14*y4 + c1.
• z2 = v21*y1 + v22*y2 + v23*y3 + v24*y4 + c2.
In this way, the final outputs of the network z1 and z2 are
produced from the inputs x1, x2 and x3 by applying
transformations through the hidden layer with the help of
weights, biases, and activation functions.
The learning mechanism in any neural network is basically a
self-learning mechanism. Based on the closeness of the
output to the expected output, the neural network adjusts
its weights and biases so that the output produced in the
next iteration matches the expected output more closely
than the one produced in the previous iteration. This
process repeats until the output produced is satisfactory.
The process of learning in a neural network takes place by a
technique called back propagation. In this process, a loss
function is minimized using the chain rule of derivatives
starting from the output layer and moving towards the
innermost input layer. The loss function is basically the
quantity that we are trying to minimize to optimize the
neural network.
Let us now take a deeper look at the concept of an
activation function which is very important in a neural
network. An activation function is primarily used to
transform the output of a node in such a way that it assists
in performing backpropagation. It is also used to remove
any irrelevant components of the output and retain only the
relevant components. The Figure 3.9 below shows a
strainght-line function. Suppose we want the node to get
activated only when the output is greater than zero and to
remain deactivated when the output is less than zero. In this
case, we apply the Rectified Linear Unit Function
(Relu).

Figure 3.9: Straight line prior to applying activation function

The ReLU function basically retains the output only if the


value is greater than zero otherwise it converts all other
values to zero. Mathematically, it is expressed as (refer to
Figure 3.10):
ReLU
f(X) = max(0,X)
Figure 3.10: Straight line after applying ReLU Activation Function

The figure above shows the transformed output after


applying the ReLU activation function. Apart from ReLU,
there could be several other activation functions depending
upon the type of requirement. The most popular one is the
sigmoid function which we have already seen in logistic
regression. This sigmoid function basically transforms any
output to a nonlinear distribution between 0 and 1. Hence, it
becomes easier to perform backpropagation using the chain
rule of derivatives.

Types of neural networks


The Artificial Neural Network (ANN) that we saw in the
previous section can be divided into different types
depending upon the type of data that it is trained on and
the type of application that it is being used for. In this
chapter, we shall only take a look at the important ones at a
high level:
• Convolutional Neural Network: A Convolutional
Neural Network (CNN), also known as CovNet is a sp
ecial type of neural network where we modify a conve
ntional Artificial Neural Network (ANN) in such a w
ay that it is efficiently able it to read and extract infor
mation from image files. A CNN has three layers: a co
nvolutional layer, a pooling layer and a fully connecte
d layer.
The convolutional layer is used to enable sparse
interaction as opposed to a regular neural network
where every output node interacts with every input
node. While processing information in an image that
contains millions of pixels, one needs only fewer
parameters to extract meaningful information from
the image which might be sought from only tens or
thousands of pixels. A convolution layer contains a
kernel which is smaller than the image and slides
throughout the image to form a representation of the
receptive region. This process of sliding is called
convolution. This kernel is basically a matrix
initialized to arbitrary random values that performs a
dot product with an equivalently sized portion of the
image that it is interacting with, thus forming an
image representation of the region.
The pooling layer shrinks the output at certain
locations by a representational statistic. This is
required in order to reduce the computational time
associated with a large size. Most commonly, the
maximum value from the neighborhood is reported,
which is called max pooling.
The fully connected layer is just like a regular neural
network layer where nodes are fully connected with
preceding and succeeding layers.
The convolutional and pooling layers use the ReLU
function as activation function whereas the fully
connected layer uses the softmax function as an
activation function. The Figure 3.11 below would
make a CNN clear:

Figure 3.11: Convolutional Neural Network

• Recurrent Neural Network: In regular neural netwo


rks or convolutional neural networks, once the output
is produced, the input is forgotten or in other words, it
does not retain the input in memory. This shortcoming
is addressed by using Recurrent Neural Networks
(RNN) which proves to be useful in applications like t
ext processing, natural language understanding, lang
uage translation and sequential output generation. In
an RNN, every node has the potential to store the valu
e from the previous operation. This enables the netwo
rk to utilize the information from prior input to process
current input and output. RNNs are typically used in a
pplications like Google Translate and Siri. Figure 3.12
below shows the architecture of a Recurrent Neural Ne
twork:
Figure 3.12: Recurrent Neural Network

• Long Short-Term Memory: Long Short Term Mem


ory (LSTM) is a special type of Recurrent Neural N
etwork (RNN) that addresses the problem of long ter
m dependencies associated with a traditional RNN. Tra
ditional RNNs can make predictions from recent infor
mation more rapidly than information stored in long te
rm memory. Hence, traditional RNN cannot give efficie
nt performance on information that has been stored in
long term memory. LSTM network contains a memory
cell that can handle information for extended periods
of time, thus making it suitable for speech recognition
and language translation tasks.
We have now seen the main types of neural networks in this
section that give us an idea of the different variations that a
neural network can be subjected to in order to realize
different objectives.
The most popular applications of RNNs and LSTM is natural
language processing (NLP) which is gaining momentum
in the contemporary era due to applications like Siri, Alexa,
Google Translator and of course intelligent chatbots like
ChatGPT! The next section would provide an overview on
natural language processing.

Natural language processing


Natural language processing (NLP) is a branch of
artificial intelligence where computers are trained in such a
way that they understand language used by people in a way
that it is spoken and written. NLP takes place in two phases:
• Data processing: This is the phase where text data i
s transformed or cleaned and brought to a format whe
re machines can analyze it. This includes stemming a
nd lemmatization, where words are broken down to th
eir root forms; tokenization where the text data is brok
en down to smaller units; part of speech tagging wher
e words in a text are identified based on their part of s
peech like nouns, pronouns, adjectives, and verbs; sto
p word removal where commonly used words are rem
oved from the text so that only those key words that c
onvey meaning are retained.
• Algorithmic processing: Once the data is processed
as discussed above, it is fed to algorithms to process i
t further so that it is trained further. Sometimes, simpl
e rules may be used wherever possible but most of th
e training happens with the use of complex neural net
works like RNN and LSTM that we have seen earlier. T
hese networks then learn from repeated training and f
eedback.
Python provides popular packages like NLTK and Spacy to
execute natural language processing. NLTK is popularly used
in the creation of chatbots.
Transformers and large language
models
Transformers have been gaining popularity in recent times
because the world is witnessing the rise of intelligent
chatbots like ChatGPT which are based on the transformer
architecture. GPT is the abbreviation for Generative Pre-
Trained Transformer. Transformers are used wherever an
input sequence is required to be transformed to an output
sequence like text to speech transformation. Transformers
make use of Convolutional Neural Networks. However, they
also make use of the ‘attention’ mechanism. This consists of
an encoder and a decoder. The encoder basically transforms
the input data into a numerical representation which is
called as the hidden state. This captures important
information from the input data. This hidden state is then
fed into the decoder network that generates the output.
The most popular application of the transformer architecture
is a LLM. These are models that are fed on voluminous
amounts of data using unsupervised learning where the
model learns by itself. The latest buzzword ChatGPT is an
example of a Large Language Model. It has been trained on
massive dataset of 570 GB on data until September 2021.
This data consists of scientific journals, Wikipedia, literary
work, books, and other sources.
Large language models can be used for generating content,
summarizing articles, suggesting an approach to solve
regular mathematics and Physics problems, suggesting
itinerary for a trip and many more!
In this way, neural networks fine tune themselves to
produce wonderful mechanisms that assist us in variety of
tasks and applications.

Conclusion
This chapter has laid us a strong foundation of artificial
intelligence and machine learning concepts by taking us on
a progressive journey starting from the history of AI to basic
ML concepts and later from deep learning and neural
networks to NLP and Transformers. This foundation should
now enable us to develop algorithms on our own using
Python libraries in later chapters to come that are specially
devoted to machine learning and deep learning.
In order to develop and train good ML models, the first thing
we need is a good dataset. This data is obtained from
numerous sources, the most common one being web pages,
which makes extraction of data from web, or web scraping
the most essential skill before starting any ML project.
While extracting data from web pages, it becomes easy if
the web API is provided to us. However, in the absence of
such API, one should not miss out on the opportunity to
extract web page data. In the next chapter, we shall explore
the concept of web scraping, which is the process of
extracting data from web pages without using any API.
Hence, without wasting further time, let us jump into the
next chapter on web scraping using Python!
Chapter 4
Automating Web
Scraping

Introduction
This chapter discusses the process of extracting data from
the web using Python. Data is the central part of all projects
in machine learning and most of the data for learning
projects or real-world applications is sourced from the web.
Hence, knowledge of web scraping techniques is
fundamental to the process of implementing projects in AI,
ML, or data science. This chapter introduces the reader to
web scraping and discusses some popular Python libraries
like Requests and Beautiful Soup that are required to
perform web scraping. The chapter continues with salient
features of web scraping like inspection and extraction. The
chapter ends with a real-world use case project.

Structure
The chapter covers the following topics:
• What is web scraping
• Popular Python libraries for web scraping
∘ The request module in Python
∘ The Beautiful Soup Library
∘ Inspecting the web page
• Extracting information from the web page
• Legal considerations of web scraping
• Practical use case in Python

Objectives
This chapter will prepare the reader to comfortably use
Python libraries to perform web scraping for regular data
gathering tasks and familiarize the reader with commonly
used web scraping techniques.

What is web scraping


Web scraping is the automated process of extracting data
from any web page by accessing the HTML page of the
website. Usually, if the data that we require is small, we
conveniently copy and paste it. However, as the data
required gets voluminous, it becomes nearly impossible to
manually copy and paste it every single time. It is during
these occasions that web scraping techniques prove useful
to extract data from the web. Moreover, data found on web
pages is in an unstructured format, which means it does not
exist in a way which is systematic. Web scraping stores this
unstructured data in a structured format. Web scraping
could be used in many applications like research and
development, social media analysis or market data
gathering.
An alternative way of extracting data from a website is by
utilizing the APIs if they have been provided to developers.
However, this is the conventional and classical method and
does not come under the definition of web scraping. In web
scraping, we directly access the page and extract data from
it.

Popular Python libraries for web


scraping
Python provides very useful libraries for performing web
scraping. Below, we discuss the most popular ones but in
this chapter, we shall be focusing only on Requests and
Beautiful soup.
• Requests: This library is usually used in the first stag
e of web scraping itself because the initial step is to p
ass an HTTP request to the server of the website to ex
tract data from the website’s page.
• Beautiful soup: This is a popular Python library that
parses HTML and XML documents into a tree structure
that helps to identify and extract data.
• Scrapy: This is a web crawling library that crawls web
sites and extracts structured data from their pages. T
his library can also be used for data mining and auto
mated testing.
• Selenium: This is an open-source framework that ena
bles automated test cases for web browsers or web ap
plications.
We shall now see Requests and Beautiful Soup in further
detail with examples in Python.

The requests module in Python


This is the most used web scraping framework and as
mentioned earlier, it is usually the first step in web scraping.
This module is used to send an HTTP request which in turn
returns a response object containing the content.
In order to use this library, the first thing that needs to be
done is install it using the following command in the
anaconda prompt:
pip install requests
To learn web scraping in our exercises to follow, we shall use
the site https://toscrape.com/ which is a actually a
sandbox to learn web scraping. Since our objective is to
learn, we are bound to make multiple HTTP requests to the
site. Hence, it is good to use a sandbox than an actual site
since multiple HTTP requests to the same site might block
our IP. We shall learn about the legal considerations of web
scraping at the end of the chapter. Once we open the site by
clicking on the site link above, we would see the main page
of the site as shown in the window in the below Figure 4.1:
Figure 4.1: Main page of the sandbox website

Once Requests has been installed, the next step would we


to use it and call it using the following piece of code in the
Python editor:
import requests

x = requests.get('https://toscrape.com/')
if x.status_code == 200: #A status of 200 is returned
whenever a request is OK.A status of 404 means not found.
print(x.text)
The above piece of code would first import the requests
library, then use the get method of the requests module
and pass the URL as a parameter. In this case, we want to
see the contents of the toscrape.com webpage, which is
basically a web scraping sandbox, so we pass the respective
URL. Thereafter, the object that is returned is stored in the
variable x. Now, we use the text property of the object
stored in variable x to return the text that we wish to extract
from the webpage. The contents returned would be in an
html code format.

The Beautiful Soup Library


This library parses HTML and XML documents into a tree
structure.
Below is a link to the documentation of Beautiful soup:
https://beautiful-soup-4.readthedocs.io/en/latest/
To use this library, install it using the following command:
pip install bs4
We would need to extract the webpage content first using
the requests module before converting it into a Beautiful
Soup object. Hence, we shall slightly modify the earlier
piece of code itself and then build upon that.
We retain the first two lines of the earlier piece of code as
shown below:
import requests

x = requests.get('https://toscrape.com/')
We would now want to import the Beautiful Soup library
using the following code:
from bs4 import BeautifulSoup
Next, we write the following piece of code:
if x.status_code == 200:
objSoup = BeautifulSoup(x.content, 'html.parser')
Note that instead of using x.text as done in the previous
exercise, we use x.content. The former is the response
content in Unicode whereas the latter one is the response
content in bytes.
Here, we have created a BeautifulSoup object named
objSoup by passing the below two arguments:
• x.content : This is the HTML object that is obtained e
arlier.
• html.parser : This is the HTML parser that we would
want to use.
Beautiful Soup supports the usage of multiple parsers. The
one that we have used here is the HTML parser from Python
included in the Python standard library.
The Table 4.1 below lists the various parsers that are
supported by Beautiful Soup and the way in which they are
used:

No Parser Usage Method


1 Html5lib BeautifulSoup(x
.content,
'html5lib')
2 Python BeautifulSoup(x
standard HTML .content,
parser 'html.parser')
3 lxml – HTML BeautifulSoup(x
parser .content, 'lxml')

4 lxml – XML BeautifulSoup(x


parser .content, 'xml')
Table 4.1: Parsers and their usage

To use the xml parser, we would need to install the lxml


library.

Inspecting the web page


In order to be able to effectively extract data from a web
page using Beautiful soup, we need to know how different
elements in the web page are organized. This is necessary
because to navigate to a particular element in the webpage,
we need to know its location within the HTML tree. The way
to do this is to right click anywhere in the page and select
the last option Inspect. The below Figure 4.2 shows a
snapshot of this process:

Figure 4.2: Selecting ‘Inspect’ by right clicking on the webpage

Selecting Inspect would open the developer’s tool of the


browser. We shall be using Google chrome to perform this
exercise, but most of the browsers these days do have their
developer tools.
Once the developer’s tool is opened, what we see is the
Document Object Model (DOM) of the page. On
observing the code, one might think it is just plain HTML.
That is true, but the DOM slightly modifies the plain HTML
code in a format that can be understood by JavaScript code.
JavaScript is used to add behavior to the web page. DOM
uses objects to modify the HTML code in such a way that
JavaScript can understand it. For practical purposes, we can
consider whatever that we see in the developer’s tool as the
HTML code of the web page. The Figure 4.3 shows the
developer’s tool after clicking on Inspect:

Figure 4.3: Developer’s tool visible at the right after clicking on ‘Inspect’

Extracting information from the Web


Page
Now, let us try to extract some information from the web
page. Below is the code that we can type in the Python
editor:
import requests
from bs4 import BeautifulSoup

x = requests.get('https://toscrape.com/')
if x.status_code == 200: #A status of 200 is returned
whenever a request is OK.A status of 404 means not found.
objSoup = BeautifulSoup(x.content, 'html.parser')
print(objSoup.title) #This would print the title tag
print(objSoup.title.name) #This would print the name of
the tag
The output thus produced would look like this:
<title>Scraping Sandbox</title>
title
Now, let us try to extract information from the table below
which is situated at the bottom of the web page (refer to
below Figure 4.4):

Figure 4.4: Table from the web page

As before, we first right click on a sample text that says


Microdata and pagination in the table and select Inspect
to be able to navigate through the HTML tree in the
developer’s toolbox. We want to see its location within the
tree.
We observe from the HTML tree that within the body tag,
there exist three div tags of class = "row". The last div tag
contains a div tag within itself with class = "col-md-10"
within which there are further two div tags with class =
"col-md-6". The second div tag among these two with
class = "col-md-6" contains a table with class="table
table-hover". Within this table exists a body tag that
contains several tr tags. The second tr tag contains two td
tags. The second td tag corresponds to the text Microdata
and Pagination which we wanted to find!
The screenshot below would make this clear. Attached
below is a screenshot of the HTML tree obtained after right
clicking on the text Microdata and pagination and
selecting Inspect (refer to Figure 4.5):
Figure 4.5: HTML tree of the Web Page navigated to text ‘Microdata and
pagination’

Let us try to arrive at this text through code. Type the code
below in the editor:
import requests
from bs4 import BeautifulSoup
x = requests.get('https://toscrape.com/')
if x.status_code == 200:
objSoup = BeautifulSoup(x.content, 'html.parser')
We have created the BeautifulSoup object similar to the
previous exercise.
Now, let us use the find method to find the elements that
we are searching for. We first need to find the first div tag
with class="container" as this is the topmost tag within
the body. We do that using the code expression below:
objSoup.find('div',class_="container")
Within this, there are three div tags with class = "row", but
we need to access only the last one. How do we let Beautiful
Soup know this? We do this by first using the FindAll
method and then using indexing to get the nth element that
we want. Below is the modified code expression where we
use the index 2 to fetch the third element.
objSoup.find('div',class_="container").findAll('div',class_="r
ow")[2]
Now that we have understood how to navigate through the
HTML tree using the find and FindAll methods along with
indexing, let us write the complete statement of code to
navigate to the final text element that we require. Execute
the code below in the Python editor which is the final code:
import requests
from bs4 import BeautifulSoup
x = requests.get('https://toscrape.com/')
if x.status_code == 200:
objSoup = BeautifulSoup(x.content, 'html.parser')
print(objSoup.find('div',class_="container") \
.findAll('div',class_="row")[2] \
.find('div',class_="col-md-10") \
.findAll('div',class_="col-md-6")[1] \
.find('table',class_="table table-hover" \
).findAll('tr')[1].findAll('td')[1].text)
We observe that the output obtained is Microdata and
pagination. That is exactly what we wanted! However, do
not you think that this method of navigation was long and
unnecessarily tedious and complicated? Can we not have a
direct and shorter way to get this done? Of course, there is!
Since the data that we want lies in the table, why not we
just search for all tables in the HTML tree? Let us try to get a
list of all elements with the tag table using the code below:
objSoup.findAll('table')
However, we would see that the result would give us two
tables since we have a total of two tables in the web page.
In order to get the last table, we can use indexing in the
manner shown below. Here, index 0 means first and index 1
means second:
objSoup.findAll('table')[1]
Now that we have arrived at the table, we can continue with
the remaining expression as we did previously. The final
expression would look like as given below:
objSoup.findAll('table')[1].findAll('tr')[1].findAll('td')[1].text
Given below is the complete code for reference:
import requests
from bs4 import BeautifulSoup
x = requests.get('https://toscrape.com/')
if x.status_code == 200:
objSoup = BeautifulSoup(x.content, 'html.parser')
print(objSoup.findAll('table')[1].findAll('tr')[1].findAll('t
d')[1].text)
Output: Microdata and pagination

Legal considerations of web scraping


There exists no clear blueprint or a legal statement on web
scraping that denies the ethicality of the practice neither
does the scope of this book intend to discuss any grey areas
in that regard. This topic is best left to the discretion and
further research of the reader. However, as a fact, most
organizations are known to conveniently use web scraping
for analyzing customer sentiment, competitor activities,
market pricing and much more. It is important to
understand that no website would generally want to
entertain another party to scrape data from their page. If
the scraping bot gets frequently detected, it could lead to
the visitor’s IP address getting blocked from further visiting
the page, hence one needs to be careful and considerate
while performing web scraping and avoid any malevolent
intentions. If it can be proved by companies that a certain
scraping activity has caused infrastructural damage, then it
may result in legal action. Most websites usually provide a
file called robots.txt which can be obtained by just adding
\robots.txt at the end of the web page URL. It contains
certain parameters like User-agent and Disallow that
provide an idea of the extent to which crawling might be
allowed or disallowed to scrapers on their web page.

Practical use case in Python


Let us apply all that we learnt in executing a practical use
case using Python. Let us scrape the same website
https://toscrape.com/ that we used in earlier exercises.
Last time, we simply wanted to extract the single line of text
Microdata and Pagination. In this exercise, we shall try to
extract all the values in the first column titled Endpoints
from the table in the webpage shown in Figure 4.4.
As always, we shall import the required libraries and create
a BeautifulSoup object:
import requests
from bs4 import BeautifulSoup
x = requests.get('https://toscrape.com/')
if x.status_code == 200:
objSoup = BeautifulSoup(x.content, 'html.parser')
Next, we shall access the last (second) table of the web
page using the code below:
tbl = objSoup.findAll('table')[1]
We have seen previously that this table has multiple tr tags
that embed the text elements and links within it. We use the
code below to get a list of all the tr tags:
tbltr = tbl.findAll('tr')
If we inspect the HTML tree from the developer’s tool, we
find that except the first element in the table which is
basically the column header that contains the text
Endpoints, all other values in the column follow the tree
structure of two td tags. The first td tag contains the text
that we want, and the second td tag corresponds to the
next column of the table which is what we were trying to
fetch in the previous exercise. We would now use the power
of Python loops to fetch the entire list of values using the
code below:
i=0
for element in tbltr:
if not i==0:
print(element.findAll('td')[0].text)
i=+1
Here, we use an incremental i in order to ensure that the
first element is excluded.
Given below is the entire code for reference. Let us run it in
the Python editor and observe the output:
import requests
from bs4 import BeautifulSoup
x = requests.get('https://toscrape.com/')
if x.status_code == 200:
objSoup = BeautifulSoup(x.content, 'html.parser')
tbl = objSoup.findAll('table')[1]
tbltr = tbl.findAll('tr')
i=0
for element in tbltr:
if not i==0:
print(element.findAll('td')[0].text)
i=+1
Below is exactly the output that we were looking forward to:
Default
Scroll
JavaScript
Delayed
Tableful
Login
ViewState
Random
Conclusion
This chapter has kickstarted us on the practical journey of
data science because in the absence of reliable training
data, we cannot really proceed further with anything
substantial. Web pages are a source that occupy major
chunks of data and hence web scraping using the requests
module and BeautifulSoup library from Python is an
important skillset that one needs to have when building
challenging models and projects in data science. Hope you
have enjoyed the exercises and practical use cases from the
chapter and are now comfortable to explore web scraping
further by yourself!
Now that we have the data, where do we think it should be
stored? One place would be data bases and the other would-
be spreadsheets of course! Microsoft Excel is the widely
used spreadsheet software that is used to store, read,
transform, and manipulate data from spreadsheets. Do you
not think it would be worthwhile to have knowledge of
automatically having to perform commonly used tasks on
Excel? The pleasing fact is that Python provides useful
libraries to do exactly that! Hence, without wasting further
time, let us turn to the next chapter to learn how we can
utilize Python to perform automation on Microsoft Excel
spreadsheets!

Join our book’s Discord space


Join the book’s Discord Workspace for Latest updates,
Offers, Tech happenings around the world, New Release and
Sessions with the Authors:
https://discord.bpbonline.com
Chapter 5
Automating Excel and
Spreadsheets

Introduction
Microsoft Excel automation is the commonly performed
automation exercise by organizations and individuals as
Excel is the commonly used spreadsheet tool. The chapter
begins with the openpyxl library, which is a Python library to
read/write Excel 2010 xlsx/xlsm/xltx/xltm files and then
takes the reader through the process of automating some
common tasks like creating a workbook, opening a
workbook, and performing various operations in a workbook
and worksheet. The chapter continues with the description
and usage of other Python libraries used for Excel
automation like xlwings and xlsxWriter. The relative
advantage of each library over the other has been briefly
discussed. The chapter ends with a real-world use case
project involving Excel automation with Python.

Structure
The chapter covers the following topics:
• Need for automating Excel using Python
• Introduction to openpyxl library
∘ Open and modify an existing workbook
∘ Access a cell using range name
∘ Merging a cell
∘ Looping through cells
∘ Working with Excel formulae using openpyxl
∘ Create charts using openpyxl
∘ Styling a chart
• Other Python libraries for Excel automation
∘ Comparison summary of Python libraries
• Practical use case in Python

Objectives
This chapter would introduce the reader to the process of
creating basic and intermediate scripts to automate regular
Excel processes by unleashing the power of open-source
Python libraries. Thereafter, the reader would be able to
comfortably explore these Python modules independently to
create custom scripts for more complicated tasks.

Need for automating Excel using


Python
Microsoft Excel spreadsheets have been the most widely
used office tool for decades together primarily due to their
ease of use and secondarily due to the variety of other
features that they offer. Even though most of the end users
of Excel manually perform tasks, most of which typically
involve the use of formulae and then dragging them down
or copying and pasting data or tables from one sheet to
another, there always comes a point where these tasks
become repeatable and monotonous. This is exactly where
one starts perceiving the need for automation.
Fortunately, Microsoft Excel provides an inbuilt functionality
with a programming editor called Visual Basic for
Applications (VBA) that provides an environment for
writing code in Visual Basic thus enabling the automation of
small and large tasks in Excel spreadsheets using API that
have been provided by Excel for this purpose.
The advantage of this VBA editor is that it does not require
any additional cost once the user has purchased Excel since
it comes along with Excel as an in-built editor. However, it
has been observed that there are significant time
considerations to using VBA. If a programmer writes a code
that loops through a large sized data set with many rows
and columns, the individual row by row and cell by cell
navigation performed by VBA can take a lot of time to
complete, thus rendering the process inefficient.
This is exactly where Python comes into the picture with its
powerful computational capabilities. The quick tabular
manipulation of data through data frames and matrices is a
powerful functionality of Python that comes to the rescue.
Python provides open-source libraries like openpyxl, xlwings
and xlsxWriter that provide a variety of methods and
properties to read, write, transform, and analyze data from
Excel spreadsheets.
Moreover, Excel VBA is useful only when the nature of the
task is restricted within the Microsoft Office environment,
whereas Python automation can be used even the task
involves importing and exporting data from and to Excel
respectively.
Hence, it is of paramount importance for one to get
acquainted, familiar, and comfortable in automating regular
tasks in Excel using Python. In this chapter, we shall focus
on the openpyxl library and discuss the other libraries at a
high level while understanding the relative advantages of
each.

Introduction to openpyxl library


openpyxl is an open-source Python library that provides the
functionality to read, write, modify, and format an Excel
document. The advantage of using a Python library like
openpyxl is that unlike VBA, it can perform operations on an
Excel document without us having to open the Excel
application. In order to use openpyxl, the first step that
needs to be done is installation of the library using the
command below:
pip install openpyxl
Let us try a quick task of creating a workbook, writing
something in there and then saving the workbook. Execute
the code below in the Python editor:
from openpyxl import Workbook
objWorkbook = Workbook()
objWorksheet = objWorkbook.worksheets[0]
objWorksheet.cell(1,1,'This is my first task using
Openpyxl')
objWorkbook.save("Learn_Excel_Automation.xlsx")
Let us try to understand what we did in the above code:
1. Firstly, we imported the Workbook module from the
openpyxl library using the following line of code:
from openpyxl import Workbook
2. Next, we created a new Workbook() object and store
d it in a new variable named objWorkbook. Basically,
this initialized the variable objWorkbook to a blank E
xcel workbook using the following line of code:
objWorkbook = Workbook()
3. Thereafter, we accessed the first worksheet of the ne
wly created workbook using the worksheets method.
This method returns a list of all the worksheets in the
workbook. We accessed the first worksheet by passing
the list index 0 using the following line of code:
objWorksheet = objWorkbook.worksheets[0]
4. The next thing we did is we wrote a text This is my fi
rst task using Openpyxl in the first cell of the workb
ook. We did this by accessing the cell object of the w
orksheet using the following line of code:
objWorksheet.cell(1,1,'This is my first task using Ope
npyxl')
5. Here, cell takes in the following parameters: Row, Col
umn, Value. In this case, Row is 1, Column is 1 and Val
ue is This is my first task using Openpyxl.
6. Lastly, we saved the workbook in the home directory
using the following line of code:
objWorkbook.save("Learn_Excel_Automation.xlsx")
We created our first Excel workbook, modified it and saved
it. If you check the directory in which your Python file is
saved, you will find an Excel workbook named
Learn_Excel_Automation.xlsx. On opening the file, you
would see the text, This is my first task using Openpyxl
written in the first cell of the first worksheet as shown in the
Figure 5.1 below:
Figure 5.1: First exercise using openpyxl

Open and modify an existing


workbook
We shall now see how to modify an existing workbook. Let
us open the same workbook
Learn_Excel_Automation.xlsx that we saved in the
previous exercise and then modify its contents, all using
Python. Type the code below in the Python editor:
from openpyxl import load_workbook
objWorkbook =
load_workbook("Learn_Excel_Automation.xlsx")
objWorksheet = objWorkbook.worksheets[0]

objWorksheet.cell(3,1,'This is my second task in openpyxl')


objWorkbook.save("Learn_Excel_Automation.xlsx")
Here, we used the load_workbook module to load the
workbook Learn_Excel_Automation.xlsx. The method of
accessing the worksheet and entering values in the cell
remains the same. Note that openpyxl allows one to modify
the contents of a cell using an alternative method as well
which is shown below:
objWorksheet.cell(3,1).value = 'This is my second task in
openpyxl'
Here, ‘value’ is a get as well as set property, which means
that we can use it to return a value at a particular cell as
well as assign a value to a particular cell.
On opening the workbook, we would see the text that we
wrote as shown in the Figure 5.2 below:

Figure 5.2: Modifying contents of an existing Excel workbook

Access a cell using Range name


Previously, we saw how to access a cell and modify its
contents using worksheet.cell.
Now, we shall have a look at another method to execute the
same task using the code below. We shall load the same
workbook that we saved in the previous exercise:
from openpyxl import load_workbook
objWorkbook =
load_workbook("Learn_Excel_Automation.xlsx")

objWorksheet = objWorkbook.worksheets[0]
objWorksheet['A5'] = 'I have modified the contents of this
cell using the range name!'
objWorkbook.save("Learn_Excel_Automation.xlsx")
Here, we used objWorksheet['A5'] to access cell having
range A5 and modified its contents. On running the code in
the Python editor and opening the workbook, we would see
the text written in cell range A5 as shown in the Figure 5.3
below:

Figure 5.3: Using name range to modify contents of a workbook

Merging cells
In order to see how to merge cells, we shall open the same
workbook that we saved in the earlier exercise and merge
cells. Type the code below into the Python editor:
from openpyxl import load_workbook
from openpyxl.styles import Alignment
objWorkbook =
load_workbook("Learn_Excel_Automation.xlsx")
objWorksheet = objWorkbook.worksheets[0]

objWorksheet.merge_cells('A1:D2')
cell = objWorksheet.cell(1,1)
cell.alignment = Alignment(horizontal='center',
vertical='center')
objWorkbook.save("Learn_Excel_Automation.xlsx")
In this code, the function merge_cells would merge the
cells contained within range A1:D2 into a single cell. The
text contained in the first cell This is my first task using
Openpyxl would be retained in the merged cell. In order to
align the text at the center, we have imported another
module called Alignment and passed the parameters
horizontal='center', vertical='center' to align the
contents at the center. You would agree that this is the same
as manually performing ‘Merge and Center’ in Excel. Here,
we have used openpyxl to automate this process.
On opening the saved workbook, you would see the cell
range A1:D2 merged into one cell as shown in the Figure 5.4
below:

Figure 5.4: Merging cells using openpyxl merge_cells

Looping through cells


Let us now loop through cells and enter content in cells
while looping.
We shall open a blank Excel workbook and manually create
a table as shown in Figure 5.5. The table is a summary of
scores obtained in Mathematics, Science and English by five
students ‘A’, ‘B’, ‘C’, ‘D’ and ‘E’. Thereafter, we shall save
the workbook by the name ‘Student Table.xlsx’ and close
the workbook (refer to Figure 5.5):

Figure 5.5: Student table


Now, copy the code below into the Python editor:
from openpyxl import load_workbook
from openpyxl.styles import Alignment

objWorkbook = load_workbook("Student Table.xlsx")


objWorksheet = objWorkbook.worksheets[0]

dict_scores = {'A':
{'Mathematics':45,'Science':55,'English':75},\

'B':{'Mathematics':75,'Science':87,'English':87},\
'C':{'Mathematics':90,'Science':95,'English':58},\

'D':{'Mathematics':100,'Science':28,'English':80},\
'E':{'Mathematics':35,'Science':75,'English':90}}
col=2
for key in dict_scores:
student = dict_scores[key]
row=2
for subject in student:
score = student[subject]
objWorksheet.cell(row,col).value = score
objWorksheet.cell(row,col).alignment = \
Alignment(horizontal='center', vertical='center')
row+=1
col+=1
objWorkbook.save("Student Table.xlsx")
Here, we first create a dictionary named dict_scores that
stores the scores of every student in each of the subjects
Mathematics, Science and English. This dictionary contains
keys ‘A’, ‘B’, ‘C’, ‘D’,’E’ that correspond to the students.
Every value corresponding to this key is also a dictionary
that contains keys as subjects ‘Mathematics’, ‘Science’,
‘English’ and values as the respective scores obtained by
the student in these subjects. Next, we loop through each
student key and within this loop we further loop through
every subject key to obtain the scores of every student in
that subject.
Study the loop in the code carefully to see how the loop
navigates through individual columns for each student and
through every row within each column while entering the
scores. After running the code, open the workbook Student
Table.xlsx. You would observe that the scores have been
entered as shown in the Figure 5.6 below:

Figure 5.6: Students table populated with scores using Python loop

In this section, we saw how worksheet.cell() in openpyxl


can accept row and column parameters as dynamic inputs
and enable one to navigate through the worksheet populate
values by using Python loops.
Next, we shall see how openpyxl can be used to populate
Excel formulas that we regularly use.

Working with Excel formulae using


openpyxl
In this section, we would learn how to populate an Excel
formula in a particular cell using openpyxl library and
thereafter we shall also learn how to drag the formulae to
adjacent cells.
We shall use the same workbook Student Table.xlsx that
we saved in the previous exercise. Copy the code below in
the Python editor:
from openpyxl import load_workbook
from openpyxl.styles import Alignment
objWorkbook = load_workbook("Student Table.xlsx")
objWorksheet = objWorkbook.worksheets[0]
objWorksheet['B5']="=ROUND(AVERAGE(B2:B4),0)"
objWorksheet['B5'].alignment =
Alignment(horizontal='center',\
vertical='center')
objWorkbook.save("Student Table.xlsx")
Here, the line of code that inserts an Excel formula at cell B5
is:
objWorksheet['B5']="=ROUND(AVERAGE(B2:B4),0)".
Basically, the formula is trying to calculate the average of
the scores earned in all the three subjects by student A and
then round the result to the nearest integer. Once you
execute the code, open the workbook and observe cell B5.
You would see that the average value rounded to the
nearest integer would be populated in that cell. For student
A, the average of the scores 45, 55 and 75 earned in
Mathematics, Science and English respectively comes out to
be 58.333. Rounding it to the nearest integer gives 58 which
is exactly the value that we see in cell B5 as shown in the
Figure 5.7 below:
Figure 5.7: Average score of student A calculated in cell B5 using openpyxl

Now that we have successfully populated the formula in one


cell, the next thing we would want to do is drag the formula
to the remaining cells C5, D5, E5 and F5 so that we could
compute the average scores of students B, C, D and E as
well, rounded to the nearest integer.
When we manually work with Excel spreadsheets, we simply
copy a cell and paste it to another cell, so the formula gets
automatically updated in that cell. However, we would need
a different work around when we perform this exercise
through code using openpyxl. Fortunately, openpyxl
provides a module called Translator that does this task for
us.
Add the line below to the previous code:
from openpyxl.formula.translate import Translator
This line imports the Translator module. Now let us
implement it using the code below:
objWorksheet['C5'] =
Translator("=ROUND(AVERAGE(B2:B4),0)",\
"B5").translate_formula("C5")
The Translator module takes in two parameters, the formula
to be translated and the cell range where the formula
originally belongs. Thereafter, the translate_formula
method of the Translator module takes in the cell range as
an argument where we would want to transfer the formula.
In this case, it is C5 because we want to transfer the
formula to range C5. The complete code is given below.
Paste it in a fresh editor.
In case the workbook Student Table.xlsx is open, please
close the workbook and run the code:
from openpyxl import load_workbook
from openpyxl.styles import Alignment
from openpyxl.formula.translate import Translator
objWorkbook = load_workbook("Student Table.xlsx")
objWorksheet = objWorkbook.worksheets[0]
objWorksheet['B5']="=ROUND(AVERAGE(B2:B4),0)"
objWorksheet['C5'] =
Translator("=ROUND(AVERAGE(B2:B4),0)",\
"B5").translate_formula("C5")
objWorksheet['C5'].alignment =
Alignment(horizontal='center',\
vertical='center')
objWorkbook.save("Student Table.xlsx")
After executing the above code, open the workbook
Student Table.xlsx. We would observe that the formula
from cell B5 has been updated to cell C5 (refer to Figure
5.8):
Figure 5.8: Formula from cell B5 updated to cell C5 using openpyxl Translator.

The cell C5 correctly reflects the formula


=ROUND(AVERAGE(C2:C4),0) thus making it clear that
the formula from cell B5 has been correctly dragged to C5.
Similarly, we can use the Translator module to update the
formula in cell D5, E5 and F5.
However, would this be the most efficient way of updating
the formulae in subsequent cells? Here, we have just three
more cells to update the formulae but suppose we had one
hundred such cells to update then would we be calling the
Translator module one hundred more times and write a
hundred lines of additional code? Definitely not! And that is
where the power of Python loops comes into picture.
Unlike what we did earlier, we would start from cell C5 and
loop through four more cells to update the formula in each
of the cells so that we end at cell E5. Here, the row remains
the same which is 5. It is only the column that would
change. In this case, the Translator module just needs to
know the column letter, whether it is ‘C’, ‘D’, ‘E’ or ‘F’. The
code below would make it clear how we accomplish this
task:
row=5
col=3
translator =
Translator("=ROUND(AVERAGE(B2:B4),0)","B5")
for i in range(4):
str_col_letter = objWorksheet.cell(row,col).column_lette
r
objWorksheet[str_col_letter + '5'] =translator.translate_f
ormula(str_col_letter + "5")
objWorksheet.cell(row,col).alignment = Alignment(horiz
ontal='center',vertical='center')
col+=1
objWorkbook.save("Student Table.xlsx")
Here, we initialize the row variable to the value 5 which
remains constant. The col variable is initialized to 3 but gets
incremented as we loop through successive columns. Note
that we are looping through a range of 4 since we need to
navigate four more columns to update the formula from
column B. Then we declare the translator object. Note that
we have declared the translator object outside the loop
here since it needs to be initialized only once and simply
called in the loop thereafter.
The variable str_col_letter is an important highlight of this
code. We use the column_letter attribute of the cell object
which returns the letter name of the column. For example,
cell(5,3) corresponds to row 5 and column 3. In an Excel
spreadsheet, this cell corresponds to range C5 so the
column_letter attribute shall return C in this case. In a
similar way, we capture the column_letter attributes for
the successive cells as we loop through them and
concatenate them with the row number to get the range
name. In this case, since the row number is constant which
is 5, we simply pass the string str_col_letter + '5' into the
Translator object. The full code is given below. Close the
workbook Student Table.xlsx if it is already open, paste
the code below into the Python editor and run the code
(refer to Figure 5.9):
from openpyxl import load_workbook
from openpyxl.styles import Alignment
from openpyxl.formula.translate import Translator
objWorkbook = load_workbook("Student Table.xlsx")
objWorksheet = objWorkbook.worksheets[0]
objWorksheet['B5']="=ROUND(AVERAGE(B2:B4),0)"
row=5
col=3
translator =
Translator("=ROUND(AVERAGE(B2:B4),0)","B5")

for i in range(4):
str_col_letter = objWorksheet.cell(row,col).column_lette
r
objWorksheet[str_col_letter + '5'] =translator.translate_f
ormula(str_col_letter + "5")
objWorksheet.cell(row,col).alignment =
Alignment(horizontal='center',vertical='center')
col+=1
objWorkbook.save("Student Table.xlsx")
Figure 5.9: Excel formula dragged to other cells using openpyxl Translator with
loops

We observe that the formula has been correctly updated till


cell ‘F5’. In this way, we have successfully learnt how to
update formulae using the Translator module of the
openpyxl library and Python loops.
Now, we shall try to make this information more useful by
making it visually appealing. In the next section, let us
unleash the functionality of openpyxl that enables us to
create charts.

Create charts using openpyxl


We shall use the table from the same workbook ‘Student
Table.xlsx’ to create charts. Let us plot a bar chart that
shows the average score obtained by each student.
Copy the code below into the Python editor:
import openpyxl
from openpyxl import load_workbook
from openpyxl.chart import BarChart,Reference
objWorkbook = openpyxl.load_workbook('Student
Table.xlsx')
objWorksheet = objWorkbook.worksheets[0]
values = Reference(objWorksheet,min_col=2, min_row=5,
max_col=6, max_row=5)
titles = Reference(objWorksheet,min_col=2, min_row=1,
max_row=1, max_col=6)
chart = BarChart()
chart.add_data(values,titles)
chart.set_categories(titles)
objWorksheet.add_chart(chart, "H7")
objWorkbook.save("Student Table.xlsx")
First, we import the modules BarChart and Reference from
openpyxl.chart.
The Reference module is used to let the chart object know
about the correct references of data and labels.
We create a variable named ‘values’ which stores the data
that we shall be using to plot the bar chart. In this case, we
are passing the entire 5th row from column B to column F
which is basically the data that we are interested in. We do
this by specifying the parameters min_col=2, min_row=5,
max_col=6, max_row=5.
We would also want to pass the first row from column B to
column F which is basically the names of the students ‘A’,
‘B’, ‘C’, ‘D’ and ‘E’ as labels. We do this by creating a
variable named ‘titles’ and passing the values min_col=2,
min_row=1, max_row=1, max_col = 6.
Thereafter, we create the chart variable as a BarChart
object and pass values and titles as its parameters in the
add_data method.
We also need to specify the titles in the chart by using the
set_categories method and passing the titles variable as
the parameter.
We finally use the add_chart method in the worksheet
object to add the chart in the worksheet and also specify the
location where we need to add it. In this case, we have
specified H7 as the cell where we would want to place the
chart.
After running the code, we open the workbook Student
Table.xlsx. We observe that the bar chart has been created
as shown in the Figure 5.10 below:

Figure 5.10: Bar Chart showing the average score of each student

Styling a chart
The chart that we have created in the previous exercise is a
basic chart. However, we would also want to explore the
functionality that openpyxl provides in adding various styles
to the chart like color, border, and format.
Copy the code below into the Python editor:
import openpyxl
from openpyxl import load_workbook
from openpyxl.chart import BarChart,Reference
objWorkbook = openpyxl.load_workbook('Student
Table.xlsx')
objWorksheet = objWorkbook.worksheets[0]
chart = objWorksheet._charts[0]
chart.style = 8
objWorkbook.save("Student Table.xlsx")
Close the workbook ‘Student Table.xlsx’ and run the code.
On opening the workbook, we observe from the Figure 5.11
below that the chart color has been changed:

Figure 5.11: Exploring different chart colors using the ‘style’ attribute

Basically, we have accessed the chart from the worksheet


using the _charts attribute of the worksheet and passed an
index key 0 as there is just one chart in the worksheet.
Thereafter, we used the style attribute of the chart object
and assigned a value of 8 to obtain a chart color
corresponding to 8 as shown in Figure 5.11. Every number
would correspond to a specific color which is left to the
reader to explore.
Let us also add labels to the X axis and Y axis so that it
becomes easy to understand that they correspond to
students and their scores respectively. Close the Student
Table.xlsx workbook, run the code in the editor. We would
observe that the X Axis and Y Axis have been assigned the
Students and Scores labels respectively (refer to Figure
5.12 below):
import openpyxl
from openpyxl import load_workbook
from openpyxl.chart import BarChart,Reference
from openpyxl.chart.layout import Layout, ManualLayout
objWorkbook = openpyxl.load_workbook('Student
Table.xlsx')
objWorksheet = objWorkbook.worksheets[0]
chart = objWorksheet._charts[0]
chart.style = 8
chart.x_axis.title = 'Students'
chart.y_axis.title = 'Scores'
chart.legend.position = 'tr'
objWorkbook.save("Student Table.xlsx")

Figure 5.12: Adding Labels to the X Axis and Y Axis

Here, we have used the title attribute of the x_axis and


y_axis objects of the chart to assign labels to the axes. We
have also used the position attribute of the legend object
of the chart to denote the position where we would want to
place the Series 1 legend. We have assigned tr as the
value of this property which stands for top right. As
observed in the figure below, the legend has actually moved
to top right.
Finally, let us add a title to the chart. We just need to add
one extra line of code which is
chart.title = "Average Scores of Students".
Close the workbook Student Table.xlsx and run the code
below in the Python editor:
import openpyxl
from openpyxl import load_workbook
from openpyxl.chart import BarChart,Reference
from openpyxl.chart.layout import Layout, ManualLayout
objWorkbook = openpyxl.load_workbook('Student
Table.xlsx')
objWorksheet = objWorkbook.worksheets[0]
chart = objWorksheet._charts[0]
chart.style = 8
chart.x_axis.title = 'Students'
chart.y_axis.title = 'Scores'
chart.legend.position = 'tr'
chart.title = "Average Scores of Students"
objWorkbook.save("Student Table.xlsx")
Open the workbook Student Table.xlsx and observe the
chart.
We find that the chart now has a title Average Scores of
Students as shown in the Figure 5.13 below:

Figure 5.13: Title ‘Average Scores of Students’ added to the chart

With this, we complete our set of tutorials on charts! The


exercises that we went through should provide a sufficient
understanding to the reader to further explore the
functionalities that openpyxl provides for creating and
styling charts.
We have seen how openpyxl is an extremely useful library
that provides one with the ability to read, edit, transform
and analyze excel spreadsheets. There are numerous other
modules in openpyxl that provide additional functionalities
to the user. The foundation laid in this chapter should
provide the reader with a sufficient basis to independently
explore the other capabilities of openpyxl.
With this, we shall now proceed to explore other open-
source libraries provided by Python that are alternatively
used in place of openpyxl to automate regular tasks in
Excel. These libraries are xlwings and xlsxWriter.

Other Python libraries for Excel


automation
We shall explore the xlwings and xlsxwriter libraries in this
section.
xlwings: In order to use this library, one needs to firs
• t install it using the command below:
pip install xlwings
This library is especially useful when it comes to Excel
formulae. Let us try to insert formulae in the same St
udent Table.xlsx workbook.
In cell G2 of the workbook, let us insert the formula =
ROUND(AVERAGE(B2:F2),0) which would basically
calculate the average score across a particular subject
earned by all students. We also want to drag this form
ula down until cell G4. Copy the code below into the P
ython editor:
import xlwings as xlw
obj_excel_app = xlw.App(visible=False)
objWorkbook = xlw.Book('Student Table.xlsx')
objWorksheet = objWorkbook.sheets[0]
formula = '=ROUND(AVERAGE(B2:F2),0)'
objWorksheet.range("G2,G3,G4").formula = formula
objWorkbook.save()
objWorkbook.close()
obj_excel_app.kill()

Note: Here, we first initialize a variable called


as ‘obj_excel_app’ to the ‘App’ object of ‘xlwin
gs’ and set the ‘visible’ property to ‘False’. Th
e way ‘xlwings’ works is it actually creates an
Excel Application and opens the workbook at r
eal time while performing tasks on the workbo
ok. This is unlike ‘openpyxl’ where everything
can be executed without opening the Excel ap
plication. If we set this ‘visible’ property to ‘Tr
ue’, which is the default value, the user can ob
serve the workbook opened by xlwings.

Observe that xlwings provides a simple way to insert


a formula where we just assign the formula string to t
he formula property of the range object. In the rang
e object, we simply specify which cells we want to assi
gn the formula to. In this case, we specify G2, G3 and
G4 as the ranges since that is where we want to assig
n the formulae.
On running the code, we find that the formulae have s
uccessfully been assigned to ranges G2, G3 and G4 as
shown in the Figure 5.14 below:

Figure 5.14: Using xlwings to calculate average across a subject by all


students

Lastly, we save the workbook and close the workbook.


Finally, we also kill the Excel application object obj_ex
cel_app that we created at the beginning. This step is
necessary because the Excel application object remai
ns open in the background even after the workbook is
closed, which might cause issues while trying to manu
ally open another workbook.
In this way we learnt how xlwings provides a simple f
unctionality to insert formulae in an Excel worksheet.
• xlsxwriter: The first step to using this library is to ins
tall it using the command below:
pip install xlsxwriter
Let us try a sample task in xslxwriter of entering a tex
t in the first cell of the worksheet. Copy and run the c
ode below in the Python editor:
import xlsxwriter as xr
objWorkbook = xr.Workbook('Learn_Writer.xlsx')
objWorksheet = objWorkbook.add_worksheet('Exerci
se')
objWorksheet.write('A1','This is my first tutorial in xl
sxwriter!')
objWorkbook.close()
The Figure 5.15 below shows the output obtained on r
unning the code above:

Figure 5.15: Using xlsxwriter to write text in a worksheet

We find that we were able to write text, This is my fir


st tutorial in xlsxwriter! in cell A1 of the worksheet
named Exercise of the workbook Learn_Writer.xlsx
that we created using xlsxwriter. Here, the write met
hod of the worksheet object is really helpful as it can
accommodate both the cell reference and the content
that we want to write as parameters within itself.
It is important to note that the library xlsxwriter can b
e used only to create a fresh Excel workbook and cann
ot be used to open and edit Excel workbooks that alre
ady exist. However, it is a very useful library when it c
omes to speed and memory considerations. Also, it pr
ovides very useful features like formatting, filtering, i
mporting images and creating charts. In fact, the num
ber of features this provides is more than any other lib
raries that Python provides for Excel automation.
One of the big advantages of this library is also that it
produces files that are 100% equivalent to files produ
ced by Microsoft Excel, thus having a high degree of fi
delity with files produced by Excel. Hence, apart from
one shortcoming that it cannot edit existing Excel wor
kbooks, this library is great for Excel automation on all
other fronts.
In this section, we explored the xlwings and xlsxwriter
libraries which have their own advantages as compare
d to openpyxl when it comes to specific requirements.
In the next section, we shall go through a quick comp
arison table of these libraries so that the user could le
verage the table as a ready reference while taking a d
ecision about the most optimum library to use for aut
omating an Excel application.

Comparison summary of Python


libraries
Below is a table that provides a comparison based on salient
parameters of all the libraries that we studied so far. This
Table 5.1 would be handy while deciding about the most
optimum Python library specific to our task of Excel
automation:

Parameter openpyxl xlwings xlsxwriter


Parameter openpyxl xlwings xlsxwriter
Launchin Works at Launches Creates a
g Excel the the Excel new Excel
backend application workbook
without while but can
launching performing work at the
Excel tasks. backend
application without
. launching
the Excel
application
.
Unique Ability to Provides Covers
advantag create, short and numerous
es read, edit smart features in
and write execution Excel that
Excel files for specific are not
with features covered by
variety of like Excel other
features. formulae. Python
libraries.
Parameter openpyxl xlwings xlsxwriter
Shortcom Runs slow It opens It cannot
ings when the Excel open or
working application edit an
with large while already
datasets or performing existing
formulae. tasks. The workbook.
user needs It can only
to set the be used to
‘visible’ create
property of fresh
the Excel workbooks.
App object
created by
xlwings to
‘False’ to
hide the
object.
Table 5.1: Comparison summary of Python libraries for Excel automation

With this, we have completed our learning and exercises of


Excel automation with Python. Let us now apply whatever
we have learnt to execute a practical use case and then
conclude the chapter.

Practical use case in Python


Let us apply all that we learnt in the previous exercises to
implement a practical use case in Python where we shall
make use of all the libraries xlsxwriter, openpyxl and
xlwings.
Let us create a new Excel workbook called Stock
Prices.xlsx. We shall be analyzing the stock prices of five
indices ‘A’, ‘B’, ‘C’, ‘D’ and ‘E’ through the months of
January to December in this Excel workbook.
To begin with, we need to create a new workbook and enter
this data in a tabular format. We do this by creating a
dictionary in Python and then using xlsxwriter to transfer
the dictionary values to the Excel worksheet. Copy and run
the code below in the Python editor:
import xlsxwriter as xw
objWorkbook = xw.Workbook('Stock Prices.xlsx')
objWorksheet = objWorkbook.add_worksheet('Prices')

dict_Stock_Prices = {'A':[357, 457, 187, 831, 779, 338, 129,


508, 407, 748, 511, 609],\
'B':[14908, 13408, 17103, 18886, 19828, 12098,
17080, 16850, 15023, 12405, 15469, 13800],\
'C':[60,64,54, 93, 87, 74, 96, 92, 83,85,70,88],\

'D':[1667, 1962, 1845, 1535, 1753, 1767, 1551,


1893, 1715, 1707, 1627, 1532],\
'E':[2181, 2333, 2265, 2274, 2739, 2601, 2569,
2520, 2744, 2234, 2836, 2230]}

month_list =
['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov',
'Dec']
stock_list = ['A','B','C','D','E']

objFormat_1=objWorkbook.add_format({'border':1})
objFormat_2=objWorkbook.add_format({'border':2})

objWorksheet.write_column('A2', stock_list,objFormat_2)
objWorksheet.write_row('B1', month_list, objFormat_2)

row=1
for stock in dict_Stock_Prices:
stock_prices = dict_Stock_Prices[stock]
objWorksheet.write_row(row,1, stock_prices, objFormat_
1)
row+=1

objWorkbook.close()
Here, we first prepare a dictionary having keys as the stock
names ‘A’, ‘B’, ‘C’, ‘D’, ‘E’ and values as lists containing the
stock prices from Jan to Dec.
We also prepare two lists called ‘month_list’ containing the
month names and stock_list containing the stock names.
Next, we create two variables objFormat_1 and
objFormat_2 where we specify the formats of the border
that we pass as arguments to the write method of the
worksheet object. Here, ‘border':1 is a regular border while
'border':2 is a bold border.
Next, we call the write_column and write_row methods of
the worksheet object and pass the variables stock_list and
month_list respectively along with the border format
variable objFormat_2.
Here, it is interesting to note how xlsxwriter provides useful
methods like write_row and write_column that enables
one to directly pass an entire row or column data instead of
looping through individual values. Likewise, we also saw
how xlsxwriter provides excellent formatting features like
regular border and bold border creation using one single line
of code.
At the end of the code, we use the write_row method while
looping through every stock to enter the respective monthly
values for the stock. Here, we use objFormat_1 which is
regular border.
After running the code, we open the saved workbook and
observe that the dataset that we require has been created
in the tabular format as shown in the Figure 5.16 below:

Figure 5.16: Raw data entered in a tabular format using ‘xlsxwriter’ library of
Python.

In this way, we leveraged the special features of xslxwriter


to create a new workbook with raw data table and format it
with borders.
Now, we shall use xlwings on the same workbook to perform
calculations using Excel formulae. We shall calculate the
average of all the stock prices throughout the year and
insert the values in column N. Copy and run the code below
into the editor:
import xlwings as xlw
obj_excel_app = xlw.App(visible=False)
objWorkbook = xlw.Book('Stock Prices.xlsx')
objWorksheet = objWorkbook.sheets[0]
formula = '=ROUND(AVERAGE(B2:M2),0)'
objWorksheet.range("N2,N3,N4,N5,N6").formula = formula
objWorkbook.save()
objWorkbook.close()
obj_excel_app.kill()
We have performed a similar exercise in the earlier section
where we discussed the xlwings library. We do the same
here. After running the code, we open the workbook Stock
Prices.xlsx and observe that the average values have been
calculated and inserted in column N as shown in Figure 5.17
below:

Figure 5.17: Average values of stocks calculated and populated in column N


using xlwings.

Finally, we shall plot a pie chart of the average stock values


that we have in column N and we shall use openpyxl to do
so. Copy and run the code below into the editor:
import openpyxl
from openpyxl import load_workbook
from openpyxl.chart import PieChart,Reference
from openpyxl.chart.layout import Layout, ManualLayout
objWorkbook = openpyxl.load_workbook('Stock
Prices.xlsx')
objWorksheet = objWorkbook.worksheets[0]
labels = Reference(objWorksheet, min_col = 1, min_row =
2, max_row = 6)
data = Reference(objWorksheet, min_col = 14, min_row =
2, max_row = 6)
chart = PieChart()
chart.add_data(data, titles_from_data = False)
chart.set_categories(labels)
chart.title = " PIE-CHART "
objWorksheet.add_chart(chart, "C10")
chart = objWorksheet._charts[0]
chart.title = "Average Price of Stocks"
objWorkbook.save("Stock Prices.xlsx")
Open the file Stock Prices.xlsx after running the code. As
shown in the figure below, we find that the pie chart has
been created for the data in column N (refer to Figure 5.18):

Figure 5.18: Pie chart of the average price of stocks plotted using openpyxl.

This practical use case with Python beautifully illustrated


the selective and convenient use of specific Python libraries
to meet specific objectives. With this, we end this chapter
on Excel Automation using Python. It has surely been a long
and overwhelming one but equally involving!

Conclusion
Excel spreadsheets being the crux of office tasks, learning
spreadsheet automation is a great way to kick start one’s
journey in Python automation since maximum amount of
data is imported and maintained in spreadsheets. Hope this
chapter has achieved that objective in the most optimum
way!
Now, our next target would be to delve into the world of
messaging. Modern communication with the advent of social
media has brought to the forefront multiple avenues of
communication through WhatsApp along with the most
popular mailing platform Gmail. It is going to be exciting to
know how Python provides the functionality to automate
messaging through these platforms, which we will learn
about in the next chapter.

Join our book’s Discord space


Join the book’s Discord Workspace for Latest updates,
Offers, Tech happenings around the world, New Release and
Sessions with the Authors:
https://discord.bpbonline.com
Chapter 6
Automating Emails and
Messaging

Introduction
The advent of communication and social media has led to
an urgent need for instant messaging. Gmail has been the
most widely used web-based email service for personal and
light professional use. This chapter starts with a basic
demonstration of automating the process of email creation
in Gmail using Python. The last decade has also witnessed
the rise of WhatsApp as the most sought-after messaging
app which has almost displaced the usage of the regular
SMS and become an indispensable part of people’s lives due
to its ability to provide quick messaging features to
individuals and groups. This chapter will throw some light on
automating the process of sending messages on WhatsApp
to individuals and groups using the PyWhatKit library
provided by Python. The chapter would then end with a
practical use case in Python.

Structure
The chapter covers the following topics:
• Prerequisites for Gmail automation
∘ Turning on 2-step verification
∘ Getting app password
∘ Sending a Gmail message
• Automating WhatsApp messaging
• Practical use case in Python

Objectives
By the end of this chapter, we shall master the essentials of
email and messaging automation using Python. We shall
learn to send emails with Gmail and messages on WhatsApp
through hands-on examples. Equipped with these skills, we
shall be ready to apply Python’s automation capabilities to
streamline communication tasks for personal and
professional use.

Prerequisites for Gmail automation


For automating the process of creating emails and sending
them through Gmail, we would be making the use of the
email, ssl and smtplib libraries provided by Python.
However, there are certain settings that one would need to
activate in Gmail before proceeding towards the actual
automation in Python. The subsequent sections will discuss
those.

Turning on 2-step verification for


Gmail
Follow the steps below to turn on 2 step verification for
Gmail:
1. Go to the site https://myaccount.google.com/.
The left side of the page would display the options as
shown in the figure below. Select the Security option
as highlighted in the following Figure 6.1:

Figure 6.1: Options on website https://myaccount.google.com/

2. Scroll down until you come across the section How yo


u sign in to Google as shown in the figure below. Yo
u would find that the very first option in the list is 2-S
tep-Verification. Click on that option (refer to Figure
6.2):

Figure 6.2: How you sign-in to Google

3. After clicking that option, another page would load as


shown in the figure below. Click on the Get started b
utton at the bottom. You would be asked to enter your
password to sign in (refer to Figure 6.3):

Figure 6.3: Gmail 2 Step Verification page

4. If it is the first time, it might ask for phone number an


d an option to select from the radio button for receivin
g the verification code through Text message or Pho
ne call as shown in the Figure 6.4 below:
Enter the appropriate details and press NEXT:
Figure 6.4: Gmail 2 Step Verification details

5. Enter the verification code that has been received thr


ough text message or phone call and press NEXT (ref
er to Figure 6.5):
Figure 6.5: Verification code

6. Once the verification is done, you will see the window


shown in the figure below. At the bottom right would b
e an option that says, Turn On. Click on that option (r
efer to Figure 6.6):

Figure 6.6: Gmail 2 step verification final window

7. You would be asked to enter your password once agai


n. After entering the password, you would see a page
as shown in the Figure 6.7 and Figure 6.8 below that h
as the Turn On button. Click on the Turn On button:

Figure 6.7: Turn on Gmail 2 Step Verification

Figure 6.8: Gmail 2 Step Verification done

8. Finally, once the 2-step verification has been turned o


n, you would see the option Turn Off as shown in the
figure below, which confirms that the 2-Step Verifica
tion has now been turned on for the Gmail account.

Note: This is just to inform the reader that some of


the figures in this section might show only a certain
selected portion of the actual page. This has been
done for security reasons in order to conceal the
actual credentials like email, password and phone
number that have been used while performing this
exercise.

Getting app password


In order to send email through Gmail using Python code, we
do not use our regular password in the code that we use
while logging in. Instead, we should be using the app
password in the code that is provided to us by Gmail. To get
the app password, follow the steps below:
1. Go to the URL https://myaccount.google.com/app
passwords. You would see a dropdown called Select
App as shown in the figure below. Select and click the
last option Other (Custom Name) (refer to Figure 6.
9):

Figure 6.9: App passwords

2. You would see a textbox asking you to enter any nam


e for the app. Enter any arbitrary name like Python_L
earning as shown below in the Figure 6.10 below:
Figure 6.10: Enter name for app

3. Click Generate. A page would be generated as shown


in the figure below where a password would be includ
ed in the yellow block on the top right-hand side. The
password has been covered in the image, but every u
ser will have a unique password. Copy the password a
nd save it somewhere as we shall be requiring it in ou
r code (refer to Figure 6.11):

Figure 6.11: App password for Gmail account


Sending a Gmail message using
Python
After following all steps in the previous exercises, we have
finally arrived at the section where we would be writing the
Python code!
Copy the code below into the Python editor:
from email.message import EmailMessage
import ssl as objSSL
import smtplib

strSender = 'XXXXXXXX@gmail.com'
strReceiver = 'XXXXXXXX@gmail.com'
strPassword = 'XXXXXXXX'

strSubject = 'Learning Email Sending with Python'

strBody = """ Learning how to send emails using Python.


"""

objEmail = EmailMessage()
objEmail['From'] = strSender
objEmail['To'] = strReceiver
objEmail['Subject'] = strSubject
objEmail.set_content(strBody)

objContext = objSSL.create_default_context()

objSMTP =
smtplib.SMTP_SSL('smtp.gmail.com',465,context=objConte
xt)
objSMTP.login(strSender, strPassword)
objSMTP.sendmail(strSender,
strReceiver,objEmail.as_string())
Most of the code is self-explanatory. In the value for string
variable strSender enter your Gmail ID for which you
performed the 2-step verification and generated the app
password. For the variable strReceiver enter the email ID
of the person to whom you want to send the email message.
While for the variable strPassword enter the app password
that we just generated and saved in the previous section.
We use ssl library from Python in order to add a layer of
security to safeguard information by authenticating the
website. SSL basically stands for Secure Socket Layer
protocol.
To send the mail, we use the smtplib library from Python
which provides the functionality of sending email through
Simple Mail Transfer Protocol. We also create a variable
objSMTP and initialize it to the SMTP_SSL object of the
smtpblib module. We will then specify smtp.gmail.com as
the email server, 465 as the port and the objContext
variable as the context. Finally, we use the login and
sendemail methods to send the mail by entering the
appropriate variables email ID values and password.
Having completed this successfully, we are done with
everything required to send an email through Gmail!
To now test this functionality yourself, you might want to
use another Gmail ID of your own as the receiver email and
observe after running the code if that email has been sent
to the receiver’s email or not.
Now that we are familiar with creating and sending emails
through Gmail, let us move forward to another exciting topic
on automation of WhatsApp messaging!
Automating WhatsApp messaging
Python provides the PyWhatKit library which can be
conveniently used to automate the process of sending
WhatsApp messages. Before we start using this library, the
first step would be to install it using the command below:
pip install pywhatkit
Next, type the code below into the Python editor:
import pywhatkit
pywhatkit.sendwhatmsg_instantly("XXXXXXXXXXXXX","Hel
lo")
We can clearly observe that the code is short and contains
just two lines. In the first line, we import the pywhatkit
library and in the second line, we use the
sendwhatmsg_instantly method that takes in two
arguments which are the phone numbers of the receiver
and the message that needs to be sent.
On running the code, the WhatsApp web page would be
loaded at the first instance asking the user to login to
WhatsApp web by scanning the QR code. The way to login
to WhatsApp web is to go to the Settings option, select
Linked Devices and then click on the Link a Device
button. Thereafter the scanner would open which would
allow one to scan the QR code and log-in to WhatsApp web.

Note: This activity would need to be performed only


at the first instance since the user will automatically
be logged into WhatsApp Web, during the next
instance of running the code.

Now, here we will be sending a WhatsApp message to a


contact, using Python! However, along with sending
messages to individual contacts, a feature that is used with
equal frequency is WhatsApp groups, so is it not equally
important to know how to send a WhatsApp message to a
group using Python code?
Well, yes, it is! Do not worry, there are not many
differences. The only change being that instead of using a
contact, we use the group identifier as the first argument to
the method sendwhatmsg_instantly.
In order to get the Group ID, follow the steps below. Note
that you would be able to individually fetch the Group ID by
following the steps below for only that group which you
have created yourself. For other groups where you are not
the admin, you would need to reach out to the admin to
fetch the Group ID using the steps below:
1. Open the group chat.
2. Click on the three dots at the top right. A dropdown w
ould appear. Select Group Info (refer to the Figure 6.
12):
Figure 6.12: WhatsApp Group Chat options

3. On selecting Group Info a page would open that cont


ains information about the group and names of all me
mbers. Scroll down to the bottom of the page and whe
n you see an option that says, Invite to group via li
nk, click on it (refer to the following Figure 6.13):

Figure 6.13: WhatsApp Group Info options

4. On clicking that option, a page would load where you


would see a link in the format https://chat.whatsap
p.com/XXXXXXXXXXXXXXXXXXXXXX. The last par
t XXXXXXXXXXXXXXXXXXXXXX is the ID that we a
re looking forward to. This would basically be an alpha
numeric string consisting of 22 characters.
Now that we have got the group ID, the next step wou
ld be to execute the piece of code below in the Python
editor:
import pywhatkit
pywhatkit.sendwhatmsg_to_group_instantly("XXXXXX
XXXXXXXXXXXXXXXX",'Hello!')
On running the above code, you would find that the
message Hello has been sent to the WhatsApp group.
We have now covered the process of automating individual
and group WhatsApp messaging through Python. Let us now
apply all that we have learnt in this chapter to implement a
practical use case in Python.

Practical use case in Python


In this use case, we would be taking a scenario where we
have a dataset consisting of a list of people with their
names, WhatsApp numbers, email addresses and dates of
birth. We need to write a program that would automatically
send birthday emails and WhatsApp birthday messages to
all those members in the list whose birthday falls on today’s
date.
1. Create a new Excel workbook and prepare a table in t
he format shown below in a new worksheet. Enter na
mes, WhatsApp numbers, Email addresses and dates
of birth of few contacts in respective columns.

Note: As this is a test exercise, it is recommen


ded that you use contact numbers and email a
ddresses of only those people whom you are cl
osely associated with. You may also use your o
wn alternative email IDs for testing. This is a c
autionary note in order to avoid unnecessary s
pamming.

2. Figure 6.14 below provides a guideline for preparing a


table to store names and contact details. Please follow
the same format prior to running the code in the exerc
ise:

Figure 6.14: Format for preparing names and contact details

3. After entering data in the format above, save the Exce


l workbook with the name Data.xlsx. Thereafter, copy
the code below into the Python editor:
# Importing required libraries
import time
import pandas as pd
import datetime
import pywhatkit
from email.message import EmailMessage
import ssl as objSSL
import smtplib

# Reading the excel in a dataframe


df = pd.read_excel("Data.xlsx")

today = datetime.datetime.now().strftime("%m-%d")
strSender = 'XXXXXXXXX'
strPassword = 'XXXXXXXXXXX'
strSubject = 'Happy Birthday!'
strBody = """ Wish you a very happy birthday! """
# Looping through every row of the table
i=0
for row in df['Birthday']:
birthday = row.strftime("%m-%d")

# Checking if the birth date is same as the date


today
if birthday == today:
strName = df['Name'].loc[i]
strWhatsAppNo = df['WhatsApp No'].loc[i]
pywhatkit.sendwhatmsg_instantly("+"+str
(strWhatsAppNo),"Hello"\
+strName +" This is a test message sent u
sing Python.")
time.sleep(15)
strReceiver = df['Email ID'].loc[i]
print(strWhatsAppNo)
print(strReceiver)
objEmail = EmailMessage()
objEmail['From'] = strSender
objEmail['To'] = strReceiver
objEmail['Subject'] = strSubject
objEmail.set_content(strBody)

objContext = objSSL.create_default_context()

objSMTP = smtplib.SMTP_SSL('smtp.gmail.co
m',465,context=objContext)
objSMTP.login(strSender, strPassword)
objSMTP.sendmail(strSender, strReceiver,obj
Email.as_string())
i+=1
The code is self-explanatory as comments have been added.
In the piece where the code loops through every row, an if
block has been added. It this if block that checks whether
the birthday of the person is today whose details have been
included in the row. If yes, then the code would send a
WhatsApp message saying, Happy birthday and a
corresponding birthday email on the person’s Gmail
address.

Conclusion
This chapter has laid the foundation for automating Gmail
and WhatsApp which are commonly used mailing and
messaging tools respectively. We went through the
pywhatkit library and the SMTP protocol for Gmail. For the
curious Python programmer, there is still a lot to explore in
these two areas and experiment with!
In the next chapter, we shall explore automation of PDF files
and images. Discussion around automation would be
incomplete without these two. PDF files are the most
common source of raw data sources for automation and
machine learning projects. Hence, understanding of the
process of extracting data from PDF documents using
Python assumes paramount importance. In parallel, image
documents also constitute an unstructured data format
which requires a different kind of work around to extract
data. Hence, without wasting any further time, the curious
reader would flip to the next page to start this extremely
interesting topic!

Join our book’s Discord space


Join the book’s Discord Workspace for Latest updates,
Offers, Tech happenings around the world, New Release and
Sessions with the Authors:
https://discord.bpbonline.com
Chapter 7
Working with PDFs and
Images

Introduction
This chapter demonstrates data reading from PDF files and
working with image files using Python. The chapter begins
with an introduction to the PyPDF library and takes the user
through various functions like reading, merging, rotating,
and splitting the PDF files. It also discusses some limitations
of the PyPDF library in specific cases and how to tackle
those limitations using other Python libraries like Tabula and
Textract. Thereafter, the user is introduced to the PIL library
in Python that deals with reading and transforming image
files. Further, the chapter continues with the very important
topic of optical character recognition (OCR) which is
implemented through the Pytesseract library. The last
section shall discuss another popular Python package
OpenCV used for computer vision. The chapter ends with
the demonstration of a practical use case in computer
vision.

Structure
The chapter covers the following topics:
• PyPDF library
• Read a PDF file using PyPDF2
• Rotate and merge PDF files
• Working with images using the PIL library
• Optical character recognition
• Working with OpenCV
• Practical use case in Python

Objectives
This chapter shall equip you with the skills to adeptly
extract data and manipulate PDF files using Python. You will
gain hands-on experience with optical character recognition
to convert images to text, culminating in practical exercises
and a comprehensive use case that showcase Python’s
capabilities in processing digital documents and images.

PyPDF library
PyPDF is a Python package that allows one to read and
transform PDF files. The original PyPDF underwent revisions
to form PyPDF2, PyPDF3 and PyPDF4. The revised versions
support Python 3. The version PyPDF2 was abandoned after
a point but was later revived and is now being maintained
actively. The latest version PyPDF4 however, does not have
full backwards compatibility with this PyPDF2. Hence, we
shall focus only on PyPDF2 in this chapter.
In order to start using this library, the first step is its
installation. The library can be installed using the command
below:
pip install pypdf2
Read a PDF file using PyPDF2
We shall use the file Stock Prices.pdf for this exercise to
read data using Python. Copy the code below into the
Python editor:
import PyPDF2
pdfReader = PyPDF2.PdfReader('Stock Prices.pdf')
pageObj = pdfReader.pages[0]
print(pageObj.extract_text())
On running the code above, we get the output in the
console as shown below:
Below i s a table showing the price of five stocks
A,B,C,D, and E from January to December. The first
column
contains the stock names and the remaining columns
contain the respective values of the stock from
January to December.

Stocks Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
Dec
A 357 457 187 831 779 338 129 508 407 748 511 609
B 14908 13408 17103 18886 19828 12098 17080
16850 15023 12405 15469 13800
C 60 64 54 93 87 74 96 92 83 85 70 88
D 1667 1962 1845 1535 1753 1767 1551 1893 1715
1707 1627 1532
E 2181 2333 2265 2274 2739 2601 2569 2520 2744
2234 2836 2230
Observe the above output carefully.
The first thing that comes to our notice is that the table that
is populated at the bottom is bereft of column lines that
separate the columns. Hence, it becomes difficult to read
the table.
The second observation we can make is that although the
output provides us with the extracted text, there are
challenges that will arise when it is directly used. We see
that in the first sentence Below i s a table.. the word is
has been populated erroneously by inserting a space
between i and s. This might occasionally happen because
the text in a PDF document is not like a stream of text that
we have in a word document. All that a PDF document
knows is that it has a set of characters that need to be
placed at positions relative to each other. This relative
positioning of characters might sometimes make it difficult
for the parser to figure out whether two characters
constitute the same word or whether they should be
separated by space. Each character in a PDF document is
enclosed by a bounding box. The position of this bounding
box is determined by an X and Y co-ordinate each for the
lower left and upper right points. Sometimes, the bounding
boxes of two adjacent characters might have a greater
separation than is generally expected between adjacent
characters. That might create difficulty for the parser to
identify that the adjacent characters belong to the same
word.
Whatever might be the challenges here, we need to find
some ways to address them. In such cases, we need to use
alternative Python libraries to get our job done.
In order to address the first challenge, we could use a library
called tabula that is specially meant for extracting tables.
Copy the code below into the Python editor:
from tabula import read_pdf
objTable = read_pdf("Stock Prices.pdf",pages="all")
df = objTable[0]
df.to_excel("Stock Prices.xlsx",index=None)
Here, we basically import the read_pdf module of the
tabula library. We have entered two arguments in this
module. The first argument is the name of the PDF
document which is Stock Prices.pdf. The second argument
is the pages. In this case, we specify all. This is a single
page document, but as a practice, specifying all should
extract information from all pages of the document. We then
store the value of this object into the objTable variable.
The important part of this code is the creation of the df
variable which we basically obtain by indexing the objTable
variable. In this case, we have just one table so we specify
objTable[0] to obtain the first table. It is important to note
that value returned here is a pandas dataframe which is
extremely advantageous from the point of view of exporting
it or transforming it.
Here, we export the dataframe to an Excel file named Stock
Prices.xlsx using the to_excel method of the dataframe
object. On opening the Excel, we see that the table in the
figure below has beautifully been exported to an Excel
spreadsheet:

Figure 7.1: Table from PDF exported to Excel document using tabula Python
module

To address the second challenge as well, we shall use a


different library. Python provides a library called textract
which is a generic text reader library that enables text
reading from different types of documents having
extensions .docx, .xlsx, .pdf. Copy the code below into the
Python editor and note the output in the console:
import textract
text = textract.process('Stock Prices.pdf')
print(text)
Output:
b'Below is a table showing the price of five stocks
A,B,C,D, and E from January to December. The first
column\r\ncontains the stock names and the
remaining columns contain the respective values of
the stock from\r\nJanuary to December.\r\n\r\nStocks
Jan\r\n\r\nFeb\r\n\r\nMar\r\n\r\nApr\r\n\r\nMay\r\n\r\n
Jun\r\n\r\nJul\r\n\r\nAug\r\n\r\nSep\r\n\r\nOct\r\n\r\nN
ov\r\n\r\nDec\r\n\r\nA\r\n\r\n357\r\n\r\n457\r\n\r\n187
\r\n\r\n831\r\n\r\n779\r\n\r\n338\r\n\r\n129\r\n\r\n508
\r\n\r\n407\r\n\r\n748\r\n\r\n511\r\n\r\n609\r\n\r\nB\r\
n\r\n14908\r\n\r\n13408\r\n\r\n17103\r\n\r\n18886\r\n
\r\n19828\r\n\r\n12098\r\n\r\n17080\r\n\r\n16850\r\n\
r\n15023\r\n\r\n12405\r\n\r\n15469\r\n\r\n13800\r\n\r
\nC\r\n\r\n60\r\n\r\n64\r\n\r\n54\r\n\r\n93\r\n\r\n87\r\
n\r\n74\r\n\r\n96\r\n\r\n92\r\n\r\n83\r\n\r\n85\r\n\r\n7
0\r\n\r\n88\r\n\r\nD\r\n\r\n1667\r\n\r\n1962\r\n\r\n184
5\r\n\r\n1535\r\n\r\n1753\r\n\r\n1767\r\n\r\n1551\r\n\
r\n1893\r\n\r\n1715\r\n\r\n1707\r\n\r\n1627\r\n\r\n15
32\r\n\r\nE\r\n\r\n2181\r\n\r\n2333\r\n\r\n2265\r\n\r\n
2274\r\n\r\n2739\r\n\r\n2601\r\n\r\n2569\r\n\r\n2520\
r\n\r\n2744\r\n\r\n2234\r\n\r\n2836\r\n\r\n2230\r\n\r\
n\x0c'
Here, we see that the table extracted at the bottom is not in
a format that is usable. However, the table does not directly
relate to our work here. If we look at the text in the first line
of the output, we find that the word is has been correctly
printed now. However, we again need to ignore the byte
literal in front of the string as well as the newline (\n) and
carriage return (\r) within the string. This shows that
extracting text from a PDF is not that straightforward and
some workaround always needs to be done even after using
libraries like PyPDF2.

Rotate and merge PDF files


Very often, we find that we receive PDF files that may be in
landscape mode when we require them to be in portrait
mode. This is when the functionality to rotate a PDF comes
handy.
Copy the code below into the Python editor:
import PyPDF2

strFileName = 'Stock Prices.pdf'


strRotatedFileName = 'Stock Prices Rotated File.pdf'
intRotation = 90

objPDF = open(strFileName, 'rb')


objPDFReader = PyPDF2.PdfReader(objPDF)
objPDFWriter = PyPDF2.PdfWriter()
objPage = objPDFReader.pages[0]
objPage.rotate(intRotation)
objPDFWriter.add_page(objPage)
objRotatedFile = open(strRotatedFileName, 'wb')
objPDFWriter.write(objRotatedFile)
objPDF.close()
objRotatedFile.close()
What we are essentially doing here is rotating the page of
this PDF by 90 degrees clockwise and saving the file as a
new document named Stock_Prices_Rotated_File.pdf. On
running the code and opening the new file
Stock_Prices_Rotated_File.pdf, we see that the page has
indeed been rotated by 90 degrees clockwise as shown in
the Figure 7.2 below:

Figure 7.2: PDF file page rotated by 90 degrees clockwise using Python code

Remember that the default rotation would always be


clockwise. In case we would want to rotate the page in the
anticlockwise direction, we could do so simply by adding a
minus sign to the rotation number.
Now that we are familiar with rotating a PDF, let us learn
how to merge two PDF documents into a single document.
We already have our original document Stock_Prices.pdf
which we shall merge with our newly created document
Stock_Prices_Rotated_File.pdf. Copy the code below into
the Python editor:
import PyPDF2

listPDFs = ['Stock Prices.pdf', 'Stock Prices Rotated


File.pdf']
strMergedFileName = 'Stock Prices Merged File.pdf'
objPDFMerger = PyPDF2.PdfMerger()

for objPDF in listPDFs:


objPDFMerger.append(objPDF)

objPDFMerger.write(open(strMergedFileName, 'wb'))
On running the code above, we find that a new file named
Stock Prices Merged File.pdf, has been created and the
file contains two pages. The first page is the page of the
original file Stock Prices.pdf and the second page is the
page of the file Stock Prices Rotated File.pdf.
In the above code, we have used the PdfMerger object of
the PyPDF2 module to get our task done. We create a list
called listPDFs and enter the original file and the rotated
file as elements of the list. The code then loops through
each of these files in the list and appends every file to the
PdfMerger object. At the end, we use the write method of
the PdfMerger object to save the merged pages to a new
file.

Working with images using the PIL


library
Python provides the Python Imaging Library (PIL) that
allows one to work with images. The PIL fork called pillow is
the one that we shall be using in this section.
The image file that we would be using is to be found at the
link below:
https://pixabay.com/photos/blue-tit-tit-bird-animal-
feathers-7965696/
Please visit the link and download the image which would
look like the one shown the Figure 7.3 below:

Figure 7.3: Image of a bird that we would be using in our section to study

The image is of a beautiful bird that we would be using as


our reference image file in the code. Once you visit the link,
click on the Free Download button to download the file
and save the file as Image of a Bird.jpg in the same
location where you would be saving the Python script that
will follow later.
Next, run the following command in the prompt in order to
be able to use the Pillow fork of the PIL library in our code:
pip install Pillow
While there are many things that one could do with images
using this Python library, in this section we shall focus only
on rotating an image and cropping an image.
To rotate an image, copy and run the code below into the
Python editor:
from PIL import Image
objImage = Image.open("Image of a Bird.jpg")
objImage = objImage.rotate(180)
objImage.save("Rotated Image of a Bird.jpg")
The code is quite self-explanatory. We import the Image
module from the PIL library and open it using the open
method of the module. We save this opened image to a
variable named objImage. Thereafter, we use the rotate
method of objImage to rotate the image by 180 degrees
by entering 180 as argument. At the end, we save the
image as a new file by naming it Rotated Image of a
Bird.jpg.
Now, we shall try to crop the image of the bird. Copy the
code below into the editor:
from PIL import Image
objImage = Image.open("Image of a Bird.jpg")
fltWidth,fltHeight = objImage.size
fltArea = (fltWidth/5, fltHeight/5, 4*fltWidth/5,
4*fltHeight/5)
objImage = objImage.crop(fltArea)
objImage.save("Cropped Image of a Bird.jpg")
Here, we obtain the width and height of the image using the
size property and store the values in variables fltWidth and
fltHeight respectively. The values depend on what size
image has been downloaded in. If it is a 1920 *1280 size
image, then the width would be 1920 and height would be
1280. However, we do not have to worry about the actual
size since we would be dynamically using the width and
height.
Next, we create a variable fltArea where we dynamically
pass the x and y co-ordinates of the top left and bottom
right points respectively. These are the points which would
form the enclosing box to crop the image. In this case, we
pass fltWidth/5, fltHeight/5 as the X and Y coordinates of
the top left point and 4*fltWidth/5, 4*fltHeight/5 as the X
and Y coordinates of the bottom right point, which would
define the enclosing box. Hence, we are technically cropping
out a fifth of the width and height from both sides.
After running the code, we find that a new file Cropped
Image of a Bird.jpg has been created which is basically
the cropped image as shown in the Figure 7.4 below:

Figure 7.4: Image of a Bird cropped using Python

Now that we have successfully learnt how to perform basic


tasks with images using Python, we would now be
interested in making ourselves familiar with two very
important topics in machine learning which are optical
character recognition and computer vision. The coming
sections would be focused on these topics.

Optical character recognition


Optical character recognition or text recognition is a
software technology that identifies text elements and
characters inside an image or a hard copy document and
converts it into machine readable text that can be used for
further processing. The most popular use of OCR is the PDF
to text converter which converts scanned PDF documents
into .txt files.
One of the most popular OCR tools in Python is tesseract
and in this section we shall walk through the usage and
applications of tesseract to perform simple OCR tasks.
In order to start using tesseract, the first thing that needs
to be done is download the tesseract executable from the
location below:
https://digi.bib.uni-mannheim.de/tesseract/tesseract-
ocr-w64-setup-5.3.1.20230401.exe
The one above is the latest link for 64 bit, but for those who
need older versions for 32 and 64 bit Windows, they can
visit the link below for more options:
https://digi.bib.uni-mannheim.de/tesseract/
After the setup has been downloaded, install it at the
location below:
'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
Thereafter, we would want to install the Python wrapper for
tesseract using the command below:
pip install pytesseract
In this exercise, we shall take a sample image that contains
text and try to extract text from that image using
Pytesseract. We shall be using the image Test_OCR.jpg
which looks the way it has been shown below in the Figure
7.5 below:
Figure 7.5: Image Test_OCR.jpg

Copy the code below in the Python editor:


import pytesseract
from PIL import Image

img = Image.open('Test_OCR.jpg')

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files


(x86)\Tesseract-OCR\tesseract.exe'

imagetext = pytesseract.image_to_string(img)

print(imagetext)
Let us try to understand the code step by step:
1. In the first two lines, we import the libraries that we re
quire, namely pytesseract and Image.
import pytesseract
from PIL import Image
2. Next, we open the image using the open method of th
e image module using the below code:
img = Image.open('Test_OCR.jpg')
3. Thereafter, we write an important line of code where
we specify the path where the tesseract.exe file has
been installed. This step is required as the code needs
to know the path of the tesseract executable.
4. Finally, we use the image_to_string method of the P
ytesseract library and pass the image as an argumen
t. We store the value thus obtained in the variable im
g. We use the print command to print the text from t
he image in the console.
On running the code, we get the output as shown belo
w:
Learning to work with
Pytesseract
We have now successfully extracted text from an image. It
is important to note here that this image was a simple one
where black text had been written on a plain white
background. However, things might get complicated when
we have a colored background and text, resulting in a
different contrast. In such a scenario, we might have to
process the image a little bit for tesseract to recognize it.
We shall use the image Colored_Image_OCR.jpg for this
purpose, which resembles the one shown in the Figure 7.6
below:
Figure 7.6: Colored image to be applied to tesseract OCR

Let us run the same program that we ran previously, only


this time we will just be changing the file name. Copy the
code below into the Python editor:
import pytesseract
from PIL import Image

img = Image.open('Colored_Image_OCR.jpg')
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files
(x86)\Tesseract-OCR\tesseract.exe'
imagetext = pytesseract.image_to_string(img)
print(imagetext)
On running the code above, we find that we get a blank
output. This means that tesseract was not able to figure out
any text in the image above that contains a colored
background and colored text.
We shall process the image in order to make it recognizable
for tesseract using the code below:
import pytesseract
from PIL import Image

img = Image.open('Colored_Image_OCR.jpg')
img = img.convert('L')
img = img.save('Processed_Colored_Image_OCR.jpg')
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files
(x86)\Tesseract-OCR\tesseract.exe'
imagetext =
pytesseract.image_to_string(Image.open('Processed_Colore
d_Image_OCR.jpg'))
print(imagetext)
On the running the code above, we get the output below:
Learning to work with
Pytesseract
This shows that we have been able to successfully extract
text from the image! How were we able to do this? If we
read the code carefully, we observe that just two additional
lines of code as shown below were able to provide us the
image processing that we require.
img = img.convert('L')
img = img.save('Processed_Colored_Image_OCR.jpg')
Basically, we used the convert method and pass L as an
argument. Here, L stands for grayscale, so basically this
converts the image that we have to a grayscale image
'Processed_Colored_Image_OCR.jpg' where only the
luminance is retained. If we open the image
'Processed_Colored_Image_OCR.jpg', we find that it
looks the way shown in the Figure 7.7 below:
Figure 7.7: Original Image processed to Graysacle

This grayscale image can be easily recognized by


pytesseract to identify the text within the image and give
us the desired text output that we require. However, the
example that we studied was a very simple case of image
processing. Real life examples might be more complicated
that might even contain noise within the image leading to
erroneous results. In such cases, the required processing
might be more complex and would need to be achieved in
multiple steps.

Working with OpenCV


Open-Source Computer Vision (OpenCV) is a Python
package that is popularly used for computer vision
applications. Computer vision is a technology that enables
computers to understand the content in images and videos
thereby helping them to identify and classify objects
contained in the images and videos. Contemporary
technologies like Face Recognition and Autonomous Self
Driving Cars are the result of salient developments in the
field of computer vision. In this section, we shall understand
some basic concepts in computer vision and then take a
walk through few Python examples in OpenCV.
Let us first try to understand how a computer understands
images. The smallest unit of an image is a pixel. Images are
basically of two kinds as discussed below:
• Grayscale: These images contain only two colors blac
k and white where the pixel value is assigned based o
n the level of darkness.
• RGB: These are colored images where each pixel in th
e image is a combination of Red Green and Blue.
We have already seen an example of an RGB and a
grayscale image in the previous exercise where we
converted an RGB color image into grayscale before passing
it to tesseract. In order to start using OpenCV library, the
first step is to install it using the following command:
Pip install opencv-python
We shall now use OpenCV to read an image both as a
colored image and as a grayscale image. Let us use the
same image of the bird that we used in one of our previous
exercises. Copy and run the code below from the Python
editor:
import cv2
img = cv2.imread('Image of a Bird.jpg')
cv2.imshow('image', img)
cv2.waitKey(0)
The code is simple and short as explained below:
1. We first import the OpenCV module into our code usin
g the line:
import cv2
2. Next, we use the imread method of the cv2 library a
nd pass the image file as an argument. This is the key
line of the code that reads the image file. Here, we ar
e reading the image file as a colored file itself. We shal
l see in the next exercise how we read an image in the
grayscale format.
3. Thereafter, we use the imshow method of cv2 to disp
lay the image that was read by cv2. At the end, we us
e a waitkey method of cv2 that allows users to displa
y a window for a given amount of time or until the use
r presses any key.
After running the code above, one observes that the image
gets opened on the screen as shown in the Figure 7.8 below:

Figure 7.8: Image read as an RGB color by OpenCV

Now, let us read the same image as a grayscale. Most of the


code remains the same except for a small change in the
imread method. The imread method takes another
argument which is basically an integer that stands for the
color or the graysacle flag. If we pass a value of 1, it shall
read the image as a regular-colored image. This is always
the default value, so we did not pass any argument in the
last exercise. However, if we wish to read the image as a
grayscale image, then we need to pass 0 as the second
argument. Let us try this out using the code below:
import cv2

img = cv2.imread('Image of a Bird.jpg',0)


cv2.imshow('image', img)
cv2.waitKey(0)
On running the code above, we see that the image
generated on the screen looks like the one in the figure
below which shows that cv2 has indeed read the Figure 7.9
as a grayscale image:

Figure 7.9: Image read as a graysacle image by OpenCV

Practical use case in Python


Now, let us apply all that we have learnt to solve a practical
use case in Python. We shall be using the file Van.jpg as
shown in the Figure 7.10 below and to detect text SCHOOL
BUS contained in the image:

Figure 7.10: Image Van.jpg

Let us continue to use pytesseract using the code below


and observe the output:
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files
(x86)\Tesseract-OCR\tesseract.exe'
imagetext =
pytesseract.image_to_string(Image.open('Van.jpg'))
print(imagetext)
On running the code above, we would obtain a garbage
value as shown in Figure 7.11 below:

Figure 7.11: Garbage value of output obtained

As shown in the figure above, we obtain a garbage value for


the output which shows that there exists lots of extra noise
in the image which makes it difficult for Pytesseract to
detect any text. In order to allow Pytesseract to focus only
on that portion of the image that contains text, let us crop
the image within the region of the text. Let us continue to
use the PIL library that we used in the previous exercise to
achieve cropping of the text.
If we observe the image Van.jpg, we find that the horizontal
positioning of the text SCHOOL BUS is in the middle of the
image. Also, the complete text occupies roughly a third of
the width leaving another third of the width at each end.
The vertical positioning of the text is such that if we divide
the height of the image into six sections, the top portion of
the text is located at the end of the first sixth section and
the bottom portion of the text is located at the end of the
second sixth (or on third) section.
Let us find out the original size of the image using the code
below:
from PIL import Image
objImage = Image.open("Van.jpg")
fltWidth,fltHeight = objImage.size
print(fltWidth,fltHeight)
Output:
5685 3927
From the output, we see that the original image has width
and height of 5685 and 3927 respectively. Here, we use
fltWidth and fltHeight as two variables to denote the
width and height of the image respectively.
Having understood the positioning of the text and the size of
the original image, we can now have the co-ordinates for
cropping the image to the text portion as below:
X1 = fltWidth/3-100
X2 = 2*fltWidth/3+100
Y1 = fltHeight/6
Y2 = fltHeight/3
Here, we add some clearance buffer of 100 to the X
coordinates in order to prevent the cropping from cutting off
any text portion. The complete code for cropping is given
below:
from PIL import Image
objImage = Image.open("Van.jpg")
fltWidth,fltHeight = objImage.size
fltArea = (fltWidth/3-100, fltHeight/6, 2*fltWidth/3+100,
fltHeight/3)
objImage = objImage.crop(fltArea)
objImage.save("Van-Cropped.jpg")
The cropped image is saved by the name Van-Cropped.jpg
which looks the way below shown in the Figure 7.12 below
which shows that the code has done a clean cropping of the
image by retaining only the text portion:

Figure 7.12: Cropped Image of Van

Now, let us make Pytesseract read this cropped image


"Van-Cropped.jpg" instead of the original image 'Van.jpg'
by running the code below:
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files
(x86)\Tesseract-OCR\tesseract.exe'
objImage = Image.open("Van-Cropped.jpg")
imagetext =
pytesseract.image_to_string(objImage,config='--psm 7')
print(imagetext)
Output:
SCHOOL BUS
Note a small addition to the image_to_string function in
the code above. We have added an additional argument
config='—psm 7' which instructs tesseract to treat the
image as a single text line which is exactly how the cropped
image is. For detailed list of all the config types, please visit
the tesseract manual page at the link below:
https://github.com/tesseract-
ocr/tesseract/blob/main/doc/tesseract.1.asc
Hence, with this we have been able to successfully apply all
that we learnt in working with images to capture the text
SCHOOL BUS from the image file 'Van.jpg'! With this we
bring this wonderful chapter to conclusion.

Conclusion
With this chapter, we have covered almost all types of office
documents that we regularly work with on our machines. By
now, we should be comfortable with data extraction using
web scraping, automation using Excel spreadsheets,
working with text and tables in PDF files as well as detecting
text in images, all this using specific Python libraries to
serve the purpose.
In the next chapter, we shall cover another important
section that deals with the automation of regular desktop
operations like moving files and folders as well as mouse
movements and clicks. Let us now turn the page and
continue this journey with unhindered curiosity!

Join our book’s Discord space


Join the book’s Discord Workspace for Latest updates,
Offers, Tech happenings around the world, New Release and
Sessions with the Authors:
https://discord.bpbonline.com
Chapter 8
Mechanizing
Applications, Folders
and Actions

Introduction
This chapter begins with an introduction of the os and shutil
modules in Python which are powerful tools to automate the
process of reading, writing, and moving files and folders.
Various methods of the shutil module shall be discussed
along with their unique functionalities. The chapter further
continues with PyAutoGUI library and the PyWinAuto library
which are used to implement regular mouse operations on
the computer. Comparative advantages and disadvantages
of these libraries have been discussed. The chapter ends
with a project that would make the reader familiar with the
usage of these libraries.

Structure
The chapter covers the following topics:
• The os module in Python
• The shutil module in Python
∘ Copy and move a file using Shutil
∘ Move files based on extension using shutil
• Using the PyAutoGUI library
∘ Implementing basic mouse functions using
PyAutoGUI
∘ Implementing basic keyboard functions using
PyAuthoGUI
∘ Exploring message box
• Practical use case in Python

Objectives
By the end of this chapter, the reader would be able to
comfortably navigate between folders and perform common
tasks with files using the automation capabilities offered by
Python libraries. The reader would also be able to execute
common mouse functionalities using Python automation.

The os module in Python


The os module is one of the most used standard utility
modules in Python. Standard utility modules are those that
are installed on your system when you install Python. One
does not need to install them separately using the pip
command.
The os module provides functions that enables the user to
interact with the operating system. One can rename files,
create folders, get the current working directory, and
perform many more tasks with the file folder architecture
using this module. Let us have a quick look at some of the
useful functions of this module.
• Get current working directory: In order to get the
current working directory, the os module of Python pr
ovides the getcwd method. Copy and run the code be
low in the Python editor:
import os

objCWD = os.getcwd()

print("The current working directory is:", objCWD)


On running the code above, the path of your current
working directory would be printed in the console win
dow. This is basically the same directory where your P
ython file has been saved.
• Create a new directory: This functionality enables o
ne to create a new directory or a new folder within the
existing working directory. Copy and run the code belo
w into the Python editor:
import os

if not os.path.exists('Test Folder'):


os.mkdir('Test Folder')
On running the code above, you would find that a new
folder named 'Test Folder' has been created in your
working directory.
• Delete a directory: This functionality can be used to
delete an existing directory and all contents within it a
s well. Copy and run the code below in the Python edit
or:
import os
try:
os.rmdir('Test Folder')
except Exception as e:
print(f"An error occurred: {e}")
On running the code above, you would find that the n
ew folder named 'Test Folder' that we earlier created
has now been removed.

The shutil module in Python


One of the modules that come along with Python’s standard
utility of modules is the shutil module. As this is a standard
utility module, one does not have to install it using the pip
command. This module enables one to work with files by
means of renaming, moving, and copying them to different
folders. These are tasks that we regularly undertake but the
shutil module of Python offers capabilities to automate
them when they need to be done in bulk. The sections to
follow shall separately be dedicated to individual
functionalities that the Python shutil module offers, and we
shall understand them with an example in this section.

Copy and move a file using shutil


Let us now see how we can copy the contents of one file
into another file using shutil:
1. Create a new Excel workbook and name it Shutil_Tes
t.xlsx. In cell A1 type the text Testing the shutil mo
dule for the first time! as shown in the Figure 8.1 b
elow. Save changes to the workbook:

Figure 8.1: Creation of workbook Shutil_Test.xlsx


Now, create a blank Excel workbook named Copy-
Shutil_Test.xlsx and save it in the same directory as the
earlier file. We shall write a small Python script to copy the
contents of the file Shutil_Test.xlsx into the blank file
Copy-Shutil_Test.xlsx using the copy method of shutil.
Copy and run the code below in the Python editor:
import shutil
source = "Shutil_Test.xlsx"
destination ="Copy-Shutil_Test.xlsx"
dest = shutil.copy(source, destination)
After running the code above, open the workbook Copy-
Shutil_Test.xlsx. We observe that the content from the
workbook Shutil_Test.xlsx has been copied into the
workbook
Copy-Shutil_Test.xlsx as shown in Figure 8.2 below:

Figure 8.2: Contents of Shutil_Test.xlsx copied to Copy-Shutil_Test.xlsx

Now, we shall utilize the same copy method of the shutil


module to move a file from one location to another.
Everything else remains the same, the only difference is
that instead of passing a file name as the second argument,
we pass the destination path. Create another folder named
Move_Test in the same directory where the
Shutil_Test.xlsx file is located. Copy and run the code
below into the Python editor:
import shutil
source = "Shutil_Test.xlsx"
destination = "Move_Test/"
dest = shutil.copy(source, destination)
After running the code above, we would find that the file
"Shutil_Test.xlsx" has been successfully copied into the
folder "Move_Test" as we would expect from the code.

Move files based on extension using


shutil
Suppose we want to copy only image files from a particular
folder and paste them in another folder, the shutil module
provides us with the move method to get this done. Here,
we also use the os module. Before you copy and run the
code below into the Python editor, have some images files
with .jpg or .JPG extension saved in the same directory
where this Python file is saved.
import os
import shutil
if not os.path.exists('Images'):
os.mkdir('Images')

for objFile in os.listdir():


if '.jpg' in objFile.lower():
strPath = 'Images/'+ objFile
shutil.move(objFile, strPath)
On running the code above, we will find that a new folder
named Images has been created in our directory and the
image files that we earlier had in our home directory have
all been moved to the Images folder. This operation is
similar to the Cut and Paste (Ctrl + X) operation that we
manually perform on our computer. Let us try to understand
what we did in the code:
1. First, we import the os and shutil modules from Pyth
on’s standard utility.
import os
import shutil
2. Next, we create a new folder named Images within o
ur home directory using the mkdir method of the os l
ibrary.
os.mkdir('Images')
3. Thereafter, we loop through every file in the home dir
ectory and check if the extension of the file is .jpg usi
ng the if block. If the extension is .jpg then we move t
he file to the Images folder using the move method o
f the shutil module.
for objFile in os.listdir():
if '.jpg' in objFile.lower():

strPath = 'Images/'+ objFile


shutil.move(objFile, strPath)
In a similar way, we can also move files of any other
extension like .PDF, .xlsx and .docx. We simply need to
specify the correct extension in the if block of the code.

Using the PyAutoGUI Library


The PyAutoGUI library from Python provides one with the
capability to simulate regular mouse functions like moving
and hovering the cursor, left clicking, right clicking as well
as pressing of keys from the keyboard. The first step that
needs to be done in order to use the library is to install it
using the command below:
pip install pyautogui
Let us try some quick work around tasks with PyAutoGUI.
Before trying to calibrate the position of our mouse or
cursor, we might want to know the size of our screen so that
it provides us with an idea of the dimensions.
PyAutoGUI provides the size() method to get this done.
Copy and run the code below into the Python editor:
import pyautogui
print(pyautogui.size())
Output:
Size(width=1920, height=1080)
We observe that output produced by the code reveals the
size of the screen as width = 1920 and height = 1080.
Next, let us see how we can capture the current position of
the mouse using the PyAutoGUI library. Copy and run the
code below into the Python editor:
import pyautogui
print(pyautogui.position())
Output:
Point(x=240, y=80)
We observe that the output shows the coordinates of the
cursor. Try this code for different positions of the cursor to
get different points as output coordinates. It is important to
note here that the origin is the top left point of the screen.
Hence, the Y coordinate is measured from the top. This is
unlike the Cartesian coordinate system where the positive Y
coordinate is measured from the bottom to the top.
Now that we have seen some basic functions of PyAutoGUI
that have got us kickstarted on our journey of
understanding PyAutoGUI, let us now move forward with
some more interesting mouse functions in the next section
that would enable us to perform useful tasks with the
mouse.

Implementing basic mouse functions


using PyAutoGUI
In this section, we shall implement basic mouse functions
that we usually carry out manually:
• moveTo(): This method enables one to move the curs
or to a particular location on the screen as specified b
y the coordinates. Copy and run the code below into t
he Python editor:
import pyautogui
pyautogui.moveTo(1000,500)
On running the code above, we find that the cursor im
mediately moves to the location on the screen where
the coordinates are 1000 and 50.
There is a small addition that we could have in this m
ethod. Suppose we want to execute this cursor move
ment in an extended manner so the user can observe
the movement of the cursor in real time, this method
provides the capability of adding another argument ca
lled duration. Copy and run the code below in the Pyt
hon editor:
import pyautogui
pyautogui.moveTo(1000,500,duration=5)
We find that the cursor moves to the coordinates 100
0 and 50 this time as well, however it takes 5 seconds
to reach there since we have specified duration=5.
• Click(): This is the generic click function that enables
one to automate the regular click, right click, middle c
lick and double click that we manually achieve with th
e mouse. Copy and run the code below into the Pytho
n editor:
import pyautogui
pyautogui.click(1000, 800, 2,0,'left')
On running the code above, the cursor would be brou
ght to the coordinates (1000, 800) and perform a reg
ular double left click that we execute with the mouse.
Let us see how we achieve this.
The first two arguments of the click method are the X
and Y coordinates respectively where we want to posit
ion the mouse on the screen to double click. The next
argument is the number of clicks. In this case, we hav
e specified 2 as the argument because we want to ac
hieve a double click. The next argument is the time in
terval between two clicks. The double click operation i
s such that we need to click twice one after the other i
n immediate succession without much time interval, e
lse the system does not recognize the operation to be
a double click if the time interval between the two suc
cessive clicks is too large. Hence, we have specified t
he time interval as 0 seconds. Finally, the last argume
nt that we have specified is left which means that the
click operation is left click. In case of right click, we ne
ed to mention right and in case of middle click we ne
ed to mention middle.
Although this generic function was useful, PyAutoGUI
also provides us with the capability to use special func
tions to achieve specific mouse operations. These fun
ctions have been noted below with their syntax:
∘ pyautogui.rightClick(x, y): This function shall
perform the right click operation at the given
coordinates, x and y.
∘ pyautogui.doubleClick(x, y): This function shall
perform the double click operation at given
coordinates x and y just like the one that we
performed in the exercise above using the generic
Click function.
∘ pyautogui.tripleClick(x, y): This function shall
perform the triple click operation at the given
coordinates x and y. Triple click is used typically
when we want to select the entire paragraph.
∘ pyautogui.middleClick(x, y): This function shall
perform the middle click operation at the given
coordinates x and y. Middle click is typically used
to open a given link in a new tab.
One could try these functions out and verify that they
perform the same operation as the click function. The
se functions can be used as an alternative to the clic
k function as an easier option to get things done:
• Scroll(): This is one of the most important functions a
s scrolling is the most common operations that one pe
rforms while browsing sites, reading documents and w
orking on office tasks. Type the code below into the Py
thon editor:
import pyautogui
pyautogui.scroll(-1000)
On running the code above, you would find that the sy
stem has performed a scroll down of 1000 units becau
se we have specified -1000 as the magnitude of scroll
ing. In order to scroll up, we simply need to remove th
e minus sign.

Note: While performing this operation, it is im


portant to have a document that spans more t
han one page, else the scroll operation would
remain inoperative as there would be nothing
to scroll.

• mouseDown() and mouseUp(): We all drag and sele


ct elements within our window while copying and past
ing contents. While beginning our selection process, w
e press down the mouse button and once done, we rel
ease the mouse button. This process can be simulated
with PyAutoGUI by using the mouseDown() and mous
eUp() functions. Copy and run the code below into the
Python editor:
import pyautogui
pyautogui.mouseDown(x=80, y=336, button='left')
pyautogui.mouseUp(x=328, y=361, button='left')
On running the code above, you would find that the ar
ea in your window from coordinates x=80, y=336 to
x=328, y=361 has been selected and highlighted. As
we have not instructed the code to open any other do
cument while running the code, the Python editor itsel
f would be the application that would be open while ru
nning this document. Hence, some piece of your code
would be highlighted. The code assumes that the size
of your screen is 1920*1080. You might have to slightl
y change the coordinates otherwise.

Implementing basic keyboard


functions using PyAutoGUI
In this section, we shall implement basic keyboard functions
that we usually carry out manually:
• hotkey(): This function is used to type keys on the sc
reen. The most used keys are Enter, Escape, Shift, Ctrl
, Alt and a combination of these with other keys. The
hotkey function can be used to enter any of these ke
ys either individually or in combination. As a sample e
xample, we could try out the piece of code below:
import pyautogui
pyautogui.hotkey('enter')
This code shall execute the Enter key.
We can also use this function to enter a combination o
f keys as shown in the below code examples:
∘ pyautogui.hotkey('ctrl','c'):
This would perform the Ctrl + C task which is
copy to clipboard.
∘ pyautogui.hotkey('ctrl','v')
This would perform the Ctrl + V task which is
paste.
To use this function, one basically needs to enter
the correct name of the key as a string argument.
This requires one to know the actual names of
keys that the function would accept. PyAutoGUI
provides the KEYBOARD_KEYS function that
gives a list of all the key names to the user. Type
and run the code below into the Python editor:
import pyautogui
print(pyautogui.KEYBOARD_KEYS)
Output:
['\t', '\n', '\r', ' ', '!', '"', '#', '$', '%', '&', "'", '(',
')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5',
'6', '7', '8', '9', ':', ';', '<', '=', '>', '?', '@', '[',
'\\', ']', '^', '_', '`', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h',
'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u',
'v', 'w', 'x', 'y', 'z', '{', '|', '}', '~', 'accept',
'add', 'alt', 'altleft', 'altright', 'apps',
'backspace', 'browserback',
'browserfavorites', 'browserforward',
'browserhome', 'browserrefresh',
'browsersearch', 'browserstop', 'capslock',
'clear', 'convert', 'ctrl', 'ctrlleft', 'ctrlright',
'decimal', 'del', 'delete', 'divide', 'down',
'end', 'enter', 'esc', 'escape', 'execute', 'f1',
'f10', 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17',
'f18', 'f19', 'f2', 'f20', 'f21', 'f22', 'f23', 'f24',
'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'final', 'fn',
'hanguel', 'hangul', 'hanja', 'help', 'home',
'insert', 'junja', 'kana', 'kanji', 'launchapp1',
'launchapp2', 'launchmail',
'launchmediaselect', 'left', 'modechange',
'multiply', 'nexttrack', 'nonconvert', 'num0',
'num1', 'num2', 'num3', 'num4', 'num5',
'num6', 'num7', 'num8', 'num9', 'numlock',
'pagedown', 'pageup', 'pause', 'pgdn', 'pgup',
'playpause', 'prevtrack', 'print', 'printscreen',
'prntscrn', 'prtsc', 'prtscr', 'return', 'right',
'scrolllock', 'select', 'separator', 'shift',
'shiftleft', 'shiftright', 'sleep', 'space', 'stop',
'subtract', 'tab', 'up', 'volumedown',
'volumemute', 'volumeup', 'win', 'winleft',
'winright', 'yen', 'command', 'option',
'optionleft', 'optionright']
As shown above, the output of the code has
provided a list of the names of all the keys that
the user can use as a reference while trying to
pass the key names as an argument in the hotkey
function. Do we wish to know how many keys do
we have in the list above? It is easier than we
thought! When we run the code below, we clearly
see that the total number of keys in the list is
194.
import pyautogui
print(len(pyautogui.KEYBOARD_KEYS))
Output: 194
• typewrite(): This function enables one to simulate th
e typing activity that one manually performs on scree
n. In order to test this functionality, we shall open a bl
ank Excel workbook and type something in the cell A
1, which is active by default. If we could recollect from
Chapter 3, Getting started with AI/ML in Python, the P
ython library xlwings provides the functionality to op
en a blank workbook and observe the activities going
on screen in real time. So, we shall utilize xlwings to
our advantage here. Copy and run the code below int
o the Python editor:
import pyautogui
import xlwings as xw
import time

objWorkbook = xw.Book()
objWorkbook.activate(steal_focus=True)

time.sleep(3)

pyautogui.typewrite('Hello World!',0.5)

pyautogui.hotkey('enter')
pyautogui.hotkey('up')
pyautogui.hotkey('ctrl','x')
pyautogui.hotkey('down')
pyautogui.hotkey('ctrl','v')
pyautogui.hotkey('esc')
Let us understand what we did in the code above:
1. We imported the required libraries as shown below:
import pyautogui
import xlwings as xw
import time
2. Thereafter, we opened a blank Excel workbook using t
he xlwings library and set the steal_focus property o
f the activate function to True in order to make sure
that our blank Excel workbook remains the active appl
ication on the screen in our window. The code below d
oes it for us:
objWorkbook = xw.Book()
objWorkbook.activate(steal_focus=True)
3. Next, we give a delay of 3 seconds using the time.sle
ep function in order to allow the Excel application suffi
cient time to open and remain activated on the windo
w screen. Below is the code:
time.sleep(3)
4. Now, we use the typewrite function to type the text t
hat we wish to enter in the active cell A1 using the co
de below:
pyautogui.typewrite('Hello World!',0.5)
Here, the second argument 0.5 is basically the time in
terval in seconds between the typing of successive ch
aracters. Here, we have specified a time interval of 0.
5 seconds.
5. Finally, we perform some common operations using ke
ys. We perform cut and paste on the text in order to m
ove it from cell A1 to A2 using the code below:
pyautogui.hotkey('enter')
pyautogui.hotkey('up')
pyautogui.hotkey('ctrl','x')
pyautogui.hotkey('down')
pyautogui.hotkey('ctrl','v')
pyautogui.hotkey('esc')
6. On running the code above, we find that a blank Excel
workbook opens post which, the text 'Hello World!' i
s entered in cell A1. Next, this text is cut and pasted i
n cell A2. We achieved this entire procedure without u
sing the Excel API to write values to cells. Instead, we
simulated the keyboard typing activity to achieve this.
The final output has been shown in the Figure 8.3 belo
w:

Figure 8.3: Cut Paste operation using typewrite and hotkeys functions

Exploring message box functions


using PyAutoGUI
Message boxes are commonly used to display the result of a
task or to confirm a decision made by the user or to provide
choices to the user prior to moving with the next steps.
PyAutoGUI provides functions to enable this messaging
functionality. Let us go through examples of some functions
in this section that illustrate the message box functionality
of PyAutoGUI.
• alert(): This is a simple alert message with an OK but
ton. Copy and run the code below into the Python edit
or:
import pyautogui
pyautogui.alert(text='Welcome!', title='Test Messag
e', button='OK')
On running the code above, a message box as shown
in the figure below pops up with the title 'Test Messa
ge' with a message that says 'Welcome!'.
This function returns the text of the button that has b
een clicked. In this case, it would return OK. Note that
any text can be entered as text of the button (refer to
Figure 8.4):

Figure 8.4: alert() function

• confirm(): This function displays a message box with


OK and Cancel buttons.
Copy and run the code below into the Python editor. W
e get the confirm message box as shown in the Figure
8.5 below:
import pyautogui
xyz = pyautogui.confirm(text='Please confirm!', \
title='Test Message', buttons=['OK', 'Cancel'])
Figure 8.5: Confirm message box

Practical use case in Python


Now, let us apply a portion of our learning to implement a
practical use case in Python. Let us execute the task of
opening a Microsoft Word document and typing some
characters into the document, all through Python code! So
let us get started:
1. First, let us capture the coordinates of all the key posit
ions on the screen where we would want our mouse c
ursor to be placed in order to perform the click operati
on.
Take the cursor near the search box and run the code
below in the Python editor. This shall give us the coord
inates of a randomly selected point within the area of
the search box (refer to Figure 8.6):

Figure 8.6: Search Box

import pyautogui
print(pyautogui.position())
Output:
Point(x=192, y=1058)
2. Next, keep the Python editor window open and type th
e text Word in the search bar so that MS Word opens i
n the search results. We would want to capture the co
ordinates of any random point on this search result ic
on so that we could click on it during automation.
As shown in the Figure 8.7 below, we observe that the
search result for MS Word is located roughly at line nu
mber 10 in the editor. This should give us the position
of the MS Word application. Place the cursor at line 10
in the Python editor and run the code below to get the
coordinates:
import pyautogui
print(pyautogui.position())
Output:
Point(x=265, y=377)

Figure 8.7: Search for MS Word application

3. Now that we have all the coordinates, we can type the


code below into the Python editor to observe how the
use case gets executed:
import pyautogui
import time
pyautogui.click(192, 1058, 2,0,'left')
pyautogui.typewrite('word')
pyautogui.moveTo(265,377,duration=2)
pyautogui.click(265, 377, 2,0,'left')
time.sleep(3)
pyautogui.typewrite('Python use case executed!',0.5)
4. On running the code above, we find that a new Word
document is created and the text 'Python use case i
s executed!' is typed into the document as shown in
the Figure 8.8 below, thus completing our use case an
d the chapter as well.

Figure 8.8: Text being typed into word document using Python library
PyAutoGUI

Surely, this was a helpful exercise.

Conclusion
We have achieved something unique in this chapter when it
comes to practical utility of our learning. We have
accomplished a significant feat in desktop automation,
mastering common tasks such as file and folder
management and mouse and keyboard operations. We have
also explored the creation of message boxes, an essential
skill in interactive automation. With this chapter, we
conclude our exploration of desktop automation. Hereafter,
we shall focus on applying salient aspects of desktop
automation to enhance our existing applications as well as
to build new applications that increase the efficiency of
manual processes.
As we close this chapter, we not only take with us these
valuable automation skills but also prepare to apply them in
even more transformative ways. The skills you have learned
here lay a foundation for the exciting journey ahead, which
leads us further into the realm of machine learning. This
rapidly evolving field is reshaping technology in the 21st
century, offering innovative solutions to complex problems.
The next chapter shall commence a new and exciting
journey of Machine Learning that is turning out to be the
most creative, innovative, and game-changing technology of
the contemporary era as well as the 21st century. So, let us
turn the page and dive into the world of Machine Learning,
where we will explore how to harness these powerful
algorithms to enhance the functionality of our applications
and make significant strides in automation and data
analysis!

Join our book’s Discord space


Join the book’s Discord Workspace for Latest updates,
Offers, Tech happenings around the world, New Release and
Sessions with the Authors:
https://discord.bpbonline.com
Chapter 9
Intelligent Automation
Part 1: Using Machine
Learning

Introduction
This chapter makes a deep dive into the detailed
implementation of machine learning algorithms. It
introduces the reader to the concepts of supervised learning
and unsupervised learning. Thereafter, the algorithms are
also categorized into classification and regression
algorithms. This chapter further delves deeper into the
types of supervised learning algorithms like linear
regression, logistic regression, k nearest neighbors, Naive
Bayes, support vector machines and decision trees. The key
concept of gradient descent has also been discussed in
detail. In order to understand unsupervised learning, the
crucial concepts of dimensionality reduction, principal
component analysis and linear discriminant analysis have
been given priority before covering the unsupervised
learning algorithms of k means clustering and hierarchical
clustering. An important topic of Python ML libraries has
been included in the end to acquaint the reader of the
plethora of functionality that Python provides in this vast
arena of machine learning. The chapter concludes with a
use case in Python that shall make the user conversant with
building his own machine learning application in the future
using Python.

Structure
The chapter covers the following topics:
• Implementing supervised machine learning algorithms
using Python
∘ Linear regression
• Key concepts in machine learning models
• Logistic regression
• K nearest neighbors
• Naive Bayes
• Support vector machines
• Decision trees
• Implementing unsupervised learning algorithms using
Python
∘ Dimensionality reduction
∘ Principal component analysis
∘ Linear discriminant analysis
∘ K means clustering
• Real world use case project

Objectives
In Chapter 3, Getting Started with AI/ML in Python, we went
through key concepts in machine learning and obtained a
theoretical understanding of machine learning algorithms. In
this chapter, we shall implement these algorithms using
Python, thus providing a launching pad to the reader for
independently creating and deploying customized machine
learning applications in the real world.

Implementing supervised machine


learning algorithms using Python
In this chapter, we shall primarily be using the Scikit-learn
library to implement our machine learning algorithms.
Scikit-learn is an open-source Python library that supports
supervised and unsupervised machine learning. Before
using Scikit-learn, the first step is to install it using the
command below:
pip install scikit-learn
The hallmark of the Scikit-learn library is the fit() method.
Every machine learning algorithm within the Scikit-learn
library shall have its own class and every such class shall
have a method named fit(). This method takes in the
training data in the form of arrays as arguments and fits the
model to the training data. In case of supervised learning,
the array of training data shall be accompanied with its
corresponding array of labels as well, which would make two
arrays as arguments. In case of unsupervised learning, the
training dataset shall consist of a single input array as
arguments as the data here is unlabeled.
In this section, we shall be implementing supervised
machine learning algorithms using the Scikit-learn library of
Python. We shall start with the simplest one which is linear
regression.

Linear regression
We have already covered the theory behind linear
regression in Chapter 3, Getting Started with AI/ML in
Python. Now, we shall implement it using Python on some
sample data. We know that the goal of linear regression is to
fit a line to a sample of points, such that we could minimize
the sum of the squares of the vertical distances of the
points from the line. The equation shall assume the form
below. The predicted value y shall be represented as:
Y = c0 + c1*x1 + c2*x2 + c3*x3 + ….. + cn*xn.
Here, the set {c0, c1, c2, c3,……,cn} are the co-efficient
and the value cn is the intercept. In linear regression, we
basically obtain the optimum values of the coefficients and
the intercept. By optimum values, we mean those set of
values that would minimize the sum of the squared vertical
distances between the actual points and the points
predicted by the line.
To implement linear regression using Python library Scikit-
learn, copy, and run the code below into the Python editor:
from sklearn import linear_model
reg = linear_model.LinearRegression()

training_data = [[0], [3], [17], [23]]


labels = [0,9,51,69]

reg.fit(training_data, labels)

print(reg.coef_)
print(reg.intercept_)
Output:
[3.]
7.105427357601002e-15
Let us understand the code above. First, we imported the
linear_model module from the scikit_learn library. Next,
we created an instance of the LinearRegression class and
stored it in a variable named reg.
We created a sample 2D array and stored it in variable
training_data. We should note that the training data here
would always be a 2D array because it contains the features
or numbers of parameters as columns and number of
samples as rows. Thereafter, we create a 1D array called
labels which is basically the outputs corresponding to these
inputs. Note that the number of elements in this array
should always be equal to the number of samples in the
training data array.
Next, we use the fit method which takes in the
training_data and labels variables as arguments. This
method basically creates a best fit line based on least
squares regression. At the end, we use the coef_ and
intercept properties of the reg object to get the
coefficients and the intercept respectively, of the regression.
In this case, since the training data consists of only one
point in every sample, we should be having just one
coefficient since we have just one parameter in the
equation. Here, the coefficient that we have obtained is 3,
which should also be obvious from the training_data and
the labels arrays where every element in the labels is thrice
that of the training data. The intercept is expected to be
zero since the equation y=3x perfectly fits this line. The
intercept that we have obtained is 7.105427357601002e-
15 which is very close to zero.
Now that we have been able to fit a straight line to this set
of data points, our next step is to use this straight line to
predict the output of a new set of data points for which we
do not have labels, basically where the output is unknown.
We use the predict method of the LinearRegression class
for this purpose. Copy and run the code below in the Python
editor:
# Importing necessary libraries
from sklearn import linear_model
import numpy as np

# Create an instance of the Linear Regression model


reg = linear_model.LinearRegression()

# Define training data and corresponding labels


# Training data represents the independent variables
training_data = np.array([[0],[3],[17],[23]])

# Labels are the dependent variable (output) for each


training data point
labels = np.array([0,9,51,69])

# Train the model using the training data and labels


reg.fit(training_data, labels)
# Define testing data for which we want to predict the
output
testing_data = np.array([[21],[13],[15]])

# Use the trained model to make predictions on the testing


data
output = reg.predict(testing_data)

# Print the predicted output


print(output)
Output:
[63. 39. 45.]
We see that the output obtained is exactly what we
expected from the equation y=3x.

Key concepts in Machine Learning


models
There are certain important concepts that we need to
understand in order to have a better background of the
theory behind building and optimizing machine learning
models. These concepts have been covered in the bullets
points to follow:
• Gradient descent: This is the process that is used to
optimize a model. Before understanding gradient desc
ent, it is important to understand another term called
cost function. We saw in linear regression that the b
est fit line is the one where the sum of the squares of
the residuals (vertical distance between the points an
d line) is the minimum. In other words, this is the para
meter that we are trying to minimize. In machine lear
ning terminology, this parameter is called the cost fu
nction, which is basically a measure of the difference
between the actual and the predicted values.
Coming back to gradient descent, the process works b
y first creating a curve of the cost function against its
parameters. Next, we select an arbitrary point on the
plot and find the derivative of the cost function at that
point. Next, we select an arbitrary parameter called th
e learning rate and multiply it by the derivative this
obtained. Next, we subtract this product from the coor
dinate of the arbitrary point that we had selected on t
he cost function curve. The result thus obtained beco
mes the starting point for the next iteration where we
repeat the same process for multiple iterations. At eve
ry iteration, we calculate the value of the cost functio
n. Since we are descending, the value of the cost func
tion should go on reducing. Once we reach a point wh
ere the cost function does not reduce any further or re
mains somewhat stationary, we understand that the a
lgorithm has converged, and we have reached the mi
nimum of the cost function. The parameters at which t
he cost function is minimum are thus selected as the
best fit or optimum parameters of the model. Gradien
t descent algorithm can be mathematically expressed
below:
X’ = X – γ∇
Where,
∘ X = Initial arbitrary point
∘ γ = Learning rate
∘ ∇ = Derivative at arbitrary point
∘ X’ = Resultant point obtained for using in next
iteration
• Bias versus variance: Bias is when a model oversim
plifies the assumptions in order to fit itself to the traini
ng data. When there is bias, the model does not consi
der all the points in the data set. This is also usually c
alled underfitting. On the contrary, when a model tri
es too hard to fit itself to every single point in the trai
ning data set, it loses its ability to generalize the othe
r data. This (also called overfitting) gives rise to vari
ance, which is basically a large deviation of prediction
s from the expected output. The tradeoff here is that i
f the model is too simplistic due to the bias, the varian
ce would be less on the testing data. However, if the
model is too rigid due to overfitting, the bias would be
less, but the variance would be high as it would beco
me more difficult for the model to make predictions o
n a testing data that looks different. A good machine l
earning model is the one that finds a sweet spot betw
een bias and variance.

Logistic regression
In order to study logistic regression using Scikit-learn, we
would be using the Iris dataset that is inbuilt within Scikit-
learn. Let us familiarize ourselves with the Iris dataset
before proceeding with the tutorial.
The Iris dataset consists of information related to flowers of
three species with 50 samples of each. Hence the total
samples in this dataset are 150. The features used in this
dataset are Sepal Length (cm), Sepal Width (cm), Petal
Length (cm) and Petal_Width (cm) which are stored in
an array named data. There is another feature called
target which basically represents the species of the flowers
which are setosa, versicolor and virginica. Here, it is
numerically labelled as 0, 1 and 2 respectively and stored in
another array named target. Now, we shall store this
dataset in a pandas dataframe so that it enables better
visualization and analysis. Copy and run the code below into
the Python editor:
from sklearn.datasets import load_iris
import pandas as pd
iris = load_iris()
df = pd.DataFrame(iris.data,columns=iris.feature_names)
df['target'] = iris.target
print(df)
Here, we first import the load_iris class from the datasets
module of the Scikit-learn library and then instantiate it by
initializing it to variable iris. Thereafter, we read it into a
pandas dataframe and define the columns property with the
feature_names key of the Iris dataset. Next, we also add
another column for the target variable by assigning it the
value of the target key of the Iris dataset. On running the
code above, we get the output as shown below:
sepal length sepal width ... petal width targ
(cm) (cm) (cm) et
0 5.1 3.5 ... 0.2
0
1 4.9 3.0 ... 0.2
0
2 4.7 3.2 ... 0.2
0
3 4.6 3.1 ... 0.2
0
4 5.0 3.6 ... 0.2
0
.. ... ... ... ... ..
.
145 6.7 3.0 ... 2.3
2
146 6.3 2.5 ... 1.9
2
147 6.5 3.0 ... 2.0
2
148 6.2 3.4 ... 2.3
2
149 5.9 3.0 ... 1.8
2
[150 rows x 5 columns]
The entire Iris dataset is populated in the output with 150
rows and 5 columns, which includes the column for the
target variable.
Now, let us implement logistic regression with Scikit-learn
using Iris dataset. We have already been through the theory
behind logistic regression in Chapter 3, so we would not
revisit it here. Just to refresh the concepts, basically the
logistic regression algorithm tries to fit a sigmoid curve that
attempts to classify the dataset into two categories. Any
arbitrary point on the curve provides an estimate of the
probability of the point existing closer to either of the
categories.
For this tutorial, we shall use only the first two species of
flowers as we are classifying the data into two categories.
Using the code below, we first create the input data and the
labels from the Iris data:
from sklearn.datasets import load_iris
import pandas as pd
iris = load_iris()
X = iris.data[:100]
Y = iris.target[:100]
From the code above, we see that we take the first 100 rows
of the data. Next, we train the model using the
LogisticRegression classifier as shown in the code below:
from sklearn.linear_model import LogisticRegression
LR = LogisticRegression()
LR.fit(X,Y)
Now that we have fit the sigmoid curve to the dataset using
the fit method, the next step is to use the curve to predict
the output for a new row of data as shown in the code
below:
X_predict = [6.3,2.5,4.3,1.5]
Y_predict = LR.predict([X_predict])
print(Y_predict)
Output:
[1]
We see that the predictor has predicted this dataset to be
belonging to category [1], which is basically the versicolor
flower. But how do we know whether the predictor correctly
predicts the output? We can test this by dividing the input
dataset into training data and testing data. We accomplish
this using the train_test_split method available in the
model_selection module of the Scikit-learn library as
shown in the code below:
from sklearn.model_selection import train_test_split
X_Train, X_Test, Y_Train, Y_Test = train_test_split(X,Y,
test_size=0.3, random_state=100)
Here, we observe that we assign a value of 0.3 to the
test_size parameter, which means that 30 percent of the
data would be allocated as a testing sample whereas the
remaining 70 percent would be retained as the training
sample.
An important parameter to note here is random_state.
Here, we have assigned it a value of 100. However, this
parameter could take any other arbitrary value like 345, 400
or 1043. The only thing that needs to be kept in mind here
is that one should not change the value of this parameter in
subsequent runs of the code if one wishes to retain the
same clustering of training and testing data. The way
train_test_split works is that it arbitrarily chooses data
points into training and testing data and clusters them
together based on the test_size parameter. If we change
the value of the random_state parameter or do not specify
any value for the parameter, the train_test_split method
shall differently allocate data points into the training and
testing data set even though it should still retain the test
size, provided it has not been changed by the user.
Finally, we train the model on the training data only and
thereafter test it on one of the sample rows from the testing
data using the code below:
LR.fit(X_Train,Y_Train)
Y_Predict = LR.predict([X_Test[23]])

print('X_Test:' + str(X_Test[23]))
print('Y_Test:' + str(Y_Test[23]))
print('Y_Predict:' + str(Y_Predict))
Output:
X_Test:[5. 3.5 1.6 0.6]
Y_Test:0
Y_Predict:[0]
We observe that the predicted value matches the actual
value which is 0, thus confirming the accuracy of prediction.
One could try this on other sample rows of the Iris dataset
to confirm whether the predicted value matches with the
actual value. Another useful exercise would be to choose
another combination of flowers and try implementing the
same algorithm.
In this way, we have implemented logistic regression using
the Scikit-learn library of Python.
In the next section, we shall implement another interesting
classification algorithm which is k nearest neighbors.

K nearest neighbors
To recap, k nearest neighbors is a classification algorithm
that associates an arbitrary point to the classes that it is
nearest to the base on the distance. We shall use the same
Iris dataset to implement this algorithm in Python. Copy and
run the code below in the Python editor:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
Y = iris.target

X_Train, X_Test, Y_Train, Y_Test = train_test_split(


X, Y, test_size = 0.3, random_state=100)

knn = KneighborsClassifier(n_neighbors=12)
knn.fit(X_Train, Y_Train)

Y_Predict = knn.predict([X_Test[15]])
print('Y_Predict:' + str(Y_Predict))
print('Y_actual:' + str(Y_Test[15]))
Output:
Y_Predict:[1]
Y_actual:1
We observe that the predicted value of the algorithm
matches the actual target value of the selected sample row.
An important criterion here is the selection of the value of
the parameter k. A general thumb rule in choosing the value
of k is k = sqrt(N), where N is the number of samples in the
training dataset.

Naïve Bayes
Let us implement Naïve Bayes using Scikit–learn. We learnt
in Chapter 3, Getting Started with AI/ML in Python that
Naïve Bayes is a classification algorithm based on Bayes
theorem conditional probability, which is basically the
probability of an event that is conditional, on another event.
Copy and run the code below into the Python editor:
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
Y = iris.target

X_Train, X_Test, Y_Train, Y_Test =


train_test_split(X,Y,test_size=0.3, random_state=100)

NB = GaussianNB()
NB.fit(X_Train, Y_Train)

Y_Predict = NB.predict([X_Test[5]])
print('Y_Predict:' + str(Y_Predict))
print('Y_Test:' + str(Y_Test[5]))
Output:
Y_Predict:[2]
Y_Test:2
We observe that the predicted value matches the actual
value in the testing data. We can also check the accuracy of
the model on all points of the testing data using the
accuracy_score function of the metrics module of the
Scikit-learn library. Copy and run the code below in the
Python editor:
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
Y = iris.target

X_Train, X_Test, Y_Train, Y_Test =


train_test_split(X,Y,test_size=0.3, random_state=100)

NB = GaussianNB()
NB.fit(X_Train, Y_Train)

from sklearn.metrics import accuracy_score


Y_Predict_Test_Sample = NB.predict(X_Test)
accuracy = accuracy_score(Y_Test, Y_Predict_Test_Sample)
print('Accuracy:', accuracy)
Output:
Accuracy: 0.9555555555555556
From the accuracy metric of 95.56% we find that ~95% of
the predicted values match the actual values.
Support vector machines
Recalling from what we learnt in Chapter 3, Getting Started
with AI/ML in Python, support vector machines is a
supervised classification algorithm that separates discrete
classes by drawing an optimal hyperplane between them. To
implement it using Scikit-learn, copy, and paste the code
below in the Python editor:
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
Y = iris.target

X_Train, X_Test, Y_Train, Y_Test =


train_test_split(X,Y,test_size=0.3, random_state=100)

SVC_Model = SVC(kernel = 'linear', random_state = 0)


SVC_Model.fit(X_Train,Y_Train)
Y_Predict = SVC_Model.predict(X_Test)

from sklearn.metrics import confusion_matrix


cm = confusion_matrix(Y_Test, Y_Predict)
print(cm)
Output:
[[16 0 0]
[ 0 11 0]
[ 0 0 18]]
If you observe, you will notice that the output produced is a
matrix. At the end of the code, we have used a new metric
of measuring the accuracy called confusion matrix. It is a
matrix that summarizes the accuracy of a classification
algorithm by mentioning the true positives, true negatives,
false positives, and false negatives. The confusion matrix is
one of the most common metrics of evaluating the accuracy
of a classification algorithm.

Decision trees
We have already studied the theory behind decision trees in
Chapter 3 - Getting Started with AI/ML in Python. We shall
now use the same Iris dataset for implementing decision
trees in Python.
Copy and run the code below in the Python editor:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

iris = load_iris()
X = iris.data
Y = iris.target

X_Train, X_Test, Y_Train, Y_Test =


train_test_split(X,Y,test_size=0.3, random_state=100)

DT_Model = DecisionTreeClassifier()
DT_Model.fit(X_Train,Y_Train)

Y_Predict = DT_Model.predict(X_Test)
accuracy = accuracy_score(Y_Predict, Y_Test)
print(accuracy)
Output:
0.9555555555555556
From the accuracy score of 95%, we see that the decision
tree classifier has been fairly accurate in its prediction.

Implementing unsupervised learning


algorithms using Python
Having familiarized ourselves with implementing supervised
learning algorithms using the Scikit-learn library of Python,
we shall now implement unsupervised learning algorithms
using Scikit-learn. To provide a summary, unsupervised
learning algorithms are those that are applied to an
unlabeled dataset, and they cluster together sample
elements that are similar in parameters. We shall study k
means clustering and hierarchical clustering as the
unsupervised learning algorithms in this section.
However, prior to implementing them in Scikit-learn, it is
important to understand few more important theoretical
concepts in the context of machine learning modeling.
These concepts are dimensionality reduction, principal
component analysis and linear discriminant analysis that we
shall study in the sections to follow.

Dimensionality reduction
Dimensionality reduction is an important part of machine
learning. A dataset with too many features might render the
model complex, thus resulting in overfitting, where the
model might fit too tightly on the training data resulting in a
large variance on the testing data. Hence, it is essential to
reduce the number of features or dimensions in a dataset to
include only those that are required or may represent the
entire dataset. This technique of reducing the number of
features in a dataset in a way that the essence of the data is
still preserved is called as dimensionality reduction. There
are different ways in which dimensionality reduction can be
achieved:
• Feature selection: In this method, only a subset of f
eatures is selected from the main sample such that th
ey are most relevant to the problem that is being solv
ed.
• Feature extraction: In this method, the original feat
ures are transformed in such a way that the new featu
res formed are less in number and represent the origi
nal dataset. principal component analysis (PCA) a
nd linear discriminant analysis (LDA) are the popu
lar methods under this category. In the next sections,
we shall learn about these methods.

Principal component analysis


Principal component analysis is basically a dimensionality
reduction technique in which an orthogonal transformation
is applied in order to transform the data to a new coordinate
system. This technique identifies a set of orthogonal axes
called principal components in such way that the first
principal component captures the maximum variance in the
data and the second principal component captures the
second greatest variance orthogonal to the first component.
A lot of matrix algebra goes into principal component
analysis as we would not go into that detail since we would
be restricting ourselves with the Python implementation of
PCA. It is important to note that implementation of PCA
depends only on the feature set and not the labels. In other
words, PCA is an unsupervised learning technique. To
implement PCA on the Iris dataset, copy and run the code
below into the Python editor:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
Y = iris.target

X_Train, X_Test, Y_Train, Y_Test =


train_test_split(X,Y,test_size=0.3, random_state=100)

from sklearn.preprocessing import StandardScaler


sc = StandardScaler()
X_Train = sc.fit_transform(X_Train)
X_Test = sc.transform(X_Test)

from sklearn.decomposition import PCA


pca = PCA()
X_train = pca.fit_transform(X_Train)
X_test = pca.transform(X_Test)

explained_variance = pca.explained_variance_ratio_

print(explained_variance)
Output:
[0.72201925 0.23393497 0.03914811 0.00489767]
Let us now try to understand the code above:
• First, we load the Iris dataset and split it into training
and testing data as we always did for all other algorith
ms before.
• Next, we scale the data using the StandardScaler cl
ass of the preprocessing module of the Scikit-learn li
brary. This is used to standardize the features to unit v
ariance by subtracting the feature mean and dividing
the value thus obtained by the standard deviation. Sta
ndard scaling makes the features comparable on a lev
el basis and eliminates possible discrepancies betwee
n features arising due to factors like units of measure
ment. We use the fit_transform method of the Stan
dardScaler class on the training dataset and the tra
nsform method on the testing dataset.
• Thereafter, we use the PCA() class of the decomposi
tion module of the Scikit-learn library. This method ta
kes in the parameter n_components as the first argu
ment which is the number of principal components. In
our code, we have not specified anything so it would t
ake all components by default. We use the fit_transfo
rm method of the PCA class on the training dataset a
nd the transform method on the testing dataset.
• Finally, we use the explained_variance_ratio_ prope
rty of the PCA class that provides a summary of the v
ariance that each principal component is responsible f
or. By observing the output, we understand that the fi
rst principal component is responsible for the maximu
m variance of around 73% and the second principal co
mponent is responsible for the second largest varianc
e of around 23%. Together, they account for around 9
6% of the variance. The remaining two components ac
count for minimal variance of close to 4% and 1% res
pectively.

Linear discriminant analysis


Unlike principal component analysis which is an
unsupervised learning technique, linear discriminant
analysis is a supervised technique that considers both the
data and its corresponding labels. LDA works by projecting
data around new dimensions such that the separation
between the classes is maximum and individual elements of
each class are closely grouped around the centroid of the
class. In order to implement LDA on the same Iris dataset,
copy and run the code below in the Python editor:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
Y = iris.target

X_Train, X_Test, Y_Train, Y_Test =


train_test_split(X,Y,test_size=0.3, random_state=100)

from sklearn.preprocessing import StandardScaler


sc = StandardScaler()
X_Train = sc.fit_transform(X_Train)
X_Test = sc.transform(X_Test)

from sklearn.discriminant_analysis import


LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis()

X_train = lda.fit_transform(X_Train, Y_Train)


X_test = lda.transform(X_Test)

explained_variance = lda.explained_variance_ratio_

print(explained_variance)
Output:
[0.99073265 0.00926735]
As observed from the output, the LDA technique has used 2
components, and the first component is responsible for 99%
of the variance whereas the second component is
responsible for only around 1% of the variance.

K means clustering
K means clustering is one of the commonly used
unsupervised learning algorithms that groups unlabeled
points in a dataset into similar clusters. The name k means
is derived from the fact that k is an arbitrary number of
clusters that are chosen at the beginning of the algorithm.
Thereafter, we randomly choose k arbitrary centroids for
each of the clusters and associate every point in the dataset
with its nearest cluster based on the Euclidean distance
between the point and the cluster centroid. The goal here is
to optimize the position of these centroids. Next, we take
the mean of each of the points in the dataset belonging to a
particular cluster and then update the cluster center
locations to coincide with the means. This process is
iterated until no further convergence is possible.
We shall use the same Iris dataset to implement k means
clustering in Python so that we can assess the accuracy of
clustering. To implement the algorithm in Python using
Scikit-learn, copy, and run the code below in the Python
editor:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans

iris = load_iris()
X = iris.data
Y = iris.target

sse = {}
for i in range(1, 11):
kmeans = KMeans(n_clusters=i,
init='k-means++',
max_iter=300,
n_init=10,
random_state=0)
kmeans.fit(X)
sse[i]=kmeans.inertia_

import matplotlib.pyplot as plt

plt.figure()
plt.plot(list(sse.keys()), list(sse.values()))
plt.xlabel("Number of clusters")
plt.ylabel("SSE")

plt.show()
Figure 9.1 shows a graph that we obtain after running the
code. This graph is used to obtain the optimum number of
clusters:
Figure 9.1: Finding the optimum number of clusters using the elbow method

Let us try to understand the code. We have created a loop


which increments the number of clusters starting from 1 to
10. Thereafter, we calculate the sum of the squared errors
in each iteration using the method inertia_. We then plot a
graph of the number of clusters versus the SSE values. This
method is called as the elbow method of determining the
optimum value of K. As per the elbow method, optimum
value obtained is the one after which the graph distortion
starts decreasing in a linear trend. We observe from the
graph that when the number of clusters is 3, the graph
decreases in a linear trend for values beyond that. Hence,
the optimum value of K is 3, which we already know is
correct since this is a labelled dataset of three types of
flowers.
Now, let is implement the K Means clustering algorithm
using the number of clusters as K=3. Copy and run the code
below into the Python editor:
kmeans = KMeans(n_clusters=3,
init='k-means++',
max_iter=300,
n_init=10,
random_state=0)
y_kmeans = kmeans.fit_predict(X)

print(y_kmeans)
Output:
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
11111111
11111111111110020000000000000
00000000
00020000000000000000000000202
22202222
22002222020202200222220222202
22022202
2 0]
We observe from the output that the algorithm has
classified the dataset into 3 clusters and numbered them as
0, 1 and 2.

Practical use case in Python


We shall now apply all that we have learnt so far to
implement a practical use case in Python. For this use case,
we shall use the IPL Auction Dataset obtained from Kaggle
called IPL_Auction_2022_FullList.csv. It contains a list of
all players who went sold or unsold in the 2022 IPL bid along
with their statistical information. The aim of this exercise
shall be to train a model that predicts whether a player with
a player is bound to get sold or unsold in an IPL auction. The
first step we need to do is perform some exploratory data
analysis. A section of the dataset is shown in Figure 9.2
below:

Figure 9.2: IPL Auction 2022 Dataset from Kaggle

From the figure of the dataset above, there are many


variables like Country and State Association that we do
not require since they are unrelated to player performance.
We need to focus only on that information which tells more
about a player’s past performance and ability, thus making
a player more probable to be sold in an auction. One could
clearly see that 'Specialism', 'Test caps', 'ODI caps',
'T20 caps', and 'IPL' are those key variables that
completely describe the ability of the player and hence we
could select them as the features to train the model. The
last column 'Bid' is obviously the target variable for the
model that contains only two values 'Sold' and 'Unsold'.
Let us execute all of this in a pandas dataframe. Copy and
run the code below in the Python editor:
import pandas as pd
import numpy as np

df = pd.read_csv(r"IPL_Auction_2022_FullList.csv")
df = df[['Specialism','Test caps','ODI caps','T20
caps','IPL','Bid']]
We observe that the column 'Specialism' contains
information about whether a player is a batsman, bowler,
wicket keeper or all-rounder. Though helpful, we would need
to make sure that the values here are transformed to
numeric codes as the model would not be able to process
string values.
We get that done using the code below:
mapping_dictionary ={'ALL-ROUNDER' : 0, 'BATSMAN' : 1,
'BOWLER' : 2, 'WICKETKEEPER':3}
df['Specialism'] =
df['Specialism'].map(mapping_dictionary)

Next, we create the feature and target arrays using the


code below:

X = df.drop('Bid',axis=1).values
Y = df['Bid'].values
Our next step is to choose the right classification algorithm
for this dataset. Since we are interested in only two
outcomes 'Sold' or 'Unsold', we can use a logistic
regression model for this dataset. As usual, we split the data
into training and testing sets and then train the model on
the training sets.
Below is the implementation:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

X_Train, X_Test, Y_Train, Y_Test = train_test_split(


X, Y, test_size = 0.3, random_state=100)
LR = LogisticRegression()
LR.fit(X,Y)
Now that we have fit the logistic regression model on the
training dataset, our next task is to evaluate its accuracy on
the testing dataset using the code below:
from sklearn.metrics import accuracy_score
Y_Predict = LR.predict(X_Test)
accuracy = accuracy_score(Y_Test, Y_Predict)
print('Accuracy:', accuracy)

from sklearn.metrics import confusion_matrix


tn, fp, fn, tp = confusion_matrix(Y_Test, Y_Predict).ravel()
print('True Negative:' + str(tn), 'False Positive:' + str(fp),
'False Negative:' + str(fn), 'True Positive:' + str(tp))
Output:
Accuracy: 0.7288135593220338
True Negative:17
False Positive:42
False Negative:6
True Positive:112
From the output above, we see that the model shows an
accuracy of 72.88% on the testing data, which means that
the model is fairly accurate.
We have also obtained individual values from the confusion
matrix. We see that the True Positive number is high, and
the False Negative number is low, which means that the
model has a high accuracy in predicting positive outcomes.
We also see that the False Positive number is high
compared to the True Negative which means that the
model does a less accurate job of predicting a negative
outcome when the outcome is negative. We can assess this
from various performance metrics related to a confusion
matrix.
Let us calculate the various performance metrics associated
with the confusion matrix as shown below:

Precision = = 7
2.72%

Recall = = 94.9
1%
• False Positive Rate =
= 71.18%

• False Negative Rate =


= 5.08%

• True Negative Rate =


= 28.81%

• Accuracy =

= 72.88%


F1 Score = = 82.3

4%
In this model we see that precision, recall, accuracy, and F1
Score are all high, which is desirable. The low false negative
rate is also desirable. However, we see a high false positive
rate and consequently a low true negative rate arising due
to a high number of false positives. One should alternatively
try different machine learning algorithms and calculate
these metrics to assess the accuracy of prediction of the
model.
With this exercise, we end this chapter and a wonderful
journey through machine learning algorithms using the
Scikit-learn library of Python. It is important to note that
apart from Scikit-learn, there are many other libraries like
Keras, TensorFlow and PyTorch that are used for specific
applications. However, the focus of this chapter has been
Scikit-learn which provides the novice learner with the
launching pad to begin the journey with machine learning.
This chapter should give us a strong foundation to proceed
with the next chapter on deep learning.

Conclusion
This chapter has given an overview of the Python
implementation of machine learning algorithms for the
novice learner and paved the way for further curiosity and
exploration using Scikit-learn. It introduces the reader to the
powerful capabilities that Scikit-learn provides which could
be unleashed to implement interesting projects in machine
learning. Apart from the exercises covered in this chapter,
there are a variety of other modules in Scikit-learn that the
reader could explore and consequently unleash the machine
learning capabilities of Scikit-learn. Real world problems can
also get increasingly complex when the number of variables
in a dataset is large or when the dataset itself is huge. This
could require selective usage of dimensionality reduction
techniques. It is important to understand that no model is
perfect, and that model improvement is an iterative
procedure.
No discussion around intelligent automation would be
complete without exploring deep learning. In the
contemporary era, deep learning applications have entered
every single layer of technological innovation such that they
promise to dominate the 21st century. The recent hype
around Large Language Models (LLMs) like ChatGPT and
BARD have taken the world of generative AI by storm.
The next chapter will introduce the reader to key concepts
in deep learning and show how neural networks are
implemented in Python. Related concepts like natural
language processing would also be covered. Hence, without
any further delay, we should flip the page and move to the
next chapter!

Join our book’s Discord space


Join the book’s Discord Workspace for Latest updates,
Offers, Tech happenings around the world, New Release and
Sessions with the Authors:
https://discord.bpbonline.com
Chapter 10
Intelligent Automation
Part 2: Using Deep
Learning

Introduction
This chapter provides a background on deep learning and neural
networks along with an implementation of a neural network in
Python. It is important to note that the study of deep learning is
vast, and it is impossible to discuss every single topic within the
scope of this book. This chapter intends to inform the reader about
the role that Python plays in easing the process of deep learning
computations. The chapter revisits the concept of a neural
network, which forms the crux of deep learning. The chapter shall
also walk the user through the step-by-step process of building a
neural network in Python. The key concept of backpropagation
shall be covered with an actual example in Python by taking the
perceptron as a starting point. The chapter continues with the
discussion of useful applications of deep learning like natural
language processing (NLP) and computer vision also provide a
list of popular Python libraries that are used in deep learning
applications like NLTK, Spacy and OpenCV to name a few. The
chapter then concludes with the study of a real-world use case in
deep learning.

Structure
The chapter covers the following topics:
• Implementing a neural network in Python
• Backpropagation
• Popular Python libraries for deep learning
• Deep learning applications
• Natural language processing
• Practical use case in Python

Objectives
We have already seen the architecture of a neural network in
Chapter 3, Getting Started with AI/ML in Python, where we
observed that successive layers of nodes consisting of inputs,
weights and biases combined with activation functions are
iteratively responsible for the self-learning mechanism of a neural
network. We also went through different types of neural networks
like convolutional neural network and recurrent neural network.
Towards the end, we will cover theoretical aspects of various
applications in deep learning like computer vision, natural
language processing and long short-term memory. With this
background, we shall now plunge deep into the Python
implementation of a basic neural network and its outputs
transformed using activation functions. The process of
backpropagation has been covered in quite detail in the context of
a single perceptron. Later, the exciting foray of natural language
processing shall be touched upon which provides a glimpse of the
useful applications of deep learning. By the end of this chapter, the
reader shall also be comfortable in independently exploring and
implementing deep learning algorithms.

Implementing a neural network in Python


If we recall, the basic element of a neural network is the
perceptron as shown in Figure 10.1 below:
Figure 10.1: Basic element of a neural network: A perceptron

We shall begin with the implementation of the perceptron in


Python. We know that the output of a single perceptron is equal to
the dot product of the inputs (x1,x2,x3,x4) and its corresponding
weights (w1,w2,w3,w4) to which the bias is added at the end. Let
us first create separate variables for inputs, weights and bias in
Python as shown below:
Inputs:
lst_inputs = [5, 3.7, 1.3, 2.75]
Weights:
lst_weights = [0.8, 0.5, -0.3, 1.0]
Bias:
dbl_bias = 3.5
Next, we multiply each element of the inputs with the
corresponding element of the weights in order to get the dot
product and then add the bias value as shown below:
dbl_output = (lst_inputs[0]*lst_weights[0] \
+ lst_inputs[1]*lst_weights[1] \

+ lst_inputs[2]*lst_weights[2]+ \

lst_inputs[3]*lst_weights[3]) + dbl_bias

print(dbl_output)
Output:
11.71
On printing the output, the value obtained is 11.71. In this way, we
can implement a perceptron in Python.
In a similar way, let us use Python to implement a basic multi-layer
neural network as shown in Figure 10.2 below:

Figure 10.2: Basic neural network

Here, we have three nodes in the input layer. In the middle layer,
we have four nodes and in the output layer we have two nodes. Let
us first focus on obtaining the values y1, y2, y3 and y4 at the four
nodes of the hidden layer. As the hidden layer contains four nodes
and the input layer contains four nodes, we shall be having four
lists of three elements each as the weights. As the hidden layer
contains four nodes, we shall have a single list of four elements as
the list that represents the bias values. The code below represents
these entities as variables:
lst_inputs = [3, 10, 9.7]

lst_weights_1 = [0.5, 0.9, 0.35]


lst_weights_2 = [0.1, 0.3, 0.93]
lst_weights_3 = [0.91, 0.72, 0.1]
lst_weights_4 = [1.0, 0.31, 0.75]

lst_bias = [0.2, 0.95, 1.0, 4.0]


Next, we calculate the values at each output using the same
operation that we perform for individual perceptrons using the
code below:
y1 = lst_inputs[0]*lst_weights_1[0]+\
lst_inputs[1]*lst_weights_1[1]+\
lst_inputs[2]*lst_weights_1[2] + lst_bias[0]

y2 = lst_inputs[0]*lst_weights_2[0]+\
lst_inputs[1]*lst_weights_2[1]+\
lst_inputs[2]*lst_weights_2[2] + lst_bias[1]

y3 = lst_inputs[0]*lst_weights_3[0]+\
lst_inputs[1]*lst_weights_3[1]+\
lst_inputs[2]*lst_weights_3[2] + lst_bias[2]

y4 = lst_inputs[0]*lst_weights_4[0]+\
lst_inputs[1]*lst_weights_4[1]+\
lst_inputs[2]*lst_weights_4[2] + lst_bias[3]

print('y1 = ' + str(y1))


print('y2 = ' + str(y2))
print('y3 = ' + str(y3))
print('y4 = ' + str(y4))
Output:
y1 = 14.094999999999999
y2 = 13.270999999999997
y3 = 11.9
y4 = 17.375
In this way, we obtain the nodal outputs for y1, y2, y3 and y4
which would thereafter be the inputs for the output layer that shall
be computed in a similar way.
Although, we successfully captured the outputs in the most
classical way, the process becomes tedious as the number of
layers and nodes in each layer goes on increasing. In this case, we
need to be in search of a faster way of populating the outputs
using Python itself. Fortunately, we do have one! It is the power of
numpy arrays!
Let us see how it is done!
Copy and run the code below into the Python editor:
import numpy as np

lst_inputs = [3, 10, 9.7]

arr_weights = [[0.5, 0.9, 0.35],

[0.1, 0.3, 0.93],

[0.91, 0.72, 0.1],

[1.0, 0.31, 0.75]]

lst_bias = [0.2, 0.95, 1.0, 4.0]

hidden_layer_output = np.dot(arr_weights, lst_inputs) + lst_bias

print(hidden_layer_output)
Output:
[14.095, 13.271, 11.9, 17.375]
We observe that in this case, the output is a list, and all the values
match the ones that we had individually obtained in the earlier
exercise. Essentially, what we have done here is create a single
variable for weights and initialize it to be a list of four lists, each of
which contains three elements corresponding to the weights.
Thereafter, we use np.dot to obtain the dot product of the inputs
and weights and then we add the biases to this product. Is it not
simple to do?
Now that we have successfully been able to implement the output
of a hidden layer in an artificial neural network, let us move to the
next step that we had discussed in Chapter 3, Getting Started with
AI/ML in Python, which deals with processing the output with an
activation function. The most common activation function is a
rectified linear unit (ReLU) function that returns the same input
if the input is positive, else it returns a value of zero.
Mathematically, it can be expressed as below:
ReLU
f(X) = max(0,X)
Now, let us implement this activation function using Python. We
just observed that the output obtained for the four nodes of the
hidden layer is [14.095, 13.271, 11.9, 17.375]. For the node y1,
the output is 14.095. This is just one value of output that is
obtained for a particular combination of inputs and weights.
However, several different values of the output could be obtained
for various combinations of inputs and weights. Let us assume a
list of arbitrary values of outputs where 14.095 would also be one
of them and apply the ReLU function to check the final output.
Copy and run the code below in the Python editor:
import numpy as np

lst_Output = [0.5, -3.7, 14.095, 10.5, -2.8, -15, 0, 0.135]


lst_Activated_Output = np.maximum(0,lst_Output)

print(lst_Activated_Output)
Output:
[ 0.5 0. 14.095 10.5 0. 0. 0. 0.135]
We observe that all the values less than zero have been converted
to zero by ReLU.
Backpropagation
In this section, we shall study an important concept related to the
efficient training of neural networks which is backpropagation.
What backpropagation does is that it efficiently trains the neural
network by backwardly propagating the errors from the output
nodes to the input nodes. It does this by defining the loss function
and calculating the impact of each of the weights of the neural
network on the loss function by means of gradient descent. We
have already studied gradient descent in Chapter 9 - Intelligent
Automation Part 1: Using Machine Learning on machine learning.
Here, we shall see how backpropagation also uses gradient
descent to calculate the optimum values of the weights and biases
that could do the task which the neural network is intended to do.
In order to perform mathematical calculations required for
backpropagation, this section assumes that the reader is
conversant with basic concepts of calculus like derivatives, partial
derivatives and chain rule.
We are already familiar with the process of calculating the output
of a node from a neural network. We calculate the dot product of
the weights and the inputs and then add the bias term at the end.
This process is alternatively called as forward pass. For the
neural network shown in Figure 10.2 the output for node y1
calculated using weights w11,w12,w13 and inputs x1,x2,x3 by the
process of forward pass would be w11*x1 + w12*x2 + w13*x3 +
b1 where b1 is the bias. The outputs for the other nodes would be
calculated in a similar way.
Now, the outputs calculated in this way would be the predicted
outputs. However, when we train the neural network, we train it
with the intention of producing an expected output. A measure of
the gap between the expected output and the predicted output is
called as error and the goal of training the neural network is to
optimize the weights in such a way that they minimize this error.
The process of optimizing these weights is called as
backpropagation.
As the purpose of this book is simply to demonstrate the
significance of Python in simplifying computations, we would not
be going into the actual implementation of backpropagation in the
context of a fully connected neural network; else the calculations
might span many pages and go beyond the scope of this book.
However, we shall acquaint ourselves with the general process of
backpropagation in the context of a single perceptron so that the
reader could expand the idea to implement backpropagation to a
fully connected neural network.
As shown in the above Figure 10.3, the node gets four values of
inputs as x1, x2, x3 and x4 with weights w1, w2, w3 and w4
respectively. The bias value is b0. Hence, the nodal sum or the
output would be given by the dot product of the inputs and weights
added to the bias.
If z is the output, we have:
z = w1 * x1 + w2 * x2 + w3 * x3 + b0
Thereafter, we subject this output to an activation function. Let us
choose the ReLU activation function which is defined as:
ReLU(z) = max(0,z)
Hence, the final output of the node could also be expressed in the
following way:
output = ReLU(w1*x1 + w2*x2 + w3*x3 + b0)
Mathematically, this could also be expressed in the manner below:
output = ReLU(sum(w1*x1, w2*x2, w3*x3, b0))
As discussed earlier, the neural network tries to realize an
expected output. The gap between the expected output and the
predicted output is called the error or also the cost function. Let
us define a mean square error in this simple case which is given by:

Here, output(E) is the expected output.


To perform backpropagation, we assess the impact of a particular
parameter on the total error. For this exercise, let us try to assess
the impact of weight w1 on the total error. The impact of other
parameters can be assessed using the same process.
We know that the partial derivative of a function consisting of
several variables is its derivative with respect to one of the
variables keeping all others constant. Is this too much jargon? Let
us now do it mathematically, to make it easier.
The impact of weight w1 on the error would be given by:

Using the chain rule, this can be expressed as:

Let us calculate each of these terms separately:


1.

2.

3.

We define all these functions from the forward pass in Python, as


shown below:
def ReLU(z):
return np.maximum(0,z)
def Sum(lst_inputs,arr_weights,lst_bias):
return np.dot(arr_weights, lst_inputs) + lst_bias
def Error(output_E,output):
return (1/2)*(output_E-output) ** 2
Similarly, we define all functions from backpropagation in Python
as shown below:
def dError(output_E,output):
return output_E-output
def dRELU(z):
return 1. if z > 0 else 0.
def dSum_w1(lst_inputs,arr_weights):
return lst_inputs[0]
Next, we initialize the set of inputs, weights and bias as shown
below:
lst_inputs = [3,10,9.7,0.95]

arr_weights = [0.5, 0.3, 0.93, 1.0]

lst_bias = [0.2]
As a part of backpropagation, we also need to define the expected
output in order to calculate the error as well as the learning rate in
order to perform gradient descent. We initialize the parameters to
the values as shown below:
output_E = 15.00

learning_rate = 0.01
Thereafter we perform backpropagation using the code below:
output = ReLU(Sum(lst_inputs,arr_weights,lst_bias))[0]

Error_Value = Error(output_E,output)

dError_value = dError(output_E,output)

dRELU_value = dRELU(Sum(lst_inputs,arr_weights,lst_bias))

dSum_w1_value = dSum_w1(lst_inputs,arr_weights)

dError_w1 = dError_value*dRELU_value*dSum_w1_value
Finally, we would calculate the new value of w1 using gradient
descent as shown in the formula below:
w1 = arr_weights[0]

w1_new = w1 - learning_rate*dError_w1
This value of w1_new thus obtained would be the new value that
we would be using for the weight w1 in performing the next
iteration of forward pass and backpropagation thereafter. These
iterations would continue until we come to a point that the error
does not reduce much any further and the weight seems to
converge to a particular value. It is important to note here that the
learning rate is an important parameter for performing
backpropagation as too high or too low learning rates might not
provide the appropriate optimized values for the parameters.
The entire code has been given below for reference. Copy and
paste the code in the Python editor:
#Backpropagation
import numpy as np
import matplotlib.pyplot as plt

def ReLU(z):
return np.maximum(0,z)

def Sum(lst_inputs,arr_weights,lst_bias):
return np.dot(arr_weights, lst_inputs) + lst_bias

def Error(output_E,output):
return (1/2)*(output_E-output) ** 2

def dError(output_E,output):
return output_E-output

def dRELU(z):
return 1. if z > 0 else 0.

def dSum_w1(lst_inputs,arr_weights):
return lst_inputs[0]

lst_inputs = [3,10,9.7,0.95]

arr_weights = [0.5, 0.3, 0.93, 1.0]


lst_bias = [0.2]

output_E = 15.00

learning_rate = 0.01

output = ReLU(Sum(lst_inputs,arr_weights,lst_bias))[0]

Error_Value = Error(output_E,output)

dError_value = dError(output_E,output)

dRELU_value = dRELU(Sum(lst_inputs,arr_weights,lst_bias))

dSum_w1_value = dSum_w1(lst_inputs,arr_weights)

dError_w1 = dError_value*dRELU_value*dSum_w1_value

w1 = arr_weights[0]

w1_new = w1 - learning_rate*dError_w1

print(w1_new)
Output:
0.49012999999999995

Popular Python libraries for deep learning


We have already seen the power of Scikit-learn in Chapter 9,
Intelligent Automation Part 1: Using Machine Learning, when we
studied machine learning. It continues as the best library for
machine learning applications. However, when we delve into deep
learning, Python open-source community has come up with a
myriad number of powerful libraries that enable immense
computational power in performing calculations while training deep
neural networks. In this section, we shall discuss some of the
popular libraries that are frequently used in deep learning:
• TensorFlow: TensorFlow is a library released by Google that
enables powerful numerical computing that is required for de
ep learning. However, it is frequently used to create and trai
n neural networks as well. The Keras module built on top of
tensorflow enables one to create neural networks where on
e also has the flexibility to specify the activation functions an
d layers in the neural network. Let us have a look at this func
tionality using the sequential method of the Keras module. P
rior to using TensorFlow, one would need to install it using th
e command below:
pip install tensorflow
Now, copy and paste the code below into the Python editor:
#TensorFlow Tutorial
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, activation='relu',input_shape=
[4])])
model.summary()
Here, we observe that we have created a dense layer for a n
eural network by specifying the input shape argument as [4]
which is basically the shape of the input that is fed to the ne
ural network. Thereafter, we use the model.summary() met
hod to populate the details of the model in the console. On r
unning the code, we obtain the following output in the consol
e:
Model: "sequential"
_____________________________________________________________
____
Layer (type) Output Shape Param #
=======================================
==========================
dense (Dense) (None, 10) 50
=======================================
==========================
Total params: 50 (200.00 Byte)
Trainable params: 50 (200.00 Byte)
Non-trainable params: 0 (0.00 Byte)
We observe from the output that the model is "Sequential"
and the layer is dense. For dense layers, the number of para
meters is calculated using the equation below:
Param = Output Shape * (Input Shape + 1)
On inserting the values, we obtain:
Param = 10 * (4 + 1) = 50
As observed in the Param column of the table, the number of
parameters is indeed 50.
In our last section of this chapter on practical use case with P
ython, we shall study how TensorFlow proves to be a powerfu
l solution to create a neural network that classifies data from
our very familiar Iris dataset.
Until then, let us acquaint ourselves with another useful dee
p learning library called PyTorch.
• PyTorch: PyTorch is another useful Python library for deep le
arning computations which is open-source and built by Faceb
ook. This library is easy to use and hence the favorite of mos
t academicians and researchers. It is extensively used in ma
ny deep learning applications like computer vision and text p
rocessing. Prior to using PyTorch, one would need to install it
using the command below:
pip install torch
The hallmark of PyToch is that it provides the torch.nn modu
le that provides a powerful functionality to build a neural net
work using high level API. We would not be discussing the im
plementation in detail here but interested readers could find
a plethora of examples online that beautifully illustrate the u
sage of PyTorch. However, it would be advantageous to take
a quick glimpse of key concepts related to PyTorch data type
s.
PyTorch acts as an effective replacement for the numpy data
type while performing calculations. It provides us with the ve
ry useful tensor data type that acts as a generic n dimensio
nal vector. Let us take an example. Copy and run the code be
low into the Python editor:
import torch
tsr = torch.tensor([[10,5],[3,1]])
print(tsr)
Output:
tensor([[10, 5],

[ 3, 1]])
However, that is not all! Just wait until we see the beauty of
a tensor in PyTorch. In the section on backpropagation, we ob
served how tedious it is to calculate the derivatives of certai
n expressions at points. But the hassle is about to end with t
he backward method of PyTorch! Copy and run the code bel
ow into the Python editor:
import torch
x = torch.tensor(5.0, requires_grad=True)
y = x**3 +7*x + 9
y.backward()
print(x.grad)
Output:
tensor(82.)
What we did is plain simple. We initialized a tensor x to the v
alue 5.0. Next, we created a dependent variable y as a functi
on of x given by the equation:
y = x3 + 7x + 9
We can algebraically obtain the derivative of this function wit
h respect to x as shown below:
Next, we assign x = 5 in the above expression to obtain the
value of the derivative as:

Now, Pytorch has calculated the derivative of y at x=5 by ap


plying the simple backward method to y and thereafter the
grad method to x. As observed from the output, we get the s
ame value of 82 in just two lines! This is sufficient to get kick
started with building our own neural network!

Deep learning applications


The advent of GPU in computers combined with the computational
and mathematical capabilities provided by latest Python deep
learning libraries has given rise to mind blowing applications in
contemporary artificial intelligence. Prominent among these are
computer vision and natural language processing.
Computer vision predominantly uses convolutional neural
networks (CNNs) providing computers with the ability to identify
objects in images and videos. Natural language processing
provides computers with the ability to understand and comprehend
human language. Most popular contemporary examples are Alexa
and Google Translate.
Advanced implementations of natural language processing have
given rise to large language models (LLMs) becoming
prominent and taking on the technological landscape by storm.
Most popular examples in this category are ChatGPT by OpenAI
and BARD by Google.
We have previously studied specific examples on computer vision
in Chapter 7, Working with PDFs and Images, so we would not be
revisiting the topic here. However, it would be advantageous to
have a quick refresher on natural language processing which is
exactly what we do in the section to follow.

Natural language processing


As discussed earlier, natural language processing enables
computers to understand and interpret human language and
generate responses that humans can understand. The ideal state
of this technology would be that it entirely replicates a human in
terms of cognitive ability and creativity. However, NLP is currently
far away from ideal. The most popular NLP libraries are NLTK and
Spacy.
The NLP cycle consists of certain general steps that are usually
performed on the text data in order to process it and interpret it
further. These steps are as follows:
• Segmentation: This process breaks down a paragraph into
constituent sentences. For example, the paragraph AI is the
greatest innovation of the current century. However, AI need
s to be selectively used to benefit the world. Every technolog
y has its advantages and disadvantages can be broken down
into the following:
∘ AI is the greatest innovation of the current century.
∘ However, AI needs to be selectively used to benefit the
world.
∘ Every technology has its advantages and disadvantages.
• Tokenization: This step breaks the sentence into constituen
t words. The first sentence AI is the greatest innovation of th
e 21st century could be broken down as, AI, is, the, greatest,
innovation, of, the, current, century.
• Stemming: This step breaks down a derived word into the b
ase or root form. For example, the word innovation in the sec
ond sentence could be broken down to the root word innov w
hich is not an actual word, but other modifications of innov c
ould be innovate or innovative.
• Lemmatization: This is similar to stemming but for the fact
that lemmatization returns an actual word. For example, the
word greatest in the first sentence could be lemmatized to gr
eat.
• Removal of stop words: Words like a, an, is, the are consid
ered stop words and may need to be removed in order to foc
us only on the key words in a sentence.
• Part-of-speech (POS) tagging: This step tags words into t
he corresponding part-of-speech that it belongs to which furt
her helps make sense of the existing words in the sentence.
Python libraries like NLTK and Spacy provide ready to use
capabilities to implement all salient steps in natural language
processing.

Practical use case in Python


We have finally arrived at the last and the most exciting part of
every chapter and that is the practical use case in Python! We shall
apply all that we have learnt so far in implementing this use case.
We would want to try something different this time. Instead of
trying to achieve a different goal, let us achieve the same goal, but
this time using a different method.
We shall use the same Iris dataset and try to achieve the objective
of classifying items in that dataset using a neural network instead
of our classical machine learning algorithms.
Let us load the dataset first from Scikit-learn using the code below:
from sklearn.datasets import load_iris
iris = load_iris()
Next, we split that into features and target and assign them to
variables X and Y respectively using the code below:
X = iris.data
Y = iris.target
It is always a good practice to perform encoding of the target
dataset as encoding transforms the labelled outputs into
numerically assigned outputs. For example, if a dataset contains
labelled values like Dog, Cat and Mouse, then an encoder would
transform them to 0, 1 and 2 respectively.
We perform encoding using the LabelEncoder module of Scikit-
learn as shown in the following code:
from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()
Y = encoder.fit_transform(Y)
Next, we perform one hot encoding of this encoded target output.
One hot encoding is the process of transforming an output into a
vector representation. Referring to the earlier example, Dog would
be converted to {0, 0, 1}, Cat would be converted to {0, 1, 0} and
Mouse would be converted to {1, 0, 0}.
from sklearn.preprocessing import OneHotEncoder
onehot_encoder=OneHotEncoder(sparse=False)
Y=Y.reshape(len(Y), 1)
Y=onehot_encoder.fit_transform(Y)
Next, we split the data into training data and testing data using our
regular train_test_split function from the model_selection
module of the Scikit-learn library as shown in the code below:
X_Train, X_Test, Y_Train, Y_Test = train_test_split(
X, Y, test_size = 0.3, random_state=100)
We use a test size of 0.3 and specify a random state of 100.
This step takes us to the end of the data preparatory phase. To
revise what we just did, we have loaded the Iris data, separated
the target variable from the features, encoded the target and
thereafter one hot encoded the target. At the end, we split the data
into training and testing dataset.
Now comes the most interesting part, that of training the model!
However, which model do we choose? A self-made neural network!
Here comes TensorFlow to the rescue!
We use the Sequential method of the Keras module of
TensorFlow to create our neural network as shown in the code
below:
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(3, activation='softmax')
])
As evident from the arguments to the Sequential method, we
have created two dense layers with ReLU activation and one layer
with Softmax activation function.
The next step after creating a neural network model is always to
compile it as shown in the code below:
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',

metrics=['accuracy'])
This step specifies certain important parameters required in the
neural network. As a next step, which is the most important one,
we train the model using the fit method as shown in the code
below:
model.fit(X_Train, Y_Train, batch_size=50, epochs=100)
Here, the number of iterations shall correspond to the number of
epochs.
Next, we assess the accuracy of the model using metrics Loss and
Accuracy as shown in the code below:
loss, accuracy = model.evaluate(X_Test, Y_Test, verbose=0)
print('Loss:', loss)
print('Accuracy:', accuracy)
Finally, we would also be interested in knowing whether the actual
values match with the predicted values. We perform this analysis
using the code below:
Y_Pred = model.predict(X_Test)

actual = np.argmax(Y_Test,axis=1)
predicted = np.argmax(Y_Pred,axis=1)

print(f"Actual: {actual}")
print(f"Predicted: {predicted}")
While this has so far been a piecemeal exercise, it would be
advantageous to have the entire code as single pieces so that it
can be run at one go.
Copy and run the code below into the Python editor:
#Practical Use Case in Python
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder

import tensorflow as tf
from tensorflow.keras import layers
import pandas as pd
import numpy as np
from tensorflow.keras import datasets, layers, models
from tensorflow.keras.utils import to_categorical

iris = load_iris()
X = iris.data
Y = iris.target

encoder = LabelEncoder()
Y = encoder.fit_transform(Y)

onehot_encoder=OneHotEncoder(sparse=False)
Y=Y.reshape(len(Y), 1)
Y=onehot_encoder.fit_transform(Y)

X_Train, X_Test, Y_Train, Y_Test = train_test_split(


X, Y, test_size = 0.3, random_state=100)

model = tf.keras.Sequential([
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(3, activation='softmax')
])

model.compile(optimizer='rmsprop',

loss='categorical_crossentropy',
metrics=['accuracy'])

model.fit(X_Train, Y_Train, batch_size=50, epochs=100)

loss, accuracy = model.evaluate(X_Test, Y_Test, verbose=0)


print('Loss:', loss)
print('Accuracy:', accuracy)

Y_Pred = model.predict(X_Test)

actual = np.argmax(Y_Test,axis=1)
predicted = np.argmax(Y_Pred,axis=1)

print(f"Actual: {actual}")
print(f"Predicted: {predicted}")
Output:
Epoch 1/100
3/3 [==============================] - 1s
6ms/step - loss: 3.1403 - accuracy: 0.3238
Epoch 2/100
3/3 [==============================] - 0s
3ms/step - loss: 2.8077 - accuracy: 0.3524
Epoch 3/100
3/3 [==============================] - 0s
3ms/step - loss: 2.6317 - accuracy: 0.3714
Epoch 4/100
3/3 [==============================] - 0s
2ms/step - loss: 2.4918 - accuracy: 0.4476
Epoch 5/100
3/3 [==============================] - 0s
3ms/step - loss: 2.3969 - accuracy: 0.4381
...
Epoch 99/100
3/3 [==============================] - 0s
2ms/step - loss: 0.3325 - accuracy: 0.9238
Epoch 100/100
3/3 [==============================] - 0s
3ms/step - loss: 0.3292 - accuracy: 0.9238

Loss: 0.30189675092697144
Accuracy: 0.9333333373069763

2/2 [==============================] - 0s
2ms/step
Actual: [2 0 2 0 2 2 0 0 2 0 0 2 0 0 2 1 1 1 2 2 2 0 2 0 1 2 1 0
121120010
1 2 2 0 1 2 2 0]
Predicted: [2 0 2 0 2 2 0 0 1 0 0 2 0 0 2 1 1 1 2 2 1 0 2 0 1 1
10121120010
1 2 2 0 1 2 2 0]
From the output, we observe that model has performed brilliantly
with an accuracy of 93.33 %. Note that the loss and accuracy
values might change every time we run the code and slightly differ
from the ones that have been obtained above. We also observe
that the actual output closely matches with the predicted output,
thus successfully justifying our use of this neural network to solve
the classification problem.

Conclusion
Nobody can deny that neural networks and deep learning require
extensive attention to detailed concepts and patience to master.
With myriad of Python packages, one could simply imagine the
immense possibilities to create and train computationally efficient
neural networks that solve various complex problems. These
neural networks along with their intricate concepts and substantial
learning curve, represent a challenging yet immensely rewarding
field! It would be worthwhile to say that this chapter has been the
most intensive in terms of exploration and application since it has
been a mix of both theory and programming examples.
Nevertheless, smooth seas never made skillful sailors, more so
appropriate in the context in the sea of deep learning! Through this
chapter, we have only scratched the surface of what is possible
with the numerous Python tools available for creating efficient
neural networks. Our journey through theory and practical
examples has been a testament to the fact that mastering deep
learning requires both dedication and practice. Even though we
have made a minuscule attempt in this chapter to introduce the
reader to deep learning and arouse interest in the territory of
Python implementation in deep learning, the next bold step of
regular practice, expansion and consolidation has been handed
over to the able hands of the enthusiastic reader!
Previous few chapters were slight deviations from our usual
purview of desktop and file automation because they intended to
cover core theory and related concepts in the scientific domain of
artificial intelligence, machine learning and deep learning. Now, we
shall be shifting our focus back to more application-centric themes.
In the upcoming chapter, we will delve into business process
workflow, exploring key concepts like orchestration and Python’s
pivotal role in automating business processes. Hence, without
further delay, let us turn the page to the next chapter!

Join our book’s Discord space


Join the book’s Discord Workspace for Latest updates, Offers, Tech
happenings around the world, New Release and Sessions with the
Authors:
https://discord.bpbonline.com
Chapter 11
Automating Business
Process Workflows

Introduction
We have had sufficient discussions on the topic of
automation in the last several chapters of this book. Within
desktop automation we covered file and folder navigation,
we covered screen automation; we went through the
intricacies of automating processes in Excel documents, PDF
documents as well as in image documents. We also covered
automation through the web by getting familiar with the
process of data gathering using scraping techniques and
thereafter explored the arena of automated messaging to
individuals and groups. The hallmark of all the automated
processes that these chapters discussed was the way in
which Python assists in eliminating manual intervention in
them thus playing a crucial role in enhancing an
organization’s efficiency. However, we need to consider the
fact that these processes do not run individually in any
organization. The process workflow of any business or
organization is composed of many such processes that run
in a particular sequence and get triggered conditionally at
appropriate times. They might also run in tandem with other
processes. This is exactly where the need arises to have a
framework that can sequentially and efficiently align these
processes along the business workflow. This framework is
called orchestration. In this chapter, we will introduce you to
orchestration, underscoring how Python can be leveraged to
implement and enhance this vital aspect of business
process management.

Structure
The chapter covers the following topics:
• Understanding a business process workflow
• Introduction to orchestration
• Automation versus orchestration: Differences
• Orchestration platforms available in market
• Achieving orchestration with Python
∘ Prefect
∘ Luigi
• Practical use case in Python

Objectives
By the end of this chapter, the reader shall be conversant
with the concept of orchestration in a business process
workflow and learn to keep in mind the broader picture of
orchestration while automating regular tasks. A summary of
the available orchestration frameworks in market would also
be presented.
The reader shall also get a fair understanding of
implementing orchestration in a workflow using Python
since this chapter shall try at illustrating this concept
through specific examples in Python. The Python modules
that this chapter shall cover are the prefect and the luigi
modules. It shall also compare the two, based on relative
advantages and disadvantages.
As with any other chapter, this chapter shall end with a
practical use case in Python enabling the reader to apply
salient concepts from the learnings gained in the chapter.

Understanding a business process


workflow
A workflow can be defined as a sequence of tasks that
defines the process cycle of an organization. Consider an
organization to have ten different processes. These ten
processes shall have their own time and course of being
triggered in an organization’s business process. The Figure
11.1 below will make this concept clearer:
Figure 11.1: Concept of a business process workflow

As shown in the above figure, a general idea of a business


process workflow is a series of processes that run in a
sequence. We observe that Process 1 is followed by
Process 2. The output of 2 is subjected to a conditional
logic. If the output value of the Boolean is True, then
Process 3 is executed, which is further followed by
Process 4, which ends the workflow. If the output value of
the Boolean is False, then Process 2 directly progresses to
Process 4 that ends the workflow.
The diagram is just for representational purposes and shows
a very simple illustration of a workflow. However, actual
workflows in an organization could constitute multiple,
intricate, and complex processes interlinked together. It is
also important to note here that some of these processes
might be individual workflows themselves. A large
organization would definitely have a complex hierarchy of
such processes and workflows culminating into a top-level
workflow. This should provide some clarity on the concept of
a business process workflow in an organization.
There are few questions that we would need to address prior
to delving deeper into business process workflows.
The first question that needs to be addressed is whether we
have a framework in place to sequentially trigger each of
these processes based on not only their respective
chronology within the workflow but on conditional logic as
well.
Prior to that we would need to address another question:
Why would an organization require automating this
sequential alignment of a workflow when it can deploy
individuals to trigger these tasks at will?
The answer to this question is simple. We require
automating them to eliminate human involvement in
repetitive rules-based tasks. Humans could be better
employed at tasks that require creativity and problem-
solving rather than being utilized in activities that have a
specific rule or repetitive pattern. Besides, this would cut
down significant labor costs as well.
Let us now retrace to the first question. Do we have a
framework to streamline this group of automated tasks
together? The answer is yes, and that leads us to the main
subject of study in this chapter, which is orchestration.
Orchestration is a sure shot way to get rid of manual
errors in a workflow as well as speed up the process. It also
lowers the overall costs of the business due to elimination of
defects in the workflow.
In this chapter, we shall try to implement orchestration with
Python by using our own Python code snippets as an
analogy for business processes, since it is not possible to
cover actual business processes here, which might be too
complex. So let us turn to the next page to begin our
journey with orchestration!

Introduction to orchestration
We already had a look at the general concept and idea of a
business process workflow where individual processes are
sequentially connected to each other and triggered at the
appropriate time. Each of these individual processes might
themselves contain a series or a sequence of automated
steps. The process that manages these set of individual
tasks to create an amalgamated workflow is called as
orchestration.
In orchestration, multiple technologies and platforms might
be coordinated together based on specific rules and
circumstances. Orchestration also takes care of alternate
decisions that would need to be made during changing
circumstances. This could extend from simple to complex
systems. A typical example of orchestration is the creation
of an app and deployment of the app to connect to multiple
users. This could involve the triggering of multiple processes
within the network. Almost all large projects that get
deployed on the cloud involve orchestration of some sort.
It is important to note here that in orchestration, it is not
necessary that all the constituent tasks or processes are
themselves all automated processes. While most of them
might be automated, some of them might even be manual.
Conceptually, orchestration is simply the process of
effectively coordinating between individual tasks that
constitute a business process workflow. In the present day,
this shall be automated due to the availability of numerous
platforms and technology to do so. In this book, we shall be
studying this orchestration in the context of our Python
scripts. Hence, we shall be performing orchestration of only
those automated scripts that we regularly execute in
Python.
Having understood the concept of orchestration, let us now
take a quick glance of the salient differences between
automation and orchestration as that shall help us
consolidate our understanding further. Even though
automation and orchestration are complementary to each
other, there lie certain subtle differences between them.
The next section shall enlist and elaborate on these
differences.

Automation versus orchestration:


Differences
The Table 11.1 below summarizes the main differences
between automation and orchestration:

Automation Orchestration
Number of Automation of a Orchestration is
tasks task is the the coordination
process where (mostly
human automated) of
intervention is multiple
eliminated from automated
a single task. tasks (few
might be
manual).
Complexity Automation is Orchestration is
relatively complex
simple as it is because it
based on requires
predefined rules coordination
for a single and adaptation
task. between
multiple tasks
in a workflow.
Deployment An automated An orchestrated
solution might solution usually
be deployed in- involves
house or locally deployment on
on a desktop. the network.
Typical use IT processes, DevOps, Cloud
cases financial
processes,
designing
processes
Table 11.1: Primary differences between automation and orchestration

Orchestration platforms available in


market
While we would be studying orchestration using Python in
this chapter, it would be helpful to know the orchestration
tools that are available in the market which provide
orchestration capabilities. Consider this analogous to the
robotic process automation (RPA) tools that we studied
in Chapter 2, RPA Foundations, prior to exploring the Python
RPA module.
We shall have a quick glance at some of the popular
platforms available in the market along with the pros and
cons of each:
• Kubernetes: This is a popular open-source system fo
r automating the deployment, scaling, monitoring and
deployment of software. This is basically a container o
rchestration system. To understand what a container i
s, consider it to be a software package containing all t
he necessary elements required to run in any environ
ment. The concept of a container enables developers
to focus solely on writing programming logic of the sof
tware, where IT takes care of the deployment aspect.
Kubernetes, which is also known as kube orchestrates
such containers across multiple hosts.
There are certain basic components of a Kubernetes fr
amework. They are mentioned below:
∘ Node: In Kubernetes, a node could be considered
as something where containerized workloads are
run and where communication between the
containers and the Kubernetes service is
maintained.
∘ Cluster: A cluster is a set of nodes that share
resources and run containerized applications.
∘ Pod: A pod is basically a group of containers that
have been deployed to a single node.
∘ Replication controller: A replication controller is
an intelligent agent that would decide the
allocation of a pod in the cluster.
• Amazon Elastic Kubernetes Service (Amazon EK
S): This is a Kubernetes service that runs Kubernetes i
n the AWS cloud. The scheduling of containers is auto
matically managed by Amazon EKS in the cloud. Amaz
on EKS also provides the potential to leverage the adv
antage of the AWS infrastructure. With Amazon EKS, o
ne does not need to maintain Kubernetes nodes by on
eself as they are taken care of by the managed infrast
ructure of AWS. Moreover, Amazon EKS applications h
ave total compatibility with those that run on standar
d Kubernetes environment.
• Openshift: Built on the Red Hat enterprise Linux and
Kubernetes engine, this is a cloud-based Platform as
a service (PaaS) which provides infrastructure for bu
ilding, managing, and deploying containerized applica
tions. There are two variations of this platform:
∘ Openshift Online: This is offered as Software
as a service (SaaS).
∘ Openshift Dedicated: This is a managed
service.
• Nomad: This is another multi-OS orchestration platfor
m that is supported by macOS, Windows and Linux. N
omad runs as a single binary with a small resource foo
tprint (35MB). The advantage of using Nomad is that i
t can be used to run containerized as well as non-cont
ainerized applications. Nomad also provides support f
or Windows, Java and VMs.
This overview of orchestration and its associated tools in the
market should be sufficient to arouse further interest in us in
exploring the capabilities in orchestration offered by Python
libraries. Hence, without further delay, let us move on to the
next section!

Achieving orchestration with Python


Let us begin our journey of orchestration with Python. The
first library that we shall explore is the prefect library after
which we shall cover the luigi library as well.

Prefect
The Python prefect library enables one to define a
sequence of tasks in a flow and trace dependencies among
tasks. An example would make this clear. Let us take a very
simple case of performing mathematical calculations by
writing functions. The steps that we shall perform would be
as below:
1. Take any two integers with arbitrary values.
2. Add them together.
3. Multiply the sum thus obtained, by 2, consequently m
ake the sum amount double.
Let us try to program this procedure in Python. We would
basically have to define separate functions for adding the
numbers to get the sum and thereafter doubling the sum to
get the result. Let us arbitrarily select 5 and 10 as the two
numbers:
#Define the functions

def Add(intNum_1,intNum_2):
return intNum_1 + intNum_2

def Double(intNum):
return 2*intNum

#Initialize the integer values

intNum_1 = 5

intNum_2 = 10

#Perform the Operations

Sum = Add(intNum_1,intNum_2)

Doubled_Sum = Double(Sum)

print(Doubled_Sum)
Output:
30
As observed, the program gives us the expected output of
30 which is obtained by adding 5 and 10 which gives 15
and then doubling the sum which gives 30. Now let us try to
achieve the same result by visualizing the same program as
a workflow using the prefect library of Python.
Follow the steps below to start using prefect:
1. To start using prefect, the first thing that needs to be
done is the installation of the library. It is important to
note that we would be using the initial version of pref
ect which is prefect==1.0.0 while studying example
s in this chapter. The latest version of prefect is an on
going project and has syntax and methods which are
different from the earlier version. So, while installing,
we would install the earlier version using the comman
d below:
pip install prefect==1.0.0
2. Next, we need to ensure that graphviz package has b
een installed for our respective machine and operatin
g system from the location below:
https://graphviz.gitlab.io/_pages/Download/Dow
nload_windows.html
3. While installing, select the option to add the location o
f the bin directory of the Graphviz installation to the
User Path when the installer prompts to do so. After i
nstalling, add the path to the dot.exe file within the b
in folder to the System path as well.
4. Finally, we would need to install the prefect[viz] Pyth
on package for visualizing the flow at the end. Install t
he package using the command below:
pip install "prefect[viz]"
5. We are all set now to use prefect. We would require t
he task and Flow modules from the prefect library w
hich we obtain by importing them using the line of co
de below:
from prefect import task, Flow
6. The first thing we would need to understand is that th
e functions that we defined in the program for summi
ng and doubling would now have to be defined as task
s. Even the print operation that we performed at the
end of the program to print the final answer would ne
ed to be defined in a separate function and denoted a
s a prefect task. The process to do this is simple. We r
etain the definition of the functions as they are. We si
mply add a line @task prior to the declaration of the f
unctions as shown in the code that follows:
@task
def Add(intNum_1,intNum_2):
return intNum_1 + intNum_2

@task
def Double(intNum):
return 2*intNum

@task
def Print_Value(strValue):

print(strValue)
7. Now that we have defined all our tasks, our next step
would be to execute our procedure sequentially. We d
o this by initializing the Flow module to a variable na
med flow and including our procedure in a with block
of the flow. The code below would make this concept c
lear:
with Flow('Mathematical-Operations') as flow:

intNum_1 = 5

intNum_2 = 10
Sum = Add(intNum_1,intNum_2)
Doubled_Sum = Double(Sum)
Print_Value(Doubled_Sum)
8. This defines the entire flow. Our final step would be to
execute the flow by using the run method of the Flo
w module as shown in the code below:
flow.run()
9. After the flow is executed, we would be interested in v
isualizing the flow graphically which would enable us t
o trace dependencies similar to a flowchart. We do thi
s using the visualize method as shown in the code be
low:
flow.visualize(filename="Mathematical Operations")
10.Here, we observe that the visualize method allows us
to pass an argument named filename where we speci
fy the name of the file as "Mathematical Operation
s" where the flow would be saved in a graphical forma
t in the same directory where the Python script is loca
ted or the home directory of the Python script.
After running the entire code sequentially in a Python
editor from all the steps mentioned above, we get the
output in the console as shown below:
Output:
[2023-07-22 16:19:05+0530] INFO - prefect.Flo
wRunner | Beginning Flow run for 'Mathematica
l-Operations'
[2023-07-22 16:19:05+0530] INFO - prefect.Tas
kRunner | Task 'Add': Starting task run...
[2023-07-22 16:19:05+0530] INFO - prefect.Tas
kRunner | Task 'Add': Finished task run for task
with final state: 'Success'
[2023-07-22 16:19:05+0530] INFO - prefect.Tas
kRunner | Task 'Double': Starting task run...
[2023-07-22 16:19:05+0530] INFO - prefect.Tas
kRunner | Task 'Double': Finished task run for ta
sk with final state: 'Success'
[2023-07-22 16:19:05+0530] INFO - prefect.Tas
kRunner | Task 'Print_Value': Starting task run...
30
[2023-07-22 16:19:05+0530] INFO - prefect.Tas
kRunner | Task 'Print_Value': Finished task run f
or task with final state: 'Success'
[2023-07-22 16:19:05+0530] INFO - prefect.Flo
wRunner | Flow run SUCCESS: all reference task
s succeeded
11.We observe that the event of running every function
within the flow is recorded sequentially with a date an
d time stamp similar to the log functionality. We also o
bserve that the output value of 30 has been printed w
ithin the console.
12.We would now want to observe the graphical flowchar
t of the workflow. If we go to the home directory of the
Python script, we will find that a file named "Mathem
atical Operations.pdf" has been created. On openin
g the file, we observe a flowchart as shown in Figure 1
1.2 below:

Figure 11.2: Graphical representation of a workflow in prefect

13.We observe how the Add, Double and Print_Value f


unctions which called one after another as prefect tas
ks and how the intNum and strValue arguments act
as dependencies in their execution. The entire code h
as been shown in the next page for reference:
from prefect import task, Flow

@task
def Add(intNum_1,intNum_2):

return intNum_1 + intNum_2

@task
def Double(intNum):
return 2*intNum

@task
def Print_Value(strValue):
print(strValue)

with Flow('Mathematical-Operations') as flow:

intNum_1 = 5

intNum_2 = 10

Sum = Add(intNum_1,intNum_2)
Doubled_Sum = Double(Sum)
Print_Value(Doubled_Sum)

flow.run()
flow.visualize(filename="Mathematical Operations")
Hope this tutorial on prefect has been an interesting one!
We have taken a very simple example as the main aim was
to clarify concepts on the execution of a workflow. An
interested reader should revisit earlier Python exercises
from this book and try converting them into prefect
workflows. As the workflows get more complicated, it surely
helps to observe the dependencies graphically in a prefect
flowchart. Now, let us explore another interesting Python
module called Luigi.

Luigi
Apache Luigi is a workflow management system with a
Python based API that enables complex data pipelines.
1. The first step that needs to be done prior to using lui
gi is the installation of the package using the comman
d below:
pip install luigi
We shall use the same example of mathematical oper
ations to understand how luigi works.
The fundamental module of the luigi package is the T
ask module which we import as a first step using the
code below:
import luigi
from luigi import Task
The way luigi works is by creating a class that takes i
n the Task argument, defining all the functions requir
ed within that class and including a run method withi
n the class that executes the workflow. We first define
the class as named Mathematical_Operations as sh
own below:
class Mathematical_Operations(Task):
We observe that we have defined the class that takes
in the Task object as the argument enabling us to per
form luigi activities within the class.
2. Next, we define two parameters intNum_1 and intN
um_2 within the class and initialize them to luigi.Par
ameter(). This would enable us to initialize the class
with dynamic parameters similar to what a constructo
r does within a class as shown below:
class Mathematical_Operations(Task):

intNum_1 = luigi.Parameter()
intNum_2 = luigi.Parameter()
3. Next, we shall define our regular Add function within t
he class that would utilize the parameters intNum_1
and intNum_2. We also define the Double function t
hat doubles the value returned to it. The updated clas
s would be the one shown below:
class Mathematical_Operations(Task):

intNum_1 = luigi.Parameter()
intNum_2 = luigi.Parameter()

def Add(self,intNum_1,intNum_2):
return intNum_1 + intNum_2
def Double(self,intNum):
return 2*intNum
4. Now we come to the most important part which is the
run function. It is this function that would execute the
procedure within our workflow using all the functions t
hat we defined within the class. The entire class along
with the run function is shown below:
class Mathematical_Operations(Task):
intNum_1 = luigi.Parameter()
intNum_2 = luigi.Parameter()

def Add(self,intNum_1,intNum_2):
return intNum_1 + intNum_2
def Double(self,intNum):
return 2*intNum

def run(self):
Sum = self.Add(self.intNum_1,self.intNum_2)

Doubled_Sum = self.Double(Sum)
print(Doubled_Sum)
5. Finally, at the end we call the build function of the lui
gi module which executes the workflow as shown belo
w:
luigi.build([Mathematical_Operations(intNum_1=5,in
tNum_2=10)],local_scheduler=True)
We pass two arguments into the build function. First
argument is the instance of the class
Mathematical_Operations and the second argument is
local_scheduler = True, which indicates that we would
want to run this task locally without connecting to the luigi
server.
The entire piece of code has been shown below for
reference. Copy and run the code in the Python editor:
import luigi
from luigi import Task

class Mathematical_Operations(Task):

intNum_1 = luigi.Parameter()
intNum_2 = luigi.Parameter()

def Add(self,intNum_1,intNum_2):
return intNum_1 + intNum_2

def Double(self,intNum):
return 2*intNum
def run(self):

Sum = self.Add(self.intNum_1,self.intNum_2)

Doubled_Sum = self.Double(Sum)

print(Doubled_Sum)

luigi.build([Mathematical_Operations(intNum_1=5,intNum_
2=10)],local_scheduler=True)
On running the code, we get the output as shown below:
===== Luigi Execution Summary =====

Scheduled 1 tasks of which:


* 1 ran successfully:
- 1 Mathematical_Operations(intNum_1=5,
intNum_2=10)

This progress looks :) because there were no failed


tasks or missing dependencies

===== Luigi Execution Summary =====

30
We observe that the value 30 has been populated in the
output console which is the value that we expected to
achieve in the result. The summary gives an indication of
the tasks that have run. Since we had just one class
Mathematical_Operations here, the summary mentions
just the particular task.
The complete output in the console would also consist of the
entire log with DEBUG and INFO statements as well as
username details which have not been shown here.
In this way, we have seen the basic framework of a luigi
module. The official documentation of luigi would discuss
several more useful features that would enable one to
incorporate more complex pipelines into it.
We have now seen both prefect and luigi modules and
observed through examples how they both enable the
incorporation of workflow management functionalities. We
shall move towards the end of this chapter now with a
practical use case in Python.

Practical use case in Python


We have now arrived at the final section of this chapter
where we shall apply whatever we have learnt to implement
a practical use case in Python.
We could use one of the many important machine learning
algorithms that we implemented using Python libraries in
Chapter 9 as an example of a use case that we would want
to orchestrate using Python. Let us take the example of the
Naïve Bayes algorithm that we implemented in Python and
orchestrate it using the prefect Python library. We shall
observe every section of the original piece of code and
transform it into a corresponding task that can be called in
the prefect workflow.
The first thing that we would need to do is import all the
required libraries. First, we import the libraries that are
required for implementing the Naïve Bayes machine
learning algorithm as shown below:
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
Next, we import the libraries required for the prefect
workflow as shown below:
from prefect import task, Flow
If we recall, the first step is to load the Iris dataset using the
code below:
iris = load_iris()
X = iris.data
Y = iris.target
We need to wrap this step in a function in order to convert it
to a prefect task which we implement using the code below.
Note the @task line before the function:
@task(nout=2)
def Load_Iris():
iris = load_iris()
X = iris.data
Y = iris.target
return X,Y
In the above code, note that we have also added an
argument nout=2 to @task which specifies that the
function would be returning two values X and Y.
The next step in the Naïve Bayes algorithm is to split the
dataset into training and testing data using the code below:
X_Train, X_Test, Y_Train, Y_Test =
train_test_split(X,Y,test_size=0.3, random_state=100)
Again, we execute this in the context of a prefect workflow
using the code below:
@task(nout=4)
def Split_Data(X,Y):
X_Train, X_Test, Y_Train, Y_Test =
train_test_split(X,Y,test_size=0.3, random_state=100)
return X_Train, X_Test, Y_Train, Y_Test
Here again, note nout=4 which specifies to @task that the
function would return four values X_Train, X_Test, Y_Train
and Y_Test.
Finally, we implement the algorithm and calculate the
accuracy score using the code below:
NB = GaussianNB()
NB.fit(X_Train, Y_Train)
Y_Predict_Test_Sample = NB.predict(X_Test)
accuracy = accuracy_score(Y_Test, Y_Predict_Test_Sample)
print('Accuracy:', accuracy)
In the context of a prefect task, this would be as follows:
@task
def Execute_Naive_Bayes(X_Train, X_Test, Y_Train, Y_Test):
NB = GaussianNB()
NB.fit(X_Train, Y_Train)
Y_Predict_Test_Sample = NB.predict(X_Test)
accuracy = accuracy_score(Y_Test, Y_Predict_Test_Samp
le)
print('Accuracy:', accuracy)
Now that we have defined all the tasks, we would want to
execute them sequentially in the prefect flow as shown in
the code below:
with Flow('Execute-Naive-Bayes') as flow:
X,Y = Load_Iris()
X_Train, X_Test, Y_Train, Y_Test = Split_Data(X,Y)
Execute_Naive_Bayes(X_Train, X_Test, Y_Train, Y_Test)
flow.run()
flow.visualize(filename="Execute-Naive-Bayes")
Paste the prefect implementations of the code, one below
the other in the same sequence within the Python editor
and run the code to observe the output shown below:
[2023-07-22 23:42:57+0530] INFO -
prefect.FlowRunner | Beginning Flow run for
'Execute-Naive-Bayes'
[2023-07-22 23:42:57+0530] INFO -
prefect.TaskRunner | Task 'Load_Iris': Starting task
run...
[2023-07-22 23:42:57+0530] INFO -
prefect.TaskRunner | Task 'Load_Iris': Finished task
run for task with final state: 'Success'
[2023-07-22 23:42:57+0530] INFO -
prefect.TaskRunner | Task 'Load_Iris[1]': Starting task
run...
[2023-07-22 23:42:57+0530] INFO -
prefect.TaskRunner | Task 'Load_Iris[1]': Finished
task run for task with final state: 'Success'
[2023-07-22 23:42:57+0530] INFO -
prefect.TaskRunner | Task 'Load_Iris[0]': Starting task
run...
[2023-07-22 23:42:57+0530] INFO -
prefect.TaskRunner | Task 'Load_Iris[0]': Finished
task run for task with final state: 'Success'
[2023-07-22 23:42:57+0530] INFO -
prefect.TaskRunner | Task 'Split_Data': Starting task
run...
[2023-07-22 23:42:57+0530] INFO -
prefect.TaskRunner | Task 'Split_Data': Finished task
run for task with final state: 'Success'
[2023-07-22 23:42:57+0530] INFO -
prefect.TaskRunner | Task 'Split_Data[3]': Starting
task run...
[2023-07-22 23:42:57+0530] INFO -
prefect.TaskRunner | Task 'Split_Data[3]': Finished
task run for task with final state: 'Success'
[2023-07-22 23:42:57+0530] INFO -
prefect.TaskRunner | Task 'Split_Data[0]': Starting
task run...
[2023-07-22 23:42:57+0530] INFO -
prefect.TaskRunner | Task 'Split_Data[0]': Finished
task run for task with final state: 'Success'
[2023-07-22 23:42:57+0530] INFO -
prefect.TaskRunner | Task 'Split_Data[1]': Starting
task run...
[2023-07-22 23:42:57+0530] INFO -
prefect.TaskRunner | Task 'Split_Data[1]': Finished
task run for task with final state: 'Success'
[2023-07-22 23:42:57+0530] INFO -
prefect.TaskRunner | Task 'Split_Data[2]': Starting
task run...
[2023-07-22 23:42:57+0530] INFO -
prefect.TaskRunner | Task 'Split_Data[2]': Finished
task run for task with final state: 'Success'
[2023-07-22 23:42:57+0530] INFO -
prefect.TaskRunner | Task 'Execute_Naive_Bayes':
Starting task run...
Accuracy: 0.9555555555555556
[2023-07-22 23:42:57+0530] INFO -
prefect.TaskRunner | Task 'Execute_Naive_Bayes':
Finished task run for task with final state: 'Success'
[2023-07-22 23:42:57+0530] INFO -
prefect.FlowRunner | Flow run SUCCESS: all
reference tasks succeeded
We observe in the above console output that all tasks have
been logged with a timestamp and that the final accuracy
value of ~ 96% has also been populated in the output. Now,
let us end this exercise with the most exciting part of prefect
which is the flowchart! In the home directory of the script,
we would find a file named Execute-Naive-Bayes.pdf. On
opening the file, we would find the beautiful flowchart
diagram shown in Figure 11.3 below which shows all tasks
and their respective dependencies:

Figure 11.3: Graphical flow of Naïve Bayes algorithm in prefect


In this way, we have successfully implemented a Naïve
Bayes machine learning algorithm pipeline as a prefect
workflow where the dependency diagram clearly assists one
to handle important nodes within the pipeline so that the
failure of a particular node does not hinder the operation of
the entire workflow, thus enabling smooth workflow
management.
The enthusiastic programmer would make sure to
implement the same workflow using luigi as well! We would
leave this exercise to the avid reader.

Conclusion
With this chapter, we are done deal that managing
workflows should no more be a hassle to the Python
programmer. The modules prefect and luigi that we
discussed in this chapter should be a starting point as well
as a launching pad for implementing simple and complex
workflows. At the end, more complexity means more
challenges which also means additional elements of
creativity! Workflow management has never been more
exciting, thanks to Python!
In the next chapter, we shall slowly begin our transition to
real world use cases. We have already been implementing
glimpses of those in all our chapters in the form of practical
Python use cases. However, the topic of the next chapter is
Hyperautomation which is the hallmark of the contemporary
technological landscape. The name itself slightly provides a
hint that the topic shall not leave any concept unattended,
whether it is process automation or workflow orchestration
or artificial intelligence or machine learning!
Hence, without further delay, let us turn the page!
Join our book’s Discord space
Join the book’s Discord Workspace for Latest updates,
Offers, Tech happenings around the world, New Release and
Sessions with the Authors:
https://discord.bpbonline.com
Chapter 12
Hyperautomation

Introduction
Until now, we have explored the intricacies of automation
when it comes to desktop, web, file, folder, data science
algorithms or workflow automation. Our journey in
automation until now has culminated into workflow
automation in Chapter 11, Automating Business Process
Workflows, where we have not just automated individual
processes but incorporated sequential procedures in a
workflow using Python libraries.
All these frontiers that we explored have been wonderful in
their own ways. However, when used in tandem with
cognitive intelligence provided by machine learning,
robotic process automation (RPA) and other AI tools,
they have the potential to make a remarkable impact on the
entire system. That is exactly what hyperautomation is!
This chapter shall introduce the reader to the concept of
hyperautomation, expand on the idea further and thereafter
cover salient examples in Python that illustrate real world
use cases.
Even though the examples might come under the previously
studied topics of natural language processing (NLP) and
robotic process automation, it would be an altogether novel
experience revisiting the topics from a different perspective
and under a different section.

Structure
The chapter covers the following topics:
• Defining hyperautomation: What it is and why it matte
rs
∘ The hyperautomation cycle: Key steps and
processes
∘ Exploring typical use cases for hyperautomation
∘ Enhancing document understanding with optical
character recognition
• Implementing conversational agents: The role of chat
bots
• Advancing efficiency with robotic process automation
• Navigating the challenges of hyperautomation
• Practical use case in Python

Objectives
By the end of this chapter, the reader would appreciate the
term hyperautomation as something which contains the
same technologies which we covered in earlier chapters but
have the potential to create something new when used in
tandem.
This chapter shall revisit some of the previously covered
topics like natural language processing, optical character
recognition and robotic process automation from a different
perspective and under a broader horizon of
hyperautomation and shall also attempt to tackle certain
previously uncovered sections related to these technologies.
The concept, design, and application of a conversational
agent or chatbot would be explored using Python, thus
adding a new dimension to the topic of natural language
processing. Applications like Email filtering would be
discussed, thus doing further justice to the usage of
machine learning for hyperautomation.
As with any other chapter, this one would also end with a
practical use case in Python of an interesting example in
hyperautomation.

Defining hyperautomation: What it is


and why it matters
If we look at the history of the industry, the first industrial
revolution called Industry 1.0 demonstrated the power of
locomotives like the steam engine in achieving automation.
The second one called Industry 2.0 took a step further,
with regards to sophistication by adding electricity and
internal combustion engine. The third called Industry 3.0
pioneered the use of computers, electronics, and
information technology (IT) to greatly accelerate the rate
of automation. The fourth one called Industry 4.0 has the
internet as the hallmark where the concept of internet of
things (IoT) makes all computing devices interrelated.
Higher computational power enabled by graphical
processing unit (GPU) has made the implementation of
machine learning and deep learning algorithms a reality.
Tracing the evolution of industry from the steam-powered
automation of Industry 1.0 to the interconnected world of
Industry 4.0, we see a pattern of constant innovation.
Considering the fact that the contemporary industry has
witnessed tremendous growth due to innovation, it becomes
imperative for an organization to not just restrict itself to
procedural automation but also incorporate cognitive
intelligence in its business process workflows in order to
remain relevant in the ever-evolving business landscape and
emerge as an able competitor who is a cut above the rest.
To make it simple, the horizon of traditional automation
needs to be expanded to include salient aspects of cognitive
thinking that adds an element of intelligence to the business
process and to the system. Today’s business landscape
demands more than just procedural automation – it requires
the integration of cognitive intelligence. In the emerging
context, this idea is termed as hyperautomation which
basically means automating everything that is possible.
Hyperautomation represents this next leap, where AI, ML,
and RPA are orchestrated to not just automate tasks but to
imbue systems with sophisticated, cognitive capabilities.
This may sound very idealistic and may be far from reality
now, but nevertheless has the potential to upgrade and
ascend to levels close to what we define as ideal. Simply
put, hyperautomation is the process of increasing the
capability of existing automation by introducing the power
of artificial intelligence (AI), machine learning (ML)
and robotic process automation into it. In other words, we
define hyperautomation as the holistic automation of
complex business processes, aiming to automate as much
as possible by harnessing AI and ML. This concept is not just
an ideal but a tangible goal that can significantly elevate
operational efficiency and competitiveness.
We have already studied the concept of orchestration in the
previous chapter. Another way of looking at
hyperautomation is the orchestration of multiple
technologies by combining process automation with robotic
process automation and cognitive technologies like artificial
intelligence and machine learning. This addition of cognitive
intelligence adds an element of sophistication and provides
a competitive advantage to the business.

The hyperautomation cycle: Key


steps and processes
Every organization would have to go through some key
steps in the planning of their hyperautomation cycle. These
steps might vary from one organization to another, but
there are some general steps that remain common to any
organization. They have been diagrammatically shown in
Figure 12.1 below:

Figure 12.1: Steps in the hyperautomation cycle

We shall now elaborate each step:


1. Identifying the goal: This is the most crucial and im
portant defining step of the hyperautomation process
and hence it comes first in the sequence. Any busines
s needs to have a clear goal in picture prior to contem
plating any implementation of hyperautomation. If a b
usiness is going for hyperautomation just for the sake
of it or to keep up with the trend or simply because ot
her organizations are using it, then it clearly becomes
a recipe for disaster. A business should have a clear vi
sion of the goals and the challenges that it is trying to
conquer so that the adoption of hyperautomation bec
omes well defined, selective, and customized for the b
usiness.
2. Selection of key technologies: Once a business ha
s identified the goal, the next step would be to list out
those areas where adoption of hyperautomation woul
d ensure significant value addition. For example, an e-
commerce business would want to consider adoption
of recommendation algorithms in order to improve cu
stomer engagement and satisfaction. This might prom
pt them to incorporate sophisticated machine learning
algorithms to serve this purpose. Another business mi
ght simply be interested in eliminating the involveme
nt of its talented workforce in repetitive manual work i
n order to utilize them for more creative and construct
ive tasks. Hence it might go for robotic process autom
ation.
3. Assessment of capabilities: Once the goal has bee
n identified along with the necessary technologies tha
t would meet the goal, the organization needs to asse
ss the time, money, and resources that it would need i
n order to implement the technologies.
4. Continuous improvement: Any process must be co
ntinuously monitored for feedback that enables the bu
siness to rectify errors and incorporate learning in the
next implementation.
Exploring typical use cases for
hyperautomation
Having already received a good understanding about
hyperautomation in the previous introductory sections, let
us jump straightaway to some typical use cases in
hyperautomation where Python comes to the rescue! In this
chapter, we would be covering the commonly occurring
applications below that come under the purview of
hyperautomation and study each of them in the sections to
follow:
• Document understanding using optical characte
r recognition: We have studied the concept of optic
al character recognition (OCR) in Chapter 7, Worki
ng with PDFs, and Images, where we utilized the tesse
ract library of Python to read text from images. We sh
all implement the same concept in this chapter, this ti
me for reading information from image documents. Th
is requirement frequently arises in all organizations w
here important documents may have information in a
n unstructured format where text recognition using O
CR greatly accelerates the automation cycle by consid
erably reducing the element of human involvement in
extracting information from these documents.
• Conversational agents – Chatbots: In our discussio
n on natural language processing in one of our previo
us chapters, we have discussed how the Python librari
es NLTK and Spacy provide readily available methods
to enable one to implement salient features of NLP pip
eline. One of the many applications of NLP is a Chatbo
t or a conversational agent which has become an indis
pensable part of any business today. Chatbots enable
human to customer interaction on most of the generic
topics, thus reducing unnecessary human involvemen
t which could be utilized elsewhere. In this chapter, w
e shall build a quick chatbot using NLTK.
• Robotic process automation: In Chapter 2, RPA Fou
ndations, we were introduced to the concept of roboti
c process automation where we went through various
RPA software available in the market and studied a pr
actical use case where Python rpa package enables t
he implementation of desktop and web RPA. In this ch
apter, we shall restrict ourselves to the Python rpa pa
ckage to further explore its capabilities.

Enhancing document understanding


with optical character recognition
An organization frequently comes across cases where
extracting key information from documents and processing
it further becomes a pivotal point in the business process.
Most of the times, this information is not readily available in
a structured format, thus requiring the involvement of
experts from the profession to manually extract information.
This consumes lots of time and makes the process prone to
human errors as well. With the availability of technology like
optical character recognition, the manual involvement in
this process could be greatly reduced by enabling the
reading and interpretation of information from documents,
thus proving to be a great catalyst in the arena of
hyperautomation using AI.
In this section, we shall be revisiting the pytesseract
Python library to read text from images. We have already
covered the downloading and installation procedure for
pytesseract in Chapter 7, Working with PDFs and Images,
so we would straightaway move to the Python
implementation here.
The example file that we shall be using is Stock Prices.pdf
which we also used in Chapter 7, while working with the
PyPDF library. However, this time the file would be an
image PDF file, which means that one cannot use it with
PyPDF as the text within the file is unsearchable. Hence,
we would be making use of OCR in order to extract text from
this image file.
It is important to note here that Tesseract OCR currently
does not support the reading of text directly from a PDF file.
Hence, we would need to convert the PDF file into an image
format like .JPG which Tesseract OCR could consume as an
input.
Let us open the Stock Prices.pdf file and observe the
contents of the file. On opening the file, we see that shown
in Figure 12.2, it contains the same text and table that we
observed in the text PDF Stock Prices.pdf which we used
in Chapter 7 - Working with PDFs and Images, while
studying the PyPDF2 library. The only difference is that the
text cannot be searched or selected since this is an image
PDF. Therefore, we would be using OCR to extract text from
this image PDF.
Figure 12.2 below shows the contents from the image PDF
file Stock Prices.pdf:
Figure 12.2: Contents from image PDF file Stock Prices.pdf’

As discussed, let us first transform the PDF file into an


image file in order to make it into a format that could be
consumed by Tesseract OCR. We use the Python library
pdf2image to transform the PDF file into an image file.
Before using the library, we install the library using the
command below:
pip install pdf2image
Next, we import the required libraries as follows:
from pdf2image import convert_from_path
import pytesseract
Here, we shall be using the convert_from_path method of
the pdf2image library. Save the file Stock Prices.pdf in
the same directory where the Python file has been saved. As
shown in the code below, we simply provide the name of the
file as the path and pass it as an argument to the function
convert_from_path which would convert all the pages of
the PDF document into images that could be consumed by
Tesseract.
str_pdf_path = "Stock Prices.pdf"
images = convert_from_path(pdf_path, 500)
Now, we specify the path of the Tesseract executable since
we would be using it:
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files
(x86)\Tesseract-OCR\tesseract.exe'
Next, we shall loop through every image and pass it into
pytesseract to read text from the image and then we shall
concatenate the text from every page to produce a single
text in the output. Here, we have just one page in the PDF
document so we would be having just one image to iterate
through.
Below is the code that performs the task:
text_output = ''
for image in images:
text = pytesseract.image_to_string(image)
text_output += text + '\n'

print(text_output)
The entire code has been shown below for reference. Copy
and run the code in a Python editor and observe the output
in the console:
from pdf2image import convert_from_path
import pytesseract

str_pdf_path = "Stock Prices.pdf"

images = convert_from_path(str_pdf_path)

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files


(x86)\Tesseract-OCR\tesseract.exe'
text_output = ''
for image in images:
text = pytesseract.image_to_string(image)
text_output += text + '\n'

print(text_output)
Output:
Below is a table showing the price of five stocks
A,B,C,D, and E from January to December. The first
column
contains the stock names and the remaining columns
contain the respective values of the stock from
January to December.

Stocks | Jan Feb Mar Apr May Jun Jul Aug Sep Oct
Nov Dec
A 357 457 187 831 779 338 129 508 407 748 511 609
C 60 64 54 93 87 74 96 92 83 85 70 88

D 1667 1962 1845 1535 1753 1767 1551 1893 1715


1707 1627 1532
E 2181 | 2333 | 2265 | 2274 | 2739 | 2601 | 2569 |
2520 | 2744 | 2234 | 2836 | 2230
The document contains a paragraph at the top and a table
at the bottom. We clearly see that every word of the
paragraph has been perfectly extracted. However, we see
that in the table, the line corresponding to stock B has been
completely omitted even though rest of the information
from the table has been correctly retained. There are certain
limitations with Tesseract when it comes to detecting and
extracting tables.
We would try to tackle this limitation by using another
Python library called img2table and then use OCR. Prior to
using it, we would need to install it using the command
below:
pip install img2table
Next, we import the required libraries using the code below:
from img2table.document import PDF
from img2table.ocr import TesseractOCR
From the code above, we understand that we would be
using the PDF method provided by the document module
of the img2table library and the TesseractOCR method
provided by the ocr module of the img2table library.
Next, we create the objects obj_pdf and obj_ocr using the
respective methods as shown in the code below. Prior to
running the code, make sure that the exercise file Stock
Prices.pdf is in the same directory as the Python code, else
the explicit path of the file needs to be mentioned in the src
argument of the PDF method in the code below:
obj_pdf = PDF(src="Stock Prices.pdf")

obj_ocr = TesseractOCR(lang="eng")
Next, we use the extract_tables method of the obj_pdf
object which extracts all tables within the document using
OCR. Here, we pass the obj_ocr object as an argument to
the ocr parameter of the extract_tables method:
obj_pdf_tables = obj_pdf.extract_tables(ocr=obj_ocr)
Thereafter, we use the to_xlsx method of the obj_pdf
object to export the table to an Excel worksheet as shown in
the code below:
obj_pdf.to_xlsx("Extracted_Tables.xlsx", ocr=obj_ocr)
Here, we observe that the to_xlsx method as well requires
an input to the ocr parameter that we provide as the
obj_ocr object. The name of the output file that we require
is "Extracted_Tables.xlsx" and has been passed as the
first argument to the to_xlsx method.
The entire piece of code has been included on the next page
for reference. Copy and run the code in the Python editor
and observe that an Excel file named
"Extracted_Tables.xlsx" has been created in the same
directory where the code has been saved.
from img2table.document import PDF
from img2table.ocr import TesseractOCR

obj_pdf = PDF(src="Stock Prices.pdf")


obj_ocr = TesseractOCR(lang="eng")
obj_pdf_tables = obj_pdf.extract_tables(ocr=obj_ocr)
obj_pdf.to_xlsx("Extracted_Tables.xlsx", ocr=obj_ocr)
On opening the output Excel file "Extracted_Tables.xlsx",
we observe a table as shown in Figure 12.3 below:

Figure 12.3: Table from output Excel File “Extracted_Tables.xlsx”

We observe that this is a much better-looking table than the


one which we obtained using the image_to_string method
of the pytesseract library. However, we still find that the
first column Stocks that mentions the names of stocks A, B,
C, D and E has all values blank. Additionally, the value
corresponding to Nov for stock A is missing.
It is important to note that OCR is a complex exercise, and
no universal approach exists that would enable one to arrive
at the ideal solution in a straightforward manner, since
every approach has some limitations to it. One would
always need to refine the extracted output by using a
combination of approaches. As an additional note, OpenCV
is a library that is used for preprocessing the image prior to
applying OCR so that it becomes easier for the OCR tool to
detect text in the image. It is best left to the interested
reader to further explore this topic!

Implementing conversational agents:


The role of chatbots
A chatbot is a great application of natural language
processing technology which does the job of reducing the
involvement of people in regular and repetitive interactions.
Typically, these interactions could be between the
organization and its clients or between the IT team of an
organization and its employees. A chatbot does a great job
of generating automated responses to regular questions and
advanced chatbots even go to the extent of learning from
previous interactions and refining future responses. Due to
the presence of a chatbot, an organization greatly saves on
time, effort and resources required in hiring and training a
person for the job of doing regular interactions involving
repetitive responses. Also, the same person could be utilized
elsewhere for tasks requiring more creativity. In this way, a
chatbot becomes a perfect example of hyperautomation
since it uses automation coupled with a deep learning
technology like NLP to accelerate the pace of an
organization and results in the saving of time, effort and
resources that would be required in involving a person for
the same job of having regular interactions.
In this section, we shall create our own chatbot using Python
library NLTK. We have already seen the different steps in the
natural language processing cycle in Chapter 10, Intelligent
Automation Part 2: Using Deep Learning. A chatbot being an
application of natural language processing shall utilize few
or more of these steps.
Typically, a chatbot would be of either of the two types
listed below or the combination of both:
• Rule based chatbot: A rule based chatbot or a script
ed chatbot is basically a chatbot that generates respo
nses based on well-defined rules that have been emb
edded or programmed into it. Prior to the advent of AI,
early versions of chatbots were predominantly rule ba
sed and were customized for a particular requirement.
These might not be able to understand questions outs
ide the scope of the logic that has been programmed i
nto it.
• Self-learning chatbot: These chatbots also called in
telligent chatbots utilize AI technology like NLP to s
elf-learn and refine their responses.
For regular requirements in organizations, a rule base
d chatbot does most of the tasks. In this exercise, we
would create a simple rule based chatbot using Pytho
n library NLTK that addresses new software installatio
n requests and resolves issues related to existing soft
ware. This exercise would provide us with a basic envi
ronment based on which we could further build compl
ex chatbots.
Before using NLTK, the first step we would need to do i
s install the library using the command below:
pip install nltk
Next, we import the libraries that we require as shown
in the code below:
import nltk
from nltk.chat.util import Chat, reflections
Let us have a quick understanding of Chat and reflec
tions that we have imported.
• Chat: This is the basic class that contains the logic th
at has been embedded into it that enables the chatbo
t to process text information and produce responses.
We shall soon see how we use it in code.
• Reflections: This is basically a dictionary where if the
value in a regular expression matches the key, it shall
output the value as a response. There are many basic
phrases in a conversation where this dictionary proves
useful. One could also produce a customized dictionar
y or add contents to this one. We could quickly observ
e the contents of this dictionary using the command b
elow.
Print(reflections)
Output:
{'i am': 'you are', 'i was': 'you were', 'i': 'you',
"i'm": 'you are', "i'd": 'you would', "i've": 'you hav
e', "i'll": 'you will', 'my': 'your', 'you are': 'I am',
'you were': 'I was', "you've": 'I have', "you'll": 'I w
ill', 'your': 'my', 'yours': 'mine', 'you': 'me', 'me':
'you'}
Next, we create a list called pairs which stores the rul
es or logic that we would want to incorporate into the
chatbot. We shall create a basic list here, but this list
could be expanded further based on new rules and bu
siness logic:
pairs = [
[
r"My name is (.*)",

["Hello %1",]
],
[
r"(I'm|good|good,|I am)",
["How can I help you?\nPlease select one from
the following:" \
"\n1)New Software Installation\n2)Issue with
Existing Software",]
],
[

r"(.*) Installation",
["Could you please mention the name of the
software? Please type 'Software:'" \
"before mentioning the software name.
Thanks!",]

],
[
r"(.*) Existing",
["Please select the issue from the following:" \

"\n1)Version needs Upgradation\n2)Uninstall


Software",]
],
[
r"(Version|Uninstall)",

["Thanks for your patience. Someone from the IT


team " \
"would reach out to you to resolve your issue.Is
there anything else" \
"I could help you with?",]

],
[
r"Software:(.*)",
["Thanks for your patience. I am assignining this
ticket to someone from " \
"the IT team who shall help you install %1.Is
there anything else I could help you with?",]
],
[

r"No",
["Thank you. Have a great day!",]
],
]
The reader is advised to spend some time observing t
he statements and responses in this list so that could
provide a better idea of how the chatbot produces ans
wers to the questions. The reader is assumed to be fa
miliar with regex in Python.
Finally, we use the Chat class to instantiate the chatb
ot by passing pairs and reflections as arguments. At
the end, we use the converse method of the Chat cla
ss to start the conversation.
We also print an introductory statement prior to instan
tiating the Chat class in order to begin the conversati
on on behalf of the chatbot.
print("Hi, I'm your new chatbot friend and I'm here to
help you! May I know your name?")

chat = Chat(pairs, reflections)


chat.converse(quit="No")
Note the argument quit="No" that has been passed i
n the converse method. This argument enables the c
hatbot to understand that the conversation needs to b
e closed when the user types the word No typically at
the end of the conversation.
The entire code has been provided below for referenc
e. Copy and run the code in a Python editor:
import nltk
from nltk.chat.util import Chat, reflections

pairs = [

[
r"My name is (.*)",
["Hello %1",]
],

[
r"(I'm|good|good,|I am)",
["How can I help you?\nPlease select one from
the following:" \
"\n1)New Software Installation\n2)Issue with
Existing Software",]
],

[
r"(.*) Installation",
["Could you please mention the name of the
software? Please type 'Software:'" \

"before mentioning the software name.


Thanks!",]
],
[
r"(.*) Existing",

["Please select the issue from the following:" \


"\n1)Version needs Upgradation\n2)Uninstall
Software",]
],
[

r"(Version|Uninstall)",
["Thanks for your patience. Someone from the IT
team " \
"would reach out to you to resolve your issue.Is
there anything else" \
"I could help you with?",]
],
[
r"Software:(.*)",

["Thanks for your patience. I am assignining this


ticket to someone from " \
"the IT team who shall help you install %1.Is
there anything else I could help you with?",]
],

[
r"No",
["Thank you. Have a great day!",]
],
]

print("Hi, I'm your new chatbot friend and I'm here to


help you! May I know your name?")
chat = Chat(pairs, reflections)
chat.converse(quit="No")
Output:
Hi, I'm your new chatbot friend and I'm here to
help you! May I know your name?
>My name is abc
Hello abc, How are you?
>I'm good, thanks!
How can I help you?
Please select one from the following:
1)New Software Installation
2)Issue with Existing Software
>New Software Installation
Could you please mention the name of the softw
are? Please type 'Software:'before mentioning th
e software name. Thanks!
>Software:xyz community edition
Thanks for your patience. I am assignining this t
icket to someone fromthe IT team who shall hel
p you install xyz community edition.Is there anyt
hing else I could help you with?
>No
Thank you. Have a great day!
Please note that on running the code, only the conten
ts from the ‘Print’ statement would be initially printed
prompting the user to mention the name and the conv
ersation would begin thereafter based on further resp
onses provided by the user. The entire sample conver
sation thus generated has been referenced above as t
he output. The reader is free to try different responses
to obtain different output statements from the chatbot
and the conversation would look different from this on
e.
In this way, we have acquainted ourselves with the basic
infrastructure to create a chatbot which NLTK beautifully
provides. Another interesting Python library that provides
the ability to make chatbots is Chatterbot and the
interested reader would explore it!
Advancing efficiency with robotic
process automation
Robotic process automation enables an organization to
eliminate the involvement of people in processes that are
repetitive. In the context of software, this could be desktop
or web automation. We have previously covered the basic
concepts related to robotic process automation in Chapter 2,
RPA Foundations, and studied a practical use case in Python
involving web automation using the rpa package. It is
important to note that other Python libraries like shutil and
PyAutoGUI that we studied in Chapter 8, Mechanizing
Applications, Folders and Actions, are also valuable
contributors in the context of robotic process automation as
they enable the automation of repetitive desktop processes.
In this chapter, we shall directly make use of the RPA
functionality when we study the practical use case in
Python. With this, we have had good overview of the three
use cases of hyperautomation that we intended to discuss
which are OCR, NLP chatbots and RPA. Now, let us move to
the final section before we take up the practical use case in
Python where we discuss the challenges with
hyperautomation.

Navigating the challenges of


hyperautomation
Although useful, hyperautomation comes with its own set of
challenges which an organization needs to tackle. Some of
them are discussed below:
• Resistance to change: Complete or partial replacem
ent of people with software or machines might render
existing workforce redundant thus arousing resistance
to the adoption of hyperautomation technologies like
AI or RPA. Hence, a balance needs to be struck betwe
en technology and people.
• Misplaced priorities: To keep up with the existing tr
end, an organization might go for the latest cutting-ed
ge technology without keeping in mind its specific bus
iness plan or organizational goals. To go for hyperauto
mation just for sake of it is the surest way to ask for tr
ouble.
• Return on investment: An organization needs to en
sure that benefits accrued by adopting hyperautomati
on exceed costs to justify the ROI.

Practical use case in Python


It is time to apply what we have learnt! In this use case, we
shall navigate to a web page and take a snapshot of an
image which has text written into it. We shall save this
snapshot in our directory and thereafter apply OCR to read
the text from the image. Hence, this use case would require
knowledge of both RPA and OCR making it a perfect
example of hyperautomation. Does this not sound
interesting?
We would navigate to the URL below that contains an image
with a message written. The image is shown in Figure 12.4:
https://c4.wallpaperflare.com/wallpaper/734/647/661/
quote-motivational-wallpaper-preview.jpg
Figure 12.4: Image from webpage

We would now write a Python script using the rpa package


that automatically navigates to the web page URL to open
the image, takes a snapshot of the image, saves the
snapshot to the same directory where the Python code file is
saved and then closes the web page window. As we have
studied all these steps in Chapter 2, RPA Foundations, on
RPA, we shall directly write the code as follows:
import rpa as r
r.init()
r.url('https://c4.wallpaperflare.com/wallpaper/734/647/66
1/quote-motivational-wallpaper-preview.jpg')
r.snap('page', 'Captured_Image.png')
r.wait(5)
r.close()
Now, we have a file named Captured_Image.png saved in
the same directory where the Python code has been saved.
This is a snapshot of the image that the get when we visit
the URL.
Our next step would be to apply Python library tesseract to
read the contents written in the snapshot image. We are
already familiar with the usage of the tesseract library, so
we directly write the script as shown below:
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program
Files\Tesseract-OCR\tesseract.exe"

image = 'Captured_Image.png'

text = pytesseract.image_to_string(image, lang='eng',


config='--psm 11')

print(text)
Note that we have used config='--psm 11' in the last
argument of the image_to_string method of the
pytesseract library. Please refer to the tesseract manual
link provided in Chapter 7, Working with PDFs and Images
that discusses in detail the config types.
The entire code has been provided below for reference.
Copy and run the code in a Python editor:
import rpa as r
r.init()
r.url('https://c4.wallpaperflare.com/wallpaper/734/647/66
1/quote-motivational-wallpaper-preview.jpg')
r.snap('page', 'Captured_Image.png')
r.wait(5)
r.close()

import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program
Files\Tesseract-OCR\tesseract.exe"
image = 'Captured_Image.png'
text = pytesseract.image_to_string(image, lang='eng',
config='--psm 11')

print(text)
Output:
DO YOUR BEST
AND
FORGET THE REST
®
We find that pytesseract has given us the output that we
require! With this, we end the use case and the chapter!

Conclusion
As interesting as this chapter is, it did not introduce any
fundamentally new concept. In fact, the central message of
this chapter has been that a completely new territory called
hyperautomation could be brought into existence simply
by applying the same fundamental technologies at salient
points in the business workflow, thus refining the workflow
with cognitive intelligence and automation. This chapter
also discussed the challenges that an organization would
need to address while incorporating hyperautomation.
In the next chapter, we shall revisit the RPA tool UiPath once
again, this time with Python as the center of focus! The
combination of UiPath with Python provides wonderful
flexibility to the user which would be the primary area of
discussion in the next chapter. So, without wasting anymore
time, let us turn the page and move to the next chapter!
Chapter 13
Python and UiPath

Introduction
The fact that we have previously covered the foundations of
robotic process automation (RPA) in Chapter 2, RPA
Foundations, should give us the prerogative to explore the
role of Python in enhancing the efficiency and flexibility of
RPA tools like UiPath. This chapter shall deal with this topic
in depth where the inclusion of Python scripting in RPA
enhancement would be demonstrated through a detailed
step by step guided walkthrough.
Even though RPA tools like UiPath provide specialized
versions that do not require programming for those
professionals who are not keen to program themselves, this
chapter would focus on the customizable version of RPA
software that enables one to include a programmable script
to enhance the efficiency and operations of the workflow.
In this chapter, we would discuss the different activities
within UiPath that constitute the sequence that enables
Python integration with UiPath. Each of these activities
would have their own unique attributes which we would
glance through in detail.
A foundation of Python integration with UiPath further fuels
the efficient implementation of hyperautomation, a concept
that we studied in Chapter 12, Hyperautomation earlier.

Structure
The chapter covers the following topics:
• Setting up the Python environment in UiPath
• Exploring Python activities in UiPath
• Creating the Python script
• Integrating Python with UiPath

Objectives
This chapter assumes that the reader is already familiar and
comfortable navigating basic functionalities in UiPath.
Although Chapter 2, RPA Foundations has provided an
overview and summary of different RPA functionalities, it
would be helpful to get acquainted with primary concepts
related to UiPath prior to beginning this chapter, as the
focus of this chapter would be the integration of Python with
UiPath.
By the end of this chapter, the reader would be comfortable
invoking Python scripts within the UiPath workflow, thus
getting a glimpse of the power of Python in enhancing
UiPath programs.
This chapter would be heavy with images! This is not
surprising since it would require the inclusion of captured
screenshots from every instance of the process. The step-
by-step guided navigation with elaborate images would
make it easy for the reader to understand the salient steps
in the procedure.
Setting up the Python environment in
UiPath
In order to use Python scripts in the UiPath environment, we
would need to perform a few settings as described in the
below steps and figures:

Note that as a prerequisite, .NET 5.0 needs to be


installed before starting the exercise.

1. Open UiPath studio.


2. Create a new blank process of compatibility Windows
and name it Python_Test as shown in Figure 13.1:

Figure 13.1: New blank process in UiPath named Python_Test

3. Click the Manage Packages tab at the top left as sho


wn in Figure 13.2 below:
Figure 13.2: Manage Packages tab

4. In the Manage Packages window that opens, select


All Packages and type Python in the search bar as s
hown in Figure 13.3 below:

Figure 13.3: Manage Packages bar in UiPath

5. Select the package UiPath.Python.Activities as sho


wn in Figure 13.4 below and click the Install button o
n the right:
Figure 13.4: Installation of UiPath.Python.Activities

6. After the installation, we would be able to see the yell


ow circle next to the package name thus indicating th
at the package has been installed as shown in Figure
13.5 below:

Figure 13.5: Confirmation of Python package installation

7. After the installation, we click in the search box in the


Activities bar and click Python, which populates the
list of all Python activities within UiPath as shown in Fi
gure 13.6 below:
Figure 13.6: Python Activities in UiPath

Exploring Python activities in UiPath


In this section, we shall quickly describe the various
activities in Python that are available in UiPath after
installing the Python package as described in the previous
section. After clicking Python in the search box in the
Activities bar, we would find the list of all the Python
activities in UiPath which are shown in the above Figure
13.6:
Below, we would discuss each of them and their respective
functions. In the next section, we would have a practical
understanding of them when we perform an actual exercise.
• Python scope: This activity comes at the first level in
the hierarchy within the sequence. Here, we specify th
e basic parameters like the Python installation directo
ry and the Python version. All other Python activities a
re included within this activity.
• Load Python script: This activity enables us to load
an external Python script within UiPath. The purpose o
f this activity is to utilize Python code to perform a tas
k within UiPath, which is essentially the hallmark of thi
s chapter. Note that this activity is different from Run
Python Script which we would be discussing at the e
nd.
• Invoke Python method: This activity enables one to
reference a method or function from the Python script
that we had loaded using the Load Python Script ac
tivity. The Invoke Python Method activity enables o
ne to pass actual values from UiPath into the Python
method as arguments and utilize the method to produ
ce the intended output.
• Get Python object: This activity enables one to retri
eve a Python object from a specified variable.
• Run Python script: Using this activity, we directly ru
n an external Python script from the UiPath environme
nt.

Creating the Python script


Given that we have set up the Python environment in UiPath
and the necessary background to get started with exploring
various UiPath activities, we are now able to do a practical
walkthrough. It would provide us with an opportunity to
explore various Python activities within UiPath.
Before we begin this exercise, the first thing we need to
have been a Python program to work with. Let us create a
Python file and write some functions within the file.
Create a Python file named UiPath_Test.py and save the
file to a local directory. To keep it simple, let us write a very
simple mathematical function that calculates the square of
a number and another one that performs the addition of two
numbers.
Copy and paste the below code into the Python file
UiPath_Test.py and save changes within the file:
#Calculates the square of an integer number

def Square(intNum):

intSquare = intNum*intNum
return intSquare

#Performs the addition of two integers

def Add(intNum1,intNum2):

intSum = intNum1 + intNum2


return intSum
We shall refer to each of these functions, later in our
exercise.

Integrating Python with UiPath


We shall now use the Python script that we created in the
previous section. Let us follow the steps below to
understand Python integration with UiPath:
1. Open the same blank process Python_Test that we cr
eated earlier. We have already installed the UiPath pa
ckage from the Manage Packages section. Thereafte
r, we go to the Activities window and type Python in
the search box. This gives us the list of all Python acti
vities in UiPath which are shown in the figure below. T
he first activity that needs to be introduced in the seq
uence is the Python Scope activity, as shown in Figu
re 13.7 below:
Figure 13.7: Python Scope activity

2. Next, we click on the Python Scope window within th


e sequence. We find that the properties window in the
UiPath process opens as shown in Figure 13.8:

Figure 13.8: Properties window for Python Scope activity

There are few properties that would be of interest to u


s here and we would need to specify the right values f
or these properties.
Below is the list of properties that we would want to s
pecify:
• Path: The first property that we would need to
specify is Path. This contains the path to the
directory where Python has been installed. Note
that for exercises specifically in this chapter, the
Python version used is the 32-bit Python 3.6
version. We click the icon with three horizontal
dots at the end of the Path parameter to open
the Expression Editor.
As shown in Figure 13.9, we enter the directory
where the Python interpreter is installed. Note
that we do not provide the full path until
python.exe. We simply enter the location of the
folder where the interpreter is placed. Below is
the path that we enter as a string:
C:\Users\dell\AppData\Local\Programs\Pytho
n\Python36-32

Figure 13.9: Specifying the Path attribute in Expression Editor

• Target: Here, we have the option to choose x64


for 64 bit or x86 for 32 bits. We specify x86 as we
are working with a 32-bit version as mentioned
before.
• Version: This attribute would provide us with a
dropdown to choose the Python version. We have
an option named Auto as well which enables
UiPath to automatically select the correct version
from the directory that we specified in the Path
attribute. We select Auto in our case.
3. Next, we drag the activity Load Python Script as sh
own in Figure 13.10. As soon as we drag the activity w
ithin the sequence, we see a red circle with an exclam
ation symbol. When we hover the mouse on the red ci
rcle, a message that says, None of the overload gr
oups have all their required/optional activity ar
guments configured, will pop up. This message app
ears as we are yet to mention any arguments in the pr
operties window for this activity. Please refer to the fol
lowing Figure 13.10:

Figure 13.10: Load Python Script activity

4. The properties window for this activity is shown in Fig


ure 13.11 where mention the full path to the Python s
cript UiPath_Test.py after which the red circle would
disappear:

Figure 13.11: Properties window for Load Python Script activity

5. We also have another attribute in the Input section of


the properties window called Code where we could dir
ectly paste the Python code instead of referring to the
file, if it is a small block of code. However, we would n
ot use the Code attribute in this exercise even though
our code block is small because the File attribute is a
more generic way of working as the Python script gets
bigger.
6. The final attribute that we would want to specify in th
e Load Python Script activity is the Result attribute
in the Output section. Here, we store the result obtai
ned in a UiPath variable. We click in the textbox corres
ponding to the Result attribute and press Ctrl + K, wh
ich would create a Set Var: prefix as shown in Figure
13.12 below:

Figure 13.12: Set Var for storing result in UiPath variable

7. Next, we type Result_Python next to Set Var: and p


ress Enter which shall store the result of the activity L
oad Python Script into a variable named Result_Py
thon of type Python Object. On opening the variabl
e explorer, we observe that a variable named Result_
Python has been created having type Python Objec
t as shown in Figure 13.13 below:

Figure 13.13: Result_Python variable created in Variable Explorer

8. We drag the next activity Invoke Python Method int


o our sequence as shown in Figure 13.14 below:

Figure 13.14: Invoke Python Method activity in UiPath

9. This activity is the crux of the sequence because it is


here that we specify the Python method that we woul
d want to use. Here again, the red circle appears beca
use the activity awaits us to specify the Name attribu
te, which is basically the name of the method which w
e would want to use. Here, we use the Square metho
d from our Python file UiPath_Test.py to calculate th
e square of a number. We specify “Square” in the tex
tbox as shown below in Figure 13.15:

Figure 13.15: Mentioning “Square” as the method in Invoke Python Method

10.Next, we would want to explore the other attributes th


at we would need to specify in the Invoke Python M
ethod activity. We mention them in the properties win
dow as shown in Figure 13.16 and discussed thereafte
r:

Figure 13.16: Properties window for Invoke Python Method

• InputParameters: This attribute is basically


meant to specify the input that goes into the
method Square. Here, we enter the value
mentioned below into the textbox:
New List(Of Object)({5})
This value would basically pass the value 5 as an
argument to the Square method from our Python
script to calculate the square of 5.
• Instance: This attribute requires us to specify the
instance of the Python script that the method
would belong to. We have stored the result of the
previous activity Load Python Script in a Python
variable named Result_Python which is exactly
what we enter here.
• Result: Similar to the previous activity, this
activity would as well require us to specify the
Result attribute. We press Ctrl +K which opens
Set Var: and we type intResult which creates a
variable named intResult of type Python
Object that stores the output obtained from the
Python method.
11.Next, we drag the Get Python Object activity into o
ur sequence as shown in Figure 13.17 below:

Figure 13.17: Get Python Object activity in UiPath

As usual, we see the red circle since we need to specif


y the necessary attributes. Before that, let us have so
me understanding of this activity.
12.We know that we have stored the output of the previo
us activity into a variable named intResult. However,
the data type of this variable is Python Object which
cannot be directly utilized for further calculations. Hen
ce, we need to convert it into an integer data type whi
ch could be used for further calculations. This activity
thus converts the result obtained from the Python met
hod from the Invoke Python Method activity into a
data type that could be transformed further. As shown
in the properties window in Figure 13.18, we specify t
he required values which have been discussed thereaf
ter:
Figure 13.18: Properties Window for Get Python Object activity

• Python object: This is the variable of data type


Python Object that we would want to convert to
another data type. In this case, we specify
intResult which is the variable obtained from the
Invoke Python Method activity having data
type Python Object which we want to convert
into another data type.
• Type argument: This attribute would provide a
dropdown to select the data type to which we
would want to convert the Python Object variable
to. In this case, we select int32 as we want to
convert the variable to an integer.
• Result: Here, we store the variable to a
converted data type thus obtained to a UiPath
variable. As usual, we press Ctrl + K that opens
Set Var: where we type intFinalResult that
would create a variable named intFinalResult in
the variable explorer of data type Int32.
13.Finally, we drag the Message Box activity into the se
quence as shown in Figure 13.19 below. This is not a P
ython activity, but a regular UiPath activity that pops
a Message Box:
Figure 13.19: Message box activity in UiPath

14.As usual, the red circle appears asking us to specify th


e input to the Text method of the Message Box activ
ity. As we would want to populate the result to the Me
ssage Box, we type intFinalResult.ToString in the t
xt as shown in Figure 13.20:

Figure 13.20: Message box contents

15.With this, we have constructed our UiPath sequence a


nd are all set to go! Save the sequence by pressing Ct
rl + S.
16.Now, we would run the UiPath sequence by clicking o
n the Debug File icon on the top left and selecting R
un File from the dropdown options as shown in Figure
13.21 below:

Figure 13.21: Run File


17.On running the file, we get the Output Message Bo
x as in Figure 13.22 which is exactly what we expecte
d after calculating the square of 5, which is 25:

Figure 13.22: Output Message Box

18.Now, if we recollect correctly, we defined two method


s while creating the file UiPath_Test.py, which were
Square which we used all this while and the other on
e being Add. Let us run the entire sequence again, thi
s time using the Add method.
19.The only change that we would need to make would b
e in the Invoke Python Method activity. Clicking on
the Invoke Python Method activity would open the
properties window where we enter the respective valu
es as shown in Figure 13.23 below:

Figure 13.23: Using Add method in the Invoke Python Method activity
20.As we are adding two numbers, we mention the value
in the attribute InputParameters as New List(Of O
bject)({100,150}), which would add the numbers 10
0 and 150.
21.Similarly, in the Name attribute, we mention the nam
e of the method as "Add".
22.Next, we save the sequence and run it in a similar wa
y as we ran it before. We again get the Output Mess
age Box as shown in Figure 13.24 below that populat
es the expected output of 250 which is the sum of the
two numbers 100 and 150:

Figure 13.24: Output Message Box

With this simple exercise, are all done exploring the basic
infrastructure required to integrate Python with UiPath.
We should now be in a strong position to build more
complex applications that selectively utilize existing Python
scripts in a UiPath sequence, thus making the best use of
the functionalities of both Python and UiPath.
It is left to the choice of the enthusiastic programmer to
explore and experiment the inclusion of Python script in
other salient UiPath activities.
Hence with this, we conclude this chapter!

Conclusion
We are now done and dusted with the integration of UiPath
and Python! This chapter has been unique in few aspects.
Firstly, a single section has spanned multiple pages due to
its inherent nature of being a procedural walkthrough.
Secondly, the first aspect being attributed to the fact that
every stage in the procedure required detailed capturing of
important screenshots, thus making the entire chapter
image heavy. As mentioned previously in the introduction,
the inclusion of Python in the arena of Uipath greatly
accelerates the hyperautomation journey by enabling a
smooth utilization of the power of Python in the foray of
robotic process automation.
In the next chapter, we shall learn about the accessories
that are required while working on automation projects. The
chapter will be about architecting automation projects,
which would introduce us to the concept of a virtual
environment. The pip command will be revisited in this
chapter along with the topic of Docker.
So, without further delay, let us turn the page to the next
chapter!

Join our book’s Discord space


Join the book’s Discord Workspace for Latest updates,
Offers, Tech happenings around the world, New Release and
Sessions with the Authors:
https://discord.bpbonline.com
Chapter 14
Architecting
Automation Projects

Introduction
While building an automation project, there are several
components that one needs to take care of. These
components, whether it be an accessory, determine the
efficient working of the automation project. This chapter
discusses the various components that are required for
building the architecture of an automation project in Python.
Different accessory tools like these add up to the total
effectiveness of the automation project, which makes it
imperative for us to discuss them in this chapter.
This chapter is basically a collection of discrete topics, which
might be directly unrelated to each other, but collectively
turn out to be valuable in the context of the overall
automation task or project.
The process of setting up a virtual environment has been
discussed in this chapter. The pip command, which is the
gateway to installing any package in Python has been
revisited in this chapter in greater detail. The various
functionalities provided by the pip command have been
discussed through syntax and examples. Further essentials
like docker have also been highlighted upon at the end of
the chapter.

Structure
The chapter covers the following topics:
• Introduction to virtual environment
∘ Setting up a virtual environment
∘ Virtual environment directories
∘ Additional considerations
• Python PIP revisited
• Performing basic operations
∘ Working with the requirements.txt file
∘ Using Docker for containerization

Objectives
This chapter is basically the outcome of the realization that
certain important accessories which could not be visited in
other chapters need to be jotted down in one place. This
chapter does a great job of providing a useful appendix for
these accessory utilities.
By the end of this chapter, the reader shall have a firm
grasp on the concepts of a virtual environment, pip
command and docker. Many features like these have now
become a part of regular usage, but most of us remain
bereft of their theoretical understanding. This chapter shall
imbibe that additional know-how of these supplementary
topics, thus enabling the reader to effectively manage and
architect future automation projects.
In this chapter, the reader shall thoroughly get acquainted
with the process of creating virtual environments in Python,
using pip to perform various essential tasks, and thereafter
using the requirements.txt file within the virtual
environment.

Introduction to virtual environment


A virtual environment is a tool that provides the ability to
keep separate dependencies for separate projects. This
becomes typically important in the case of multiple complex
projects of large scale requiring several dependencies.
One might not realize the importance of a virtual
environment while working on small chunks of code.
However, as the number of projects increases along with the
expanse of each project, a situation arises where we have
one project that uses a particular version of a specific library
and another project that uses another version of the same
library. The fact is that every project would refer to the same
directory for third party libraries and that is where things
get tricky. This is where virtual environments come into
picture as a useful tool for working on multiple projects in
Python that require separate dependencies. Figure 14.1
below illustrates the concept of a virtual environment in
Python:
Figure 14.1: Concept of a Virtual Environment in Python

We observe that the illustration shows two projects named


Project 1 and Project 2 which require Python 3.6 and
3.10.10 as their respective Python versions. Each of them
further requires Pandas == 1.3 and Pandas == 2.0.0 as
their respective Pandas versions. The combination of Python
3.6 with Pandas 1.3 installed is incorporated into Virtual
Environment 1. Similarly, the other combination of Python
3.10.10 with Pandas 2.0.0 installed is incorporated into
Virtual Environment 2. Each of them separately serves their
purpose for Project 1 and Project 2 respectively.
In this way, a virtual environment provides a customized
arena for every project based on its specific dependencies.
One could have as many virtual environments as may be
required.
It is also obvious that a virtual environment proves useful
when it comes to sharing project dependencies with other
members of the team. Consider a scenario where an
individual within a team has been working on two different
projects Project A and Project B that requires different third-
party Python libraries. Some of these might even overlap
between the two projects while others might be unique to
each project. In the absence of a virtual environment, all
these libraries shall be stored in the same location in the
global Python environment. The project then works well and
becomes a success! That is great, but here comes the
problem.
There are other team members now who would require
working on Project A and this individual has the task of
assisting these team members with the list of all
dependencies associated with only Project A. The difficulty
now is that all the dependencies are stored in the same
global environment and knowing which packages belong
only to Project A is not easy to segregate. In the presence of
a virtual environment, this issue would not exist because all
the dependencies related to a specific project reside within
the virtual environment itself, which becomes easy to
transfer to anyone.
To summarize, a virtual environment is a folder structure
which provides one with everything that is required to run
an isolated Python environment. Now that we have
understood the concept of a virtual environment and the
advantages of having one, we shall move to the next
section where we set up a virtual environment of our own.

Setting up a virtual environment


In this section, we shall go through the process of creating a
Python virtual environment in Windows.
1. The first step to create a virtual environment in Windo
ws is installation of virtualenv using the pip comman
d. This virtualenv is a tool used to create isolated Pyt
hon environments.
Type the command below in the prompt:
pip install virtualenv
2. Next, in order to create a virtual environment in the pr
oject directory, we would need to first make the projec
t directory as our current directory using the comman
d below:
cd project-directory
Here, project-directory represents the path of Pytho
n project directory.
3. Next, we create the virtual environment called myenv
in this directory using the command below:
virtualenv myenv
4. If we go to the project directory, we shall find that a n
ew folder named myenv has been created inside the
directory which is exactly the virtual environment that
we wanted to create.
We have now created the virtual environment!

Virtual environment directories at a


glance
Now that we have been able to set up a virtual environment
named myenv in the previous section, let us spend some
time understanding the main folders that reside inside the
virtual environment.
If we go the myenv folder, we observe the other folders and
files as shown in Figure 14.2 below:
Figure 14.2: Folders inside a Virtual Environment

Inside the Lib folder, we would be able to observe the site-


packages folder within which would reside all the libraries
that we install.
Another folder called the Scripts folder contains all the
executables as shown in Figure 14.3.
In the root directory of the virtual environment my_env, we
also see a file named pyvenv.cfg, which is basically a
configuration file. This file contains information about the
virtual environment as well as the original Python source
and version information from which the environment was
created. This file specifies different options for the virtual
environment (refer to Figure 14.3):
Figure 14.3: Executables in the Scripts folder

Additional considerations involving a


virtual environment
In this section, we shall quickly add some finishing bullet
points to this this topic of virtual environment before moving
to the next section.
• Activating a virtual environment: We need to acti
vate a virtual environment prior to using it. After creat
ing the virtual environment as we did before, we activ
ate it using the command below:
.\myenv\Scripts\activate
• Installing packages in a virtual environment: To i
nstall packages in a virtual environment, we simply m
ake sure that the environment is activated and then in
stall the package using the regular pip command. All
the packages would be installed in the site-packages
folder that resides within the Lib folder.
• Creating requirements.txt file: The requirement
s.txt file essentially contains information about the vir
tual environment packages that becomes a useful refe
rence for other members in the team while installing t
he same virtual environment. We create the file using
the command below:
pip freeze > requirements.txt
• Deactivating a virtual environment: Finally, once
we are done using the virtual environment, we should
deactivate the environment using the command belo
w so that the command prompt returns to normal syst
em settings:
Deactivate

Python PIP revisited


Throughout our exercises in this book, we have regularly
installed packages using the Python pip package. This pip
is basically a package manager for Python packages. It
allows one to install those packages and dependencies that
do not come pre-installed as a part of the Python standard
library. PIP is basically an acronym for PIP Installs
Packages.
Let us take an example where we want to install pandas.
The regular syntax that we would follow is as below:
pip install pandas
However, this might not reveal which Python version pip
might belong to in case there are multiple versions installed
on the machine. In such a case, it would be advantageous to
use the python -m pip command which explicitly specifies
which Python version the pip belongs to.
Using python -m pip command, we would be installing
pandas using the syntax below:
python -m pip install pandas
Whenever we install packages using pip, the pip installer
uses PyPI to look for packages. This Python Package
Index (PyPI) is a repository of software for the Python
programming language. Whenever a package is contributed
by the Python community, it is published to PyPI. Hence, by
default, all packages installed using pip install shall be
fetched from the PyPI repository.
However, pip also provides the functionality to install
packages from a custom package index or from a Github
repository.

Performing basic operations using


pip
In this section, we shall skim through the regular operations
that we perform using the pip command and the various
functionalities provided by this command:
• Installing a package: This is the most common task
that is performed using the pip command. The syntax
for this task is as below:
pip install package
Here, package is a placeholder for the package that
we would want to install.
As mentioned earlier, specifying the version for pip is
a more precise way of installing the package, the synt
ax of which is as below:
python -m pip install package
To install a specific version of a package, the comman
d below is used:
pip install package==version
• Displaying package information: If we would want
to know the information related to a package like the
version, its dependencies, and its dependents, we use
the command below:
pip show package
As an example, when we try this for the pandas librar
y, the results are populated as shown in Figure 14.4:
Figure 14.4: Detailed Information of pandas package using pip show command

From the image of the prompt above, we observe that the


basic information below has been obtained about the
pandas package:
• Version: 2.0.0
• Summary: Powerful data structures for data analysis,
time series, and statistics
• Location: c:\users\dell\anaconda3\lib\site-packag
es
• Requires: numpy, python-dateutil, pytz, tzdata
• Required-by: altair, camelot-py, gpt-index, gradi
o, tabula-py
• List all locally installed packages: If one wants to
know the list of all packages installed locally, this can
be done using the simple command below:
pip list
Figure 14.5 below is a snapshot of a section of the out
put generated in the prompt after entering the comm
and pip list:
Figure 14.5: List of local packages getting displayed by pip list command

• Listing all packages installed with pip: In case on


e would want to know the list of all packages installed
using pip, which are basically those that do not come
pre-installed with Python, the command to be used is
given below:
pip freeze
Figure 14.6 below is a snapshot of a section of the out
put generated in the prompt after entering the comm
and pip freeze:
Figure 14.6: List of all externally installed packages displayed by pip freeze

We observe that pip freeze also displays the version


of the package which pip list did not.
• Listing outdated packages: In case one would want
to know those packages, whose versions have been o
utdated and upgraded by recent ones, the command
below would do the task:
pip list –outdated
Figure 14.7 below is a snapshot of a section of the out
put generated in the prompt after entering the comm
and pip list --outdated. Please refer to the following
Figure 14.7:
Figure 14.7: List of all outdated packages displayed by pip list –outdated

• Upgrading a package: The command below can be


used to upgrade a package to a recent version:
pip install --user --upgrade package
In order to upgrade to a specific version, use the com
mand below:
pip install --user --upgrade package==version
• Downgrading a package: In order to downgrade a p
ackage from a higher version to a lower version, the c
ommand below could be used:
pip install --user package==version
• Uninstalling a package: In order to uninstall a pack
age, the command below can be used:
pip uninstall package
It is important to note here that the pip uninstall pa
ckage command does not uninstall the dependencies.
In order to achieve that, one would first need to see th
e dependencies using the pip show command and th
en manually uninstall each dependency.
• Verifying dependencies for installed packages: I
n order to verify that the installed packages have dep
endencies that are compatible, the command below c
ould be used:
pip check
Figure 14.8 below is a snapshot of the output generat
ed in the prompt after entering the command pip che
ck:

Figure 14.8: Result from the command pip check

We observe from the snapshot above that the dependencies


of various packages have been listed. We could take
camelot as an example shown below:
camelot-py 0.9.0 has requirement
pdfminer.six>=20200726, but you have pdfminer-six
20191110.
It mentions that camelot needs a more recent version than
the one which is currently installed.

Working with the requirements.txt


file
Until now, we have installed solitary packages using the pip
command. However, project requirements at times demand
that multiple packages be installed at once as the same
dependencies would need to be installed by other team
members as well. In such a scenario, pip provides a very
advantageous functionality to install packages in bulk using
the requirements.txt file.
We would first need to create a requirements.txt file in
order to work with it using pip. We shall go back to our
virtual environment myenv to create a requirements.txt
file.
Before creating the requirements.txt file, we need to make
sure that we install some sample packages in the virtual
environment using the pip install package command so
that after creating the requirements.txt file, the packages
automatically get listed in the file.
For this exercise, we install pandas in the virtual
environment myenv that we created earlier. As pandas
gets installed, it would automatically install the
dependencies numpy, python-dateutil, pytz and tzdata.
The dependency python-dateutil has its own dependency
package named six which is in turn automatically installed
by python-dateutil.
Next, we would want to create a requirements.txt file that
lists all these installed packages together so that the list
could be utilized by another member of the team. The pip
command comes to the rescue again that provides us with
the functionality to create a requirements.txt file. Before
using it, we would need to activate the virtual environment
myenv using the commands below:
cd project-directory
This makes the directory project-directory the current
directory where we have previously created the virtual
environment myenv.
Next, we activate the virtual environment using the
command below:
.\myenv\Scripts\activate
Now, we create a requirements.txt file in the myenv
virtual environment directory using the command below:
pip freeze > requirements.txt
After running this command, a file named
requirements.txt would be created in the myenv directory.
On opening the file, we would observe as shown in Figure
14.9 that it contains a list of all the packages that have
been installed in the virtual environment. Please refer to the
following Figure 14.9:
Figure 14.9: List of installed packages listed in the requirements.txt file

As shown in Figure 14.9 above, the requirements.txt file


not only lists the package names but also the versions of
the installed packages.
Now, let us finally put to use the requirements.txt file that
we just created. Before that, we would need to create a new
virtual environment in order to test the functionality. We
make the same project-directory as the current directory
and then enter the command below:
virtualenv test_env
This creates a virtual environment called test_env. Next, we
activate the environment using the command below:
.\test_env\Scripts\activate
Next, we type the command below which shall install all
packages from the requirements.txt file into the virtual
environment test_env.
pip install -r requirements.txt
If we go to the Lib folder and open the site-packages
folder in the virtual environment test_env, we find that all
the packages listed in the requirements.txt file have been
installed.
In this way, we have explored the functionality provided by
pip that enables one to install all packages listed in the
requirements.txt file.

Using Docker for containerization


We have seen so far how a virtual environment in Python
enables one to encapsulate the entire set of Python
dependencies. Another concept is Docker, which is a
containerization platform that is used to package the
application and all related dependencies in the form of
containers thus ensuring that the application works
seamlessly in any environment, which may be development
or test or production. Unlike a virtual environment that
encapsulates Python dependencies, a Docker encapsulates
the entire operating system. A docker container provides
code isolation, independence and portability and is required
when one wants to deploy the application into production.
The purpose of this section on docker was to provide a quick
summary of the concept. It is further left to the enthusiastic
reader to explore containerization concepts in docker and
try related commands thereafter.
With this, we end our discussion of architecting Python
automation projects and the chapter as well.

Conclusion
In this chapter, our focus has been on brushing up all
accessory concepts that were otherwise not a part of
specific topics but are of immense importance considering
the architectural considerations of automation projects. The
concepts of virtual environment and detailed examples of
pip library commands shall provide a massive boost to the
overall utility of the automation project.
In the next chapter, we would explore another exciting
luxury provided by Python which is the PyScript framework,
which is an open-source framework that provides the
functionality to create front end web applications using
Python. If that sounds interesting enough let us turn the
page to the next chapter!

Join our book’s Discord space


Join the book’s Discord Workspace for Latest updates,
Offers, Tech happenings around the world, New Release and
Sessions with the Authors:
https://discord.bpbonline.com
Chapter 15
The PyScript
Framework

Introduction
In this chapter, we introduce the reader to an interesting
and ongoing project which is currently under development,
but garnering lots of attention called PyScript which is
basically a Python framework that allows us to write Python
code inside HTML directly. The advantage of PyScript is that
it allows one to run the Python code directly in the browser
without infrastructural barriers. PyScript does not require
any development environment other than a web browser
like Chrome and a text editor like Notepad. Additionally, we
do not install anything like we used to do earlier using the
pip installer.
Developers in Python who have background in JavaScript
would find it exciting to experiment with the functionality
provided by PyScript.
This chapter shall provide the basic awareness that is
necessary for the reader to explore this novel framework.
Structure
The chapter covers the following topics:
• Introduction to PyScript
• Creating a basic webpage using PyScript
• Adding working Python code to the webpage
• Using third party libraries with PyScript
• Referencing external Python files in PyScript

Objectives
This chapter is a worthwhile exploration of a novel
framework which shall take the reader through multiple
lucid examples from specific sections related to the PyScript
framework. Each of these sections would introduce the
reader to unique concepts which the reader shall find
extremely useful for further implementation.
The Python code in this chapter has been kept simple since
the key focus here would be to understand salient concepts
in PyScript. The sections and the corresponding exercises
have been carefully chosen in such a way that they would
clearly enlighten the reader about the concept that is
intended to be explained. The chapter would introduce the
reader to PyScript and basic concepts related to the
framework. Thereafter, specific exercises would introduce
the reader to the creation of a basic webpage in PyScript,
inclusion of a working Python code in PyScript, importing
third party libraries in PyScript and referencing external
Python code in PyScript.
By the end of this chapter, the reader would appreciate the
advent of PyScript as a novel idea to implement web
applications along with the presence of conventional web
development frameworks.
Introduction to PyScript
PyScript is an open-source web framework that allows one
to embed and execute Python code in the browser using
HTML tags. It enables dynamic generation of pages with
Python. For a long time JavaScript has been the language
that has dominated frontend development because of its
ability to run natively in the browser and interact with HTML
and CSS. Then arrived WebAssembly, which is an open
standard enabling the usage of binary code on the web. This
allowed developers to run languages like C and C++ in the
browser. Obviously, Python soon entered the race and came
up with this framework.
PyScript was developed by Anaconda and is built using
Pyodide, which ports CPython to WebAssembly. It is typically
used when one would want to move a Python application
from the backend to the frontend. Another great advantage
is the ability to use scientific packages from Python like
Scikit-learn, pandas and numpy. These work only in the back
end and not in the front end. However, PyScript allows one
to utilize these modules in the front end. Finally, PyScript
provides the ability to interact with the local file system
which JavaScript does not.
Listed below are basic prerequisites that the reader needs to
be having prior to beginning this chapter:
• Basic knowledge of HTML, CSS and JavaScript.
• The Chrome web browser, which is the suggested bro
wser in the official documentation of PyScript.
In this chapter, we shall gradually witness the involvement
of Python for web development in the form of the PyScript
framework by starting with basic HTML applications and
increasing the complexity of the examples thereafter.
In the next section, we shall create a basic web page using
PyScript.

Creating a basic webpage using


PyScipt
In this section, we shall begin our journey with PyScript by
creating a basic webpage. Prior to adding PyScript
framework, we would need to create a basic HTML template.
In order to create the template, open a blank text editor like
Notepad and copy the basic HTML script shown below:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>PyScript Tutorial</title>
</head>
<body>

</body>
</html>
Save the file as PyScript_Tutorial.html.
Next, we shall need to add the link tag and the script tag
which are described below:
• Link tag: This is the link for the stylesheet allowing o
ne to format the output of Python code. This is given a
s below:
<link rel="stylesheet" href="https://pyscript.ne
t/alpha/pyscript.css" />
• Script tag: This brings the JavaScript source file that
works with the browser for interacting with Python run
time. This is given as below:
<script defer src="https://pyscript.net/alpha/pys
cript.js"></script>
After incorporating both these tags, the modified HTML
script would look like the one below:
<!DOCTYPE html>

<html lang="en">

<head>

<meta charset="UTF-8">

<link rel="stylesheet" href="https://pyscript.net/alpha/


pyscript.css" />

<script defer src="https://pyscript.net/alpha/pyscript.js


"></script>

<title> PyScript Tutorial </title>

</head>
However, this does not complete the application as we are
yet to add the important section where Python code shall be
written. We implement this using the py-script tag as
shown below:
<py-script>

#python code

</py-script>
It is important to note that the Python code written with the
py-script tag needs to be indented. So, with this
knowledge, let us write our first Hello World script and
modify the HTML document accordingly to incorporate the
Python code within the py-script tag.
The modified HTML script would look like the one shown on
the next page.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">

<link rel="stylesheet" href="https://pyscript.net/alpha/


pyscript.css" />
<script defer src="https://pyscript.net/alpha/pyscript.j
s"></script>

<title> PyScript Tutorial </title>


</head>
<py-script>
print('Hello World!')
</py-script>
Save the document PyScript_Tutorial.html after pasting
the modified script above. Close the document and open it
in a Chrome browser. We find that while opening, the
browser first displays a message that says Loading
Runtime and then displays the message Hello World! as
shown in Figure 15.1 below:
Figure 15.1: First webpage using PyScript

Hurrah! We have just created our first webpage using


PyScript!

Adding working Python code to the


webpage
Given the fact that we have the basic template of HTML
ready with PyScript placeholders, let us move to the next
task of adding a working piece of Python code in the py-
script tag.
We shall proceed with writing a small program that creates
a dictionary and then populates the values of the dictionary
on the webpage.
Let us create a sample dictionary that consists of names of
few countries as keys and values as their capital cities as
shown below:
dict_countries = {'United States of America':'Washington
D.C.',

'India':'New Delhi',
'Argentina':'Buenos Aires',

'Japan':'Tokyo',
'Germany':'Berlin',

'France':'Paris',
'Egypt':'Cairo'}
Next, we shall print the values of the dictionary keys as
shown below. These values are basically the capital cities of
the respective countries:
for str_country in dict_countries:
print(dict_countries[str_country])
We would need to include this code within the py-script
block in the HTML template PyScript_Tutorial.html that
we saved earlier. Save this file with a different name
Countries_List.html. The modified piece of HTML code is
shown in the next page:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">

<link rel="stylesheet" href="https://pyscript.net/alpha/


pyscript.css" />
<script defer src="https://pyscript.net/alpha/pyscript.j
s"></script>

<title> PyScript Tutorial </title>


</head>

<py-script>
dict_countries = {'United States of America':'Washington
D.C.',
'India':'New Delhi',

'Argentina':'Buenos Aires',
'Japan':'Tokyo',
'Germany':'Berlin',

'France':'Paris',
'Egypt':'Cairo'}

for str_country in dict_countries:


print(dict_countries[str_country])

</py-script>
Copy the code above in the HTML file and save the file. Run
the file and open it in the Chrome browser. We get the
output as shown in Figure 15.2:

Figure 15.2: Output of dictionary values using PyScript

In this section, we have conveniently applied a Python code


with dictionary and loops into the py-script block of the
HTML code, thus introducing ourselves to mainstream
Python coding within HTML.
A good practice is to first write the Python code into a
Python editor like Spyder, PyCharm or Visual Studio Code
and thereafter edit the code to ensure that all salient
aspects of the code like syntax and indentation are correct,
prior to including the code within the py-script block of the
HTML code. If one directly pastes the code into the text
editor and misses out on either of these important aspects,
one might find the code unable to run and would enter a
situation where it would be difficult to figure out the cause.
In the next section, we shall move to further stages with
Python coding where we would introduce the functionality of
reading local files and populating their content into the web
browser.

Using third party libraries with


PyScript
We have discussed that one of the advantages of the
PyScript framework is the ability to use third party Python
libraries like pandas, numpy and matplotlib in the front end.
In this exercise, we shall use import the numpy package into
our Python code and include that code within the py-script
tag.
Prior to writing the Python code within the py-script block
of the HTML code, we would need to include a new block
called py-env in the HTML code and mention the numpy
module within that block in order to inform the HTML code
that the numpy library is being referenced in the Python
code.
Below is the py-env block that we would need to include in
our HTML code:
<py-env>
- numpy
</py-env>
Next, we shall run the following program in our Python code
using the numpy library:
1. Import numpy module into Python.
2. Create a numpy array named matrix_1 with three ro
ws and four columns where each element of the matri
x is a random number multiplied by 10.
3. Create a numpy array named matrix_2 with three ro
ws and four columns where each element of the matri
x is a random number multiplied by 10.
4. Create a numpy array named matrix_sum with three
rows and four columns which is basically the result of
the sum of corresponding elements of matrix_1 and
matrix_2.
Below is the Python code that would execute the step
s 1 to 4 that we discussed. Paste the code in a Python
editor so that it could be formatted accordingly for err
ors and indentation.
#Calculate the sum of two matrices
import numpy as np
matrix_1 = np.random.rand(3,4)*10
matrix_2 = np.random.rand(3,4)*10
matrix_sum = matrix_1 + matrix_2
print('\nThis is a program to calculate the sum of two
matrices having randomly generated values:\n')
print('\nThe first matrix is:\n')
print(matrix_1)
print('\nThe second matrix is:\n')
print(matrix_2)
print('\nThe matrix sum is:\n')
print(matrix_sum)
5. Next, we include this Python code within the py-scrip
t block of the HTML code. The complete HTML code ha
s been shown in the next page. Paste this code in a te
xt editor and save the file with the name as Using_N
umpy_With_PyScript.html.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">

<link
rel="stylesheet"href="https://pyscript.net/alpha/p
yscript.css" />

<script defer
src="https://pyscript.net/alpha/pyscript.js">
</script>

<title> PyScript Tutorial </title>


<py-env>

- numpy
</py-env>
</head>
<body>
<py-script>
import numpy as np

matrix_1 = np.random.rand(3,4)*10
matrix_2 = np.random.rand(3,4)*10
matrix_sum = matrix_1 + matrix_2
print('\nThis is a program to calculate the sum of two
matrices having randomly generated values:\n')

print('\nThe first matrix is:\n')


print(matrix_1)

print('\nThe second matrix is:\n')


print(matrix_2)

print('\nThe matrix sum is:\n')


print(matrix_sum)
</py-script>
</body>
6. On running the HTML file with the code above and ope
ning the results in a Chrome browser, we get the resul
ts as shown below in Figure 15.3:

Figure 15.3: Numpy arrays and the result obtained after adding them

Note that the arrays shall look different at every run


because the values are randomly generated numbers.
Referencing external Python files in
PyScript
Until now, we have studied how we could include Python
code into the py-script block of the HTML code and observe
the results at the front end.
Let us now consider a scenario where the Python code may
get voluminous, and it might render the HTML code too long
if we write the entire Python code within the HTML file. In
such a case, PyScript provides us with a very useful
functionality of referencing code from an external Python
file. Using this functionality, we could store most of the
Python code in an external Python file which would typically
include the function definitions. Thereafter, we could import
only the functions from the Python file and perform our
tasks in the py-script block of the HTML code. An example
would make this clear.
As our focus here would be on understanding the concept,
we would keep the Python script simple. Let us create a
Python script that defines two functions Square and Cube
that calculate the square and cube respectively of an
integer number. Below is the Python code which would do
that for us. Copy the code in a Python editor and save the
Python file with the name Perform_Math.py:
#Perform Math

#Calculate Square
def Square(intNum):
intSquare = intNum * intNum
return intSquare

#Calculate Cube
def Cube(intNum):
intCube = intNum * intNum * intNum
return intCube
Next, we would want to reference the Python file in the py-
script block and perform the calculations. Below is the code
that we would include in the py-script block:
<py-script>
from Perform_Math import Square,Cube
intNum=5
intSquare = Square(intNum)
intCube = Cube(intNum)
print('\nThe number is:')
print(intNum)
print('\nThe square of the number is:')
print(intSquare)
print('\nThe cube of the number is:')
print(intCube)
</py-script>
We reference Perform_Math as a module and import the
functions Square and Cube into the Python code to
perform calculations and print the results thereafter.
Before running the HTML code, we would need to add
another block in the HTML file called py-config which
basically instructs the HTML code to fetch the appropriate
Python file. For this exercise, we shall be saving the Python
file and the HTML file in the same folder within the directory.
Below is the additional block of code corresponding to the
py-config tag that we would need to add to the HTML code.
<py-config>
[[fetch]]
files = ["./Perform_Math.py"]
</py-config>
We would also require making few more changes in href
and src within the HTML code where we specify the PyScript
releases. Unlike what we did in the previous exercise where
we used the /latest/pyscript.js release, we would be
linking to the PyScript static assets in release 2022.12.1
which is a later release with which the code seemed to work
properly. Below is the modified block of the HTML code in
which we include the title as well:
<title>Referencing External Python Files</title>
<link rel="stylesheet" href="https://pyscript.net/release
s/2022.12.1/pyscript.css" />
<script defer src="https://pyscript.net/releases/2022.1
2.1/pyscript.js"></script>
The complete HTML code has been given below for
reference. Copy the code in a text editor like notepad and
save the file with the name Perform_Math.html. As
mentioned earlier, we would want to make sure that the
HTML file is saved in the same folder where the Python file
is saved.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />

<title>Referencing External Python Files</title>


<link rel="stylesheet" href="https://pyscript.net/release
s/2022.12.1/pyscript.css" />
<script defer src="https://pyscript.net/releases/2022.1
2.1/pyscript.js"></script>

</head>
<body>
<py-config>
[[fetch]]
files = ["./Perform_Math.py"]
</py-config>
<py-script>
from Perform_Math import Square,Cube
intNum=5
intSquare = Square(intNum)
intCube = Cube(intNum)
print('\nThe number is:')
print(intNum)
print('\nThe square of the number is:')
print(intSquare)
print('\nThe cube of the number is:')
print(intCube)

</py-script>
</body>
</html>
On running the code in Chrome, we get an error as shown in
Figure 15.4 below:

Figure 15.4: Error received on running HTML file Perform_Math.html without


server

The error message clearly indicates that one cannot directly


reference a file from the local directory. This is where the
concept of a server comes into place. We would need to use
an HTTP server to render the HTML page.
Entering the command below into the prompt would run the
server for us:
python -m http.server -d 'directory_path'
Here, 'directory_path' is a placeholder for the path of the
folder where the HTML file and the Python file have been
saved.
After running the server, we go to Chrome and visit
127.0.0.1:8000 which would list all the files within the
directory as shown in Figure 15.5 below:
Figure 15.5: List of files in the directory where the HTTP server is run

We observe from Figure 15.5 above that the Python file


Perform_Math.py and the HTML file Perform_Math.html
have been listed along with other files from the exercises
that we performed in the previous sections.
Next, we click on the file Perform_Math.html listed in the
directory from the rendered HTML page. As shown in Figure
15.6 below, we finally observe the results:
Figure 15.6: Output generated by the HTML file Perform_Math.html

We have explored the basic server functionality in this final


exercise which has opened the doors for us to further
explore other tasks that require referencing files from a
directory.
With this, we end our exploration of the PyScript framework!
It surely has been an interesting endeavor exploring this
novel framework which is an ongoing developmental effort
and shall undergo further upgradations.

Conclusion
As we conclude this chapter, we have laid a solid
groundwork for understanding PyScript, a cutting-edge
framework that is reshaping how Python interacts with web
technologies. While we have unlocked many possibilities
with PyScript, it is essential to remember that it is a
dynamic and evolving tool. Staying abreast of its
developments will be key to leveraging its full potential in
the future.
With the end of this chapter, we begin our journey to the
concluding chapter of the book. Reflecting on our journey
through this book, we have traversed a diverse landscape of
automation—from file and folder automation to delving into
the realms of RPA, AI, ML, and deep learning. Each chapter
has contributed a piece to this expansive mosaic of Python
automation knowledge.
As we approach the final chapter, our focus shifts to test
automation—a crucial aspect of software development. We
will explore the Selenium framework, a cornerstone of
browser automation, followed by insights into the Pytest
framework and a discussion on the Robotic framework in
Python. This concluding chapter promises to be the
culminating point of our comprehensive exploration of
Python automation.
It has been a marathon run so far, so one would not want to
miss out on the finish line. Let us step forward into the final
leg of our marathon, a journey that is as enlightening as it is
empowering. Onward to the finish line in our exploration of
Python automation! Hence, without further delay, here we
go to the last and final chapter!

Join our book’s Discord space


Join the book’s Discord Workspace for Latest updates,
Offers, Tech happenings around the world, New Release and
Sessions with the Authors:
https://discord.bpbonline.com
Chapter 16
Test Automation in
Python

Introduction
So here we are into the last chapter of the book! We shall
begin this chapter by delving into a topic that is most
commonly addressed at the end of software development
which is testing. The practice of testing software involves
creation of multiple test cases and performing of checks. As
this process gets manual, it becomes time consuming and
inefficient. This is where Python comes to the rescue with
open-source libraries to enable automation of this process.
We start by introducing Selenium, the world’s most popular
framework for automating web browsers. Python, among
other languages, is supported by Selenium, and we will
explore its capabilities through the Selenium Python
module. Following this, we will examine Pytest, another
powerful testing framework known for its simplicity and
scalability. Lastly, we will venture into the Python Robot
Framework, an interesting tool that offers a higher-level,
keyword-driven approach to test automation.
Structure
The chapter covers the following topics:
• Introduction to Selenium
∘ Setting up the Selenium
∘ Exploring web automation with Selenium Python
API
• The Pytest library
∘ Advantages and limitations of Pytest
• Python Robot Framework
∘ Running test cases in the Python Robot
Framework

Objectives
This concluding chapter of the book shall address the most
practically useful topic of test automation by introducing the
reader to frameworks like Selenium, Pytest and the Python
robotic framework. By the end of this chapter, the reader
shall not only have a basic understanding of these
frameworks but shall also appreciate the mammoth role that
Python plays in providing the functionality of test
automation. Each of these sections shall cover practical
examples that shall illustrate salient concepts and
demonstrate the use of Python libraries in execution of
required tasks.
The chapter shall aim to provide a foundation based on
which the reader would be able to independently explore
and implement further use cases using Python frameworks
like Selenium, Pytest and the Robot Framework, thus making
the reader get acquainted with an extremely useful skillset
that adds the final finishing touch to an automation project.
Introduction to Selenium
Selenium is an open-source software testing framework that
is used to automate web applications. It works on all
browsers, majority of the operating systems and the scripts
are written in various programming languages like Python,
Java, and C#.
In this chapter, we shall be focusing on the Chrome browser
and of course, the Python programming language.
There are four main components in Selenium as shown
below in Figure 16.1:

Figure 16.1: The Four Main Components of Selenium

Each of these components is discussed below:


• Selenium IDE: The Selenium integrated developm
ent environment (IDE) is the major tool in Selenium
which is available as an IDE for Selenium tests. This is
available as a Chrome extension and allows recording,
editing, and debugging functional tests. This was also
called as Selenium Recorder in the past.
• Selenium RC: The Selenium Remote Control (RC)
is a server written in Java that enables the functionalit
y to write automated tests for a web application in an
y programming language. However, the use of this co
mponent deprecated later with the advent of Seleniu
m WebDriver, which is the next component that we sh
all be studying.
• Selenium WebDriver: This component is the succes
sor of Selenium Remote Control and sends commands
to the browser. This does not need a special server to
execute tests. It opens an instance of a browser and p
erforms actions on various browser elements. It suppo
rts various programming languages like Java, Python a
nd C#.
• Selenium GRID: This is a server that allows one to ru
n test cases across different browsers, machines, and
operating systems in parallel. This functionality enabl
es to spread the load of testing across several machin
es and is also useful when tests are to be run on differ
ent browsers that run on their specific platforms or op
erating systems.
In this chapter, we shall focus on browser automation using
Selenium with Python. The Python module of Selenium is
built to perform automated testing with Python. This Python
API shall provide us with the ability to access the WebDriver
component of Selenium.
Through Selenium Python API, we can access several
browsers like Chrome, Firefox, Internet Explorer and Edge. It
also supports multiple operating systems like Windows,
Linux, Mac, Android, iOS. The efficient functionality of
parallel test execution is supported by this API.
In the sections to follow, we shall look at the procedure to
install Selenium and its associated web drivers and then go
through few examples in Python that would provide a
walkthrough of the Selenium functionality.
Setting up the Selenium Python API
In this section, we shall discuss the procedure set up the
Selenium Python API. Follow the steps below in order to set
up Selenium Python API on your system:
1. The first step that needs to be done prior to using the
selenium Python library is to install the library using t
he pip command as shown below:
pip install selenium
Alternatively, we could also use the command below t
o be specific about the Python version that is being us
ed:
python -m pip install selenium
2. The next step would be the installation of the web driv
ers. Each of the browsers like Firefox, Chrome, Interne
t Explorer, or Edge would be having their own respecti
ve browsers. However, in this chapter we shall focus o
nly on Chrome and hence we would download the Chr
ome browser. The Chrome drivers for 32 Bit and 64-Bit
Windows could be downloaded from the links below:
Chrome downloader for Windows 32 Bit:
https://edgedl.me.gvt1.com/edgedl/chrome/chro
me-for-testing/116.0.5845.96/win32/chromedriv
er-win32.zip
Chrome downloader for Windows 64 Bit:
https://edgedl.me.gvt1.com/edgedl/chrome/chro
me-for-testing/116.0.5845.96/win64/chromedriv
er-win64.zip
3. Post installation of the web drivers, we need to make
sure that we extract them from the zip files and there
after ensure that the paths where their respective .ex
e files have been saved are reflected in the system en
vironment variables.
We have now successfully set up the Selenium Python API.
Next, let us try experimenting with some piece of code. We
shall try a simple procedure where we open an instance of
the Chrome browser using selenium. Copy and paste the
code below in a Python editor:
#Open Chrome Browser Instance

from selenium import webdriver

# Webdriver object
objDriver= webdriver.Chrome()

#Open google.co.in window


objDriver.get("https://google.co.in")
Below is a quick explanation of the above code:
a. The first step is:
from selenium import webdriver
Here, we import the webdriver module of the seleni
um library as we would be doing web automation. We
could recollect that WebDriver is one of the main com
ponents of Selenium as discussed in earlier section.
b. The second step is:
objDriver= webdriver.Chrome()
Here, we specify that we would want to use the driver
for the Chrome browser.
c. The third step is:
objDriver.get("https://google.co.in")
Here, we mention the link to the Google Chrome brow
ser as an argument which opens the Chrome browser
window.
On running the code above, the Chrome browser opens in a
separate window as shown in Figure 16.2 below:

Figure 16.2: Automated opening of Chrome browser using Selenium

On the top of the Chrome window that has been opened, we


observe a message that says, Chrome is being controlled by
automated test software which indicates that Chrome has
detected the automated trigger of the browser instance.
To avoid this message being populated, we would need to
modify the code a little bit to specify Chrome options. The
entire modified code has been shown below for reference:
#Chrome browser instance with options

from selenium import webdriver


from selenium.webdriver.chrome.service import Service

obj_Chrome_Options = webdriver.ChromeOptions()

obj_Chrome_Options.add_experimental_option("excludeSwi
tches",["enable-automation"])

obj_Service = Service(r"C:\Users\dell\chromedriver-
win64\chromedriver.exe")

obj_Driver=webdriver.Chrome(options=obj_Chrome_Option
s,service=obj_Service)

obj_Driver.get("https://www.google.co.in/")
On running the code above, we get the regular Chrome
window without the earlier message. We have now
understood the setup and procedure to instantiate a
Chrome browser with Selenium and shall further explore
web automation with Selenium.

Exploring web automation with


Selenium Python API
In this section, we shall try out a sample task of web
browser automation that shall enable the reader to further
explore salient aspects of web automation. In the example
within this section, we shall consider visiting the Wikipedia
web page and type text into the search text box to populate
the result web page.
To locate web elements like text box, button or drop down,
Selenium provides us with the functionality to locate them
using their element identifiers which can be found using the
Inspect option that is available in the drop down when we
right click on the element in the browser window. This
procedure has been discussed in detail in Chapter 2 – RPA
Foundations. Here, we shall use the same procedure to
identify elements in the browser window.
Selenium provides the functionality to identify web elements
using various attributes like ID, Name, ClassName or
XPath. This is accomplished using the By class provided by
selenium and the find_element method.
1. We import the By class using the code below:
from selenium.webdriver.common.by import By
2. Next, if we want to find an element X by its ID, the sy
ntax for that is shown below:
element = Webdriver.find_element(By.ID, "id")
3. The process of typing text into a selected element can
be achieved using the send_keys method as shown b
elow:
element.send_keys("Enter text")
Selenium provides the Keys class which enables us to
simulate the usage of commonly used keyboard keys.
The Keys class is imported using the code below:
from selenium.webdriver.common.keys import Keys
4. Using this understanding, let us execute the task of vi
siting Wikipedia and typing the text Machine Learni
ng into the search textbox.
The main element in consideration for us here would b
e the search textbox. As shown in Figure 16.3 below, t
he id attribute of the search text box is searchInput
which we would be using in our code:
Figure 16.3: Inspecting the HTML code for the search textbox of the Wikipedia
page

Most of the code from the previous exercise that we p


erformed to load the Chrome browser can be used in t
his exercise except for url that we pass as an argumen
t to the get method of the driver. In this exercise, we
use the url https://www.wikipedia.org/ instead of h
ttps://www.google.co.in/ that we used in the previo
us exercise.
The updated chunk of code is shown below:
#Chrome browser instance with options
from selenium import webdriver
from selenium.webdriver.chrome.service import Serv
ice

obj_Chrome_Options = webdriver.ChromeOptions()

obj_Chrome_Options.add_experimental_option("exclu
deSwitches",["enable-automation"])
obj_Service = Service(r"C:\Users\dell\chromedriver-
win64\chromedriver.exe")

obj_Driver=webdriver.Chrome(options=obj_Chrome_
Options,service=obj_Service)

obj_Driver.get("https://www.wikipedia.org/")
5. Next, we would want to import the additional libraries
that we discussed earlier. They have been included be
low once again for reference. These can be added at t
he top of the module along with the other libraries:
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
6. Finally, we would want to include the lines of code tha
t would perform the web element search, entering of t
ext and thereafter pressing the enter key. These lines
have been shown below along with their respective co
mments:
#Get the search textbox element
text = obj_Driver.find_element(By.ID,"searchInput")
#Type text into the textbox element
text.send_keys("Machine Learning")
#Press the enter key
text.send_keys(Keys.ENTER)
The entire code has been shown below for reference. Copy
and run the code into a Python editor:
#Exploring browser automation with Selenium

from selenium import webdriver


from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

obj_Chrome_Options = webdriver.ChromeOptions()

obj_Chrome_Options.add_experimental_option("excludeSwi
tches",["enable-automation"])

obj_Service = Service(r"C:\Users\dell\chromedriver-
win64\chromedriver.exe")

obj_Driver=webdriver.Chrome(options=obj_Chrome_Option
s,service=obj_Service)

obj_Driver.get("https://www.wikipedia.org/")

#Get the search textbox element


text = obj_Driver.find_element(By.ID,"searchInput")

#Type text into the textbox element


text.send_keys("Machine Learning")

#Press the enter key


text.send_keys(Keys.ENTER)
On running the code above, we get the output as shown in
Figure 16.4 below:
Figure 16.4: Output from Wikipedia with Machine Learning search

In this example, we have searched the web element using


By.ID within the get_element method. However, Selenium
provides the functionality to search web elements using
other attributes as well. Below is the list of other attributes
enabled by the By class that could be used to identify web
elements using Selenium:
ID = "id"
NAME = "name"
XPATH = "xpath"
LINK_TEXT = "link text"
PARTIAL_LINK_TEXT = "partial link text"
TAG_NAME = "tag name"
CLASS_NAME = "class name"
CSS_SELECTOR = "css selector"
The various ways in which the other attributes are used is
documented below for reference:
find_element(By.ID, "id")
find_element(By.NAME, "name")
find_element(By.XPATH, "xpath")
find_element(By.LINK_TEXT, "link text")
find_element(By.PARTIAL_LINK_TEXT, "partial link text")
find_element(By.TAG_NAME, "tag name")
find_element(By.CLASS_NAME, "class name")
find_element(By.CSS_SELECTOR, "css selector")
With this, we finish our exploratory tutorials on Selenium.
These tutorials should provide a sufficient launching pad for
the reader to explore further possibilities in web automation
using selenium. Let us now proceed with the next section!

Pytest library
Pytest is an open-source Python framework used for unit
testing, which is basically the process of performing
automated testing to validate that a single component of a
software works as intended. While many other Python
frameworks like unittest and doctest are used for this
purpose, the advantage of Pytest is that it has a large
community to support its open-source development. Hence,
we would discuss the Pytest framework in this chapter. In
this section, we would explore some sample examples that
would illustrate the efficient usage and implementation of
Pytest in unit testing.
The first step that needs to be performed prior to using the
library is its installation using the pip command as shown
below:
pip install pytest
Once the library is installed, we would move ahead with
writing some piece of code. In this section, we shall take a
simple example of a test that checks whether a number
equals a certain target number.
Let us create a simple function that calculates the square of
a number. Below is the Python code for the function.
def Square(intNum):
intSquare = intNum*intNum
return intSquare
Copy the code above in a Python editor and save the file by
the name Calculate_Square.py.
Next, we create another Python file within the same
directory. In this new file, we would be importing this file as
a module and using the Square function.
Copy the code below in a new file within the Python editor.
#Perform test
from Calculate_Square import Square

def test_Square():
intNum = 10
intSquare = Square(intNum)
assert intSquare == 100
Now comes the important part. While saving the file, we
should name the Python file in such a way that the name
should start with test_ or end with _test. The way Pytest
works is that it checks for the Python file in the particular
directory that starts with test_ or ends with _test. Let us
save this file with the name Square_test.py.
Below is a quick explanation of the code in the file
Square_test.py.
1. First, we import the file Calculate_Square as a modu
le and import the Square function from the module u
sing the code below:
from Calculate_Square import Square
2. Next, we define a function named test_Square that c
hecks whether the square of a number being calculate
d using the imported Square function equals 100. Not
e that Pytest requires the function name to be prepen
ded with test_.
Here, define a variable intNum and initialize it to an i
nteger value of 10 using the code below:
intNum = 10
Next, we pass intNum as an argument to the functio
n Square to calculate the square of the value held in
variable intNum and assign the result thus obtained t
o another variable named intSquare.
Finally, we use the most important assert function. Th
is is a Python function that validates whether the valu
e on the left-hand side would equal the value on the ri
ght-hand side. In this case, it checks whether the valu
e held in the variable intSquare would equal 100.
Next, let us test the assertion using the Pytest library.
3. Follow the steps below to implement the test using th
e Pytest library:
a. Create a folder within the C:\ drive and name the
folder as Pytest_Tutorial.
b. Save both the files Calculate_Square.py and
Square_test.py in this folder.
c. Open the prompt and make the directory
C:\Pytest_Tutorial as the current directory using
the command below:
cd "C:\Pytest_Tutorial"
The snapshot of the prompt after entering the
command above has been shown below in Figure
16.5 below:

Figure 16.5: Changing the current directory to C:\Pytest_Tutorial

d. Next, type the command pytest which shall run


the test and validate the assertion that we made
in the test_Square function of the
Square_test.py file. The snapshot of the prompt
after entering the command Pytest is shown
below in Figure 16.6:
Figure 16.6: Test successfully validated using pytest

We observe that we receive the message that says 1 passed


in 0.07s which says that the test has successfully passed,
which means that the assertion which we made in our
function test_Square has been successfully validated.
Now, let us take the same example code to demonstrate
how Pytest works with a failed assertion. We open the file
Square_test.py and tweak the code a little bit. The only
change we make this time is that we assign a different value
to the variable intNum. Instead of 10, we assign it a value
of 15. Clearly the square of 15 does not equal 100 and
hence this would be a failed assertion which we shall test
using the same procedure. The updated code is shown on
the next page.
#Perform test
from Calculate_Square import Square
def test_Square():
intNum = 15
intSquare = Square(intNum)
assert intSquare == 100
Save the file Square_test.py after replacing it contents with
the code above.
Open the command prompt and follow the same procedure
which was followed for the earlier test and enter the pytest
command. The test results are populated as shown in the
snapshot of the prompt in Figure 16.7 below:

Figure 16.7: Pytest results for a failed assertion

As show in the figure above, Pytest has provided FAILED as


a test summary due to an assertion error. In this way, we
have explored the capability of Pytest in performing unit
testing efficiently. In the next section, we shall go through
the salient advantages of the pytest library and its
limitations.

Advantages and limitations of Pytest


Pytest is by far the most appreciated framework for unit
testing because of the reasons mentioned below:
• Pytest has a vast development community due to whi
ch it becomes easy to receive quick resolution on any
emerging issues and errors.
• Pytest is extremely easy to learn as is evident from th
e examples we just studied in the previous section. Th
e assert keyword of Python forms the crux of Pytest w
hich makes it the essence of focus for the tester. More
over, test functions should simply be named with test
_ prepended or _test appended and pytest would itse
lf do the job of searching them. All of this makes the l
earning curve for Pytest a steep one.
• Pytest is free and does not have any licensing cost.
• Pytest can choose to selectively run specific methods
from a test file based on the conditions. This is accom
plished using the filtering functionality provided by th
e pytest library.
• As compared to other testing libraries like unittest, th
e pytest library requires writing less code which make
s it compact for the tester.
Inspite of the numerous advantages mentioned above that
Pytest provides, it is important to note one possible
limitation of Pytest that integrating Pytest with other
frameworks might become challenging because Pytest uses
specific routines for writing tests requiring rewriting of the
entire code.
With this, we finish our exploration of another interesting
framework called Pytest.
In the next section, we shall move to the Python Robot
Framework, which is an open-source automation framework
for Acceptance Testing and Robotic Process Automation.

Python Robot Framework


The Python robot framework is an open-source automated
framework provided by the library robotframework for
acceptance testing and robotic process automation.
Acceptance testing is basically the process of evaluating
software prior to sending it to production in order to make
sure that it is up to the expected quality and specifications.
In this section, we shall explore the installation and setup of
the robot framework and thereafter take up some examples
to illustrate its use.
1. The first step that needs to be done before using the li
brary is the installation of the library using the pip co
mmand as shown below:
pip install robotframework
2. Next, we require a special kind of editor called RIDE i
n order to write test cases using the robot framework.
We install ride using the command below:
pip install robotframework-ride
3. After installing RIDE, we would need to open the RIDE
interface in order to navigate within its controls and w
rite test cases.
In order to open RIDE, we first open the command pro
mpt and type the command below:
ride.py
4. On entering this command, the RIDE editor opens as s
hown in Figure 16.8:
Figure 16.8: Interface of the RIDE Editor

5. Next, we would want to create a new project to start


writing test cases. However, prior to doing that, we w
ould be creating a new directory for our project.
Let us create a new directory C:\Learn_Robot_Fram
ework for this exercise.
6. To create a new project, click on File in the top left cor
ner and select New Project as shown in Figure 16.9
below:

Figure 16.9: Creating a new project in RIDE


7. On selecting New Project a new window opens as sh
own in Figure 16.10 below which contains fields for br
owsing to the directory and assigning a name to the p
roject:

Figure 16.10: New Project window in RIDE

Below are quick actions that we perform with this win


dow:
• As shown in Figure 16.10 above, we browse to the
directory C:\Learn_Robot_Framework using the
Browse button and thereafter name the project
as Robot_Framework_Tutorial in the Name
field.
• These details are reflected automatically in the
Created Path field which reflects the entire path
of the project.
• At the right-hand side, we observe two sections
named Type and Format.
• The Type section contains two radio buttons File
and Directory. We observe that the File radio
button has been selected by default and we retain
this default selection.
• The Format section contains four radio buttons
having labels ROBOT, TXT, TSV and HTML.
These are basically meant to specify the
extensions of the project file. Here, we would be
creating a file having the .robot extension and
hence we retain the default selection ROBOT
here as well.
8. Next, we click the OK button at the bottom of the win
dow. On clicking the OK button, we observe that the p
roject gets listed at the left-hand side of the window u
nder Test Suites as shown in Figure 16.11:

Figure 16.11: Project Learn_Robot_Framework being reflected on the left side

9. Our next step would be to create a test case in order t


o perform testing. The procedure to do so would be to
right click on the project name under Test Suites and
select the option New Suite as shown in Figure 16.12
below:
Figure 16.12: Creating New Suite in RIDE

10.On selecting New_Suite, a window opens that contai


ns options similar to the ones that we had while creati
ng a new project as shown in Figure 16.13 below:

Figure 16.13: New window for Add Suite

11.As shown in Figure 16.13 above, we assign the name


First_Test_Case in the Name field for the test case t
hat we would be creating. The complete path automat
ically gets updated in the field Created Path. As befo
re, we retain all default radio button selections on the
right-hand side of this window.
12.On clicking the OK button, we observe that the test c
ase gets listed at the left-hand side Test Suites wind
ow under the project folder Learn_Robot_Framewor
k as shown below in Figure 16.14:

Figure 16.14: First_Test_Case observed in Test Suites window

Running test cases in the Python


Robot Framework
With the setup that we did in the previous section, we are all
set to now explore writing test cases in the RIDE editor of
the Python Robot Framework. In this exercise, we shall
create a Python file and write a function that calculates the
perfect square of an integer number and save the file in the
project directory that we created earlier.
Open a blank file in the Python editor and copy the code
below:
def Calculate_Square(intNum):
intSquare = intNum*intNum
return intSquare
Save the file in the same directory of the project
Learn_Robot_Framework and name the file as
Calculate_Perfect_Square.py.
Next, we shall proceed to writing the test case. Similar to
what we did during our section on Pytest, we shall validate
whether the square of an arbitrary integer generated using
the function Calculate_Square equals 100.
Follow the steps below to prepare the test case:
1. We would need to import the Python file Calculate_P
erfect_Square.py in order to be able to call its functi
on Calculate_Square. We do this by clicking the Libr
ary button at the right-hand side of the Edit bar of th
e IDE window that is shown in Figure 16.14. On clickin
g the Library button, a window opens as shown in Fig
ure 16.15 below:

Figure 16.15: Selecting library in the test case

2. We click on the Browse button and select the Calcul


ate_Perfect_Square.py file by browsing to its locatio
n. Next, we click on the OK button.
3. Now, we create an input variable that would be passe
d as an argument to the function Calculate_Square.
To create variable, click on the Create Scalar button
at the right side of the Edit bar of the IDE that is show
n in Figure 16.14. On clicking the Create Scalar butt
on, a new window appears as shown below in Figure 1
6.16:
Figure 16.16: Scalar Variable window

We enter the variable name as intNumber and its val


ue as shown in the image above.
4. Next, we click on the tab Text Edit which opens the e
ditor window as shown in Figure 16.17 below:

Figure 16.17: Text Edit tab in the test case

5. We observe that the imported module Calculate_Per


fect_Square.py has been included under the Setting
s section and the variable intNumber = 10 has been
mentioned under the Variables section.
6. Next, we write the actual test under a new section tha
t we name as Test Cases. We label the test as Valida
tion of Output which is reflected on the left-hand sid
e pane under First Test Case. We call the function C
alculate_Square from the module Calculate_Perfec
t_Square and pass the variable intNumber as an ar
gument. The result thus obtained is stored in the varia
ble answer.
As shown in Figure 16.17, below is the line of code tha
t calculates the value in the variable answer.
${answer} = Calculate Square ${{int(${intNumber})}}
There are few important things to note here:
• The syntax for the variable names is basically a $
sign with the variable name included within curly
brackets.
• While calling the Calculate_Square function, the
underscore between the words Calculate and
Square is not included.
• In order to convert the Scalar Variable
intNumber to an integer, it is transformed using
the int keyword and thereafter converted to a
variable using the $ sign again. This is then
passed as an argument to the function
Calculate_Square.
• There needs to be sufficient space of at least 3 to
4 characters between each of the terms in the
line of code above, else the IDE fails to identify
the terms resulting in errors.
7. Finally, we write the actual test which is the line that
would perform the validation using the line of code be
low.
Should Be Equal As Numbers ${answer} 100
We use the Should Be Equal As Numbers method
of the Robot Framework, which would take two argum
ents and basically validate whether they are the same
or different. If they are the same, then the test would
pass, else it would fail.
8. We are all set to run the test case now! As shown in Fi
gure 16.18 below, go to the Run tab and select the V
alidation of Output checkbox on the left pane. Ther
eafter, click the Start button available in the Run tab.
The test case starts running and after finishing its run,
it populates the message log as observed at the botto
m of the window in Figure 16.18 below:

Figure 16.18: Test case message log on successful completion

9. We can also observe the report by clicking on the Rep


ort button at the top and the log by clicking on the Lo
g button as shown in Figure 16.19 and Figure 16.20 b
elow:
Figure 16.19: Report of the test

Figure 16.20: Log of the test


That finishes our basic exploration of the Robot framework!
We would leave it to the reader to explore the failed
scenario by changing the variable value.

Conclusion
As we turn the final page of this book, we reach not just the
end of a chapter but the culmination of an extensive and
enriching journey through the world of Python programming
and automation! This chapter, rich with images and detailed
instructions, has strived to demystify test automation,
ensuring concepts are accessible and examples are
straightforward. This chapter has made a gentle attempt at
introducing important concepts in test automation while
keeping the examples as simple as possible. Hope this
concluding chapter serves the purpose of efficiently closing
in on the exploration of Python programming applications by
providing an enlightening edge to salient concepts!
Throughout this book, the motive has been to keep the code
snippets and the examples super simple so that the key
focus of the reader remains on grasping the concepts.
We’ve ventured through a diverse landscape – from
industrial automation to software testing, hyperautomation,
orchestration, RPA, desktop and file automation, and delved
into the depths of machine learning and deep learning. This
book promises to serve as a ready manual for quick
reference of key concepts in each of these topics. It is one of
the very few attempts at covering a wide variety of topics at
a moderately high level but sufficiently deep level. It stands
as a comprehensive guide that navigates a variety of topics
with a balance of breadth and depth.
As you continue on your Python automation journey, please
revisit these pages whenever you wish to seek clarity or
inspiration. Remember, the journey of learning and
exploration never truly ends – there is always more to
discover and master. Last but not least, we thank you for
joining us on this educational adventure! We wish you all
the best as you forge ahead, armed with new knowledge
and insights. May your path in Python automation be as
rewarding as it is enlightening! On this note, the book
concludes by wishing you all the very best in your future
endeavors on this exciting journey of Python automation!

Join our book’s Discord space


Join the book’s Discord Workspace for Latest updates,
Offers, Tech happenings around the world, New Release and
Sessions with the Authors:
https://discord.bpbonline.com
Index

A
algorithmic processing
Amazon Elastic Kubernetes Service (Amazon EKS)
Artificial General Intelligence (AGI)
artificial intelligence (AI)
about
history
Artificial Neural Network (ANN)
automation
versus orchestration
automation anywhere
bot creator
bot runner
control room
B
backpropagation
Beautiful Soup
bias
about
versus variance
Blue Prism
control room
object studio
process studio
bots
business process workflow
C
cell
accessing, with name range
merging
used, for looping
chart
creating, with openpyxl
styling
chat
chatbot
chat
reflections
role
rule based chatbot
self-learning chatbot
Community Edition
confusion matrix
conversational agents
implementing
convolution
convolutional layer
Convolutional Neural Network (CNN)
about
convolutional layer
fully connected layer
pooling layer
cost function
D
data processing
data science
decision tree
deep learning (DL)
about
application
Python libraries
dimensionality reduction
about
feature extraction
feature selection
Document Object Model (DOM)
E
Excel
automating, with Python
Excel formulae
working, with openpyxl
existing workbook
modifying
opening
external Python files
referencing, in PyScript
F
forward pass
fully connected layer
G
Gmail automation
2-step verification
app password, obtaining
Gmail message, sending with Python
prerequisites
Gmail message
sending, with Python
gradient descent
graphical processing unit (GPU)
Graphical User Interface (GUI)
grayscale image
H
hyperautomation
about
challenges
defining
document, enhancing with optical character recognition
process
use cases
I
images
working, with PIL library
information technology (IT)
integrated development environment (IDE)
Internet of Things (IoT)
J
JavaScript Object Notation (JSON)
K
Keras
keyboard functions
hotkey()
implementing, with PyAutoGUI library
typewrite()
K means clustering
k nearest neighbors
K nearest neighbors
kube
Kubernetes
Kubernetes framework, components
cluster
node
pod
replication controller
L
large language models (LLMs)
learning rate
least squares regression
linear discriminant analysis (LDA)
linear regression
logistic regression
Long Short Term Memory (LSTM)
looping
through cell
luigi library
luigi module
M
machine learning (ML)
about
concepts
Python libraries
machine learning (ML) models
bias
gradient descent
key concepts
variance
margin
matplotlib
message box functions
alert()
confirm()
exploring, with PyAutoGUI library
mouse functions
click()
implementing, with PyAutoGUI library
mouseDown()
mouseUp()
moveTo()
scroll()
N
Naïve Bayes
name range
used, for accessing cell
natural language processing (NLP)
about
algorithmic processing
data processing
lemmatization
part-of-speech (POS) tagging
segmentation
stemming
stop words removal
tokenization
Natural Language Toolkit (NLTK)
neural network
about
architecture
hidden layer
implementing, in Python
input layer
output layer
types
neural network, types
Convolutional Neural Network (CNN)
Long Short Term Memory (LSTM)
Recurrent Neural Network (RNN)
nodes
Nomad
Numerical Python (NumPy)
O
Object-Oriented Programming (OOP)
OpenCV
openpyxl
used, for creating charts
used, for Excel working
openpyxl library
Openshift
Openshift Dedicated
Openshift Online
Open-Source Computer Vision (OpenCV)
about
working with
optical character recognition (OCR)
orchestration
about
versus automation
orchestration platforms
about
Amazon Elastic Kubernetes Service (Amazon EKS)
Kubernetes
Nomad
Openshift
overfitting
P
pandas
PDF file
merging
reading, with PyPDF2 library
rotating
PIL library
used, for images working
Platform as a service (PaaS)
pooling layer
principal component analysis
PyAutoGUI library
used, for exploring message box functions
used, for implementing basic mouse function
used, for implementing keyboard functions
using
PyPDF2 library
used, for reading PDF file
PyPDF library
PyScript
about
basic webpage, creating
external Python files, referencing
third party libraries
Pytest library
about
advantages
limitation
Python
about
advantages
high-level language
integrating, with UiPath
libraries
portability aspect
used, for automating Excel
used, for implementing neural network
used, for implementing supervised machine learning
algorithms
used, for implementing unsupervised learning algorithms
used, for sending Gmail message
Python activities
exploring, in UiPath
Python code
adding, to webpage
Python environment
setting up, in UiPath
Python Imaging Library (PIL)
Python libraries
Beautiful Soup library
for deep learning
for Excel automation
for machine learning (ML)
Keras
Natural Language Toolkit (NLTK)
OpenCV
PyTorch
requests module
Scikit Learn
Spacy
TensorFlow
Theano
web page, inspecting
web scraping
xlsxwriter library
xlwings library
Python orchestration
about
luigi module
perfect library
Python OS module
about
function
functions
Python pip package
about
basic operation, performing
Docker containerization, using
requirements.txt file, working
Python Robot Framework
about
test cases, running
Python script
creating
Python shutil module
about
file, copying
file, moving
used, for moving file based extension
PyTorch
pywhatkit
R
random forests
Record Macro
Rectified Linear Unit Function (Relu)
Recurrent Neural Network (RNN)
reflections
reinforcement learning
Remote Control (RC)
RGB image
Robotic Process Automation (RPA)
about
history
Python package
tools
use case
Robotic Process Automation (RPA), components
about
bot runner
control center
development studio
plugins and extensions
recorder
Robotic Process Automation (RPA), tools
about
automation anywhere
Blue Prism
UiPath
rule based chatbot
S
Scientific Python (SciPy)
scikit-learn
scrapy
Selenium
Selenium, components
Selenium GRID
Selenium IDE
Selenium RC
Selenium WebDriver
Selenium GRID
Selenium IDE
Selenium Python API
setting up
web automation, exploring
Selenium RC
Selenium Recorder
Selenium WebDriver
self-learning chatbot
software as a service (SaaS)
Spacy
supervised learning
about
decision tree
K nearest neighbors
linear regression
logistic regression
Naïve Bayes
random forests
support vector machine
supervised machine learning algorithms
implementing, with Python
support vector machines
T
tabula library
TensorFlow
textract library
text recognition
Theano
transformers
Turing Test
U
UiPath
about
Python activities, exploring
Python environment, setting up
used, for integrating Python
underfitting
unsupervised learning
unsupervised learning algorithms
dimensionality reduction
implementing, with Python
K means clustering
linear discriminant analysis (LDA)
principal component analysis
V
variance
about
versus bias
virtual environment
about
additional consideration
directories
setting up
Visual Basic for Applications (VBA)
W
web page
information, extracting
web scraping
about
legal statement
Python libraries
WhatsApp message
automating
X
xlsxwriter library
xlwings library

You might also like