Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
355 views

Python Highway 2 Books in 1 The Fastest Way For Beginners To Learn Python Programming, Data Science and Machine Learning in 3 Days (Or Less) + Practical Exercises Included by Cox, Aaron

Uploaded by

Alberto Fasce
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
355 views

Python Highway 2 Books in 1 The Fastest Way For Beginners To Learn Python Programming, Data Science and Machine Learning in 3 Days (Or Less) + Practical Exercises Included by Cox, Aaron

Uploaded by

Alberto Fasce
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 174

HIGHWAY TO PYTHON

2 Books in 1: The Fastest Way for


Beginners to Learn Python
Programming, Data Science and
Machine Learning in 3 Days (or less) +
Practical Exercises Included

Aaron Cox
Book 1 Python for Beginners

Book 2 Python Data Science


© Copyright 2020 by Aaron Cox - All rights reserved.
This eBook is provided with the sole purpose of providing relevant
information on a specific topic for which every reasonable effort has
been made to ensure that it is both accurate and reasonable.
Nevertheless, by purchasing this eBook, you consent to the fact that
the author, as well as the publisher, are in no way experts on the
topics contained herein, regardless of any claims as such that may
be made within. As such, any suggestions or recommendations that
are made within are done so purely for entertainment value. It is
recommended that you always consult a professional before
undertaking any of the advice or techniques discussed within.
This is a legally binding declaration that is considered both valid and
fair by both the Committee of Publishers Association and the
American Bar Association and should be considered as legally
binding within the United States.
The reproduction, transmission, and duplication of any of the content
found herein, including any specific or extended information, will be
done as an illegal act regardless of the end form the information
ultimately takes. This includes copied versions of the work, both
physical, digital, and audio, unless express consent of the Publisher
is provided beforehand. Any additional rights reserved.
Furthermore, the information that can be found within the pages
described forthwith shall be considered both accurate and truthful
when it comes to the recounting of facts. As such, any use, correct
or incorrect, of the provided information will render the Publisher free
of responsibility as to the actions taken outside of their direct
purview. Regardless, there are zero scenarios where the original
author or the Publisher can be deemed liable in any fashion for any
damages or hardships that may result from any of the information
discussed herein.
Additionally, the information in the following pages is intended only
for informational purposes and should thus be thought of as
universal. As befitting its nature, it is presented without assurance
regarding its continued validity or interim quality. Trademarks that are
mentioned are done without written consent and can in no way be
considered an endorsement from the trademark holder.
Python for Beginners
Introduction:
Chapter 1 : What is Python
Chapter 2 : Why Python is the
Easiest Language to Learn
Chapter 3 : Installing the Interpreter
Chapter 4 : Using the Python
Shell, IDLE and Writing the First Program
Chapter 5 : Variables and
Operators
Chapter 6 : Data Types in Python
Chapter 7 : Making your program
Interactive
Chapter 8 : Making Choices and
Decisions
Chapter 9 : Functions and Models
Chapter 10 : How to Work with Files
Chapter 11 : Object Oriented
Programming
Chapter 12 : Math and binary
Chapter 13 : Exercises
Conclusion
Python Data Science
Introduction
Chapter 1 Installing Python
Chapter 14 : Python Libraries to
Help with Data Science
Chapter 15 : Python Functions
Chapter 16 : The Basics of Working
with Python
Chapter 17 : Data Structures and
the A* Algorithm
Chapter 18 : Reading data in your
script
Chapter 19 : Manipulating data
Chapter 20 : Probability –
Fundamental – Statistics – Data Types
Chapter 21 : Distributed Systems &
Big Data
Chapter 22 : Python in the Real
World
Chapter 23 : Linear Regression
Conclusion
Python for Beginners

THE CRASH COURSE TO LEARN


PYTHON PROGRAMMING IN 3-
DAYS (OR LESS).
MASTER ARTIFICIAL
INTELLIGENCE FOR DATA
SCIENCE AND MACHINE LEARNING
+ PRACTICAL
EXERCISES.
Introduction:
Programming has come a long way. The world of programming may
have started quite some time ago; it was only a couple of decades
ago that it gained attention from computer experts from across the
globe. This sudden shift saw some great minds who contributed to
the entire age of programming far more significant than most. We
saw the great GNU project take shape during this era. We came
across the rather brilliant Linux. New programming languages were
born as well, and people certainly enjoyed these to the utmost.
While most of these programming languages worked, there was
something that was missing. Surely, something could be done to
make coding a less tedious task to do and carry out. That is
precisely what a revolutionary new language, named after Monty
Python’s Flying Circus, did for the world. Immediately, coding
became so much easier for programmers. The use of this language
started gaining momentum, and today, it is set to overtake the only
language that stands before it to claim the prestigious spot of being
the world’s most favored language.
This language was the brainchild of Guido Van Rossum. Created in
the year 1991, Python has become a byword for efficient and user-
friendly programming. This language is what connected the dots and
gave programmers the much-needed ease of coding that they have
since been yearning for. Naturally, the language was received well
by the programming community. Today, it is one of the most critical
languages for both professionals and students who aim to excel in
fields like Machine Learning, automation, artificial intelligence, and
so much more.
With real-life examples showing a wide variety of use, Python is now
living and breathing in almost every major social platform, web
application, and website. All of this sounds interesting and exciting at
the same time, but what if you have no prior knowledge about
programming? What if you have no understanding of basic concepts
and you wish to learn Python?
I am happy to report that this will provide you with every possible
chance of learning Python and allow you to jump-start your journey
into the world of programming. This is ideally meant for people who
have zero understanding of programming and may have never
coded a single line of program before.
I will walk you through all the basic steps from installation to
application. We will look into various aspects of the language and
hopefully provide you with real-life examples to further explain the
importance of such aspects. The idea of this is to prepare you as you
learn the core concepts of Python. After this, you should have no
problem choosing your path ahead. The basics will always remain
the same, and this ensures that each one of those basic elements is
covered in the most productive way possible. I will try to keep the
learning process as fun as I can without deviating from the learning
itself.
Things You Need!
“Wait. Did you not say I don’t need to know anything about
programming?” Well, yes! You do not have to worry about
programming or their concepts at the moment, and when the time
comes, I will do my best to explain those. What is needed of you is
something a little more obvious.

Computer: Like I said, obvious! You need a machine of


your own to download and practice the material and matter
you learn from here. To make the most out of the this,
practice as you read. This greatly increases your
confidence and allows you to keep a steady pace. The
specifications do not matter much. Most of the modern
machines (2012 and above) should be able to run each of
the components without posing any problem.
An internet connection: You will be required to download a
few files from the internet.
An Integrated Development Environment (IDE): If, for some
reason, you felt intimidated by this terminology, relax! I will
be guiding you through each and every step to ensure you
have all of these and know what they are all about. For
now, just imagine this as a text editor.
A fresh mind: There is no point in learning if your mind is
not there with you. Be fresh, be comfortable. This may take
a little practice and a little time, but it will all be worth it.

That is quite literally all that you need. Before we go on into our very
first and start learning the essentials, there is but one more thing I
would like to clarify right away.
If you picked up a copy of this or are considering it, under the
impression that it will teach you all the basics about Python, good
choice! However, if you are of the idea that by the end of this, you
will turn out to be a fully trained professional with an understanding
of things like machine learning and other advanced Python fields,
please understand that this would fall outside the scope.
This is to serve as a guide, a crash course of a sort. To learn more
advanced methods and skills, you will first need to establish
command over all the basic elements and components of the
language. Once done, it is highly recommended to seek out that are
for advanced learning.
What I can recommend you to do is to continue practicing your
codes after you have completed. Unlike driving and swimming, which
you will remember for the rest of your life, even if you stop doing
them, Python continues to update itself. It is essential that you keep
yourself in practice and continue to code small programs like simple
calculator, number predictors, and so on. There are quite a few
exercises you can come across online.
For advanced courses, refer to Udemy. It is one of the finest sources
to gain access to some exceptional courses and learn new
dimensions of programming, amongst many other fields.
Phew! Now that this is out of the way, I shall give you a minute to flex
your muscles, adjust your seat, have a glass of water; we are ready
to begin our journey into the world of Python.
Chapter 1: What is Python
Python is a multi-purpose language created by Guido van Rossum.
The language boasts of a simple syntax that makes it easy for a new
learner to understand and use. This will introduce the basics of the
Python language. Stay tuned.
Let’s get started!
Python is described as a general-purpose language. It has many
applications and therefore, you can use it to accomplish many
different functions.
The syntax of a python language is clean and the length of the code
is short. Developers who have used Python at one point of their lives
will express how fun it was to code with Python. The beauty of
Python is that it offers you a chance to think more about the task at
hand instead of the language syntax.
Some history of Python
The design of the Python language started back in the 1980s and it
was first launched in February 1991.
Why was Python developed?
The reason why Guido Van Rossum embarked on the move to
design a new programming language is that he wanted a language
that could offer a simple syntax just like the ABC. This motivation led
to the development of a new language named Python.
But you may be wondering why just the name Python?
First, this language wasn’t named after the huge snake called
python. No! One of the interests of Rossum was watching comedy.
He was a great fan of the comedy series in the late seventies. As a
result, the name of the language was borrowed from the “Monty
Python’s Flying Circus.”
Properties of Python
Easy to learn – The syntax of Python is simple and beautiful.
Additionally, Python programmers enjoy writing its syntax than other
languages. Python simplifies the art of programming and allows the
developer to concentrate on the solution instead of the syntax. For a
newbie, this is a great choice to start your Python career.
Portability – When it comes to Python portability, it offers you the
ability to run Python on different platforms without making any
changes.
Python is described as a high-level language – In other words, you
don’t need to be scared of tedious tasks such as memory
management and so on. Alternatively, whenever you execute a
Python code, it will automatically change the language to a language
that your computer understands. No need to be worried about any
lower-level operations.
Object-oriented – Since it is an object-oriented language, it will allow
you to compute solutions for the most difficult problems. Object-
Oriented Programming makes it possible to divide a large problem
into smaller parts by building objects.
Has a huge standard library to compute common tasks – Python has
different standard libraries for the programmer to use. As a result,
you will not write all the lines of code yourself. Instead, you will only
import the library of the relevant code.
A Brief Application of Python
Web Applications
You develop a scalable Web application using CMS and frameworks
that are created on Python. Popular environments for developing
web applications include Pyramid, Django, Django CMS, and Phone.
Other popular websites like Instagram, Mozilla, and Reddit are
written in Python language.
Scientific and Numeric Calculations
There are different Python libraries designed for Scientific and
numeric calculations. Libraries such as NumPy and SciPy use
Python for general computing purpose. And, there are specially
designed libraries like AstroPy for Astronomy, and so on.
Additionally, the Python language is highly applied in data mining,
machine learning, and deep learning.
A great Language for Tech Programmers
The Python language is an important tool used to demonstrate
programming to newbies and children. It is a great language that has
important capabilities and features. However, it is one of the easiest
languages to learn because it has a simple syntax.
Building Software Prototypes
Compared to Java and C++, Python is a bit slow. It may not be a
great choice when resources are restricted and efficiency is made
compulsory.
But Python is a powerful language to build prototypes. For instance:
You can apply the Pygame library to develop the prototype of your
game first. If you enjoy the prototype, you can decide to use C++ to
develop the actual game.
Chapter 2: Why Python is the Easiest
Language to Learn
Python is an interpretive, object-oriented and dynamic data type
high-level programming language. Since the birth of Python
language in the early 1990s, it has gradually been widely used in
processing system management tasks and Web programming.
Especially with the continuous development of artificial intelligence,
Python has become one of the most popular programming
languages.
The first benefit that you will notice with the Python language is that
it is easy to learn. This language was developed with the beginner in
mind, in the hopes of bringing more people into coding. Some of the
traditional languages were hard and bulky, and unless you were
really passionate about some of the work that you were doing with
coding, you would probably decide to give up long before anything
was done. But with the Python language, things are a bit different.
This language as designed to be easy to learn and easy to read,
which helped make it possible for more people to get into the world
of coding.
Even though you will be pleasantly surprised by how easy it is to
learn about the Python language, you will also find that it is a
powerful language. Don’t let the simplicity of this language fool you;
it has enough power to get the work done, no matter how complex or
hard the problem is. Even though Python is able to handle some of
the basic coding needs that you have, it also has the power to help
you to do things like machine learning and data analysis. And if you
have spent any time working with these topics, and these ideas, you
know that they are not easy.
With this in mind, Python is also going to have a lot of extensions
and libraries that help it to work better. This is primarily how you will
be able to get Python to work with some of those more complex
tasks. You can add these simply by installing them to your computer
or system, and the Python language is ready to go when you are.
You can then handle algorithms, finish your data analysis, and so
much more. There are many Python data science libraries available
based on which step of the process you are working on at the time
Why is Python special?
There are hundreds of programming languages now available for
programmers to start with. However, according to statistics from a
survey done by Harvard computer scientists Python is a leading
language among beginners. We will discuss about some of the
reasons that make Python an understandable language for new
programmers.
Python has the following major advantages over other programming
languages:
(1) The grammar is concise and clear, and the code is highly
readable. Python's syntax requires mandatory indentation, which is
used to reflect the logical relationship between statements and
significantly improve the readability of the program.
(2) Because it is simple and clear, it is also a programming language
with high development efficiency.
(3) Python can be truly cross-platform, for example, the programs we
develop can run on Windows, Linux, macOS systems. This is its
portability advantage.
(4) It consists of A large number of rich libraries or extensions.
Python is often nicknamed glue language. It can easily connect
various modules written in other languages, especially C/C++. Using
these abundant third-party libraries, we can easily develop our
applications.
(5) The amount of code is small, which improves the software quality
to a certain extent. Since the amount of code written in Python is
much smaller than that in other languages, the probability of errors is
much smaller, which improves the quality of the software written to a
certain extent.
Python is very versatile and can be used in the following areas:
(1) web page development;
(2) Visual (GUI) interface development;
(3) Network (can be used for network programming);
(4) System programming;
(5) Data analysis;
(6) Machine learning (Python has various libraries to support it);
(7) Web crawlers (such as those used by Google);
(8) Scientific calculation (Python is used in many aspects of the
scientific calculation).
For example, Python is used in many Google services. YouTube is
also implemented in Python. The basic framework of the Wikipedia
Network initially is also implemented in Python.
How does python work?
Python Program Execution Principle is very simple. We all know that
programs written in compiled languages such as C/C++ need to be
converted from source files to machine languages used by
computers, and then binary executable files are formed after linking
by linkers. When running the program, you can load the binary
program from the hard disk into memory and run it.
However, for Python, Python source code does not need to be
compiled into binary code. It can run programs directly from the
source code. The Python interpreter converts the source code into
bytecode and then forwards the compiled bytecode to the Python
virtual machine (PVM) for execution.
When we run the Python program, the Python interpreter performs
two steps.
(1) Compiles Source Code into Byte Code
Compiled bytecode is a Python-specific expression. It is not a binary
machine code and needs further compilation before it can be
executed by the machine. This is also why Python code cannot run
as fast as C/C++.
If the Python process has to write permission on the machine, it will
save the bytecode of the program as a file with the extension .pyc. If
Python cannot write the bytecode on the machine, the bytecode will
be generated in memory and automatically discarded at the end of
the program. When building a program, it is best to give Python
permission to write on the computer, so as long as the source code
is unchanged, the generated .py file can be reused to improve the
execution efficiency.
(2) Forwarding the compiled bytecode to Python Virtual Machine
(PVM) for execution.
PVM is short for Python Virtual Machine. It is Python's running
engine and part of the Python system. It is a large loop that
iteratively runs bytecode instructions, completing operations one
after another.
In this process, every python program is executed and gives results
that can be further analyzed and tested to deploy as new
applications completely.
Chapter 3: Installing the Interpreter
Python has many free IDEs and environments available online. With
this variety of options, there are some programs which are better
than others. With their shortfalls in mind, the best software one can
use to practice their Python programming is PyCharm Community
Edition.
Python is a common programming language for application
development. Python design focuses on code readability and clear
programming for both small and big projects. You are able to run
modules and full application from a massive library of resources on
the server. Python works on various operating systems, such as
Windows. Installing Python on the Windows server is a
straightforward process of downloading the installer, and running it
on your server and configuring some adjustments can Python easier.
It is this software that I recommend to many of my students, although
Anaconda is another, I found quite useful. PyCharm won’t offer you
the extraordinary power and capabilities as professional software
will, but for beginners, it’s more than adequate.
With that in mind, we need only to download and install the software.
I will go through this process with you, step-by-step with pictures.
Step 1: Open your preferred internet browser, (Google Chrome,
Firefox, etc.), and search ‘PyCharm community edition’. You should
see page-link depicted in image 1.2 as your first result.
1.1: Searching PyCharm Community Edition

1.2: First Result for PyCharm Community Edition


Step 2: Once you click on the link, you should see a page like the
one depicted in image 2.1. From here, you can decide which version
of PyCharm you wish to download. Be it Mac, Windows or Linux for
your OS, Pro or Community edition depending on your preferred
plan.
NOTE: I will be using Community Edition throughout
2.1: PyCharm Download Page
Step 3: Once you have chosen your preferred OS and version of
PyCharm, click the ‘Download’ button. You should see the PyCharm
installer downloading at the bottom-left of your screen.
3.1: PyCharm Downloading

Step 4: Once the download is


complete, click on the same box at the bottom-left of your screen. If
you no longer see the box or have closed your browser, locate the
downloaded installer in your Downloads folder. Double-click the icon
to start the installation.
4.1: PyCharm Download Finished

4.2: PyCharm Installer in Downloads Folder


Step 5: Now, we begin the installation. The process is simple, as you
need only click ‘Next’ and then ‘Install’ at the bottom of the
installation process boxes. However, I will be going through each
box.
5.1: Box 1 - Introduction

This first box is simply introducing you to the installation process.


Click ‘Next’ to continue.
5.2: Box 2 - Install Location
The second box is concerning where the software will be installed.
PyCharm is a relatively small program, requiring less than 1GB of
space. In addition to that, you may want to install PyCharm in a
certain folder, by clicking ‘Browse...’ you will be presented with an
interface that allows you to select which one.
RECOMMENDED: The default location is perfectly fine and should
cause no issues, as long as you have enough space for the program
on your system. I recommend you leave this option unaltered. Click
‘Next’ to continue.
5.3: Box 3 - Additional Installation Options
This step is purely optional, but I do recommend you create a
Desktop Shortcut by checking the box for ‘64-bit Launcher’, for ease
of use. What this will do is place an icon on your desktop, which you
can use to quickly start PyCharm without having to search for it in
the Start Menu.
Whether you have selected this option or not, click ‘Next’ to continue.
5.4: Box 4 - Start Menu Folder Selection
This step is similar to 5.2, as you can select the folder where the
software is installed, but in this case, it is the Start Menu folder. If
you have enabled the Desktop Shortcut as recommended in 5.3, this
step can be left without alteration.
However, if you want the application icon to be stored in a specific
folder, you can change that here.
RECOMMENDED: Once more, the default location is perfectly fine
and I recommend you leave it unaltered.
Click ‘Install’ to continue.
5.5: Box 5 - PyCharm Installing
PyCharm is installing and you are on your way to Python
programming! Leave your computer running until the installation is
complete.
5.6: Box 6 - PyCharm Finished Installing
PyCharm has been installed on your system and you have one more
option before clicking ‘Finish’. You can choose to run PyCharm now.
If you have unchecked this box and enabled the Desktop Shortcut,
you can find the following icon on your Desktop to start PyCharm.
5.7: PyCharm Desktop Shortcut Icon

Step 6: Once you have started PyCharm up, you should see the
following as depicted in image 6.1. For the first startup, PyCharm
asks you to accept standard terms & conditions before you can use
the program.
You can read through these or not, but in order to continue, check
the box that states you have read and accepted the terms of this
user agreement. Once checked, click ‘Continue’.
6.1: Accepting User Agreement

Step 7: The box you should see is an option for most programming
software. The software developers ask if you allow the software to
send data on your usage to help in bug-fixing etc. For more details,
they allow you an option to read more about it.
You can choose to provide this information or not, you still have full
access to PyCharm.
7.1: Data Sharing Agreement
Step 8: We are in the final stages of this installation process. The
few steps are more preference steps than anything else. Once
completed, you are ready to move, where we will create a project for
coding in.
Choose a theme for your UI. I will be using Darcula, but you can use
whichever. Once selected, click ‘Next: Featured plugins.
8.1: Theme Choosing

8.2: Featured Plugins


Once more, these are preference options. These plugins are more
for the experienced program and all are optional.
I won’t be using any additional plugins, so once you are ready, click
‘Start Using PyCharm’
8.3: Finished and Ready to Start Creating Projects!
With that, you have finished installing and setting up PyCharm!
If you see the image in 8.3, you are ready to start where we will
create a project. An important last step before we start learning
some code!
Chapter 4: Using the Python Shell, IDLE and
Writing the First Program
Once you have Python in your operating system, the following step
is to compile and run a program with Python.
A program is a series of instructions that have been coded and that
will allow you to perform a series of specific tasks on your computer.
These coded instructions are what are known as source code; these
codes are what the user or programmer sets in his computer.
The source code is written in the Python programming language and
this language will be converted into an executable file and for this to
happen, in other words, for the source code to be converted into an
executable file, the help of a compiler will be necessary that will be
executed in a "central processing unit" (CPU) and all this will happen
with the help of an interpreter.
In summary, we have that a compiler is going to convert our source
code into an executable file since it is a translator that transforms a
program or source code into a machine language so that it can be
executed; this translation process is what is known as compiling.
There is a difference between a compiler and an interpreter since the
first one translates a program described by the programming
language into the machine code of the system, while the interpreters
only perform the translation, be it instruction by instruction, and also
do not store the result of this translation.
Therefore, we have a source code that is going to be executable in
two ways by either a compiler or an interpreter who will execute it
immediately.
When we open the IDLE in our system, in the same way that we did
it before, we are going to observe the screen that we find when we
open our IDLE, which is called Shell, or we can also call it as the
interpreter of our Python language.
Every time we open our interpreter or Shell, we will always find a
kind of header, which will always be the same, where it has Python
information, such as the version in which it is working, date and time,
for example. This type of format helps us appreciate that we are
working with the Shell interpreter.
By means of this example, we will be able to visualize how our Shell
interpreter is doing the translation from Python language to machine
language instruction by instruction.
The default on OS X is that Python 3 is not going to be installed at
all. If you want to use Python 3, you can install it using some of the
installers that are on Python.org. This is a good place to go because
it will install everything that you need to write and execute your
codes with Python. It will have the Python shell, the IDLE
development tools, and the interpreter. Unlike what happens with
Python 2.X, these tools are installed as a standard application in the
Applications folder.
You have to make sure that you are able to run both the Python
IDLE and the Python shell on any computer that you are using. And
being able to run these can be dependent on which version you have
chosen for your system as well. You will need to go through and
check on your system which version of Python is there before
proceeding and then moves on from there to determine which IDLE
and shell you need to work with.
We write a line of codes in Python, starting with the very famous
phrase in Python for every beginner "Hello World" and we will do it in
the following way:
The syntax is written as follows:

Already written
the instruction that we want the program to execute, we only have to
press the "Enter" key and automatically the interpreter will translate
instruction by instruction and will not wait to receive another
additional instruction but executes once we press the "Enter" key.
Additional detail of the interpreter is that it can also be used from the
command prompt, which is also available on Windows, Linux and
Mac.
In order to use the interpreter from the command prompt, simply type
in the word Python and press the "Enter" key. This way, you start to
run the Python interpreter and we know that we are effectively in the
interpreter because, we are going to see the same header as we
saw before.
Now we can start to execute instructions written with Python:
--- print ("Hello world"), the interpreter is going to translate this line
and immediately shows us the result "Hello world".
Chapter 5: Variables and Operators
What are Variables?
A variable is nothing more than a reserved location in the memory, a
container if you like, where values are stored. The basic rules
relating to variables are:

Values can be strings, numeric or another data type


A variable is created when they are first assigned
A variable need to be assigned before you can reference it
The value that you store in the variable may be updated or
accessed at a far ahead time
Variables do not require a declaration
The variable data type, for example, float, int, string, etc.,
will be decided by Python
The Python interpreter will allocate the required amount of
memory based on the variable data type

Naming Rules for Variables


Like many things in Python, variables come under strict naming
conventions:

A variable must start with an underscore (_) or a letter – A


to Z or a to z
The other characters in the name may be underscores,
letters or numbers
Variables are case sensitive. For example, myname is a
different variable to MyName
Variable names can be any length within reason
Reserved keywords cannot be used – a list of these can be
found below

Basic Operators and Assignment Operators


The control flow of a program is the order that the code is executed
in and this is regulated through loops, conditional statements and
function calls. We are going to look first at Boolean and comparison
operators, followed by the if statement and all the variations of it.
Booleans and Comparison Operators
Boolean data types may be True or they may be False, nothing else.
Booleans are used to control program flow and to make
comparisons. They are representative of truth values that we
associate with mathematics, the logic side of it to be precise.
Booleans were named after George Boole, a mathematician, and the
word always starts with a capital B. By the same token, the two
values, True and False, also start with capital letters. The reason for
this is because, in Python, they are special values.
We are going to look at how these Booleans work, including
comparison operators, and logical operators. First, we look at the
comparison operators.
Comparison Operators
In computer programming, we use comparison operators as a way of
comparing two or more values and to evaluate a single value, the
Boolean True or False. These are the comparison operators:
OperatorDescription
==is equal to
! =is not equal to
<is less or lower than
>is greater or larger than
<=is less or equal to
>=is greater or equal to
To better understand the way they work, let’s look at an example
where we assigned two variables with integer values:
x=7
y=9
We can see from the example that, because x has been assigned a
value of 7, it is greater than variable y, which has been assigned 9 as
a value.
Using these variables and the values that go with them, we can take
a better look at the comparison operators. We are going to write a
program that asks whether each of the operators will evaluate to
True or to False and then print the result. To help understand it even
more, we will ask for a string to be printed that shows us what is
happening.
x=7
y=9
print ("x == y:", x == y)
print ("x! = y:", x! = y)
print ("x < y:", x < y)
print ("x > y:", x > y)
print ("x <= y:", x <= y)
print ("x >= y:", x >= y)
The output would be:
x == y: False
x! = y: True
x < y: True
x > y: False
x <= y: True
x >= y: False
If we followed the logic in Math, we can see that Python evaluated
each of these expressions as:
Is 7 (x) equal to 9 (y)? False
Is 7 not equal to 9? True
Is 7 less than 9? True
Is 7 greater than 9 False
Is 7 less than or equal to 9? True
Is 7 not less than or equal to 9? False
We used integer numbers for this example but we could just have
easily used floats. We can also use strings with Boolean operators
but do remember that they are case sensitive. Look at a practical
example of how strings are compared:
Sally = "Sally"
sally = "sally"
print ("Sally == sally: ", Sally == sally)
The output would be:
Sally == sally: False
The string called “Sally” is not the same as the string called “sally”
because they are not the same – one begins with a capital letter, the
other doesn’t. If we were to add in a variable that we assigned with a
value, “Sally”, they would evaluate as equal:
Sally = "Sally"
sally = "sally"
also_Sally = "Sally"
print ("Sally == sally: ", Sally == sally)
print ("Sally == also_Sally", Sally == also_Sally)
The output would be:
Sally == sally: False
Sally == also_Sally: True
As well as these, we can use two other comparison operators, < and
>, to compare strings as well as evaluating Booleans with these
operators:
t = True
f = False
print ("t! = f: ", t! = f)
Output
t! = f: True
This has resulted in an evaluation that True does not equal False.
Note that there is a difference between these two operators - = and
==:
x = y # Sets x as equal to y
x == y # Evaluates if x is equal to y
The first one, =, is called an assignment operator. This will set a
value as being equal to another value. ==, on the other hand, is a
comparison operator and this evaluates if two separate values are
equals.
Logical Operators
We can make sue of three different logical operators when we want
to compare two values. These will evaluate an expression, down to
True or False, both Boolean values. Here is what those operators
are and what they do:
OperatorDescription
andEvaluates True if both values are true
orEvaluates True if one or more values is true
not evaluates True only if the evaluation is false
We use the logical operators to determine if at least two expressions
are true or not. For example, we can use a logical operator to see if
a specific grade is a passing grade and to check that a specific
student has been registered in the course. If both are True, the
student is assigned with a grade. Another way of looking at this
would be to see if a user is an active and valid user at an online
store and this would be based on whether they have made any
purchases within the last 3 months or if they have been extended
store credit.
To better understand logical operators, look at the following
expressions:
print ((9 > 7) and (1 < 3)) # Both of the original expressions evaluate
to True
print ((7 == 7) or (4! = 4)) # One of the original expressions evaluates
to True
print (not (4 <= 2)) # The original expression evaluates to False
The output would be:
True
True
True
Let’s break this down:
In the first expression, print ((9 > 7) and (1 < 3)), both 9 > 7 and 1 <
3 had to evaluate as True because we used the and operator and
both of the statements are true
In the second expression, print ((7 == 7) or (4! = 4)), because 7 ==
7 evaluates to True, it doesn’t matter that (4! = 4) evaluated to False.
We used the or operator so only one of them had to evaluate True. If
the and operator had been used instead, this would have been
False.
In the third expression, print (not (4 <= 2)), the use of the not
operator means that the False value that the expression returns is
negated and the output is True.
Now let’s look at some expressions where floating points are used
instead of integers – we want to see False as the evaluated Boolean
value.
print ((-0.1 > 1.5) and (0.7 < 4.1)) # One of the original expressions
evaluates to False
print ((6.5 == 7.9) or (8.2! = 8.2)) # Both of the original expressions
evaluate to False
print (not (-4.7 <= 0.2)) # The original expression evaluates to True
In this example:
The and operator has to have one or more False expressions that
evaluate to False or both of the expressions has to evaluate as False
The inner expression of the not operator has to be True otherwise
the new expression cannot evaluate as False
Compound statements may also be written with the and, not and or
operators:
not ((-0.1 > 1.5) and ((0.7 < 4.1) or (0.2 == 0.2)))
Now, let's take a look at the inner expression – (o.7 < 4.1) or (0.2 ==
0.2). This will evaluate as True because both of the statements are
True in mathematical terms.
Then, we take the value that was returned as True and add it to the
following inner expression – (0.1 > 1.5) and (True). This will evaluate
as False because the first statement is False and a False and True
must always return False.
Lastly, the final expression – not (False) will evaluate as True so, if
we were to print all this out, the output would be:
True
Using Boolean Operators for Controlling Flow
To control how a program flows and what the outcome will be, we
use low control statements and these are made up of conditions and
clauses.
A condition will evaluate to True or False and that gives us the point
in the program where a decision has been made. In other words, the
condition will determine if something will evaluate to True or to False.
The clause is a code block and it comes after the condition. The
clause is responsible for determining what the program outcome is.
To clear it up, if you had a construction of “if y is True, then do this”,
the clause is the do part of it. The example below shows you the
control flow of a program through the comparison operators working
together with conditional statements.
if grade >= 70: # Condition
print ("Passing grade") # Clause
else:
print ("Failing grade")
The program is going to evaluate each grade from the students and
evaluate if they are a passing or a failing grade. If a student had a
grade of 75, the initial statement will evaluate as True and the
Passing grade print statement is triggered. If a student has a grade
of 69, the initial statement evaluates as False and the Failing grade
print statement will be executed.
Chapter 6: Data Types in Python
The basic operations that can be done in Python, we now move on
to a discussion of data types. Computer programming languages
have several different methods of storing and interacting with data,
and these different methods of representation are the data types
you’ll interact with. The primary data types within Python are
integers, floats, and strings. These data types are stored in Python
using different data structures: lists, tuples, and dictionaries. We’ll
get into data structures after we broach the topic of data types.

Integers
Integers in Python is not different from what you were taught in math
class: a whole number or a number that possess no decimal points
or fractions. Numbers like 4, 9, 39, -5, and 1215 are all integers.
Integers can be stored in variables just by using the assignment
operator, as we have seen before.

Floats
Floats are numbers that possess decimal parts. This makes
numbers like -2.049, 12.78, 15.1, and 1.01 floats. The method of
creating a float instance in Python is the same as declaring an
integer: just choose a name for the variable and then use the
assignment operator.

String
While we’ve mainly dealt with numbers so far, Python can also
interpret and manipulate text data. Text data is referred to as a
“string,” and you can think of it as the letters that are strung together
in a word or series of words. To create an instance of a string in
Python, you can use either double quotes or single quotes.
string_1 = "This is a string."
string_2 = ‘This is also a string.’
However, while either double or single quotes can be used, it is
recommended that you use double quotes when possible. This is
because there may be times you need to nest quotes within quotes,
and using the traditional format of single quotes within double quotes
is the encouraged standard.
Something to keep in mind when using strings is that numerical
characters surrounded by quotes are treated as a string and not as a
number.
# The 97 here is a string
Stringy = "97"
# Here it is a number
Numerical = 97

Type Casting in Python


The term “type casting” refers to the act of converting data from one
type to another type. As you program, you may often find out that
you need to convert data between types. There are three helpful
commands that Python has which allow the quick and easy
conversion between data types: int (), float () and str ().
All three of the above commands convert what is placed within the
parenthesis to the data type outside the parentheses. This means
that to convert a float into an integer, you would write the following:
int (float here)
Because integers are whole numbers, anything after the decimal
point in a float is dropped when it is converted into an integer. (Ex.
3.9324 becomes 3, 4.12 becomes 4.) Note that you cannot convert a
non-numerical string into an integer, so typing: int (“convert this”)
would throw an error.
The float () command can convert integers or certain strings into
floats. Providing either an integer or an integer in quotes (a string
representation of an integer) will convert the provided value into a
float. Both 5 and “5” become 5.0.
Finally, the str () function is responsible for the conversion of integers
and floats to strings. Plug any numerical value into the parenthesis
and get back a string representation of it.

List
Lists are just collections of data. When you think about a list in
regular life, you often think of a grocery list or to-do list. These lists
are just collections of items, and that’s precisely what lists in Python
are; collections of items. Lists are convenient because they offer
quick and easy storage and retrieval of items.
Let’s say we have a bunch of values that we need to access in our
program. We could declare separate variables for all those values, or
we could store them all in a single variable as a list. Declaring a list
is as simple as using brackets and separating objects in the list with
commas. So, if we wanted to declare a list of fruits, we could do that
by doing the following:
Fruits = ["apple", "pear", "orange", "banana"]
It’s also possible to declare an empty list by just using empty
brackets. You can add items to the list with a specific function, the
append function - append (). We can access the items in the list
individually by specifying the position of the item that we want.
Remember, Python is zero-based, and so to get the first item is 0 in
the list. How do we select the values from a list? We just declare a
variable that references that specific value and position:
Apple = fruits [ 0]

Tuple
Tuples are very similar to lists, but unlike lists, their contents cannot
be modified once they are created. The items that exist in the tuple
when created will exist for as long as the tuple exists. If it’s unclear
as to when tuples would be useful, they would be helpful whenever
you have a list of items that will never change. For example,
consider the days of the week. A list containing all the days of the
week won’t change. In practice, you are likely to use tuples far less
often than you will use lists, but it’s good to be aware of the
existence of tuples.
Functionally, tuples are declared and accessed very similarly to lists.
The major difference is that when a list is created, parentheses are
used instead of brackets.
This_is_a_tuple = ("these", "are", "values", "in", "a", "tuple")
The items can be accessed with brackets, just like a list.
Word = this_is_a_tuple [ 0]

Dictionary
Dictionaries hold data that can be retrieved with reference items, or
keys. Dictionaries can be confusing for first-time programmers but try
to imagine a bank filled with a number of safety deposit boxes. There
are rows and rows of these boxes, and the contents of each box can
only be accessed when the correct key is provided. Much like
opening a deposit box, the correct key must be provided to retrieve
the value within the dictionary. In other words, dictionaries contain
pairs of keys and the value that can be accessed with those keys.
When you declare a dictionary, you must provide both the data and
the key that will point to that data. These key-value pairs must be
unique. Evidently, it would be a problem if one key could open
multiple boxes, so keys in a dictionary cannot be repeated; you
cannot have two keys, both named “Key1”.
The syntax for creating a key in Python is curly braces containing the
key on the left side and the value on the right side, separated by a
colon. To demonstrate, here’s an example of a dictionary:
Dict_example = {"key1": 39}
If you want to create a dictionary with multiple items, all you need to
do is separate the items with commas.
Dict_example2 = {"key1": 39, "key2": 21, "key3": 54}
Dictionaries can also be declared by using the dict () method. You
could create the same dictionary as above by-passing keys and their
values using the assignment operator and still separating them with
commas.
Dict_example3 = duct (key1 = 39, key2 = 21, key3 = 54)
Note that this method uses parentheses instead of curly braces and
doesn’t use quotes.
To access items within the dictionary, you need to supply the
appropriate key. The syntax for this in Python is dictionary[‘key’], so
in order to get 39 from the dictionary above, you would use this
syntax:
number = Dict_example3["key1"]
Since the syntax above selects the value associated with the passed
key, you might be able to guess that we can overwrite the data by
selecting the value we want and using an assignment operator.
Dict_example3["key1"] = 99
Much like how it is possible to create an empty list with just an empty
pair of parentheses, we can also create an empty dictionary by using
empty brackets when we declare the dictionary.
Dict_example4 = {}
To add data to a dictionary, all we need to do is create a new
dictionary entry and assign a value to it.
Dict_example4["key1"] = 109
To drop values from the dictionary, we use the del command
followed by the dictionary and the key we want to drop.
del Dict_example4["key1"]
Chapter 7: Making your program Interactive
Input0
When writing your program or creating an application, you may
require the users to enter an input such as their username and other
details. Python provides the input () function that helps you get and
process input from users. Other than entering input, you may require
the users to perform an action so that they may go to the next step.
For example, you may need them to press the enter key on the
keyboard to be taken to the next step.
Example:
input ("\n\n Press Enter key to Leave.")
Just type the above statement on the interactive Python interpreter
then hit the Enter key on the keyboard. You will be prompted to
press the Enter key:
The program waits for an action from the user to proceed to the next
step. Notice the use of \n\n which is characters to create a new line.
To create one line, we use a single one, that is, \n. In this case, two
blank lines will be created. That is how Python input () function
works.

Print ()
Python comes with many in-built functions. A good example of such
a function is the “print ()” function which we use for displaying the
contents on the screen. Despite this, it is possible for us to create
our own functions in Python. Such functions are referred to as the
“user-defined functions”.
#!/usr/bin/python3
def functionExample():
print ('The function code to run')
bz = 10 + 23
print(bz)
Triple Quotes
Before we move into triple quotes, keep in mind that you can also
create a string like 'this'.
>>> 'this''this'
We can create the string, but we really don't have to do anything with
it. The interpreter will tell you what it is. Next enter the following into
the interpreter. (Enter 3 single quotes before and after). Warning a
double quote and single quote together will give a different error.
>>> '''line 1... line 2''''line 1\nline 2'
Notice the (\n) fora new line. Try it with print in front of it and it will put
the string in two separate lines.
>>> print ('''line 1 ... line 2''') line 1 line 2
If you enter '\n' it will display the following string on a new line..
>>> Print('I\ngo')I go
The results will be the same with '>>>'. Now a raw string with some
slight changes. Try it at the interpreter.
>>> r'string''string' >>> r'string\string2''string\\string2' >>>
r"string\string2" 'string\\string2' >>> print("string\string2")string\string2
>>> r"""string\string2""" 'string\\string2' >>> print(r"""string\string2""")
string\string2
The last example is to show that you can test your expressions as
your scripts progress towards complexity. The string examples
above will become more clear as you progress.

Escape characters
Have a look at the following code:

print(“\tHi there”)
output:
Hi there #tabbed to the right

So what is with the \t ?


The backslash ( \ ) character is used to escape characters that are
required to be interpreted differently by Python. Sounds a bit of a
mouthful right!
Have another look at the output in the example above. Notice how
the text (Hi there) is tabbed to the right. Inserting the escape
character \t at the beginning of the string results in the string being
tabbed to the right.
Adding two escape characters \t would result in the string being
tabbed to the right twice:

print ( "\t\tHi there")


output:
Hi there # tabbed to the right twice

\n is another popular escape character. \n adds a new line to the


string.

print( "I’m going for a walk\nin the park\nbecause it is a


lovely\n\tday")
output:
I’m going for a walk
in the park #adds new lines
because it is a lovely day
day #adds new line & a tab to the right

There is no space between the text and the escape character \thiya
Here are some of the most regularly used escape characters in
Python.
Escape Description
character New Line
\n Horizontal
\t Tab
\\ Backslash
\’ Single quote
\” Double
Quote
Let’s have a look at the some more escape characters.
The following sentences would result in an error when printed:

print ( "I said "hello mate" and he totally ignored me")


print (‘He said he’d be there at 2pm’)

In the first example Python thinks that the inverted comma before
hello is the end of the string. The third inverted comma would cause
the program to crash. Likewise, in the second sentence Python
would take the back quote on the word he’d to mean the end of the
string and throw an error when it encounters the third comma. One
solution is to use single quotes when you intend using inverted
commas in the string:
print ( ‘I said “hello mate” and he totally ignored me’)

And use double quote if you intend using a lot of back quotes such
as he’d, there’s etc. in your string:

print ( " He said he’d be there at 2pm but there’s no sign of him")

Alternatively, you can use the escape character \

print ( "I said \"hello mate\" and he totally ignored me")


print ('He said he\'d be there at 2pm but there\'s no sign of him')
output:
I said "hello mate" and he totally ignored me
He said he'd be there at 2pm but there's no sign of him

Now what if you want just to print a backslash \ in Python? Yes, you
also must escape it.

print ("\") #will result in an error


print ("\\") #use a backslash to escape a backslash
output:
\
So how does this work in real life? Have a look at a snippet from a
sample food menu:
“Available drinks include tea\coffee\water”
To print this in we need to include the escape character \

print ("Available drinks include tea\\coffee\\water")


output:
Available drinks include tea\coffee\water
Chapter 8: Making Choices and Decisions
Conditions Statements
In some of the code samples we have used until now, you may
notice that the codes follow a pattern of execution that is religiously
adhered to. One prominent pattern is the top to bottom order of
execution each of these codes uses. Bringing that to light now, did
you know it is possible to alter this order of execution? Say, for
instance, you want the program to make decisions on its own,
performing different actions depending on the situation that comes
up. Like printing, “Good Morning” or “Good Night” depending on the
time of day.
This is a possible feat in Python and can be achieved with the use of
control flow statements. These statements are three in number in
Python, namely while, if, and for statements. Let’s discuss each one
of them briefly:

If Statements
The if statement serves as a means of taking control of how a
statement that follows it is executed — in this case, a block of code
or single statement contained in braces. The if statement evaluates
the expression contained in parentheses. Should the expression
result in a value considered to be true, the execution process is
initiated? If not, the whole statement is abandoned. Doing this allows
your PHP script to make decisions on its own based on a range of
factors selected.
Syntax:
if ( expression ) {
// code to run if the expression outputs as true
}
Sample:
The following code would display x is greater than y if $x is greater
than $y:
<?php
$x=5;
$y=2;
if ($x > $y)
echo "x is bigger than y";
?>

Inline If
This statement is used alongside the if…else statement during the
execution of a series of codes, should one of a variety of conditions
be true. As the name connotes, the elseif statement is a mixture of
both if and else statements. As with the else statement, the elseif
statement extends the if statement to run another statement in the
event that the main if the expression is evaluated as FALSE. Albeit,
contrary to the else statement, the else if statement runs the
alternative expression only when the assigned conditional
expression is evaluated to be TRUE. So, put simply, whenever you
wish to run a set of code when one of many different conditions
evaluate to true, the else if statement should be used.
Syntax
if (condition)
code to be run if the condition evaluates to true;
elseif (condition)
code to be run if the condition evaluates to true;
else
code to be run if the condition evaluates to false;
Sample:
The sample shown below produces “Good morning. Rise and shine!”
if the period of the day is Morning, and “Good night! Sleep well.”
when it is night. Otherwise, it produces “Have a great day!”
<html>
<body>
<?php
$t = time("T");
if ($t == "Morn")
echo "Good morning. Rise and shine!";
elseif ($t == "Ngt")
echo "Good night! Sleep well.";
else
echo "Have a great day!";
?>
</body>
</html>
When executed, the result shown below will be outputted:
Good morning. Rise and shine!

While Loop
This type of loop runs a specific block of code for as long as the
given condition remains true. Once the given condition is no longer
valid, or turns to false, the block of code will end right away.
This is quite a useful feature as there may be codes that you may
need to rely on to process information quickly. To give you an idea,
suppose, you are to guess a number. You have three tries. You want
the prompt to ask the user to guess the number. Once the user
guesses the wrong number, it will reduce the maximum number of
tries from three to two, inform the user that the number is wrong and
then ask to guess another time. This will continue until either the
user guesses the right number or the set number of guesses are
utilized, and the user fails to identify the number.
Imagine just how many times you would have to write the code over
and over again. Now, thanks to Python, we just type it once
underneath the ‘while’ loop, and the rest is made for us.
Here’s how the syntax for the ‘while’ loop looks like:
while condition:
code
code

You begin by typing in the word ‘while’ followed by the condition. We
then add a colon, just like we did for the ‘if’ statement. This means,
whatever will follow afterward, it will be indented to show that the
same is working underneath the loop or the statement.
Let us create a simple example from this. We start by creating a
variable. Let’s give this variable a name and a value like so:
x=0

For Loop
In Python, the for…in statement is a looping statement that allows
users to iterate over a sequence of objects. That is, it is used to go
through every item that makes up a sequence. Take note that a
sequence refers to an ordered set of items. Let’s consider the same
code sample used for the if statement. This time, though, save the
file by the name “for. py”:
for x in range(1, 7):
print(x)
else:
print(‘The for loop is complete)
Output:
$ python for. py
1
2
3
4
5
6
The for loop is completed
How the for statement Works:
In the code sample used above, we attempt to print out a sequence
of numbers. This sequence of numbers is generated with the help of
a built-in “range” function. What we do at this point is to enter two
numbers into the program, and the “range” function returns a
sequence of numbers beginning from the initial number up to the
second one. For instance, range (1,7) produces the sequence (1, 2,
3, 4, 5, 6). In a default state, range assumes a step count of 1. If we
add a third number into the range, then it automatically takes the
place of the default step count. Take, for instance, range (1,7,2)
produces the sequence [1,3,5]. Take note that the range reaches up
to the second number, but does not include the second number
itself. So, the second number serves as a boundary the range never
reaches or exceeds. Keep in mind that the range() function only
generates one number per time. So, if you need a full set of numbers
at any point, use the list() on the range() function. For instance:
list(range(7)) will result in the sequence [0, 1, 2, 3, 4, 5, 6].
Moving on, the for loop steps in and begins iteration over the range
— for x in range(1,7) is the same as for x in [1, 2, 3, 4, 5, 6]. This
case is also similar to assigning each object or number in the
sequence to x, one per time, and then running the clock of code for
every value of x. At this point, we go straight to printing the values
within the block of code. Recall that the else of the code remains
optional. So, when it is introduced, it is only ever executed after the
for loop has been entirely executed, or until a break statement is
used. Also, recall that for in loops work on all sequences. At this
point, there is a sequence of numbers produced from executing the
range function. However, it is possible to use still any other
sequence containing any type of object.

Break
The break statement in Python is applied as a breakout strategy
from a loop statement. That is, it is used to stop the running of a loop
statement, even when the condition for looping remains True, and
the sequence of objects has not undergone complete iteration. A
point worth noting is that when you apply the break statement to a
while or for loop, any other alternative loop, such as the else or elif
block, remains unexecuted.
Let’s consider the same code sample used for the if statement. Save
the file by the name “break. py”:
while True:
m = input('Enter something : ')
if m == 'quit':
break
print('Length of the string is', len(m))
print('Completed')
When the code is executed, the result is as follows:
$ python break. py
Enter something: Python is easy to learn
Length of the string is 23
Enter something: When my work is over
Length of the string is 20
Enter something: You could make your work fun:
Length of the string is 29
Enter something: Hello, World!
Length of the string is 13
Enter something: quit
Completed

Continue
In Python, the continue statement is used to inform the program to
skip the remainder of the statements yet unexecuted in the present
loop block and continue to the following loop iteration. Let’s consider
a sample code of the continue statement in use. Save the file as
continue. py.
while True:
j = input(‘Write something : ')
if j == 'quit':
break
if lensj) <5:
print(‘Entry is too small')
continue
print('Entry is of sufficient length')
# Process other type of things here...
When the code sample above is executed, the result is as follows:
$ python continue. py
Enter something: x
Entry is too small
Enter something: 515
Entry is too small
Write something: vwxyz
Entry is of sufficient length
Write something: quit

Try & Except


The try except blocks was used to manage the error. However, you
or your user can still do something to screw your solution up. For
example:
>>> def div(dividend, divisor):
try:
print(dividend / divisor)
except:
print("Cannot Divide by Zero.")
>>> div(5, "a")
Cannot Divide by Zero.
>>> _
The statement prepared for the “except” block is not enough to justify
the error that was created by the input. Dividing a number by a string
does not warrant a “Cannot Divide by Zero.” message.
For this to work, you need to know more about how to use except
block properly. First of all, you can specify the error that it will
capture and respond to by indicating the exact exception. For
example:
>>> def div(dividend, divisor):
try:
print(dividend / divisor)
except ZeroDivisionError:
print("Cannot Divide by Zero.")
>>> div(5, 0)
Cannot Divide by Zero.
>>> div(5, "a")
Traceback (most recent call last):
File "<stdin>", line 1, <module>
File "<stdin>", line 3, in div
TypeError: unsupported operand type(s) for /: 'int' and 'str'
>>> _
Chapter 9: Functions and Models
Functions of the regression analysis
Trend Forecasting
Determine the strength of predictors
Predict an effect
Breaking down regression
There are two basic states of regression-linear and multiple
regression. Although there are different methods for complex data
and analysis. Linear regression contains an independent variable to
help forecast the outcome of a dependent variable. On the other
hand, multiple regression has two or more independent variables to
assist in predicting a result.
Regression is very useful to financial and investment institutions
because it is used to predict the sales of a particular product or
company based on the sales and GDP growth among many other
factors. The capital pricing model is one of the most common
regression models applied in the finance. The example below
describes formulae used in the linear and multiple regression.

Choosing the best regression model


Selecting the right linear regression model can be very hard and
confusing. Trying to model it with a sample data cannot make it
easier. These are some of the most popular statistical methods
which one can use to choose models, challenges that you might
come across, and lists some practical advice to use to select the
correct regression model.
It always begins with a researcher who would like to expand the
relationship between the response variable and predictors. The
research team that is accorded with the responsibility to perform
investigation essentially measures a lot of variables but only has a
few in the model. The analysts will make efforts to reduce the
variables that are different and apply the ones which have an
accurate relationship. As time moves on, the analysts continue to
add more models.
Statistical methods to use to find the best regression model
If you want a great model in regression, then it is important to take
into consideration the type of variables which you want to test as well
as other variables which can affect the response.
Modified R-squared and Predicted R-squared.
Your model should have a higher modified and predicted R-squared
values. The statistics are shown below help eliminate critical issues
which revolve around R-squared.
• The adjusted R squared increases once a new term improves the
model.
• Predicted R-squared belongs to the cross-validation that helps
define how your model can generalize remaining data sets.
P-values for the Predictors
When it comes to regression, a low value of P denotes statistically
significant terms. The term “Reducing the model” refers to the
process of factoring in all candidate predictors contained in a model.
Stepwise regression
This is an automated technique which can select important
predictors found in the exploratory stages of creating a model.
Real World Challenges
There are different statistical approaches for choosing the best
model. However, complications still exist.
• The best model happens when the variables are measured by the
study.
• The sample data could be unusual because of the type of data
collection method. A false positive and false negative process
happens when you handle samples.
• If you deal with enough models, you’ll get variables that are
significant but only correlated by chance.
• P-values can be different depending on the specific terms found in
the model.
• Studies have discovered that the best subset regression and
stepwise regression can’t select the correct model.
Finding the correct Regression Model
Theory
Perform research done by other experts and reference it into your
model. It is important that before you start regression analysis, you
should develop ideas about the most significant variables.
Developing something based on outcome from other people eases
the process of collecting data.
Complexity
You may think that complex problems need a complex model. Well,
that is not the case because studies show that even a simple model
can provide an accurate prediction. Once there is a model with the
same explanatory potential, the simplest model is likely to be a
perfect choice. You just need to start with a simple model as you
slowly advance the complexity of the model.
How to calculate the accuracy of the predictive model
There are different ways in which you can compute the accuracy of
your model. Some of these methods include:
You divide the dataset into a test and training data set. Afterward,
build the model based on the training set and apply the test set as a
holdout sample to measure your trained model with the test data.
The following thing to do is to compare the predicted values using
actual values by computing the error by using measures like the
“Mean Absolute Percent Error” (MAPE). If your MAPE is less than
10%, then you have a great model.
2. Another method is to calculate the “Confusion Matrix” to the
computer False Positive Rate and False Negative Rate. These
measures will allow a person to choose whether to accept the model
or not. If you consider the cost of the errors, it becomes a critical
stage of your decision whether to reject or accept the model.
3. Computing Receiver Operating Characteristic Curve (ROC) or the
Lift Chart or Area under the curve (AUC) are other methods that you
can use to decide on whether to reject or accept a model.
Chapter 10: How to Work with Files
The succeeding thing that we need to focus on when it comes to
working with Python is making sure we know how to work and
handle files. It may happen that you are working with some data and
you want to store them while ensuring that they are accessible for
you to pull up and use when they are needed. You do have some
choices in the way that you save the data, how they are going to be
found, and how they are going to react in your code.
When you work with the files, you will find that the data is going to be
saved on a disk, or you can re-use in the code over and over again
as much as you would like. This is going to help us learn a bit more
about how to handle some of the work that we need to do to ensure
the files behave the way that they should, and so much more.
Now, we are going to enter into file mode on the Python language,
and this allows you to do a few different options along the way. A
good way to think about this is that you can think about it like
working on a file in Word. At some point, you may try to save one of
the documents that you are working with so that it doesn’t get lost
and you can find them. These kinds of files in Python are going to be
similar. But you won’t be saving pages as you did on Word, you are
going to save parts of your code.
You will find with this one that there are a few operations or methods
that you can choose when it comes to working with files. And some
of these options will include:
Closing up a file you are working on.
Creating a brand new file to work on.
Seeking out or moving a file that you have over to a new location to
make it easier to find.
Writing out a new part of the code on a file that was created earlier.
Creating new files
The first task that we are going to look at doing here is working on
creating a file. It is hard to do much of the other tasks if we don’t first
have a file in place to help us out. if you would like to be able to
make a new file and then add in some code into it, you first need to
make sure the file is opened up inside of your IDLE. Then you can
choose the mode that you would like to use when you write out your
code.
When it comes to creating files on Python, you will find there are
three modes that you can work with. The three main modes that we
are going to focus on here include append (a), mode(x) and write(w).
Any time that you would like to open up a file and make some
changes in it, then you would want to use the write mode. This is the
easiest out of the three to work with. The write method is going to
make it easier for you to get the right parts of the code set up and
working for you in the end.
The write function is going to be easy to use and will ensure that you
can make any additions and changes that you would like to the file.
You can add in the new information that you would like to the file,
change what is there, and so much more. If you would like to see
what you can do with this part of the code with the write method,
then you will want to open up your compiler and do the following
code:
#file handling operations
#writing to a new file hello. txt
f = open(‘hello. txt’, ‘w’, encoding = ‘utf-8’)
f.write(“Hello Python Developers!”)
f.write(“Welcome to Python World”)
f.flush()
f.close()
From here, we need to discuss what you can do with the directories
that we are working with. The default directory is always going to be
the current directory. You can go through and switch up the directory
where the code information is stored. Still, you have to take the time,
in the beginning, to change that information up, or it isn’t going to
end up in the directory that you would like.
Whatever directory you spent your time in when working on the code
is the one you need to make your way back to when you want to find
the file. If you would like it to show up in a different directory, make
sure that you move over to that one before you save it and the code.
With the option that we wrote above, when you go to the current
directory (or the directory that you chose for this endeavor, then you
will be able to open up the file and see the message that you wrote
out there.
For this one, we wrote a simple part of the code. You, of course, will
be writing out codes that are much more complicated as we go
along. And with those codes, there are going to be times when you
would like to edit or overwrite some of what is in that file. This is
possible to do with Python, and it just needs a small change to the
syntax that you are writing out. A good example of what you can do
with this one includes:
#file handling operations
#writing to a new file hello. txt
f = open(‘hello. txt’, ‘w’, encoding = ‘utf-8’)
f.write(“Hello Python Developers!”)
f.write(“Welcome to Python World”)
mylist = [“Apple”, “Orange”, “Banana”]
#writelines() is used to write multiple lines into the file
f.write(mylist)
f.flush()
f.close()
The example above is a good one to use when you want to make a
few changes to a file that you worked on before because you just
need to add in one new line. This example wouldn’t need to use that
third line because it just has some simple words, but you can add in
anything that you want to the program, just use the syntax above
and change it up for what you need.
What are the binary files?
One other thing that we need to focus on for a moment before
moving on is the idea of writing out some of your files and your data
in the code as a binary file. This may sound a bit confusing, but it is a
simple thing that Python will allow you to do. All that you need to do
to make this happen is to take the data that you have and change it
over to a sound or image file, rather than having it as a text file.
With Python, you can change any of the code that you want into a
binary file. It doesn’t matter what kind of file it was in the past. But
you do need to make sure that you work on the data in the right way
to ensure that it is easier to expose in the way that you want. The
syntax that is going to be needed to ensure that this will work well for
you will be below:
# write binary data to a file
# writing the file hello. that write binary mode
F = open(‘hello. dat’, ‘wb’)
# writing as byte strings
f.write(“I am writing data in binary file!/n”)
f.write(“Let’s write another list/n”)
f.close()
If you take the time to use this code in your files, it is going to help
you to make the binary file that you would like. Some programmers
find that they like using this method because it helps them to get
things in order and will make it easier to pull the information up when
you need it.
Opening your file up
So far, we have worked with writing a new file and getting it saved,
and working with a binary file as well. In these examples, we got
some of the basics of working with files down so that you can make
them work for you and you can pull them up any time that you would
like.
Now that this part is done, it is time to learn how to open up the file
and use it, and even make changes to it, any time that you would
like. Once you open that file up, it is going to be so much easier to
use it again and again as much as you would like. When you are
ready to see the steps that are needed to open up a file and use it,
you will need the following syntax.
# read binary data to a file
#writing the file hello. that write append binary mode
with open(“hello. dat”, ‘rb’) as f:
data = f.read()
text = data. decode(‘utf-8’)
print(text)
The output that you would get form putting this into the system would
be like the following:
Hello, world!
This is a demo using with
This file contains three lines
Hello world
This is a demo using with
This file contains three lines.
Seeking out a file you need
And finally, we need to take a look at how you can seek out some of
the files that you need on this kind of coding language. We already
looked at how to make the files, how to store them in different
manners, how to open them and rewrite on them, and then how to
seek the file. But there are times where you can move one of the
files that you have over to a new location.
For example, if you are working on a file and as you do that, you find
that things are not showing up the way that you would like it to, then
it is time to fix this up. Maybe you didn’t spell the time of the identifier
the right way, or the directory is not where you want it to be, then the
seek option may be the best way to actually find this lost file and
then make the changes, so it is easier to find.
With this method, you are going to be able to change up where you
place the file, to ensure that it is going to be in the right spot all of the
time or even to make it a bit easier for you to find it when you need.
You just need to use a syntax like what is above to help you make
these changes.
Working through all of the different methods that we have talked
about are going to help you to do a lot of different things inside of
your code. Whether you would like to make a new file, you want to
change up the code, move the file around, and more; you will be
able to do it all using the codes that we have gone through.
Chapter 11: Object Oriented Programming
Python classes make Python an object-oriented language. It is one
of the most effective approaches to writing software. You write
classes to depict real-life objects in programs. A class allows you to
define the general behavior of a real-life object. The class is
equipped with the attributes of that object. You can add more traits
along the way. Any real-life object can be modeled on classes. There
is a feature known as instantiation in which you have to work with
instances. You will write certain classes that tend to extend the
functionality of existing classes.
Object-orienting programming allows you to create different objects.
You will able to see the world as a programmer does. You can be a
creator of things that exist in your imagination. You will think logically
and write programs that allow you to complete your tasks effectively
and efficiently. Classes make life easier for you as you move on to
complex tasks in your programming life.
But many of the newer programming languages have been changed
to become object oriented. These are easier to deal with and can be
used in a variety of different ways. Python is one of these object-
oriented programming languages, and you will be able to look at the
objects and determine what they are attached to. So, if you have a
ball inside of the program or the code, it should match up to the ball
that you would find in real life. This helps to keep things in order and
even a beginner will be able to recognize how the objects work
inside of the code.
With that being said, you will also need to look for some of the
attributes that are in your code. The attributes are what is going to
determine the object. An excellent way to think about this is to pick
out an object, such as a box. The attributes would be the things that
you would use to describe the object. So, in this case, this is going to
be brown, big, sturdy, square, and so on. These should all make
sense to others who would look at the box and want to describe it.
For example, you would not want to add in bouncing or flying to the
box, because these are not attributes that are usually given to the
box.
These classes are also going to help you to organize some of the
objects that you are making. If there are a few objects that you are
using, you will be able to put them all inside the same class so that
you are able to find them later on. You are able to make the class be
composed of anything that you would like, but it is often better to
make the items inside of the class be similar, so that they make more
sense and it keeps the code easy to work with. You may have to
think this through a little before you get started, but you should be
able to organize the objects that you are using into the right classes
to help the interpreter do the work the way that you would like.
Creating a Python Class
Python classes can be used to model anything such as a bird, an
animal, or a human being as well. In the following example, I will be
writing a superman class. I will attribute to it certain behaviors that
you might have seen in a superman cartoon or movie. You can study
the code, understand, and then create your own object on its basis.
Before I create a superman class, I will create an eagle class to
make things easier for you. Let’s jump on to Python editor. Each
instance that is created from an Eagle class will have a name and its
age. After that, I will attribute certain behaviors to the Eagle class,
such as flying and attacking the prey.
class Eagle():
"""This is a simple try to model an Eagle."""
def __init__(self, ename, eage):
"""Time to kick name and age attributes."""
self.ename = ename
self.eage = eage
def fly(self):
"""This will simulate the eagle flying to a command."""
print(self.ename.title() + " is now flying high in the air.")
def attack(self):
"""This will simulate it to attack a prey in response to a command."""
print(self.ename.title() + " is attacking a rabbit!")
my_eagle = Eagle('Gamon', 5)
print("My eagle's name is " + my_eagle.ename.title() + ".")
print("The eagle is " + str(my_eagle.eage) + " years old.")
============= RESTART: C:/Users/saifia
computers/Desktop/Python.py =============
My eagle's name is Gamon.
The eagle is 5 years old.
>>>
This is an Eagle class. The last two lines of the code are the
instance that I have created. This is a kind instruction for the class
on which it will act. You can add as many instances in the Python
class as you want to.
class Eagle():
"""This is a simple try to model an Eagle."""
def __init__(self, ename, eage):
"""Time to kick name and age attributes."""
self.ename = ename
self.eage = eage
def fly(self):
"""This will simulate the eagle flying to a command."""
print(self.ename.title() + " is now flying high in the air.")
def attack(self):
"""This will simulate it to attack a prey in response to a command."""
print(self.ename.title() + " is attacking a rabbit!")
my_eagle = Eagle('Gamon', 5)
print("My eagle's name is " + my_eagle.ename.title() + ".")
print("The eagle is " + str(my_eagle.eage) + " years old.")
my_eagle1 = Eagle('Timmy', 4)
print("My eagle's name is " + my_eagle1.ename.title() + ".")
print("The eagle is " + str(my_eagle1.eage) + " years old.")
my_eagle2 = Eagle('Flyer', 5)
print("My eagle's name is " + my_eagle2.ename.title() + ".")
print("The eagle is " + str(my_eagle2.eage) + " years old.")
============= RESTART: C:/Users/saifia
computers/Desktop/Python.py =============
My eagle's name is Gamon.
The eagle is 5 years old.
My eagle's name is Timmy.
The eagle is 4 years old.
My eagle's name is Flyer.
The eagle is 5 years old.
>>>
I have added three instances this time. That’s how you can add as
many instances to a Python class as you want to. I have told Python
to create three eagles who have different names and different age
groups. When Python reads this line, it calls on to the __init__ ()
method to create an object. The __init__() method is known as a
particular method in Python classes that Python needs to run any
new instance that you create for a class. There are two leading and
two trailing underscores in the script of the method. I have allocated
three attributes to the __init__() method. I will add one or two more in
the following example.
In the instance that I have created, I just passed the name and age
of the bird that was then applied by the __init__ method. I have also
added two more methods; one to make the eagle fly and the other to
attack prey. You can add as many methods as you like. In the
following example, I will make the program a bit more complex. Let’s
see how it is done.
class Eagle():
"""This is a simple try to model an Eagle."""
def __init__(self, ename, eage, ecolor):
"""Time to kick name and age attributes."""
self.ename = ename
self.eage = eage
self.ecolor = ecolor
def fly(self):
"""This will simulate the eagle flying to a command."""
print(self.ename.title() + " is now flying high in the air.")
def attack(self):
"""This will simulate it to attack a prey in response to a command."""
print(self.ename.title() + " is attacking a rabbit!")
def rest(self):
"""This will simulate it to rest in response to a command."""
print(self.ename.title() + " is resting in the nest!")
my_eagle = Eagle('Gamon', 5, 'black')
print("My eagle's name is " + my_eagle.ename.title() + ".")
print("The eagle is " + str(my_eagle.eage) + " years old.")
my_eagle1 = Eagle('Timmy', 4, 'blue')
print("My eagle's name is " + my_eagle1.ename.title() + ".")
print("The eagle is " + str(my_eagle1.eage) + " years old.")
my_eagle2 = Eagle('Flyer', 5, 'grey')
print("My eagle's name is " + my_eagle2.ename.title() + ".")
print("The eagle is " + str(my_eagle2.eage) + " years old.")
============= RESTART: C:/Users/saifia
computers/Desktop/Python.py =============
My eagle's name is Gamon.
The eagle is 5 years old.
My eagle's name is Timmy.
The eagle is 4 years old.
My eagle's name is Flyer.
The eagle is 5 years old.
>>>
I have added more arguments to the function to make the class more
interactive. Now it is time to call all the functions to make the eagle
do what I made it do. I will make the eagle fly, attack, eat, and rest in
the nest. It is really amazing to see it do things that you want it to do.
class Eagle():
"""This is a simple try to model an Eagle."""
def __init__(self, ename, eage, ecolor):
"""Time to kick name and age attributes."""
self.ename = ename
self.eage = eage
self.ecolor = ecolor
def fly(self):
"""This will simulate the eagle flying to a command."""
print(self.ename.title() + " is now flying high in the air.")
def attack(self):
"""This will simulate it to attack a prey in response to a command."""
print(self.ename.title() + " is attacking a rabbit!")
def eat(self):
"""This will simulate it to eat in response to a command."""
print(self.ename.title() + " is eating the rabbit!")
def rest(self):
"""This will simulate it to rest in response to a command."""
print(self.ename.title() + " is resting in the nest!")
my_eagle = Eagle('Gamon', 5, 'black')
print("My eagle's name is " + my_eagle.ename.title() + ".")
print("The eagle is " + str(my_eagle.eage) + " years old.")
my_eagle.fly()
my_eagle.attack()
my_eagle.eat()
my_eagle.rest()
my_eagle1 = Eagle('Timmy', 4, 'blue')
print("My eagle's name is " + my_eagle1.ename.title() + ".")
print("The eagle is " + str(my_eagle1.eage) + " years old.")
my_eagle1.fly()
my_eagle1.attack()
my_eagle1.eat()
my_eagle1.rest()
my_eagle2 = Eagle('Flyer', 5, 'grey')
print("My eagle's name is " + my_eagle2.ename.title() + ".")
print("The eagle is " + str(my_eagle2.eage) + " years old.")
my_eagle2.fly()
my_eagle2.attack()
my_eagle2.eat()
my_eagle2.rest()
============= RESTART: C:/Users/saifia
computers/Desktop/Python.py =============
My eagle's name is Gamon.
The eagle is 5 years old.
Gamon is now flying high in the air.
Gamon is attacking a rabbit!
Gamon is eating the rabbit!
Gamon is resting in the nest!
My eagle's name is Timmy.
The eagle is 4 years old.
Timmy is now flying high in the air.
Timmy is attacking a rabbit!
Timmy is eating the rabbit!
Timmy is resting in the nest!
My eagle's name is Flyer.
The eagle is 5 years old.
Flyer is now flying high in the air.
Flyer is attacking a rabbit!
Flyer is eating the rabbit!
Flyer is resting in the nest!
>>>
When you want to call a method, you just need to give the name of
the instance that is, in my case, my_eagle, my_eagle1, and
my_eagle2. You can give any names to the attributes and methods,
but if they are descriptive, it will help you read through the code
quickly and identify what is missing if you are receiving error
messages.
Chapter 12: Math and binary
Whether you’re using a simple command prompt, the Jupyter
development environment (check it out, it’s excellent!), or your
favorite Python file editor, you can import the math module by typing
“import math”. This loads the module and makes a number of
mathematical functions available to you. On a related note, other
modules can be imported the same way, so if you’re interested in
how you can extend Python in the future, search for the list of Python
modules. It is quite extensive and diversified, and many of them are
used in fields like data science and machine learning.
One of the functions you’ll have access to is the square root function
we mentioned earlier. This isn’t part of basic Python, but after
importing the module, you can now try the following example:
import math
print (sqrt (36))
Oops! You get an error. But why? We imported the math module,
didn’t we? Yes, we did, but the problem is we still wrote the function
the same way we would declare normal, built-in functions. When
using module functions, we need also to include the name of the
module, followed by a dot and then add the function. Here’s how:
import math
x = 36
print (math. sqrt (36))
6.0
Now we have the result of the function, which is 6.0.
The math module we imported adds a great deal of functions,
including trigonometric, logarithmic, and hyperbolic functions, as well
as constants such pi. We’re not going to explore all of them because
learning mathematics is beyond the scope, mainly because only
specific fields require advanced notions beyond the standard
operations you learned. But if you’re interested nonetheless and you
have some math knowledge, you can play around with the following
functions:
math. cos (x): This will return the cosine of x radians.
math. degrees(x): This function translates the angle of x from radians
to degrees.
math.e: The value of e is a constant equal to 2.7182…and it doesn’t
require parentheses. The same goes for pi, which is another
constant. These are exceptions to the rule.
math. log (x, y): Return the natural logarithm of x to base y.
math. factorial (x): This will return the factorial value of x.
There are a lot more functions included in the math module. If you
plan to pursue a path in data science or machine learning, you
should look them up and brush up on your skills in mathematics.
Binary and Text

Files can be classified into two distinct categories:


Binary Files: These files are used to store computer data in the form
of bytes. This is the computer’s native language, so whatever you
see in these files is unreadable to your eyes. Well, actually, you can
learn binary and find out what every combination of 0 and 1 means,
but realistically you don’t want to go through that. On a side note, if
you open this kind of file using a text editor, you’ll see a bunch of
unreadable gibberish. For instance, you can open an image file
inside a text editor and you’ll see text, just not in characters you’re
familiar with. The text won’t mean anything to you, but you can read
it. Just don’t save and overwrite that file in text or you can cause
some issues.
Binary files include the following examples: Executable files (.exe,
.bin), images (jpg, gif), pdf documents, compressed zip files, mp3
audio files, videos, and fonts.
Text Files: These files are readable because they contain characters.
So when you run them inside an editor, you’ll see the text characters
you’re used to. However, this doesn’t mean you’ll understand what
you’re reading because they might not be set in a particular
language.
Text files include the following examples: Simple text files like .txt
and .csv, source code files like your Python files, and data (json or
xml).
The file types we mention are the most common ones you’re
undoubtedly familiar with. There are other files that are split into
these two categories.
Before we get started with practical examples, take note that we’re
going to use Visual Studio Code as our coding editor instead of the
usual Jupyter notebook, Vim or the online Python console. You may
have heard of Visual Studio being used generally with other
languages like C#, but it also offers Python support. You don’t have
to use this editor, though. Any will do just fine. The reason why we’re
going to play around with it is because it has a handy Explorer bar to
show us the folder/directory we’re in.
Chapter 13: Exercises
In your first program you had a single statement that was printed
with the print function. Keep in mind that you can also print any
number of statements, even in the same line, even if they are
represented by several variables. This is done with one of the most
successful operations you will perform on strings called
concatenation. This concept is simple. All it involves is linking
multiple strings together. Here’s a simple example:
charRace = “human”
charGender = “male”
print (charRace, charGender)
The output will be “human male”.
As you can see, we have two variables and each one of them holds
a string. We can print both of them by separating the variables with
commas when writing the print statement. Keep in mind that there
are multiple ways you can do this. For instance, if you don’t want to
use variables but you need to concatenate the strings, you can get
rid of the commas inside the print statement. You will notice a little
problem, though. Here’s the example:
print (“school” “teacher”)
The result is “schoolteacher”. What happened? We didn’t leave any
whitespace. Take note that whitespace can be part of a string just as
numbers and punctuation marks. If you don’t leave a space, words
will be glued together. The solution is to simply add one blank space
before or after one of the strings, inside the quotes.
Subsequent, let’s see what happens if you try to combine the two
methods and concatenate a variable together with a simple string.
print (charRace “mage”)
This is what you will see:
File "<stdin>", line 1
print (characterGender “warrior”)
^ SyntaxError: invalid syntax
Congratulations, you got your first syntax error. What’s the problem
here? We tried to perform the concatenation without using any kind
of separator between the two different items.
Let’s take a look at one more method frequently used to concatenate
a set of strings. Type the following:
x = “orc”
y = “ mage”
x+y
As you can see you can apply a mathematical operator when
working with string variables. In this case, we add x to y and achieve
string concatenation. This is a simple method and works just fine,
however, while you should be aware of it, you shouldn’t be using it.
Mathematical operations require processing power. Therefore, you
are telling your Python program to use some of your computer juice
on an operation that could be written in such a way as not to
consume any resources. Whenever you work on a project, at least a
much more complex one, code optimization becomes one of your
priorities and that involves managing the system’s resource
requirement properly. Therefore, if you have to concatenate a large
number of string variables, use the other methods that don’t involve
any math.
We’re going to carry on with If and Then in a bit but before we do,
there’s one more thing to consider: comparing variables.
Sometimes it will be useful to look at one variable and then compare
that to another variable. For instance, we might want to compare a
string to a stored password if we’re asking someone to log in.
Alternatively, we might be trying to find out if someone is older or
younger than a certain age.
To do this, we have a few symbols and conventions. To ask if
something ‘equals’ something else, we will use the symbol ‘==’
(using ‘==’ compares two variables, whereas one ‘=’ forces them to
be the same). This is what will allow us to test certain conditions for
our IF, THEN statements. This way we can say ‘IF’ password is
correct, ‘THEN’ proceed.
For example:
Password = "guest"
Attempt = "guest"
if Attempt == Password:
print("Password Correct")
This essentially tests the imaginary password attempt against the
true password and only says ‘correct’ when the two strings are the
same. Notice that we aren’t actually using the word ‘next’ at any
point. In some programming languages (such as BASIC) you
actually do write ‘next’ but in most it is implicit. Anything that comes
after the colon is ensuing, which is just the same way that loops
work! Python is nice and consistent and it’s actually a very attractive
and simple language to look at when you code with it well…
(That’s right – programming languages can be attractive! In fact,
there is even such thing as ‘code poems’!)
We can also use an input to make this a bit more interactive!
Doing this is very easy:
Password = "guest"
Attempt = input("Please enter password: ")if Attempt == Password:
print("Password Correct")
Try entering the right password and you should be presented with
the correct message – congrats!
There’s just one problem at the moment, which is that our user will
still be able to get into the program if they get the program wrong!
And there is nothing to tell them that they answered incorrectly…
Fortunately, we can fix this with our following statement: ‘else’.
As you might already have guessed, ‘else’ simply tells us what to do
if the answer is not correct.
This means we can say:
Password = "guest"
Attempt = input("Please enter password: ")if Attempt == Password:
print("Password Correct")
else:
print("Password Incorrect!")
Note that the ‘else’ statement moves back to be in-line with the initial
‘if’ statement. Try entering wrong passwords on purpose now and
the new program will tell you you’ve made a mistake!
Okay, so far so good! But now we have another problem: even
though our user is entering the password wrong and being told as
much, they are still getting to see whatever code comes subsequent:
Password = "guest"
Attempt = input("Please enter password: ")if Attempt == Password:
print("Password Correct")
else:
print("Password Incorrect!")
print(“Secret information begins here…”)
Of course this somewhat negates the very purpose of having a
password in the first place!
So now we can use something else we learned earlier – the loop!
And better yet, we’re going to use while True, break and continue.
Told you they’d come in handy!
Password = "guest"
while True:
Attempt = input("Please enter password: ")
if Attempt == Password:
print("Password Correct")
break
else:
print("Password Incorrect!")
continue
print("Secret information begins here...")
Okay, this is starting to get a little more complex and use multiple
concepts at once, so let’s go through it!
Basically, we are now starting a loop that will continue until
interrupted. Each time that loop repeats itself, it starts by asking for
input and waits for the user to try the password. Once it has that
information, it tests the attempt to see if it is correct or not. If it is, it
breaks the loop and the program continues.
If it’s not? Then the loop refreshes and the user has another attempt
to enter their password!
We’ve actually gone on something of a tangent here but you may
recall that the title of this was ‘Comparing Variables’. What if we don’t
want to test whether two variables are the same? What if we want to
find out if one variable is bigger than another? We can ask if
something is ‘bigger’ using the symbol ‘>’ and ask whether it is
smaller using the ‘<’ symbol. This is easy to remember – just look at
the small end and the big end of the character!
Adding an equals sign will make this test inclusive. In other words
‘>=’ means ‘equal or bigger than’.
Likewise, we may also test if two strings are different. We do this like
so: ‘!=’ which basically means ‘not equal to’.
Using that last example, we can turn our password test on its head
and achieve the exact same end result:
Password = "guest"
while True:
Attempt = input("Please enter password: ")
if Attempt != Password:
print("Password Incorrect!")
continue
else:
print("Password Correct")
break
print("Secret information begins here...")
Of course when you get programming you’ll find much more useful
ways to use this symbol!
Let’s Make Our First Game!
We’ve talked an awful lot of theory at this point so perhaps it’s time
for us to make our first game! It’s not going to be that much fun,
seeing as you’ll know the answer – but you can get your friends to
play it to impress them with your coding know-how (unfortunately, it’s
still not all that fun even then!).
The game is simply going to get the player to guess the number it is
thinking of and will then give clues to help them get there if they get it
wrong.
CorrectNumber = 16
while True:
GuessedNumber = int(input("Guess the number I'm thinking of!"))
if GuessedNumber == CorrectNumber:
print("Correct!")
break
elif GuessedNumber < CorrectNumber:
print("Too low!")
continue
elif GuessedNumber > CorrectNumber:
print("Too high!")
continue
print("You WIN!!!")
Conclusion
Now that we have come to the end, I hope you have gathered a
basic understanding of what machine learning is and how you can
build a machine learning model in Python. One of the best ways to
begin building a machine learning model is to practice the code, and
also try to write similar code to solve other problems. It is important
to remember that the more you practice, the better you will get. The
best way to go about this is to begin working on simple problem
statements and solve them using the different algorithms. You can
also try to solve these problems by identifying newer ways to solve
the problem. Once you get a hang of the basic problems, you can try
using some advanced methods to solve those problems.
Thanks for reading to the end!
Python Machine Learning may be the answer that you are looking for
when it comes to all of these needs and more. It is a simple process
that can teach your machine how to learn on its own, similar to what
the human mind can do, but much faster and more efficient. It has
been a game-changer in many industries, and this guide tried to
show you the exact steps that you can take to make this happen.
There is just so much that a programmer can do when it comes to
using Machine Learning in their coding, and when you add it
together with the Python coding language, you can take it even
further, even as a beginner.
The succeeding step is to start putting some of the knowledge that in
this guide to good use. There are a lot of great things that you can
do when it comes to Machine Learning, and when we can combine it
with the Python language, there is nothing that we can’t do when it
comes to training our machine or our computer.
This guide took some time to explore a lot of the different things that
you can do when it comes to Python Machine Learning. We looked
at what Machine Learning is all about, how to work with it, and even
a crash course on using the Python language for the first time. Once
that was done, we moved right into combining the two of these to
work with a variety of Python libraries to get the work done.
You should always work towards exploring different functions and
features in Python, and also try to learn more about the different
libraries like SciPy, NumPy, PyRobotics, and Graphical User
Interface packages that you will be using to build different models.
Python is a high-level language which is both interpreter based and
object-oriented. This makes it easy for anybody to understand how
the language works. You can also extend the programs that you
build in Python onto other platforms. Most of the inbuilt libraries in
Python offer a variety of functions that make it easier to work with
large data sets.
You will now have gathered that machine learning is a complex
concept that can easily be understood. It is not a black box that has
undecipherable terms, incomprehensible graphs, or difficult
concepts. Machine learning is easy to understand, and I hope it has
helped you understand the basics of machine learning. You can now
begin working on programming and building models in Python.
Ensure that you diligently practice since that is the only way you can
improve your skills as a programmer.
If you have ever wanted to learn how to work with the Python coding
language, or you want to see what Machine Learning can do for you,
then this guide is the ultimate tool that you need! Take a chance to
read through it and see just how powerful Python Machine Learning
can be for you.
Python Data Science

THE COMPLETE GUIDE TO DATA


ANALYTICS + MACHINE LEARNING
+ BIG DATA SCIENCE + PANDAS
PYTHON. THE EASY WAY TO
PROGRAMMING (EXERCISES
INCLUDED).
Introduction
Data Science might be a relatively new multi-disciplinary field.
However, its integral parts have been individually studied by
mathematicians and IT professionals for decades. Some of these
core elements include machine learning, graph analysis, linear
algebra, computational linguistics, and much more. Because of this
seemingly wild combination of mathematics, data communication,
and software engineering, the domain of data science is highly
versatile. Keep in mind that not all data scientists are the same.
Each one of them specializes based on competency and area of
expertise. With that in mind, you might be asking yourself now what's
the most important or powerful, tool for anyone aiming to become a
data scientist.
This book will focus on the use of Python because this tool is highly
appreciated within the community of data scientists, and it's easy to
start with. This is a highly versatile programming language that is
used in a wide variety of technical fields, including software
development and production. It is powerful, easy to understand, and
can handle any kind of program, whether small or complex.
Python started out in 1991, and it has nothing to do with snakes. As
a fun fact, this programming language loved by both beginners and
professionals was named this way because its creator was a big fan
of Monty Python, a British comedy group. If you're also one of their
fans, you might notice several references to them inside the code, as
well as the language's documentation. But enough about trivia -
we're going to focus on Python due to its ability to develop quick
experimentations and deploy scientific applications. Here are some
of the other core features that explain why Python is the way to go
when learning data science:
Integration: Python can integrate many other tools and even
code written in other programming languages. It can act as a
unifying force that brings together algorithms, data strategies,
and languages.
Versatility: Are you a complete beginner who never learned
any kind of programming language, whether procedural or
object-oriented? No problem, Python is considered by many
to be the best tool for aspiring data scientists to grasp the
concepts of programming. You can start coding as soon as
you learn the basics!
Power: Python offers every tool you need for data analysis
and more. There is an increasing number of packages and
external tools that can be imported into Python to extend its
usability. The possibilities are truly endless, and that is one of
the reasons why this programming language is so popular in
diverse technical fields, including data science.
Cross-Platform Compatibility: Portability is not a problem, no
matter the platform. Programs and tools written in Python will
work on Windows, Mac, as well as Linux and its many
distributions.

Python is a Jack of all trades, master of everything. It easy to learn,


powerful, and easy to integrate with any other tools and languages,
and that is why this book will focus on it when discussing data
science and its many aspects. Now let’s begin by installing Python.
Chapter 1 Installing Python
Since many aspiring data scientists never used Python before, we’re
going to discuss the installation process to familiarize you with
various packages and distributions that you will need later.
Before we begin, it's worth taking note that there are two versions of
Python, namely Python 2 and Python 3. You can use either of them.
However, Python 3 is the future. Many data scientists still use Python
2, but the shift to version 3 has been building up gradually. What's
important to keep in mind is that there are various compatibility
issues between the two versions. This means that if you write a
program using Python 2 and then run it inside a Python 3 interpreter,
the code might not work. The developers behind Python have also
stopped focusing on Python 2. Therefore version 3 is the one that is
being constantly developed and improved. With that being said, let's
go through the step by the step installation process.
Step by Step Setup
Start by going to Python's webpage at www.python.org and
download Python. Next, we will go through the manual installation,
which requires several steps and instructions. It is not obligatory to
setup Python manually. However, this gives you great control over
the installation, and it's important for future installations that you will
perform independently, depending on each of your projects'
specifications. The easier way of installing Python is by automatically
installing a scientific data distribution, which sets you up with all the
packages and tools you may need (including a lot that you won't
need) therefore if you wish to go through the simplified installation
method, head down to the section about scientific distributions.
When you download Python from the developer's website, make
sure to choose the correct installer depending on your machine's
operating system. Afterward, simply run the installer. Python is now
installed. However, it is not quite ready for our purposes. We will now
have to install various packages. The easiest way to do this is to
open the command console and type "pip" to bring up the package
manager. The "easy_install" package manager is an alternative, but
pip is widely considered an improvement. If you run the commands
and nothing happens, it means that you need to download and install
any of these managers. Just head to their respective websites and
go through a basic installation process to get them. But why bother
with a package manager as a beginner?
A package manager like "pip" will make it a lot easier for you to
install/uninstall packages, or roll them back if the package version
causes some incompatibility issues or errors. Because of this
advantage of streamlining the process, most new Python
installations come with pip pre-installed. Now let's learn how to install
a package. If you chose "pip," simply type the following line in the
command console:
pip install < package_name >
If you chose "Easy Install," the process remains the same. Just type:
easy_install < package_name >
Once the command is given, the specified package will be
downloaded and installed together with any other dependencies they
require in order to run. We will go over the most important packages
that you will require in a later section. For now, it’s enough to
understand the basic setup process.

Scientific Distributions
As you can see in the previous section, building your working
environment can be somewhat time-consuming. After installing
Python, you need to choose the packages you need for your project
and install them one at a time. Installing many different packages
and tools can lead to failed installations and errors. This can often
result in a massive loss of time for an aspiring data scientist who
doesn't fully understand the subtleties behind certain errors. Finding
solutions to them isn't always straightforward. This is why you have
the option of directly downloading and installing a scientific
distribution.
Automatically building and setting up your environment can save you
from spending time and frustration on installations and allow you to
jump straight in. A scientific distribution usually contains all the
libraries you need, an Integrated Development Environment (IDE),
and various tools. Let’s discuss the most popular distributions and
their application.
Anaconda

This is probably the most complete scientific distribution offered by


Continuum Analytics. It comes with close to 200 packages pre-
installed, including Matplotlib, Scikit-learn, NumPy, pandas, and
more (we'll discuss these packages a bit later). Anaconda can be
used on any machine, no matter the operating system, and can be
installed next to any other distributions. The purpose is to offer the
user everything they need for analytics, scientific computing, and
mass-processing. It's also worth mentioning that it comes with its
own package manager pre-installed, ready for you to use in order to
manage packages. This is a powerful distribution, and luckily it can
be downloaded and installed for free, however, there is an advanced
version that requires purchase.
If you use Anaconda, you will be able to access “conda” in order to
install, update, or remove various packages. This package manager
can also be used to install virtual environments (more on that later).
For now, let’s focus on the commands. First, you need to make sure
you are running the latest version of conda. You can check and
update by typing the following command in the command line:
conda update conda
Now, let’s say you know which package you want to install. Type the
following command:
conda install < package_name >
If you want to install multiple packages, you can list them one after
another in the same command line. Here’s an example:
conda install < package_number_1 > < package_number_2 > <
package_number_3 >
Next, you might need to update some existing packages. This can
be done with the following command:
conda update < package_name >
You also have the ability to update all the packages at once. Simply
type:
conda update --all
The last basic command you should be aware of for now is the one
for package removal. Type the following command to uninstall a
certain package:
conda remove < package_name >
This tool is similar to "pip" and "easy install," and even though it's
usually included with Anaconda, it can also be installed separately
because it works with other scientific distributions as well.
Canopy

This is another scientific distribution popular because it’s aimed


towards data scientists and analysts. It also comes with around 200
pre-installed packages and includes the most popular ones you will
use later, such as Matplotlib and pandas. If you choose to use this
distribution instead of Anaconda, type the following command to
install it:
canopy_cli
Keep in mind that you will only have access to the basic version of
Canopy without paying. If you ever require its advanced features,
you will have to download and install the full version.
WinPython

If you are running on a Windows operating system, you might want


to give WinPython a try. This distribution offers similar features as
the ones we discussed earlier. However, it is community-driven. This
means that it's an open-source tool that is entirely free.
You can also install multiple versions of it on the same machine, and
it comes with an IDE pre-installed.

Virtual Environments
Virtual environments are often necessary because you are usually
locked to the version of Python you installed. It doesn’t matter
whether you installed everything manually or you chose to use a
distribution - you can’t have as many installations on the same
machine as you might want. The only exception will be if you are
using the WinPython distribution, which is available only for Windows
machines, because it allows you to prepare as many installations as
you want. However, you can create a virtual environment with the
"virtualenv". Create as many different installations as you need
without worrying about any kind of limitations. Here are a few solid
reasons why you should choose a virtual environment:
Testing grounds: It allows you to create a special
environment where you can experiment with different
libraries, modules, Python versions, and so on. This way, you
can test anything you can think of without causing any
irreversible damage.
Different versions: There are cases when you need multiple
installations of Python on your computer. There are packages
and tools, for instance, that only work with a certain version.
For instance, if you are running Windows, there are a few
useful packages that will only behave correctly if you are
running Python 3.4, which isn’t the most recent update.
Through a virtual environment, you can run different version
of Python for separate goals.
Replicability: Use a virtual environment to make sure you can
run your project on any other computer or version of Python,
aside from the one you were originally using. You might be
required to run your prototype on a certain operating system
or Python installation, instead of the one you are using on
your own computer. With the help of a virtual environment,
you can easily replicate your project and see if it runs under
different circumstances.

With that being said, let’s start installing a virtual environment by


typing the following command:
pip install virtualenv
This will install "virtualenv," however, you will first need to make
several preparations before creating the virtual environment. Here
are some of the decisions you have to make at the end of the
installation process:
Python version: Decide which version you want “virtualenv” to
use. By default, it will pick up the one it was installed from.
Therefore, if you want to use another Python version, you
have to specify by typing -p python 3.4, for instance.
Package installation: The virtual environment tool is always
set to perform the full package installation process for each
environment even when you already have said package
installed on your system. This can lead to a loss of time and
resources. To avoid this issue, you can use the --system-site-
packages command to instruct the tool to install the
packages from the files already available on your system.
Relocation: For some projects, you might need to move your
virtual environment on a different Python setup or even on
another computer. In that case, you will have to instruct the
tool to make the environment scripts work on any path. This
can be achieved with the --relocatable command.

Once you make all the above decisions, you can finally create a new
environment. Type the following command:
virtualenv myenv
This instruction will create a new directory called “myenv” inside the
location, or directory, where you currently are. Once the virtual
environment is created, you need to launch it by typing these lines:
cd myenv
activate

Necessary Packages
We discussed earlier that the advantages of using Python for data
science are its system compatibility and highly developed system of
packages. An aspiring data scientist will require a diverse set of tools
for their projects. The analytical packages we are going to talk about
have been highly polished and thoroughly tested over the years, and
therefore are used by the majority of data scientists, analysts, and
engineers.
Here are the most important packages you will need to install for
most of your work:
NumPy: This analytical library provides the user with support
for multi-dimensional arrays, including the mathematical
algorithms needed to operate on them. Arrays are used for
storing data, as well as for fast matrix operations that are
much needed to work out many data science problems.
Python wasn't meant for numerical computing. Therefore
every data scientist needs a package like NumPy to extend
the programming language to include the use of many high-
level mathematical functions. Install this tool by typing the
following command: pip install numpy.
SciPy: You can't read about NumPy without hearing about
SciPy. Why? Because the two complement each other. SciPy
is needed to enable the use of algorithms for image
processing, linear algebra, matrices, and more. Install this
tool by typing the following command: pip install scipy.
pandas: This library is needed mostly for handling diverse
data tables. Install pandas to be able to load data from any
source and manipulate as needed. Install this tool by typing
the following command: pip install pandas.
Scikit-learn: A much-needed tool for data science and
machine learning, Scikit is probably the most important
package in your toolkit. It is required for data preprocessing;
error metrics supervised and unsupervised learning, and
much more. Install this tool by typing the following command:
pip install scikit-learn.
Matplotlib: This package contains everything you need to
build plots from an array. You also have the ability to visualize
them interactively. You don’t happen to know what a plot is? It
is a graph used in statistics and data analysis to display the
relation between variables. This makes Matplotlib an
indispensable library for Python. Install this tool by typing the
following command: pip install matplotlib.
Jupyter: No data scientist is complete without Jupyter. This
package is essentially an IDE (though much more) used in
data science and machine learning everywhere. Unlike IDEs
such as Atom, or R Studio, Jupyter can be used with any
programming language. It is both powerful and versatile
because it provides the user with the ability to perform data
visualization in the same environment, and allows
customizable commands. Not only that, it also promotes
collaboration due to its streamlined method of sharing
documents. Install this tool by typing the following command:
pip install jupyter.
Beautiful Soup: Extract information from HTML and XML files
that you have access to online. Install this tool by typing the
following command: pip install beautifulsoup4.

For now, these seven packages should be enough to get you started
and give you an idea of how to extend Python's abilities. You don't
have to overwhelm yourself just yet by installing all of them,
however, feel free to explore and experiment on your own. We will
mention and discuss more packages later in the book as needed to
solve our data science problems. But for now, we need to focus
more on Jupyter, because it will be used throughout the book. So
let’s go through the installation, special commands, and learn how
this tool can help you as an aspiring data scientist.
Using Jupyter
Throughout this book, we will use Jupyter to illustrate various
operations we perform and their results. If you didn’t install it yet,
let’s start by typing the following command:
pip install jupyter
The installation itself is straightforward. Simply follow the steps and
instruction you receive during the setup process. Just make sure to
download the correct installer first. Once the setup finishes, we can
run the program by typing the next line:
jupyter notebook
This will open an instance of Jupyter inside your browser. Next, click
on “New” and select the version of Python you are running. As
mentioned earlier, we are going to focus on Python 3. Now you will
see an empty window where you can type your commands.
You might notice that Jupyter uses code cell blocks instead of
looking like a regular text editor. That’s because the program will
execute code cell by cell. This allows you to test and experiment with
parts of your code instead of your entire program. With that being
said, let’s give it a test run and type the following line inside the cell:
In: print (“I’m running a test!”)
Now you can click on the play button that is located under the Cell
tab. This will run your code and give you output, and then a new
input cell will appear. You can also create more cells by hitting the
plus button in the menu. To make it clearer, a typical block looks
something like this:
In: < This is where you type your code >
Out: < This is the output you will receive >
The idea is to type your code inside the "In" section and then run it.
You can optionally type in the result you expect to receive inside the
"Out" section, and when you run the code, you will see another "Out"
section that displays the true result. This way, you can also test to
see if the code gives you the result you expect.
Chapter 14: Python Libraries to Help with
Data Science
Python is one of the best coding languages that you are able to work
with when you want to do some work with data science. But the
regular library that comes installed with the Python language is not
going to be able to handle all of the work that needs to be done with
this field. This doesn’t mean that you are stuck though. There are
many extensions and other libraries that work with Python, that can
do some wonderful things when it comes to working on data science.
When you are ready to start analyzing some of the data that you
have been able to collect and learn some valuable insights out of
them, here are some of the best coding libraries that work with
Python as well.
NumPy and SciPy
The first part of the Python libraries for data science that we are
going to take a look at is the NumPy, or Numeric and Scientific
Computation, and the SciPy library. NumPy is going to be useful
because it is going to help us lay down the basic premises that we
need for scientific computing in Python. It is going to help us get
ahold of functions that are precompiled and fast to help with
numerical and mathematical routines as needed.
In addition to some of the benefits that we listed out above, NumPy
is able to come in and optimize some of the programming that
comes with Python by adding in some powerful structures for data.
This makes it easier for us to efficiently compute matrices and arrays
that are multi-dimensional.
Scientific Python, which is known as SciPy, is going to be linked
together with NumPy, and it is often that you can’t have one without
the other. When you have SciPy, you can lend a competitive edge to
what happens with NumPy. This happens when you enhance some
of the useful functions for minimization, regression, and more.
When you want to work with these two libraries, you need to go
through the process of installing the NumPy library first and getting
that all setup and ready to work with Python. From there, you can
install the SciPy library and get to work with using the Python coding
language with any of your goals or projects that include data
science.
Pandas
The second type of Python library that we can use to help out with
data science is going to be known as Pandas, or Python Data
Analysis Library. The name of the library is going to be so important
when it shows us how we can use this kind of library to help us get
started.
Pandas is going to be a tool that is open-sourced and can provide us
with data structures that are easy to use and high in performance
and it comes with all of the tools that you need to complete a data
analysis in the Python code. You can use this particular library to add
in data structures and tools to complete that data analysis, no matter
what kind you would like to do. Many industries like to work with this
Python library for data science will include engineering, social
science, statistics, and finance.
The best part about using this library is that it is adaptable, which
helps us to get more work done. It also works with any kind of data
that you were able to collect for it, including uncategorized, messy,
unstructured, and incomplete data. Even once you have the data,
this library is going to step in and help provide us with all of the tools
that we need to slice, reshape, merge, and more all of the sets of
data we have.
Pandas is going to come with a variety of features that makes it
perfect for data science. Some of the best features that come with
the Pandas library from Python will include:
1. You can use the Pandas library to help reshape the
structures of your data.
2. You can use the Pandas library to label series, as well
as tabular data, to help us see an automatic alignment
of the data.
3. You can use the Pandas library to help with
heterogeneous indexing of the data, and it is also useful
when it comes to systematic labeling of the data as
well.
4. You can use this library because it can hold onto the
capabilities of identifying and then fixing any of the data
that is missing.
5. This library provides us with the ability to load and then
save data from more than one format.
6. You can easily take some of the data structures that
come out of Python and NumPy and convert them into
the objects that you need to Pandas objects.

Matplotlib
When you work on your data science, you want to make sure that
after gathering and then analyzing all of the data that is available you
also find a good way to present that information to others so they
can gain all of the insights quickly. Working with visualizations of
some sort, depending on the kind of data you are working with, can
make it easier to see what information is gathered and how different
parts are going to be combined together.
This is where the Matplotlib is going to come in handy. This is a 2D
plotting library from Python, and it is going to be capable of helping
us to produce publication-quality figures in a variety of formats. You
can also see that it offers a variety of interactive environments
across a lot of different platforms as well. This library can be used
with the scripts form Python, the Python and the IPython shell, the
Jupyter notebook, four graphical interface tool kits, and many
servers for web applications.
The way that this library is going to be able to help us with data
science is that it is able to generate a lot of the visualizations that we
need to handle all of our data, and the results that we get out of the
data. This library is able to help with generating scatterplots, error
charts, bar charts, power spectra, histograms, and plots to name a
few. If you need to have some kind of chart or graph to go along with
your data analysis, make sure to check out what the matplotlib
option can do for you.
Scikit-Learn
Scikit-Learn is going to be a module that works well in Python and
can help with a lot of the state of the art algorithms that are found in
machine learning. These algorithms that work the best with the
Scikit-Learn library will work with medium-scale unsupervised and
supervised machine learning problems so you have a lot of
applications to make all of this work.
Out of the other libraries that we have talked about in this guidebook,
the Scikit-Learn library is one of the best options from Python when it
comes to machine learning. This package is going to focus on
helping us to bring some more machine learning to non-specialists
using a general-purpose high-level language. With this language,
you will find that the primary emphasis is going to be on things like
how easy it is to use, the performance, the documentation, and the
consistency that shows up in the API.
Another benefit that comes with this library is that it has a minimal
amount of dependencies and it is easy to distribute. You will find that
this library shows up in many settings that are commercial or
academic. Scikit-Learn is going to expose us to a consistent and
concise kind of interface that can work with some of the most
common algorithms that are part of machine learning, which makes it
easier to add in some machine learning to the data science that you
are working with.
Theano
Theano is another great library to work with during data science, and
it is often seen as one of the highly-rated libraries to get this work
done. In this library, you will get the benefit of defining, optimizing,
and then evaluating many different types of mathematical
expressions that come with multi-dimensional arrays in an efficient
manner. This library is able to use lots of GPUs and perform
symbolic differentiation in a more efficient manner.
Theano is a great library to learn how to use, but it does come with a
learning curve that is pretty steep, especially for most of the people
who have learned how to work with Python because declaring the
variables and building up some of the functions that you want to
work with will be quite a bit different from the premises that you learn
in Python.
However, this doesn’t mean that the process is impossible. It just
means that you need to take a bit longer to learn how to make this
happen. With some good tutorials and examples, it is possible for
someone who is brand new to Theano to get this coding all done.
Many great libraries that come with Python, including Padas and
NumPy, will be able to make this a bit easier as well.
TensorFlow
TensorFlow, one of the best Python libraries for data science, is a
library that was released by Google Brain. It was written out mostly
in the language of C++, but it is going to include some bindings in
Python, so the performance is not something that you are going to
need to worry about. One of the best features that come to this
library is going to be some of the flexible architecture that is found in
the mix, which is going to allow the programmer to deploy it with one
or more GPUs or CPUs in a desktop, mobile, or server device, while
using the same API the whole time.
Not many, if any, of the other libraries that we are using in this
chapter, will be able to make this kind of claim. This library is also
unique in that it was developed by the Google Brain project, and it is
not used by many other programmers. However, you do need to
spend a bit more time to learn the API compared to some of the
other libraries. In just a few minutes, you will find that it is possible to
work with this TensorFlow library in order to implement the design of
your network, without having to fight through the API like you do with
other options.
Keras
Keras is going to be an open-sourced library form Python that is able
to help you to build up your own neural networks, at a high level of
the interface. It is going to be pretty minimalistic, which makes it
easier to work with, and the coding on this library is going to be
simple and straightforward, while still adding in some of the high-
level extensibility that you need. It is going to work either TensorFlow
or Theano along with CNTK as the backend to make this work better.
We can remember that the API that comes with Keras is designed
for humans to use, rather than humans, which makes it easier to use
and puts the experience of the user right in front.
Keras is going to follow what are known as the best practices when it
comes to reducing the cognitive load. This Python library is going to
offer a consistent and simple APIs to help minimize how many
actions the user has to do for many of the common parts of the code,
and it also helps to provide feedback that is actionable and clear if
an error does show up.
In this library, we find that the model is going to be understood as a
sequence, or it can be a graph of standalone, fully-configurable
modules that you are able to put together with very few restrictions at
the time. Neural layers, optimizers, activation functions, initialization
schemes, cost functions, and regularization schemes are going to be
examples of the standalone modules that are combined to create a
new model. You will also find that Keras is going to make creating a
new module simple, and existing module that are there can provide
us with lots of examples to work with.
Caffe
The final Python library that we will take a look at in order to do some
work with data science is going to be Caffe. This is a good machine
learning library to work with when you want to focus your attention
on computer vision. Programmers like to use this to create some
deep neural networks that are able to recognize objects that are
found in images and it has been explored to help recognize a visual
style as well.
Caffe is able to offer us an integration that is seamless with GPU
training and then is highly recommended any time that you would
like to complete your training with some images. Although this library
is going to be preferred for things like research and academics, it is
going to have a lot of scope to help with models of training for
production as well. The expressive architecture that comes with it is
going to encourage application and innovation as well.
In this kind of library, you are going to find that the models will be
optimized and then defined through configuration without hard
coding in the process. You can even switch between the CPU and
the GPU by setting a single flag to train on a GPU machine, and then
go through and deploy to commodity clusters, or even to mobile
devices.
These are just a few of the different libraries that you are able to use
when it comes to working on Python, and they will ensure that you
are going to see the best results any time that you want to explore a
bit with data science. While the traditional form of the Python library,
the one that comes with the original download, is not going to be
able to handle some of the different parts that come with data
science, you can easily download and add on these other Python
libraries and see exactly what steps they can help with when it
comes to gathering, cleaning, analyzing, and using the data that you
have with data science.
Chapter 15: Python Functions
Python functions are a good way of organizing the structure of our
code. The functions can be used for grouping sections of code that
are related. The work of functions in any programming language is to
improve the modularity of code and make it possible to reuse code.
Python comes with many in-built functions. A good example of such
a function is the “print()” function which we use for displaying the
contents on the screen. Despite this, it is possible for us to create
our own functions in Python. Such functions are referred to as the
“user-defined functions”.
To define a function, we use the “def” keyword which is then followed
by the name of the function, and then the parenthesis (()).
The parameters or the input arguments have to be placed inside the
parenthesis. The parameters can also be defined within parenthesis.
The function has a body or the code block and this must begin with a
colon (:) and it has to be indented. It is good for you to note that the
default setting is that the arguments have a positional behavior. This
means that they should be passed while following the order in which
you defined them.
Example:
#!/usr/bin/python3
def functionExample():
print('The function code to run')
bz = 10 + 23
print(bz)
We have defined a function named functionExample. The
parameters of a function are like the variables for the function. The
parameters are usually added inside the parenthesis, but our above
function has no parameters. When you run above code, nothing will
happen since we simply defined the function and specified what it
should do. The function can be called as shown below:
#!/usr/bin/python3
def functionExample():
print('The function code to run')
bz = 10 + 23
functionExample()
It will print this:

That is how we can have a basic Python function.

Function Parameters
You can dynamically define arguments for a function. Example:
#!/usr/bin/python3
def additionFunction(n1,n2):
result = n1 + n2
print('The first number is', n1)
print('The second number is', n2)
print("The sum is", result)
additionFunction(10,5)
The code returns the following result:

We defined a function named addFunction. The function takes two


parameters namely n1 and n2. We have another variable named
result which is the sum of the two function parameters. In the last
statement, we have called the function and passed the values for the
two parameters. The function will calculate the value of variable
result by adding the two numbers. We finally get the result shown
above.
Note that during our function definition, we specified two parameters,
n1 and n2. Try to call the function will either more than two
parameters, or 1 parameter and see what happens. Example:
#!/usr/bin/python3
def additionFunction(n1,n2):
result = n1 + n2
print('The first number is', n1)
print('The second number is', n2)
print("The sum is", result)
additionFunction(5)
In the last statement in our code above, we have passed only one
argument to the function, that is, 5. The program gives an error when
executed:

The error message simply tells us one argument is missing. What if


we run it with more than two arguments?
#!/usr/bin/python3
def additionFunction(n1,n2):
result = n1 + n2
print('The first number is', n1)
print('The second number is', n2)
print("The sum is", result)
additionFunction(5,10,9)
We also get an error message:

The error message tells us the function expects two arguments but
we have passed 3 to it.
In most programming languages, parameters to a function can be
passed either by reference or by value. Python supports parameter
passing only by reference. This means if what the parameter refers
to is changed in the function; the same change will also be reflected
in the calling function. Example:
#!/usr/bin/python3
def referenceFunction(ls1):
print ("List values before change: ", ls1)
ls1[0]=800
print ("List values after change: ", ls1)
return
# Calling the function
ls1 = [940,1209,6734]
referenceFunction( ls1 )
print ("Values outside function: ", ls1)
The code gives this result:
What we have done in this example is that we have maintained the
reference of the objects which are being passed and then values
have been appended to the same function.
In next example below, we are passing by reference then the same
reference will be overwritten inside the same function which has
been called:
#!/usr/bin/python3
def referenceFunction( ls1 ):
ls1 = [11,21,31,41]
print ("Values inside the function: ", ls1)
return
ls1 = [51,91,81]
referenceFunction( ls1 )
print ("Values outside function: ", ls1)
The code gives this result:

Note that the “ls1” parameter will be local to the function


“referenceFunction”. Even if this is changed within the function, the
“ls1” will not be affected in any way. As the output shows above, the
function helps us achieve nothing.

Function Parameter Defaults


There are default parameters for functions, which the function
creator can use in his or her functions. This means that one has the
choice of using the default parameters, or even using the ones they
need to use by specifying them. To use the default parameters, the
parameters having defaults are expected to be last ones written in
function parameters. Example:
#!/usr/bin/python3
def myFunction(n1, n2=6):
pass
In above example, the parameter n2 has been given a default value
unlike parameter n1. The parameter n2 has been written as the last
one in the function parameters. The values for such a function may
be accessed as follows:
#!/usr/bin/python3
def windowFunction(width,height,font='TNR'):
# printing everything
print(width,height,font)
windowFunction(245,278)
The code outputs the following:

The parameter font had been given a default value, that is, TNR. In
the last line of the above code, we have passed only two parameters
to the function, that is, the values for width and height parameters.
However, after calling the function, it returned the values for the
three parameters. This means for a parameter with default, we don’t
need to specify its value or even mention it when calling the
function.
However, it’s still possible for you to specify the value for the
parameter during function call. You can specify a different value to
what had been specified as the default and you will get the new one
as value of the parameter. Example:
#!/usr/bin/python3
def windowFunction(width,height,font='TNR'):
# printing everything
print(width,height,font)
windowFunction(245,278,'GEO')
The program outputs this:

Above, the value for parameter was given the default value “TNR”.
When calling the function in the last line of the code, we specified a
different value for this parameter, which is “GEO”. The code returned
the value as “GEO”. The default value was overridden.

Chapter 16: The Basics of Working with


Python
Before we start working with machine algorithms, you should first
understand the basics of working with Python. However, if you are
already familiar with Python or you have experience programming in
other languages such as C++ or C#, you can probably skip this
chapter or simply use it to refresh your memory.
In this chapter we are going to discuss the basic concepts of working
with Python briefly. Machine learning and Python go hand in hand
due to the simple fact that Python is a simple but powerful and
versatile language. Furthermore, there are many modules,
packages, and tools designed to expand Python's functionality to
specifically work with machine learning algorithms, as well as data
science.
Keep in mind that this is a brief introduction to Python, and therefore
we will not be using any IDE’s or fancy tools. All you need is the
Python shell, in order to test and experiment with your code as you
learn. You don’t even need to install anything on your computer
because you can simply head to Python’s official website and use
their online shell. You can find it here: https://www.python.org/shell/.

Data Types
Knowing the basic data types and how they work is a must. Python
has several data types, and in this section, we will go through a brief
description of each one and then see them in practice. Don't forget
to also practice on your own, especially if you know nothing or very
little about Python.
With that in mind, let's explore strings, numbers, dictionaries, lists,
and more!
Numbers
In Python, just like in math in general, you have several categories of
numbers to work with, and when you work them into code, you have
to specify which one you're referring to. For instance, there are
integers, floats, longs, and others. However, the most commonly
used ones are integers and floats.
Integers, written int for short, are whole numbers that can either be
positive or negative. So make sure that when you declare a number
as an integer, you don't type a float instead. Floats are decimal or
fractional numbers.
Now let's discuss the mathematical operators. Just like in elementary
school, you will often work using basic mathematical operators such
as adding, subtracting, multiplication, and so on. Keep in mind that
these are different from the comparison operators, such as greater
than or less than or equal to. Now let's see some examples in code:
x = 99
y = 26
print (x + y)
This basic operation simply prints the sum of x and y. You can use
this syntax for all the other mathematical operators, no matter how
complex your calculation is. Now let’s type a command using a
comparison operator instead:
x = 99
y = 26
print (x > 100)
As you can see, the syntax is the same. However, we aren't
performing a calculation. Instead, we are verifying whether the value
of x is greater than 100. The result you will get is "false" because 99
is not greater than 100.
Next, you will learn what strings are and how you can work with
them.
Strings
Strings have everything to do with text, whether it's a letter, number,
or punctuation mark. However, take note that numbers written as
strings are not the same as the numbers data type. Anything can be
defined as a string, but to do so you need to place quotation marks
before and after your declaration. Let's take a look at the syntax:
n = “20”
x = 10
Notice that our n variable is a string data type and not a number,
while x is defined as an integer because it lacks the quotation marks.
There are many operations you can do on strings. For instance, you
can verify how long a string is, or you can concatenate several
strings. Let's see how many characters there are in the word "hello"
by using the following function:
len (“Hello”)
The “len” function is used to determine the number of characters,
which in this case is five. Here’s an example of string concatenation.
You’ll notice that it looks similar to a mathematical operation, but with
text:
‘42 ’ + ‘is ’ + ‘the ’ + ‘answer’
The result will be “42 is the answer”. Pay attention to the syntax,
because you will notice we left a space after each string, minus the
last one. Spaces are taken into consideration when writing strings. If
we didn’t add them, all of our strings would be concatenated into one
word.
Another popular operation is the string iteration. Here’s an example:
bookTittle = “Lord of the Rings”
for x in book: print c
The result will be an iteration of every single character found in the
string. Python contains many more string operations. However, these
are the ones you will use most often.
Now let’s progress to lists.
Lists
This is a data type that you will often be using. Lists are needed to
store data, and they can be manipulated as needed. Furthermore,
you can store objects of different types in them. Here's what a
Python list looks like:
n = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
The square brackets define the list, and every object separated by a
comma is a list element. Here's an example of a list containing
different data types:
myBook = [“title”, “somePages”, 1, 2.1, 5, 22, 42]
This is a list that holds string objects as well as integers and floats.
You can also perform a number of operations on lists, and most of
them follow the same syntax as for the strings. Try them out!
Dictionaries
This data type is nearly identical to a list. However, you cannot
access the elements the same way. What you need is to know the
key, which is linked to a dictionary object. Take a look at the following
example:
dict = {‘weapon’ : ‘sword’, ‘soldier’ : ‘archer’}
dict [‘weapon’]
The first line contains the dictionary's definition, and as you can see,
the objects and their keys have to be stored between curly braces.
You can identify the keys as "weapon" and "soldier" because, after
them, you need to place a colon, followed by the attribute. Keep in
mind that while in this example, our keys are, in fact strings, they can
be other data types as well.
Tuples
This data type is similar to a list, except its elements cannot be
changed once defined. Here’s an example of a tuple:
n = (1, 43, ‘someText’, 99, [1, 2, 3])
A tuple is defined between parentheses, and in this case, we have
three different data types, namely a few integers, a string, and a list.
You can perform a number of operations on a tuple, and most of
them are the same as for the lists and strings. They are similar data
types, except that once you declare the tuple, you cannot change it
later.

Conditional Statements
Now that you know the basic data types, it’s time to take a crash
course on more complex operations that involve conditional
statements. A conditional statement is used to give an application a
limited ability to think for itself and make a decision based on their
assessment of the situation. In other words, it analyzes the condition
required by a variable in order to tell the program to react based on
the outcome of that analysis.
Python statements are simple to understand because they are
logical, and the syntax reflects human thinking. For instance, the
syntax written in English looks like this "If I don't feel well, I won't go
anywhere else. I will have to go to work." In this example, we instruct
the program to check whether you feel well. If the statement is
valued as false, it means you feel well, and therefore, it will progress
to the next line, which is an "else" statement. Both “if” and “if else”
conditionals are frequently used when programming in general.
Here’s an example of the syntax:
x = 100
if (x < 100):
print(“x is small”)
This is the most basic form of the statement. It checks whether it's
true, and if it is, then something will happen, and if it's not, then
nothing will happen. Here's an example using the else statement as
well:
x = 100
if (x < 100):
print(“x is small”)
else:
print(“x is large”)
print (“Print this no matter what”)
With the added “else” keyword, we instruct the application to perform
a different task if a false value is returned. Furthermore, we have a
separate declaration that lies outside of the conditional statement.
This will be executed no matter the outcome.
Another type of conditional involves the use of "elif" which allows the
application to analyze a number of statements before it makes a
decision. Here's an example:
if (condition1):
add a statement here
elif (condition2):
add another statement for this condition
elif (condition3):
add another statement for this condition
else:
if none of the conditions apply, do this
Take note that this time we did not use code. You already know
enough about Python syntax and conditionals to turn all of this into
code. What we have here is the pseudo-code, which is very handy,
whether you are writing simple Python exercises or working with
machine learning algorithms. Pseudocode allows you to place your
thoughts on "paper" by following the Python programming structure.
This makes it a lot easier for you to organize your ideas and your
application by writing the code after you've outlined it. With that
being said, here's the actual code:
x = 10
if (x > 10):
print (“x is larger than ten”)
elif x < 4:
print (“x is smaller”)
else:
print (“x is pretty small”)
Now you have everything you need to know about conditionals. Use
them in combination with what you learned about data types in order
to practice. Keep in mind that you always need to practice these
basic Python concepts in order to understand later how machine
learning algorithms work.

Loops
Code sometimes needs to be executed repeatedly until a specific
condition is met. This is what loops are for. There are two types, the
for loop and the while loop. Let’s begin with the first example:
for x in range(1, 10):
print(x)
This code will be executed several times, printing the value of X
each time, until it reaches ten.
The while loop, on the other hand, is used to repeat the execution of
a code block only if the condition we set is still true. Therefore, when
the condition is no longer met, the loop will break, and the
application will continue with the next lines of code. Here's a while
loop in action:
x=1
while x < 10:
print(x)
x += 1
The x variable is declared as an integer, and then we instruct the
program that as long as x is less than ten, the result should be
printed. Take note that if you do not continue with any other
statement at this point, you will create an infinite loop, and that is not
something you want. The final statement makes sure that the
application will print the new value with one added to it with every
execution. When the variable stops being less than ten, the condition
will no longer be met, and the loop will break, allowing the
application to continue executing any code that follows.
Keep in mind that infinite loops can easily happen due to mistakes
and oversight. Luckily, Python has a solution, namely the "break"
statement, which should be placed at the end of the loop. Here's an
example:
while True:
answer = input (“Type command:”)
if answer == “Yes”:
break
Now the loop can be broken by typing a command.

Functions
As a beginner machine learner, this is the final Python component
you need to understand before learning the cool stuff. Functions
allow you to make your programs a great deal more efficient,
optimized, and easier to work with. They can significantly reduce the
amount of code you have to type, and therefore make the application
less demanding when it comes to system resources. Here's an
example of the most basic function to get an idea about the syntax:
def myFunction():
print(“Hello, I am now a function!”)
Functions are first declared by using the “def” statement, followed by
its name. Whenever we want to call this block of code, we simply call
the function instead of writing the whole code again. For instance,
you simply type:
myFunction()
The parentheses after the function represent the section where you
can store a number of parameters. They can alter the definition of
the function like this:
def myName(firstname):
print(firstname + “ Smith”)
myName(“Andrew”)
myName(“Peter”)
myName(“Sam”)
Here we have a first name parameter, and whenever we call the
function to print its parameter, it does so together with the addition of
the word "Smith". Take note that this is a really basic example just so
you get a feel for the syntax. More complex function are written the
same way, however.
Here’s another example where we have a default parameter, which
will be called only if there is nothing else to be executed in its place.
def myHobby(hobby = “leatherworking”):
print (“My hobby is “ + hobby)
myHobby (“archery”)
myHobby (“gaming”)
myHobby ()
myHobby (“fishing”)
Now let’s call the function:
My hobby is archery
My hobby is gaming
My hobby is leatherworking
My hobby is fishing
You can see here how the default parameter is used when we lack a
specification.
Here you can see that the function without a parameter will use the
default value we set.
In addition, you can also have functions that return something. For
now, we only wrote functions that perform an action, but they don't
return any values or results. These functions are far more useful
because the result can then be placed into a variable that will later
be used in another operation. Here's how the syntax looks in this
case:
def square(x):
return x * x
print(square (5))
Now that you've gone through a brief Python crash course and you
understand the basics, it's time to learn how to use the right tools
and how to set up your machine learning environment. Don't forget
that Python is only one component of machine learning. However,
it's an important one because it's the foundation, and without it,
everything falls apart.
Chapter 17: Data Structures and the A*
Algorithm
In this chapter, you will learn how to create abstract data structures
using the same Python data types you already know. Abstract data
structures allow your programs to process data in intuitive ways and
rely on the Don't Repeat Yourself (DRY) principle. That is, using less
code and not typing out the same operations repeatedly for each
case. As you study the examples given, you will begin to notice a
pattern emerging: the use of classes that complement each other
with one acting as a node and another as a container of nodes. In
computer science, a data structure that uses nodes is generally
referred to as a tree. There are many different types of trees, each
with specialized use cases. You may have already heard of binary
trees if you are interested in programming or computer science at all.
One possible type of tree is called an n-ary tree, or n-dimensional
tree. Unlike the binary tree, the n-ary tree contains nodes that have
an arbitrary number of children. A child is simply another instance of
a node that is linked to another node, sometimes called a parent.
The parent must have some mechanism for linking up to child nodes.
The easiest way to do this is with a list of objects.
Example Coding #1: A Mock File-System
A natural application of the n-ary tree is a traditional windows or
UNIX file system. Nodes can be either folders, directories, or
individual files. To keep things simple, the following program
assumes a single directory as the tree's root.
# ch1a.py
The FileSystem acts as the tree, and the Node class does most of
the work, which is common with tree data structures. Notice also that
FileSystem keeps track of individual ID’s for each node. The ID’s can
be used as a way to quantify the number of nodes in the file system
or to provide lookup functionality.
When it comes to trees, the most onerous task is usually
programming a solution for traversal. The usual way a tree is
structured is with a single node as root, and from that single node,
the rest of the tree can be accessed. Here the function
look_up_parent uses a loop to traverse the mock directory structure,
but it can easily be adapted to a recursive solution as well.
General usage of the program is as follows: initiate the FileSystem
class, declare Node objects with the directory syntax (in this case
backslash so Python won’t mistake it for escape characters), and
then calling the add method on them.
Example Coding # 2: Binary Search Tree (BST)
The binary search tree gets its name from the fact that a node can
contain at most two children. While this may sound like a restriction,
it is actually a good one because the tree becomes intuitive to
traverse. An n-ary tree, in contrast, can be messy.
# ch1b.py
As before, the Node class does most of the heavy lifting. This
program uses a BST primarily to sort a list of numbers but can be
generalized to sorting any data type. There are also a number of
auxiliary methods for finding out the size of the tree and which nodes
are childless (leaves).
This implementation of a tree better illustrates the role that recursion
takes when traversing a tree at each node calls a method (for
example, insert) and creates a chain until a base case is reached.
Example Coding # 3: A* Algorithm
The A* star search algorithm is considered the same as the Dijkstra
algorithm but with brains. Whereas Dijkstra searches almost
exhaustedly until the path is found, A* uses what is called a heuristic,
which is a fancy way of saying “educated guess.” A* is fast because
it is able to point an arrow at the target (using the heuristic) and find
steps on that path.
First, here's a brief explanation of the algorithm. To simplify things,
we will be using a square grid with orthogonal movement only (no
diagonals). The object of A* is to find the shortest path between point
A and point B. That is, we know the position of point B. This will be
the end node and A the start. In order to get from A to B, the
algorithm must calculate distances of nodes between A and B such
that each node gets closer to B or is discarded. An easy way to
program this is by using a heap or priority queue and using some
measure of distance to sort order.
After the first node is added to the heap, each neighbor node will be
evaluated for distance, and the closest one to B is added to the
heap. The process repeats until the node is equal to B.
#ch1c.py
In this case, the heuristic is called Manhattan distance, which is just
the absolute value between the current node and the target. The
heapq library is being used to create a priority queue with f as the
priority. Note that the backtrace function is simply traversing a tree of
nodes that each has a single parent.
You can think of the g variable is the cost of moving from the starting
point to somewhere along the path. Since we are using a grid with
no variation in movement, cost g can be constant. The h variable is
the estimated distance between the current node and the target.
Adding these two together gives you the f variable, which is what
controls the order of nodes on the path.

Chapter 18: Reading data in your script


Reading data from file
Let's make our data file using Microsoft Excel, LibreOffice Calc, or
some other spreadsheet application and save it in a tab-delimited file
ingredients.txt

Food c f protein calories serving


a a size
r t
b
pasta 3 1 7 210 56
9
parmesan 0 1 2 20 5
grated .
5
Sour 1 5 1 60 30
cream
Chicken 0 3 22 120 112
breast
Potato 2 0 3 110 148
8
Fire up your IPython notebook server. Using the New drop-down
menu in the top right corner, create a new Python3 notebook and
type the following Python program into a code cell:
#open file ingredients.txt
with open('ingredients.txt', 'rt') as f:
for line in f: #read lines until the end of file
print(line) #print each line
Remember that indent is important in Python programs and
designates nested operators. Run the program using the menu
option Cell/Run, the right arrow button, or the Shift-Enter keyboard
shortcut. You can have many code cells in your IPython notebooks,
but only the currently selected cell is run. Variables generated by
previously run cells are accessible, but if you just downloaded a
notebook, you need to run all the cells that initialize variables used in
the current cell. You can run all the code cells in the notebook by
using the menu option Cell/Run All or Cell/Run All Above
This program will open a file called "ingredients" and print it line by
line. Operatorwithis a context manager - it opens the file and makes
it known to the nested operator's asf. Here, it is used as an idiom to
ensure that the file is closed automatically after we are done reading
it. Indentation before is required - it shows that for is nested in with
and has access to the variable f designating the file. Function print is
nested inside for which means it will be executed for every line read
from the file until the end of the file is reached, and the for cycle
quits. It takes just three lines of Python code to iterate over a file of
any length.
Now, let's extract fields from every line. To do this, we will need to
use a string's method split() that splits a line and returns a list of
substrings. By default, it splits the line at every white space
character, but the tab character delimits our data - so we will use tab
to split the fields. The tab character is designated\t in Python.
with open('ingredients.txt', 'rt') as f:
for line in f:
fields=line.split('\t') #split line in separate fields
print(fields) #print the fields
The output of this code is:
['food', 'carb', 'fat', 'protein', 'calories', 'serving size\n']
['pasta', '39', '1', '7', '210', '56\n']
['parmesan grated', '0', '1.5', '2', '20', '5\n']
['Sour cream', '1', '5', '1', '60', '30\n']
['Chicken breast', '0', '3', '22', '120', '112\n']
['Potato', '28', '0', '3', '110', '148\n']
Now, each string is split conveniently into lists of fields. The last field
contains a pesky\ncharacter designating the end of line. We will
remove it using the strip() method that strips white space characters
from both ends of a string.
After splitting the string into a list of fields, we can access each field
using an indexing operation. For example, fields[0] will give us the
first field in which a food’s name is found. In Python, the first element
of a list or an array has an index 0.
This data is not directly usable yet. All the fields, including those
containing numbers, are represented by strings of characters. This is
indicated by single quotes surrounding the numbers. We want food
names to be strings, but the amounts of nutrients, calories, and
serving sizes must be numbers so we could sort them and do
calculations with them. Another problem is that the first line holds
column names. We need to treat it differently.
One way to do it is to use file object's methodreadline()to read the
first line before entering the for loop. Another method is to use
function enumerate() which will return not only a line but also its
number starting with zero:
with open('ingredients.txt', 'rt') as f:
#get line number and a line itself
#in i and line respectively
for i,line in enumerate(f):
fields=line.strip().split('\t')#split line into fields
print(i,fields) #print line number and the
fields
This program produces following output:
0 ['food', 'carb', 'fat', 'protein', 'calories', 'serving size']
1 ['pasta', '39', '1', '7', '210', '56']
2 ['parmesan grated', '0', '1.5', '2', '20', '5']
3 ['Sour cream', '1', '5', '1', '60', '30']
4 ['Chicken breast', '0', '3', '22', '120', '112']
5 ['Potato', '28', '0', '3', '110', '148']
Now we know the number of a current line and can treat the first line
differently from all the others. Let’s use this knowledge to convert our
data from strings to numbers. To do this, Python has function float().
We have to convert more than one field so we will use a powerful
Python feature called list comprehension.
with open('ingredients.txt', 'rt') as f:
for i,line in enumerate(f):
fields=line.strip().split('\t')
if i==0: # if it is the first line
print(i,fields) # treat it as a header
continue # go to the next line
food=fields[0] # keep food name in food
#convert numeric fields no numbers
numbers=[float(n) for n in fields[1:]]
#print line numbers, food name, and nutritional values
print(i,food,numbers)
Operatoriftests if the condition is true. To check for equality, you
need to use==. The index is only 0 for the first line, and it is treated
differently. We split it into fields, print, and skip the rest of the cycle
using the continue operator.
Lines describing foods are treated differently. After splitting the line
into fields, fields[0]receives the food's name. We keep it in the
variable food. All other fields contain numbers and must be
converted.
In Python, we can easily get a subset of a list by using a slicing
mechanism. For instance,list1[x:y] means that a list of every element
in list1 -starting with index and ending with y-1. (You can also include
stride, see help). If x is omitted, the slice will contain elements from
the beginning of the list up to the elements-1. If y is omitted, the slice
goes from element x to the end of the list. Expressionfields[1:]means
every field except the firstfields[0].
numbers=[float(n) for n in fields[1:]]
means we create a new list number by iterating from the second
element in the fields and converting them to floating-point numbers.
Finally, we want to reassemble the food's name with its nutritional
values already converted to numbers. To do this, we can create a list
containing a single element - food's name - and add a list containing
nutrition data. In Python, adding lists concatenates them.
[food]+ numbers

Dealing with corrupt data


Sometimes, just one line in a huge file is formatted incorrectly. For
instance, it might contain a string that could not be converted to a
number. Unless handled properly, such situations will force a
program to crash. In order to handle such situations, we must use
Python's exception handling. Parts of a program that might fail
should be embedded into atry ... except block. In our program, one
such error-prone part is the conversion of strings into numbers.
numbers=[float(n) for n in fields[1:]]
Lets insulate this line:
with open('ingredients.txt', 'rt') as f:
for i,line in enumerate(f):
fields=line.strip().split('\t')
if i==0:
print(i,fields)
continue
food=fields[0]
try: # Watch out for errors!
numbers=[float(n) for n in fields[1:]]
except: # if there is an error
print(i,line) # print offenfing lile and its number
print(i,fields) # print how it was split
continue # go to the next line without crashin
print(i,food,numbers)

Chapter 19: Manipulating data


Sorting data
In order to do something meaningful with the data, we need a
container to hold it. Let’s store information for each food in a list, and
create a list of these lists to represent all the foods. Having all the
data conveniently in one list allows us to sort it easily.
data=[] # create an empty list to hold data
with open('ingredients.txt', 'rt') as f:
for i,line in enumerate(f):
fields=line.strip().split('\t')
if i==0:
header=fields #remember a header
continue
food=fields[0].lower() #convert to lower case
try:
numbers=[float(n) for n in fields[1:]]
except:
print(i,line)
print(i,fields)
continue
#append food info to data list
data.append([food]+numbers)
# Sort list in place by food name
data.sort(key=lambda a:a[3]/a[4], reverse=True)
for food in data: #iterate over the sorted list of foods
print(food) #print info for each food
['chicken breast', 0.0, 3.0, 22.0, 120.0, 112.0]
['parmesan grated', 0.0, 1.5, 2.0, 20.0, 5.0]
['pasta', 39.0, 1.0, 7.0, 210.0, 56.0]
['potato', 28.0, 0.0, 3.0, 110.0, 148.0]
['sour cream', 1.0, 5.0, 1.0, 60.0, 30.0]
data=[]creates an empty list and theappend()method appends new
variables to the list.sort()method sorts lists in place. If the list
contains simple values (such as numbers or strings), they are sorted
from small to large or alphabetically by default. We have a list of
complex data and it is not obvious how to sort it. So, we pass
akeyparameter to thesort() method. This parameter is a function that
takes an element of the list and returns a simple value that is used to
order the elements in the list. In our case, we used a simple
nameless lambda function that took record for each food and
returned the first element, which is the food's name. So we ended up
with the list sorted alphabetically.
We could also sort the list by the second value, which represents the
amount of carbohydrates per serving. All we have to do is change
the lambda function that calculates the key:
data.sort(key=lambda a:a[1])
This will return foods in different order:
['parmesan grated', 0.0, 1.5, 2.0, 20.0, 5.0]
['chicken breast', 0.0, 3.0, 22.0, 120.0, 112.0]
['sour cream', 1.0, 5.0, 1.0, 60.0, 30.0]
['potato', 28.0, 0.0, 3.0, 110.0, 148.0]
['pasta', 39.0, 1.0, 7.0, 210.0, 56.0]
Of course, sorting by amount of carbohydrates per serving doesn't
make much sense because serving sizes might be as different as 5
grams for parmesan and 148 grams for potatoes. Perhaps, ordering
foods by amount of protein per calorie might make more sense;
whereby, the value would be reflecting the "healthiness" of the food.
Once again, all we need to do is to change the key function:
data.sort(key=lambda a:a[3]/a[4])
The output is
['sour cream', 1.0, 5.0, 1.0, 60.0, 30.0]
['potato', 28.0, 0.0, 3.0, 110.0, 148.0]
['pasta', 39.0, 1.0, 7.0, 210.0, 56.0]
['parmesan grated', 0.0, 1.5, 2.0, 20.0, 5.0]
['chicken breast', 0.0, 3.0, 22.0, 120.0, 112.0]
We have the "unhealthiest" food on top. Perhaps, we want to start
with the healthiest one. To do this we need to provide another
parameter for thesort() method – reverse.
data.sort(key=lambda a:a[3]/a[4], reverse=True)
This will reverse the list.
['chicken breast', 0.0, 3.0, 22.0, 120.0, 112.0]
['parmesan grated', 0.0, 1.5, 2.0, 20.0, 5.0]
['pasta', 39.0, 1.0, 7.0, 210.0, 56.0]
['potato', 28.0, 0.0, 3.0, 110.0, 148.0]
['sour cream', 1.0, 5.0, 1.0, 60.0, 30.0]
Although it is easy to sort by one or several columns in traditional
spreadsheet applications, it is much harder to sort by complex
expressions that require calculations on values from several
columns. Python allows you to easily do it.
Filtering data
Having our data in a list allows us to filter it with one line of code
using list comprehension, but, this time, we will use new a option for
list comprehension - anif that allows us to exclude some elements
from the new list:
data_filtered=[a for a in data if a[3]/a[4]>0.09]
for food in data_filtered:
print(food)
The filtered list is:
['chicken breast', 0.0, 3.0, 22.0, 120.0, 112.0]
['parmesan grated', 0.0, 1.5, 2.0, 20.0, 5.0]

Chapter 20: Probability – Fundamental –


Statistics – Data Types
Things are quite straightforward in Knowledge Representation and
Reasoning; KR&R. Exclusive of doubt, formulating and representing
propositions is easy. The thing is, when uncertainty makes itself
known, problems begin to arise – for example, an expert system
designed to replace a doctor. For diagnosing patients, a doctor
possesses no formal knowledge of treating the patient and no official
rules based off of symptoms. In this situation, to determine if the
patient has a specific condition and also the cure for it, it is the
probability the expert system will use to formulate the highest
probability chance.

Real-Life Probability Examples


As a mathematical term, probability has to do with the possibility that
an event may occur like taking out from a bag of assorted colors a
piece of green or drawing an ace from a deck of cards. In all daily
decision-making process, you use probability even without having a
clue of the consequences. While you may determine the best course
of action is to make judgment calls using subjective probability, you
may not perform actual probability problems sometimes.
Organize around the weather
You can make plans with the weather in mind since you use
probability almost every day. Predicting the weather condition is not
possible for meteorologists and as a result, to establish the
possibility that there will be snow, hail, or rain, they utilize
instruments and tools. For example, it has rained with the conditions
of the weather that is 60 out of 100 days amid the same conditions
when there is a 60 percent chance of rain. Intuitively, rather than
going to work with an umbrella or putting on sandals, closed-toed
shoes, maybe preferred outfit to wear. Also, not only do
meteorologists analyze probable weather patterns for that week or
day but with the historical databases that they also examine to
calculate approximately low and high temperatures.
Strategies in sports
For competitions and games, the probability is what coaches and
athletes utilize to influence the best strategies for sports. When
putting any player in the lineup, a coach of baseball evaluates the
batting average of such a player. For example, out of every ten at-
bats, an athlete may get a base hit two if the player’s batting average
is 200. The odd is even higher for a player to even have, out of every
ten at-bats, four hits when such a player has a 400-batting average.
Another example is when; field goal attempts from over 40 yards out
of 15, a high-school football kicker makes nine in a season, his next
goal effort from the same space may be about 60 percent chance.
We can have an equation like this:
9/15 = 0.60 or 60 percent
Insurance option
To conclude on the plans that are best for your family and even for
you and the required deductible amounts, probability plays a vital
role in analyzing insurance policies. For example, you make use of
probability to know how possible it can be that you will need to make
a declaration when you choose a car insurance policy. You may
likely make consideration for not only liability but comprehensive
insurance on your car when 12 percent or of every 100 drivers over
the past year, 12 out of them in your community have crashed into a
deer. Also, if following a deer-connected event run $2,8000, not to be
in a situation where you cannot afford to cover certain expenses, you
might consider a lower deductible on car repairs.
Recreational and games activities
Probability is what you use when you engage in video or card games
or play board games that has the involvement of chance or luck. A
required video game covert missile or the chances of getting the
cards you need in poker is what you must weigh. Also, the
determination of the extent of the risk you will be eager to take rests
on the possibility of getting those tokens or cards. For example, as
Wolfram Math World suggests, getting three of a class in a poker
hand is the odds of 46.3-to-1, about a chance of 2 percent. However,
you will have about 42 percent or 1.4-to-1 odds that you will catch
one pair. It is through the help of probability that you settle on the
manner with which you intend to play the game when you assess
what is at stake.

Statistics
The basis of modern science is on the statements of probability and
statistical significance. In one example, according to studies,
cigarette smokers have a 20 times greater likelihood of developing
lung cancer than those that don’t smoke. In another research, the
next 200,000 years will have the possibility of a catastrophic
meteorite impact on Earth. Also, against the second male children,
the first-born male children exhibit IQ test scores of 2.82 points. But,
why do scientists talk in ambiguous expressions? Why don’t they say
it that lung cancer is as a result of cigarette smoking? And they could
have informed people if there needs to be an establishment of a
colony on the moon to escape the disaster of the extraterrestrial.
The rationale behind these recent analyses is an accurate reflection
of the data. It is not common to have absolute conclusions in
scientific data. Some smokers can reduce the risk of lung cancer if
they quit, while some smokers never contract the disease, other than
lung cancer; it was cardiovascular diseases that kill some smokers
prematurely. As a form of allowing scientists to make more accurate
statements about their data, it is the statistic function to quantify
variability since there is an exhibition of variability in all data.
Those statistics offer evidence that something is incorrect may be a
common misconception. However, statistics have no such features.
Instead, to observe a specific result, they provide a measure of the
probability. Scientists can put numbers to probability through statistic
techniques, taking a step away from the statement that someone is
more likely to develop lung cancer if they smoke cigarettes to a
report that says it is nearly 20 times greater in cigarette smokers
compared to nonsmokers for the probability of developing lung
cancer. It is a powerful tool the quantification of probability statistics
offers and scientists use it thoroughly, yet they frequently
misunderstand it.
Statistics in data analysis
Developed for data analysis is a large number of procedures for
statistics they are in two parts of inferential and descriptive:

Descriptive statistics:
With the use of measures for deviation like mean, median, and
standard, scientists have the capability of quickly summing up
significant attributes of a dataset through descriptive statistics. They
allow scientists to put the research within a broad context while
offering a general sense of the group they study. For example,
initiated in 1959, potential research on mortality was Cancer
Prevention Study 1 (CPS-1). Among other variables, investigators
gave reports of demographics and ages of the participants to let
them compare, at the time, the United States’ broader population
and also the study group. The age of the volunteers was from ages
30 to 108 with age in the middle as 52 years. The research had 57
percent female as subjects, 2 percent black, and 97 percent white.
Also, in 1960, the total population of female in the US was 51
percent, black was about 11 percent, and white was 89 percent. The
statistics of descriptive easily identified CPS-1’s recognized
shortcoming by suggesting that the research made no effort to
sufficiently consider illness profiles in the US marginal groups when
97 percent of participants were white.
Inferential statistics:
When scientists want to make a considered opinion about data,
making suppositions about bigger populaces with the use of smaller
samples of data, discover connection between variables in datasets,
and model patterns in data, they make use of inferential statistics.
From the perspective of statistics, the term “population” may differ
from the ordinary meaning that it belongs to a collection of people.
The larger group is a geometric population used by a dataset for
making suppositions about a society, locations of an oil field, meteor
impacts, corn plants, or some various set of measurements
accordingly.
With regards to scientific studies, the process of shifting results to
larger populations from small sample sizes is quite essential. For
example, though there was conscription of about 1 million and 1.2
million individuals in that order for the Cancer Prevention Studies I
and II, their representation is for a tiny portion of the 1960 and 1980
United States people that totaled about 179 and 226 million.
Correlation, testing/point estimation, and regression are some of the
standard inferential techniques. For example, Tor Bjerkedal and
Peter Kristensen analyzed 250,000 male’s test scores in IQ for
personnel of the Norwegian military in 2007. According to their
examination, the IQ test scores of the first-born male children scored
higher points of 2.82 +/- 0.07 than second-born male children, 95
percent confidence level of a statistical difference.
The vital concept in the analysis of data is the phrase “statistically
significant,” and most times, people misunderstand it. Similar to the
frequent application of the term significant, most people assume that
a result is momentous or essential when they call it significant.
However, the case is different. Instead, an estimate of the probability
is statistical significance that the difference or observed association
is because of chance instead of any actual connection. In other
words, when there is no valid existing difference or link, statistical
significance tests describe the probability that the difference or a
temporary link would take place. Because it has a similar implication
in statistics typical of regular verbal communication, though people
can measure it, the measure of significance is most times expressed
in terms of confidence.

Data Types
To do Exploratory Data Analysis, EDA, you need to have a clear
grasp of measurement scales, which are also the different data types
because specific data types have correlated with the use of
individual statistical measurements. To select the precise
visualization process, there is also the requirement of identifying
data types with which you are handling. The manner with which you
can categorize various types of variables is data types. Now, let’s
take an in-depth look at the main types of variables and their
examples, and we may refer to them as measurement scales
sometimes.
Categorical data
Characteristics are the representation of categorical data. As a
result, it stands for things such as someone’s language, gender, and
so on. Also, numerical values have a connection with categorical
data like 0 for female and 1 for male. Be aware that those numbers
have no mathematical meaning.
Nominal data
The discrete units are the representation of nominal values, and they
use them to label variables without any quantitative value. They are
nothing but “labels.” It is important to note that nominal data has no
order. Hence, nothing would change about the meaning even if you
improve the order of its values. For example, the value may not
change when a question is asking you for your gender, and you need
to choose between female and male. The order has no value.
Ordinal data
Ordered and discrete units are what ordinal values represent. Except
for the importance of its ordering, ordinal data is therefore almost
similar to nominal data. For example, when a question asks you
about your educational background and has the order of elementary,
high school, undergraduate, and graduate. If you observe, there is a
difference between college and high school and also between high
school and elementary. Here is where the major limitation of ordinal
data suffices; it is hard to know the differences between the values.
Due to this limitation, they use ordinal scales to measure non-
numerical features such as customer satisfaction, happiness, etc.
Numerical Data
Discrete data
When its values are separate and distinct, then we refer to discrete
data. In other words, when the data can take on specific benefits,
then we speak of discrete data. It is possible to count this type of
data, but we cannot measure it. Classification is the category that its
information represents. A perfect instance is the number of heads in
100-coin flips. To know if you are dealing with discrete data or not,
try to ask the following two questions: can you divide it into smaller
and smaller parts, or can you count it?

Continuous data
Measurements are what continuous data represents, and as such,
you can only measure them, but you can’t count their values. For
example, with the use of intervals on the real number lines, you can
describe someone’s height.

Interval data
The representation of ordered units with similar differences is interval
values. Consequently, in the course of a variable that contains
ordered numeric values and where we know the actual differences
between the values is interval data. For example, a feature that
includes a temperature of a given place may have the temperature in
-10, -5, 0, +5, +10, and +15. Interval values have a setback since
they have no “true zero.” It implies that there is no such thing as the
temperature in regards to the example. Subtracting and adding is
possible with interval data. However, they don’t give room for
division, calculation, or multiplication of ratios. Ultimately, it is hard to
apply plenty of inferential and descriptive statistics because there is
no true zero.
Ratio data
Also, with a similar difference, ratio values are ordered units. The
contrast of an absolute zero is what ratio values have, the same as
the interval values. For example, weight, length, height, and so on.

The Importance of Data Types


Since scientists can only use statistical techniques with specific data
types, then data types are an essential concept. You may have a
wrong analysis if you continue to analyze data differently than
categorical data. As a result, you will have the ability to choose the
correct technique of study when you have a clear understanding of
the data with which you are dealing. It is essential to go over every
data once more. However, in regards to what statistic techniques
one can apply. There is a need to understand the basics of
descriptive statistics before you can comprehend what we have to
discuss right now. Note: you can read all about descriptive statistics
down the line in this chapter.

Statistical Methods
Nominal data
The sense behind dealing with nominal data is to accumulate
information with the aid of:

Frequencies:
The degree upon which an occasion takes place concerning a
dataset or over a period is the frequency.

Proportion:
When you divide the frequency by the total number of events, you
can easily calculate the proportion. For example, how often an event
occurs divided by how often the event could occur.

Percentage:
Here, the technique required is visualization, and a bar chart or a pie
chat is all that you need to visualize nominal data. To transform
nominal data into a numeric feature, you can make use of one-hot
encoding in data science.
Ordinal data
The same technique you use in nominal data can be applied with
ordinal data. However, some additional tools here there for you to
access. Consequently, proportions, percentages, and frequencies
are the data you can use for your summary. Bar charts and pie
charts can be used to visualize them. Also, for the review of your
data, you can use median, interquartile range, mode, and
percentiles.
Continuous data
You can use most techniques for your data description when you are
dealing with constant data. For the summary of your data, you can
use range, median, percentiles, standard deviation, interquartile
range, and mean.

Visualization techniques:
A box-plot or a histogram, checking the variability, central tendency,
kurtosis of a distribution, and modality all come to mind when you
are attempting to visualize continuous data. You need to be aware
that when you have any outliers, a histogram may not reveal that.
That is the reason for the use of box-plots.

Descriptive Statistics
As an essential aspect of machine learning, to have an
understanding of your data, you need descriptive statistical analysis
since making predictions is what machine is all about. On the other
hand, as a necessary initial step, you conclude from data through
statistics. Your dataset needs to go through descriptive statistical
analysis. Most people often get to wrong conclusions by losing a
considerable amount of beneficial understandings regarding their
data since they skip this part. It is better to be careful when running
your descriptive statistics, take your time, and for further analysis,
ensure your data complements all prerequisites.
Normal Distribution
Since almost all statistical tests require normally distributed data, the
most critical concept of statistics is the normal distribution. When
scientists plot it, it is essentially the depiction of the patterns of large
samples of data. Sometimes, they refer to it as the “Gaussian curve,”
or the “bell curve.”
There is a requirement that a normal distribution is given for
calculation and inferential statistics of probabilities. The implication of
this is that you must be careful of what statistical test you apply to
your data if it not normally distributed since they could lead to wrong
conclusions.
If your data is symmetrical, unimodal, centered, and bell-shaped, a
normal distribution is given. Each side is an exact mirror of the other
in a perfectly normal distribution.
Central tendency
Mean, mode, and the median is what we need to tackle in statistics.
Also, these three are referred to as the “Central Tendency.” Apart
from being the most popular, these three are distinctive “averages.”
With regards to its consideration as a measure that is most
consistent of the central propensity for formulating a hypothesis
about a population from a particular model, the mean is the average.
For the clustering of your data value around its mean, mode, or
median, central tendency determines the tendency. When the
values’ number is divided, the mean is computed by the sum of all
values.
The category or value that frequently happens contained by the data
is the mode. When there is no repletion of number or similarity in the
class, there is no mode in a dataset. Also, it is likely for a dataset to
have more than one mode. For categorical variables, the single
central tendency measure is the mode since you can compute such
as the variable “gender” average. Percentages and numbers are the
only categorical variables you can report.
Also known as the “50th percentile,” the midpoint or “middle” value in
your data is the median. More than the mean, the median is much
less affected by skewed data and outliers. For example, when a
housing prizes dataset is from $100,000 to £300,000 yet has more
than $3million worth of houses. Divided by the number of values and
the sum of all values, the expensive homes will profoundly impact
the mean. As all data points “middle” value, these outliers will not
profoundly affect the median. Consequently, for your data
description, the median is a much more suited statistic.
Chapter 21: Distributed Systems & Big
Data
Distributed System
A distributed system is a gathering of autonomous PCs which are
interconnected by either a nearby Network on a worldwide network.
Distributed systems enable a different machine to play out various
procedures. Distributed system example incorporates banking
system, air reservation system, etc.
Distributed System has numerous objectives. Some of them are
given underneath.
Scalability - To extend and deal with the server without corrupting
any administrations.
Heterogeneity - To deal with considerable variety types of hubs.
Straightforwardness - to shroud the interior working so that is user
can't understand the complexity.
Accessibility - To make the resources accessible with the goal that
the user accesses the resources and offer the resource adequately.
Receptiveness - To offers administrations as per standard
guidelines.
There are numerous points of interest in a distributed system. Some
of them are given beneath:
Complexity is covered up in a distributed system.
Distributed System guarantees the scalability.
Convey system give consistency.
Distributed System is more productive than other System.
A drawback of distributed System is given underneath:
Cost - It is increasingly costly because the advancement of
distributed System is difficult.
Security - More defenseless to hacking because resources are
uncovered through the network.
Complexity - More mind-boggling to understand fabric usage.
Network reliance - The current network may cause a few issues.
How do I get hands-on with distributed systems?
Learning DS ideas by
1. Building a simple chat application:
Step 1: Start little, implement a simple chat application.
If fruitful, modify it to help multi-user chat sessions.
You should see a few issues here with a message requesting.
Step 2: After reading DS hypothesis for following, causal, and other
requesting procedures, implement every one of them individually into
your System.
2. Building a capacity test system:
Step 1: Write an Android application (no extravagant UI, merely a
few catches) that can embed and inquiry into the hidden Content
Provider. This application ought to have the option to speak with
different gadgets that run your application.
Step 2: After perusing the hypothesis of Chord protocol and DHT,
reenact these protocols in your distributed set up.
For example, Assume I run your application in three emulators.
These three cases of your application should frame a chord ring and
serve embed/question demands in a distributed style, as indicated
by the chord protocol.
If an emulator goes down, at that point, you ought to have the option
to reassign keys dependent on your hashing calculation to at present
running examples.
WHAT ARE THE APPLICATIONS OF DISTRIBUTED SYSTEMS?
An appropriate system is a gathering of computer cooperating, which
shows up as a single computer to the end-user.
Whenever server traffic grows, one has to redesign the hardware
and programming arrangement of the server to deal with it, which is
known as the vertical scaling. The vertical scaling is excellent.
However, one cannot scale it after some purpose of time. Indeed,
even the best hardware and programming can not give better
support for enormous traffic.
Coming up next are the different applications of the distributed
System.
Worldwide situating System
World Wide Web
Airport regulation System
Mechanized Banking System
In the World Wide Web application, the information or application
was distributed on the few numbers of the heterogeneous computer
system, yet for the end-user or the browser, it is by all accounts a
single system from which user got the data.
The multiple numbers of the computer working simultaneously and
play out the asset partaking in the World Wide Web.
These all the System are the adaptation to internal failure, If anyone
system is bomb the application won't become up short,
disappointment computer errand can be given over by another
computer in the System, and this will all occur without knowing to the
end-user or browser.
The elements of the World Wide Web are
Multiple Computer
Common Sate
Interconnection of the Multiple computers.
There are three sorts of distributed systems:
Corporate systems
These separate utilization servers for database, business insight,
exchange preparation, and web administrations. These are more
often than not at one site, yet could have multiple servers at
numerous areas if continuous administration is significant.
Vast web locales, Google, Facebook, Quora, maybe Wikipedia
These resemble the corporate systems; however, are gigantic to the
point that they have their very own character. They are compelled to
be distributed due to their scale.
Ones serving distributed associations that can't depend on system
availability or need local IT assets
The military will require some unit-level direction and control
capacity. The perfect would be that every unit (trooper, transport, and
so on) can go about as a hub so that there is no focal area whose
pulverization would cut everything down.
Mining operations frequently have a significant modern limit at the
remotest places and are best served by local IT for stock control,
finance and staff systems, and particular bookkeeping and arranging
systems.
Development organizations frequently have huge ventures without
significant correspondences so that they will be something like
mining operations above. In the most pessimistic scenario, they may
depend on a driver bouncing in his truck with a memory stick and
associating with the web in some close-by town.

Data Visualization
What is Data Visualization?
Data Visualization is Interactive
Have you at any point booked your flight plans online and saw that
you can now view situate accessibility as well as pick your seat?
Perhaps you have seen that when you need to look into information
online on another nation, you may discover a site where all you need
to do to get political, affordable, land, and other information is drag
your mouse over the area of the nation wherein you are intrigued.
Possibly you have assembled a business introduction comprising of
different degrees of complicated advertising and spending
information in a straightforward display, which enables you to audit
all parts of your report by just tapping on one area of a guide, outline,
or diagram. You may have even made forecasts by adjusting some
information and watching the diagram change before your thought.
Warehouses are following the stock. Businesses are following deals.
Individuals are making visual displays of information that addresses
their issues. The explorer, the understudy, the ordinary laborer, the
advertising official, the warehouse administrator, the CEO are
currently ready to associate with the information they are searching
for with data visualization tools.
Data Visualization is Imaginative
If you can visualize it in your psyche, you can visualize it on a PC
screen. The eager skier might be keen on looking at the average
snowfall at Soldier Mountain, ID. Specialists and understudies may
need to look at the average malignant growth death pace of men to
ladies in Montana or Hawaii. The models are interminable.
Data visualization tools can assist the business visionary with
presenting items on their site imaginatively and educationally. Data
visualization has been grabbed by state and national government
offices to give helpful information to general society. Aircraft exploit
data visualization to be all the more obliging. Businesses utilize data
visualization for following and announcing. Youngsters use data
visualization tools on the home PC to satisfy investigate assignments
or to fulfill their interest in awkward spots of the world.
Any place you go, data visualization will be there. Whatever you
need, data visualization can present answers in an accommodating
way.
Data Visualization is a Comprehensive
Every one of us has looked into information online and found not
exactly accommodating introduction designs that have a way of
either exhibiting necessary details in a complicated technique or
showing complex information in a much progressively complex way.
Every one of us at some time has wanted that that site had a more
user amicable way of introducing the information.
Information is the language of the 21st century, which means
everybody is sending it, and everybody is looking through it. Data
visualization can make both the senders and the searchers cheerful
by creating a primary mechanism for frequently giving complex
information.
Data Visualization Basics
Data visualization is the way toward information/ displaying data in
graphical charts, bars, and figures.
It is used as intends to convey visual answering to users for the
performance, tasks, or general measurements of an application,
system, equipment, or all intents and purposes any IT asset. Data
visualization is ordinarily accomplished by extricating data from the
primary IT system. This data is generally as numbers, insights, and
by and massive action. The data is prepared to utilize displayed on
the system's dashboard and data visualization software.
It is done to help IT directors in getting brisk, visual, and
straightforward knowledge into the performance of the hidden
system. Most IT performance observing applications use data
visualization procedures to give an accurate understanding of the
performance of the checked system.
Software Visualization
Software visualization is the act of making visual tools to delineate
components or generally display parts of source code. This should
be possible with a wide range of programming dialects in different
ways, with different criteria and tools.
The principal thought behind software visualization is that by making
visual interfaces, makers can support developers and others to get
code or to figure out applications. A ton of the intensity of software
visualization has to do with understanding connections between
pieces of code, where specific visual tools, for example, windows,
will openly introduce this information. Different highlights may include
various sorts of charts or formats that developers can use to contrast
existing code with a specific standard.
Enormous Data Visualization
Massive data visualization alludes to the usage of progressively
contemporary visualization methods to show the connections inside
data. Visualization strategies incorporate applications that can
display constant changes and increasingly graphic designs along
these lines going past pie, bar, and different charts. These
delineations veer away from the use of many paths, segments, and
qualities toward a progressively creative visual portrayal of the data.
Ordinarily, when businesses need to introduce connections among
data, they use diagrams, bars, and charts to do it. They can likewise
make use of an assortment of hues, terms, and images. The primary
issue with this arrangement, notwithstanding, is that it doesn't work
superbly of exhibiting exceptionally enormous data or data that
incorporates immense numbers. Data visualization uses increasingly
intelligent, graphical representations - including personalization and
liveliness - to display figures and set up associations among pieces
of information.

The Many Faces of Data Visualization


Data Visualization has turned out to be one of the primary "buzz"
phrases twirling around the Web nowadays. With the majority of the
guarantees of Big Data and the IoT (Internet of Things), more
organizations are trying to get more an incentive from the
voluminous data they produce. This as often as possible, includes
complex examination - both ongoing and chronicled - joined with
robotization.
A critical factor in interpreting this data into meaningful information,
and in this manner, into educated activity, is the methods by which
this data is pictured. Will it be found progressively? Furthermore, by
whom? Will it be shown in vivid air pocket charts and pattern
graphs? Or on the other hand, will it be implanted in high-detail 3D
graphics? What is the objective of the visualization? Is it to share
information? Empower cooperation? Engage in basic leadership?
Data visualization may be a rough idea, yet we don't all have a
similar thought regarding what it implies.
For some organizations, viable data visualization is a significant
piece of working together. It can even involve life and demise (think
human services and military applications). Data visualization (or
information visualization) is a vital piece of some scientific research.
From molecule material science to sociology, making compact yet
incredible visualizations of research data can help researchers
rapidly identify examples or irregularities, and can at times, move
that warm and fluffy inclination we get when we have a feeling that
we've at last folded our head over something.
The present Visual Culture
We live in a present reality that is by all accounts producing new
information at a pace that can be overpowering. With TV, the Web,
roadside announcements, and all the more all competing for our
inexorably divided consideration, the media, and corporate America
are compelled to discover new ways of getting their messages
through the clamor and into our observation. As a rule - when
conceivable - the medium picked to share the message is visual.
Regardless of whether it's through a picture, a video, a fantastic
infographic, or a primary symbol, we have all turned out to be
exceptionally talented at preparing information outwardly.
It's a bustling world with numerous things about which we want to be
educated. While we as a whole get information from multiple points
of view over some random day, just individual bits of that information
will have any genuine impact in transit we think and go about as we
approach our typical lives. The intensity of compelling data
visualization is that it can distill those significant subtleties from
enormous arrangements of data just by placing it in the best possible
setting.
Well-arranged data visualization executed in an outwardly engaging
way can prompt quicker, progressively positive choices. It can reveal
insight into past disappointments and uncover new chances. It can
give an apparatus to a joint effort, arranging, and preparing. It is
turning into a need for some organizations that want to contend in
the commercial center, and the individuals who do it well will
separate themselves.
Chapter 22: Python in the Real World
Now that you know the basics behind Python programming, you
might be wondering where exactly could you apply your knowledge.
Keep in mind that you only started your journey, so right now, you
should focus on practicing all the concepts and techniques you
learned. However, having a specific goal in mind can be extremely
helpful and motivating.
As mentioned earlier in this book, Python is a powerful and versatile
language with many practical applications. It is used in many fields,
from robotics to game development and web-based application
design. In this chapter, you are going to explore some of these fields
to give you an idea about what you can do with your newly acquired
skills.

What is Python Used For?


You're on your way to work listening to your favorite Spotify playlist
and scrolling through your Instagram feed. Once you arrive at the
office, you head over to the coffee machine, and while waiting for
your daily boost, you check your Facebook notifications. Finally, you
head to your desk, take a sip of coffee, and you think, "Hey, I should
Google to learn what Python is used for." At this point, you realize
that every technology you just used has a little bit of Python in it.
Python is used in nearly everything, whether we are talking about a
simple app created by a startup company or a giant corporation like
Google. Let’s go through a brief list of all the ways you can use
Python.
Robotics

Without a doubt, you’ve probably heard about tiny computers like the
Raspberry Pi or Arduino board. They are tiny, inexpensive devices
that can be used in a variety of projects. Some people create cool
little weather stations or drones that can scan the area, while others
build killer robots because why not. Once the hardware problems are
solved, they all need to take care of the software component.
Python is the ideal solution, and it is used by hobbyists and
professionals alike. These tiny computers don't have much power, so
they need the most powerful programming language that uses the
least amount of resources. After all, resources also consume power,
and tiny robots can only pack so much juice. Everything you have
learned so far can be used in robotics because Python is easily
combined with any hardware components without compatibility
issues. Furthermore, there are many Python extensions and libraries
specifically designed for the field of robotics.
In addition, Google uses some Python magic in their AI-based self-
driving car. If Python is good for Google and for creating killer robots,
what more can you want?
Machine Learning
You’ve probably heard about machine learning because it is the new
popular kid on the block that every tech company relies on for
something. Machine learning is all about teaching computer
programs to learn from experience based on data you already have.
Thanks to this concept, computers can learn how to predict various
actions and results.
Some of the most popular machine learning examples can be found
in:
1. Google Maps: Machine learning is used here to
determine the speed of the traffic and to predict for you
the most optimal route to your destination based on
several other factors as well.
2. Gmail: SPAM used to be a problem, but thanks to
Google’s machine learning algorithms, SPAM can now
be easily detected and contained.
3. Spotify or Netflix: Noticed how any of these streaming
platforms have a habit of knowing what new things to
recommend to you? That's all because of machine
learning. Some algorithms can predict what you will like
based on what you have watched or listened to so far.
Machine learning involves programming, as well as a great deal of
mathematics. Python's simplicity makes it attractive for both
programmers and mathematicians. Furthermore, unlike other
programming languages, Python has a number of add-ons and
libraries created explicitly for machine learning and data science,
such as Tensorflow, NumPy, Pandas, and Scikit-learn.
Cybersecurity

Data security is one of the biggest concerns of our century. By


integrating our lives and business into the digital world, we make it
vulnerable to unauthorized access. You probably read every month
about some governmental institution or company getting hacked or
taken offline. Most of these situations involve terrible security due to
outdated systems and working with antiquated programming
languages.
Python's own popularity is something that makes it far more secure
than any other. How so? When something is popular, it becomes
driven by a large community of experts and testers. For this reason,
Python is often patched, and security issues are plugged in less than
a day. This makes it a popular language in the field of cybersecurity.
Web Development
As mentioned several times before, Python is simple yet powerful.
Many companies throughout the world, no matter the size, rely on
Python to build their applications, websites, and other tools. Even
giants like Google and Facebook rely on Python for many of their
solutions.
We discussed earlier in the book, the main advantages of working
with Python so that we won't explore them yet again. However, it is
worth mentioning that Python is often used as a glue language,
especially in web development. Creating web tools always involves
several different programming languages, database management
languages, and so on. Python can act as the integration language by
calling C++ data types and combining them with other elements, for
example. C++ is mentioned because in many tech areas, the critical
performance components are written in C++, which offers
unmatched performance. However, Python is used for high-level
customization.
Chapter 23: Linear Regression
The easiest and most basic machine learning algorithm is linear
regression. It will be the first one that we are going to look at, and it
is a supervised learning algorithm. That means that we need both –
inputs and outputs – to train the model.

Mathematical Explanation
Before we get into the coding, let us talk about the mathematics
behind this algorithm.
In the figure above, you see a lot of different points, which all have
an x-value and a y-value. The x-value is called the feature, whereas
the y-value is our label. The label is the result of our feature. Our
linear regression model is represented by the blue line that goes
straight through our data. It is placed so that it is as close as possible
to all points at the same time. So we “trained” the line to fit the
existing points or the existing data.
The idea is now to take a new x-value without knowing the
corresponding y-value. We then look at the line and find the resulting
y-value there, which the model predicts for us. However, since this
line is quite generalized, we will get a relatively inaccurate result.
However, one must also mention that linear model only really
develops their effectiveness when we are dealing with numerous
features (i.e., higher dimensions).
If we are applying this model to data of schools and we try to find a
relation between missing hours, learning time, and the resulting
grade, we will probably get a less accurate result than by including
30 parameters. Logically, however, we then no longer have a straight
line or flat surface but a hyperplane. This is the equivalent to a
straight line, in higher dimensions.

Preparing Data
Our data is now fully loaded and selected. However, in order to use it
as training and testing data for our model, we have to reformat them.
The sklearn models do not accept Pandas data frames, but only
NumPy arrays. That's why we turn our features into an x-array and
our label into a y-array.
X = np.array(data.drop([prediction], 1))
Y = np.array(data[prediction])
The method np.array converts the selected columns into an array.
The drop function returns the data frame without the specified
column. Our X array now contains all of our columns, except for the
final grade. The final grade is in the Y array.
In order to train and test our model, we have to split our available
data. The first part is used to get the hyperplane to fit our data as
well as possible. The second part then checks the accuracy of the
prediction, with previously unknown data.
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.1)
With the function train_test_split, we divide our X and Y arrays into
four arrays. The order must be exactly as shown here. The test_size
parameter specifies what percentage of records to use for testing. In
this case, it is 10%. This is also a good and recommended value. We
do this to test how accurate it is with data that our model has never
seen before.

Training and Testing


Now we can start training and testing our model. For that, we first
define our model.
model = LinearRegression()
model.fit(X_train, Y_train)
By using the constructor of the LinearRegression class, we create
our model. We then use the fit function and pass our training data.
Now our model is already trained. It has now adjusted its hyperplane
so that it fits all of our values.
In order to test how well our model performs, we can use the score
method and pass our testing data.
accuracy = model.score(X_test, Y_test)
print(accuracy)
Since the splitting of training and test data is always random, we will
have slightly different results on each run. An average result could
look like this:
0.9130676521162756
Actually, 91 percent is pretty high and good accuracy. Now that we
know that our model is somewhat reliable, we can enter new data
and predict the final grade.
X_new = np.array([[18, 1, 3, 40, 15, 16]])
Y_new = model.predict(X_new)
print(Y_new)
Here we define a new NumPy array with values for our features in
the right order. Then we use the predict method to calculate the likely
final grade for our inputs.
[17.12142363]
In this case, the final grade would probably be 17.

Visualizing Correlations
Since we are dealing with high dimensions here, we can’t draw a
graph of our model. This is only possible in two or three dimensions.
However, what we can visualize are relationships between individual
features.
plt.scatter(data['study time'], data['G3'])
plt.title("Correlation")
plt.xlabel("Study Time")
plt.ylabel("Final Grade")
plt.show()
Here we draw a scatter plot with the function scatter, which shows
the relationship between the learning time and the final grade.
In this case, we see that the relationship is not really strong. The
data is very diverse and you cannot see a clear pattern.
plt.scatter(data['G2'], data['G3'])
plt.title("Correlation")
plt.xlabel("Second Grade")
plt.ylabel("Final Grade")
plt.show()
However, if we look at the correlation between the second grade and
the final grade, we see a much stronger correlation.
Here we can clearly see that the students with good second grades
are very likely to end up with a good final grade as well. You can play
around with the different columns of this data set if you want to.
Conclusion
In conclusion, Python and big data provide one of the strongest
capabilities in computational terms on the platform of big data
analysis. If this is your first time at data programming, Python will be
a much easier language to learn than any other and is far more user-
friendly.
And so, we've come to the end of this book, which was meant to give
you a taste of data analysis techniques and visualization beyond the
basics using Python. Python is a wonderful tool to use for data
purposes, and I hope this guide stands you in good stead as you go
about using it for your purposes.
I have tried to go more in-depth in this book, give you more
information on the fundamentals of data science, along with lots of
useful, practical examples for you to try out.
Please read this guide as often as you need to and don’t move on
from a chapter until you fully understand it. And do try out the
examples included – you will learn far more if you actually do it
rather than just reading the theory.
This was just an overview to recap on what you learned in the first
book, covering the datatypes in pandas and how they are used. We
also looked at cleaning the data and manipulating it to handle
missing values and do some string operations.

You might also like