Buy ebook (Ebook) A Python Data Analyst’s Toolkit: Learn Python and Python-based Libraries with Applications in Data Analysis and Statistics by Gayathri Rajagopalan ISBN 9781484263983, 9781484263990, 1484263987, 1484263995 cheap price
Buy ebook (Ebook) A Python Data Analyst’s Toolkit: Learn Python and Python-based Libraries with Applications in Data Analysis and Statistics by Gayathri Rajagopalan ISBN 9781484263983, 9781484263990, 1484263987, 1484263995 cheap price
com
DOWLOAD EBOOK
https://ebooknice.com/product/a-python-data-analysts-toolkit-29870264
ebooknice.com
ebooknice.com
https://ebooknice.com/product/introduction-to-python-for-econometrics-
statistics-and-data-analysis-53726340
ebooknice.com
https://ebooknice.com/product/learn-data-analysis-with-python-lessons-
in-coding-44859412
ebooknice.com
(Ebook) Python Data Analysis: Perform data collection,
data processing, wrangling, visualization, and model
building using Python by Avinash Navlani, et al.
https://ebooknice.com/product/python-data-analysis-perform-data-
collection-data-processing-wrangling-visualization-and-model-building-
using-python-56137214
ebooknice.com
https://ebooknice.com/product/data-analysis-with-python-and-
pyspark-38373968
ebooknice.com
A Python Data
Analyst’s Toolkit
Learn Python and Python-based
Libraries with Applications in Data
Analysis and Statistics
—
Gayathri Rajagopalan
A Python Data
Analyst’s Toolkit
Learn Python and Python-based
Libraries with Applications in Data
Analysis and Statistics
Gayathri Rajagopalan
A Python Data Analyst’s Toolkit: Learn Python and Python-based Libraries with
Applications in Data Analysis and Statistics
Gayathri Rajagopalan
Introduction������������������������������������������������������������������������������������������������������������xix
v
Table of Contents
vi
Table of Contents
Indexing������������������������������������������������������������������������������������������������������������������������������������ 169
Type of an index object�������������������������������������������������������������������������������������������������������� 170
Creating a custom index and using columns as indexes���������������������������������������������������� 171
Indexes and speed of data retrieval������������������������������������������������������������������������������������ 173
Immutability of an index������������������������������������������������������������������������������������������������������ 174
Alignment of indexes����������������������������������������������������������������������������������������������������������� 176
Set operations on indexes��������������������������������������������������������������������������������������������������� 177
Data types in Pandas���������������������������������������������������������������������������������������������������������������� 178
Obtaining information about data types������������������������������������������������������������������������������ 179
Indexers and selection of subsets of data�������������������������������������������������������������������������������� 182
Understanding loc and iloc indexers����������������������������������������������������������������������������������� 183
Other (less commonly used) indexers for data access�������������������������������������������������������� 188
Boolean indexing for selecting subsets of data������������������������������������������������������������������� 192
Using the query method to retrieve data����������������������������������������������������������������������������� 192
Operators in Pandas������������������������������������������������������������������������������������������������������������������ 193
Representing dates and times in Pandas��������������������������������������������������������������������������������� 194
Converting strings into Pandas Timestamp objects������������������������������������������������������������ 195
Extracting the components of a Timestamp object������������������������������������������������������������� 196
Grouping and aggregation�������������������������������������������������������������������������������������������������������� 197
Examining the properties of the groupby object����������������������������������������������������������������� 199
Filtering groups������������������������������������������������������������������������������������������������������������������� 201
Transform method and groupby������������������������������������������������������������������������������������������ 202
Apply method and groupby������������������������������������������������������������������������������������������������� 204
How to combine objects in Pandas������������������������������������������������������������������������������������������� 204
Append method for adding rows����������������������������������������������������������������������������������������� 205
Concat function (adding rows or columns from other objects)������������������������������������������� 207
Join method – index to index���������������������������������������������������������������������������������������������� 210
Merge method – SQL type join based on common columns����������������������������������������������� 211
viii
Table of Contents
ix
Table of Contents
lmplot���������������������������������������������������������������������������������������������������������������������������������� 266
Strip plot������������������������������������������������������������������������������������������������������������������������������ 267
Swarm plot�������������������������������������������������������������������������������������������������������������������������� 268
Catplot��������������������������������������������������������������������������������������������������������������������������������� 269
Pair plot������������������������������������������������������������������������������������������������������������������������������� 270
Joint plot������������������������������������������������������������������������������������������������������������������������������ 272
Summary���������������������������������������������������������������������������������������������������������������������������������� 273
Review Exercises���������������������������������������������������������������������������������������������������������������������� 274
x
Table of Contents
Index��������������������������������������������������������������������������������������������������������������������� 393
xi
About the Author
Gayathri Rajagopalan works for a leading Indian
multinational organization, with ten years of experience
in the software and information technology industry.
She has degrees in computer engineering and business
adminstration, and is a certified Project Management
Professional (PMP). Some of her key focus areas include
Python, data analytics, machine learning, statistics, and
deep learning. She is proficient in Python, Java, and C/C++
programming. Her hobbies include reading, music, and
teaching programming and data science to beginners.
xiii
About the Technical Reviewer
Manohar Swamynathan is a data science practitioner
and an avid programmer, with over 14 years of experience
in various data science related areas that include data
warehousing, Business Intelligence (BI), analytical tool
development, ad hoc analysis, predictive modeling, data
science product development, consulting, formulating
strategy, and executing analytics programs. He’s had a
career covering the life cycle of data across different
domains such as US mortgage banking, retail/ecommerce,
insurance, and industrial IoT. He has a bachelor’s degree
with a specialization in physics, mathematics, and
computers, and a master’s degree in project management. He’s currently living in
Bengaluru, the Silicon Valley of India.
xv
Acknowledgments
This book is a culmination of a year-long effort and would not have been possible
without my family’s support. I am indebted to them for their patience, kindness, and
encouragement.
I would also like to thank my readers for investing their time and money in this book. It is
my sincere hope that this book adds value to your learning experience.
xvii
Introduction
I had two main reasons for writing this book. When I first started learning data science,
I could not find a centralized overview of all the important topics on this subject.
A practitioner of data science needs to be proficient in at least one programming
language, learn the various aspects of data preparation and visualization, and also
be conversant with various aspects of statistics. The goal of this book is to provide
a consolidated resource that ties these interconnected disciplines together and
introduces these topics to the learner in a graded manner. Secondly, I wanted to provide
material to help readers appreciate the practical aspects of the seemingly abstract
concepts in data science, and also help them to be able to retain what they have learned.
There is a section on case studies to demonstrate how data analysis skills can be applied
to make informed decisions to solve real-world challenges. One of the highlights of
this book is the inclusion of practice questions and multiple-choice questions to help
readers practice and apply whatever they have learned. Most readers read a book and
then forget what they have read or learned, and the addition of these exercises will help
readers avoid this pitfall.
The book helps readers learn three important topics from scratch – the Python
programming language, data analysis, and statistics. It is a self-contained introduction
for anybody looking to start their journey with data analysis using Python, as it focuses
not just on theory and concepts but on practical applications and retention of concepts.
This book is meant for anybody interested in learning Python and Python-based libraries
like Pandas, Numpy, Scipy, and Matplotlib for descriptive data analysis, visualization,
and statistics. The broad categories of skills that readers learn from this book include
programming skills, analytical skills, and problem-solving skills.
The book is broadly divided into three parts – programming with Python, data analysis
and visualization, and statistics. The first part of the book comprises three chapters. It
starts with an introduction to Python – the syntax, functions, conditional statements,
data types, and different types of containers. Subsequently, we deal with advanced
concepts like regular expressions, handling of files, and solving mathematical problems
xix
Introduction
with Python. Python is covered in detail before moving on to data analysis to ensure that
the readers are comfortable with the programming language before they learn how to
use it for purposes of data analysis.
The second part of the book, comprising five chapters, covers the various aspects of
descriptive data analysis, data wrangling and visualization, and the respective Python
libraries used for each of these. There is an introductory chapter covering basic concepts
and terminology in data analysis, and one chapter each on NumPy (the scientific
computation library), Pandas (the data wrangling library), and the visualization
libraries (Matplotlib and Seaborn). A separate chapter is devoted to case studies to
help readers understand some real-world applications of data analysis. Among these
case studies is one on air pollution, using data drawn from an air quality monitoring
station in New Delhi, which has seen alarming levels of pollution in recent years. This
case study examines the trends and patterns of major air pollutants like sulfur dioxide,
nitrogen dioxide, and particulate matter for five years, and comes up with insights and
recommendations that would help with designing mitigation strategies.
The third section of this book focuses on statistics, elucidating important principles in
statistics that are relevant to data science. The topics covered include probability, Bayes
theorem, permutations and combinations, hypothesis testing (ANOVA, chi-squared
test, z-test, and t-test), and the use of various functions in the Scipy library to enable
simplification of tedious calculations involved in statistics.
By the end of this book, the reader will be able to confidently write code in Python, use
various Python libraries and functions for analyzing any dataset, and understand basic
statistical concepts and tests. The code is presented in the form of Jupyter notebooks
that can further be adapted and extended. Readers get the opportunity to test their
understanding with a combination of multiple-choice and coding questions. They
also get an idea about how to use the skills and knowledge they have learned to make
evidence-based decisions for solving real-world problems with the help of case studies.
xx
CHAPTER 1
Getting Familiar
with Python
Python is an open source programming language created by a Dutch programmer
named Guido van Rossum. Named after the British comedy group Monty Python,
Python is a high-level, interpreted, open source language and is one of the most sought-
after and rapidly growing programming languages in the world today. It is also the
language of preference for data science and machine learning.
In this chapter, we first introduce the Jupyter notebook – a web application for running
code in Python. We then cover the basic concepts in Python, including data types,
operators, containers, functions, classes and file handling and exception handling, and
standards for writing code and modules.
The code examples for this book have been written using Python version 3.7.3 and
Anaconda version 4.7.10.
T echnical requirements
Anaconda is an open source platform used widely by Python programmers and data
scientists. Installing this platform installs Python, the Jupyter notebook application, and
hundreds of libraries. The following are the steps you need to follow for installing the
Anaconda distribution.
2. Click the installer for your operating system, as shown in Figure 1-1.
The installer gets downloaded to your system.
1
© Gayathri Rajagopalan 2021
G. Rajagopalan, A Python Data Analyst’s Toolkit, https://doi.org/10.1007/978-1-4842-6399-0_1
Chapter 1 Getting Familiar with Python
3. Open the installer (file downloaded in the previous step) and run it.
Please follow the following steps for downloading all the data files used in this book:
Now that we have installed and launched Jupyter, let us understand how to use this
application in the next section.
JupyterLab is the IDE for Jupyter notebooks. Jupyter notebooks are web applications that
run locally on a user’s machine. They can be used for loading, cleaning, analyzing, and
modeling data. You can add code, equations, images, and markdown text in a Jupyter
notebook. Jupyter notebooks serve the dual purpose of running your code as well as
serving as a platform for presenting and sharing your work with others. Let us look at the
various features of this application.
Type “jupyter notebook” in the search bar next to the start menu.
This will open the Jupyter dashboard. The dashboard can be used
to create new notebooks or open an existing one.
Click inside the first cell in your notebook and type a simple line
of code, as shown in Figure 1-4. Execute the code by selecting Run
Cells from the “Cell” menu, or use the shortcut keys Ctrl+Enter.
3
Chapter 1 Getting Familiar with Python
5. Renaming a notebook
Click the default name of the notebook and type a new name, as
shown in Figure 1-6.
Table 1-1 gives some of the familiar icons found in Jupyter notebooks, the corresponding
menu functions, and the keyboard shortcuts.
5
Chapter 1 Getting Familiar with Python
Adding a new cell to a Esc+b (adding a cell below the Insert ➤ Insert Cell
Jupyter notebook current cell), or Esc+a (adding Above or Insert ➤
a cell above the current cell) Insert Cell Below
Running a given cell Ctrl+Enter (to run selected cell); Cell ➤ Run
Shift+Enter (to run selected cell Selected Cells
and insert a new cell)
If you are not sure about which keyboard shortcut to use, go to: Help ➤ Keyboard
Shortcuts, as shown in Figure 1-8.
• Shift+Enter to run the code in the current cell and move to the next
cell.
T ab Completion
This is a feature that can be used in Jupyter notebooks to help you complete the code
being written. Usage of tab completions can speed up the workflow, reduce bugs, and
quickly complete function names, thus reducing typos and saving you from having to
remember the names of all the modules and functions.
For example, if you want to import the Matplotlib library but don’t remember the
spelling, you could type the first three letters, mat, and press Tab. You would see a drop-
down list, as shown in Figure 1-9. The correct name of the library is the second name in
the drop-down list.
7
Chapter 1 Getting Familiar with Python
One commonly used magic command, shown in the following, is used to display
Matplotlib graphs inside the notebook. Adding this magic command avoids the need
to call the plt.show function separately for showing graphs (the Matplotlib library is
discussed in detail in Chapter 7).
CODE:
%matplotlib inline
Magic commands, like timeit, can also be used to time the execution of a script, as shown
in the following.
CODE:
%%timeit
for i in range(100000):
i*i
Output:
16.1 ms ± 283 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Now that you understand the basics of using Jupyter notebooks, let us get started with
Python and understand the core aspects of this language.
P
ython Basics
In this section, we get familiar with the syntax of Python, commenting, conditional
statements, loops, and functions.
C
omments
A comment explains what a line of code does, and is used by programmers to help others
understand the code they have written. In Python, a comment starts with the # symbol.
8
Chapter 1 Getting Familiar with Python
Proper spacing and indentation are critical in Python. While other languages like Java
and C++ use brackets to enclose blocks of code, Python uses an indent of four spaces
to specify code blocks. One needs to take care of indents to avoid errors. Applications
like Jupyter generally take care of indentation and automatically add four spaces at the
beginning of a block of code.
Printing
The print function prints content to the screen or any other output device.
CODE:
print("Hello!")
To print multiple lines of code, we use triple quotes at the beginning and end of the
string, for example:
CODE:
Output:
Note that we do not use semicolons in Python to end statements, unlike some other
languages.
The format method can be used in conjunction with the print method for embedding
variables within a string. It uses curly braces as placeholders for variables that are passed
as arguments to the method.
Let us look at a simple example where we print variables using the format method.
9
Chapter 1 Getting Familiar with Python
CODE:
weight=4.5
name="Simi"
print("The weight of {} is {}".format(name,weight))
Output:
The preceding statement can also be rewritten as follows without the format method:
CODE:
Note that only the string portion of the print argument is enclosed within quotes. The name
of the variable does not come within quotes. Similarly, if you have any constants in your
print arguments, they also do not come within quotes. In the following example, a Boolean
constant (True), an integer constant (1), and strings are combined in a print statement.
CODE:
Output:
The format fields can specify precision for floating-point numbers. Floating-point
numbers are numbers with decimal points, and the number of digits after the decimal
point can be specified using format fields as follows.
CODE:
x=91.234566
print("The value of x upto 3 decimal points is {:.3f}".format(x))
Output:
We can specify the position of the variables passed to the method. In this example, we
use position “1” to refer to the second object in the argument list, and position “0” to
specify the first object in the argument list.
10
Chapter 1 Getting Familiar with Python
CODE:
y='Jack'
x='Jill'
print("{1} and {0} went up the hill to fetch a pail of water".format(x,y))
Output:
I nput
The input function accepts inputs from the user. The input provided by the user is stored
as a variable of type String. If you want to do any mathematical calculations with any
numeric input, you need to change the data type of the input to int or float, as follows.
CODE:
Output:
V
ariables and Constants
A constant or a literal is a value that does not change, while a variable contains a value
can be changed. We do not have to declare a variable in Python, that is, specify its data
type, unlike other languages like Java and C/C++. We define it by giving the variable a
name and assigning it a value. Based on the value, a data type is automatically assigned
to it. Values are stored in variables using the assignment operator (=). The rules for
naming a variable in Python are as follows:
• a variable name cannot have spaces
11
Chapter 1 Getting Familiar with Python
Operators
The following are some commonly used operators in Python.
Arithmetic operators: Take two integer or float values, perform an operation, and return
a value.
• **(Exponent)
• %(modulo or remainder),
• //(quotient),
• *(multiplication)
• -(subtraction)
• +(addition)
CODE:
(1+9)/2-3
Output:
2.0
12
Chapter 1 Getting Familiar with Python
In the preceding expression, the operation inside the parenthesis is performed first,
which gives 10, followed by division, which gives 5, and then subtraction, which gives the
final output as 2.
Comparison operators: These operators compare two values and evaluate to a true or
false value. The following comparison operators are supported in Python:
• >: Greater than
• < : Less than
• <=: Less than or equal to
• >=: Greater than or equal to
• == : equality. Please note that this is different from the assignment
operator (=)
• !=(not equal to)
Logical (or Boolean) operators: Are similar to comparison operators in that they
also evaluate to a true or false value. These operators operate on Boolean variables or
expressions. The following logical operators are supported in Python:
Output:
False
CODE:
(2>1) or (1>3)
13
Exploring the Variety of Random
Documents with Different Content
Auch nicht Stellung ists, noch Titel,
Nicht der Rock, noch bunter Kittel,
Dann erst lob' ich mir den Mann,
Wenn ich ihn noch kriegen kann.«
Aber die edle Frau, die aus kühler Ferne beobachtet, ohne Herz
und ohne Galle sprechen zu lassen, die gleichgültig lächelnd vom
hohen Balcon herabschaut auf das Treiben der Männlein und
Weiblein, die singt ein anderes Lied, das da lautet:
Updated editions will replace the previous one—the old editions will
be renamed.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebooknice.com