Lesson 03 3.01 Python Libraries For Data Science
Lesson 03 3.01 Python Libraries For Data Science
A Python library is a group of interconnected modules. It contains code bundles that can be reused
in different programs and apps.
Python programming is made easier and more convenient for programmers due
to its reusability.
Python Libraries
SciPy
Pandas Matplotlib
NumPy Scikit-learn
Easy to learn
Open source
Big open-source
open source community
01 02
03 04
05
In Python, a file is referred to as a module. The import keyword is used to utilize it.
Whenever we need
to use a module, we
Importing math
import it from its
library
library.
import math
Example 🡪
Example: Import Module in Python
In this code, the math library is imported. One of its methods, that is sqrt(square root), is used without
writing the actual code to calculate the square root of a number.
Output:
Example:
import math
A = 16
print(math.sqrt(A))
Example: Import Module in Python
As in the previous code, a complete library is imported to use one of its methods. However, only
importing “sqrt” from the math library would have worked.
Output:
Example:
In the above code, only “sqrt” and “sin” methods from the math library are imported.
NumPy
NumPy
NumPy (Numerical Python) is a library that consists of multidimensional array objects and a
collection of functions for manipulating them.
Numpy
26 43 52
Import NumPy
numpy.add()
3 NumPy arithmetic functions numpy.subtract()
numpy.mod() and numpy.power()
numpy.median()
4 NumPy statistical functions numpy.mean()
numpy.average()
NumPy Function: Example 1
To access NumPy and its functions, import it in the Python code as shown below:
Output:
Example:
import numpy as np
a = np.array([1, 2, 3])
print(a.shape)
In this example, the NumPy module is imported and the shape function
is used.
NumPy Function: Example 2
To access NumPy and its functions, import it in the Python code as shown below:
Output:
Example:
import numpy as np
a = np.array([1, 2, 3, 4, 5,
6, 7, 8, 9, 10, 11, 12])
newa = a.reshape(4, 3)
print(newa)
Pandas is a Python package that allows you to work with large datasets.
NumPy
Why Pandas
Intrinsic data
alignment
The various features of pandas make it an efficient library for data scientists.
Powerful data
structure
Fast and
High performance
efficient
merging and joining
data wrangling
of data sets
Pandas
Intelligent and Easy data
automated aggregation and
data alignment transformation
Series is a one-dimensional array-like object containing data and labels (or index).
Data 4 11 21 36
0 1 2 3
Label(index)
Data alignment is intrinsic and will not be broken until changed explicitly by program.
Series
Data Input
• Integer
• String
• Python • Data Structures
ndarray 2 3 8 4
Object • dict 0 1 2 3
• Floating • scalar
Point • list Label(index)
Data Structures
Basic Method
4 11 21 36
S = pd.Series(data, index = [index])
Series
Creating Series from a List
Import
libraries
Pass list as an argument
Data
value
Index
Data
type
The index is not created for data but notices that data alignment is done automatically.
Data Frame
Data Frame is a two-dimensional labeled data structure with columns of potentially different types.
Data Input
• Integer
• String
• ndarray 2 3 8 4
• Python
• dict 5 8 10 1
Object
• List 0 1 2 3
• Floating
• Series Label(index)
Point
• Data Frame
Data Types Data Frame
Creating Data Frame from Lists
This example shows how to create a Data Frame from a series of dicts.
Entire dict
Viewing Data Frame
A Data Frame can be viewed by referring to the column names or using the describe function.
Pandas Functions: Example 1
The example shown uses two functions: the pd.read csv() function to import a dataset, and the
df.head() function to output the first five rows of a dataset.
Output:
Example:
import pandas as pd
import numpy as np
df = pd.read_csv('driver-data.csv')
df.head()
Pandas Functions: Example 2
The example uses df.shape() function to output the shape of the dataset.
Output:
Example:
import pandas as pd
import numpy as np
df = pd.read_csv('driver-data.csv')
df.shape
Pandas Functions: Example 3
The example uses df.info() function to output the information of the dataset.
Output:
Example:
import pandas as pd
import numpy as np
df = pd.read_csv('driver-data.csv')
df.info
Matplotlib
Matplotlib
Matplotlib is a visualization tool that uses a low-level graph plotting library written in Python.
Using Pythons’ matplotlib makes data visualization of large and complex data easy.
Matplotlib
There are several advantages of using matplotlib to visualize data. They are as follows:
With jupyter
notebook
Can work well Has high-quality integration, the Has large
with many graphics and plots developers are free community
operating to print and view a to spend their time support and cross-
Is a multi- systems and range of graphs implementing platform support
platform data graphics features as it is an open- Has full
visualization tool; backends source tool control over
therefore, it is graphs or plot
fast and efficient styles
The Plot
A plot is a graphical representation of data, which shows the relationship between two
variables or the distribution of data.
Title
First Plot
1.1
Legend
1.0
0.9
Numbers
0.8 Grid
Y -axis 0.7
0.6
0.5
0.4
0.
0.3
2 0 1 3 4 5 6 7
Range
X-axis
Steps to Create a Plot
First
1.1 Plot
1.0
0.9
0.8
Numbers
0.7
0.6
0.5
0.4
0.
0.3
2 0 1 3 4 5 6 7
Range
Follow the steps to obtain this plot.
Steps to Create Plot: Example
SciPy is a free and open-source Python library used for scientific and technical computing.
SciPy has built-in packages that help in handling the scientific domains.
Mathematics
integration Statistics
(Normal
distribution)
Linear algebra
Multidimensional
image processing
Mathematics Language
constants integration
SciPy and Its Characteristics
Simplifies scientific
application development 6
Efficient and fast data
3 processing
cluster ndimage
Clustering algorithms N-dimensional image processing
constants odr
Physical and mathematical constant Orthogonal distance regression
fftpack optimize
Fast Fourier Transform routines Optimization and root-finding routines
integrate signal
Integration and ordinary differential equation solvers Signal processing
Spatial sparse
Spatial data structures and algorithms Sparse matrices and associated routines
interpolate weave
Interpolation and smoothing splines C/C++ integration
IO stats
Input and Output Statistical distributions and functions
special
linalg
Special functions
Linear algebra
SciPy Packages
IO
Optimize
Integration
Statistics
SciPy Packages: Example 1
Output:
Example:
linalg.det( two_d_array )
Output:
Example:
In this example, the function returns two values in which the first value is integration, and the
second value is the estimated error in integral.
Scikit-Learn
Scikit-Learn
Scikit is a powerful and modern machine learning Python library. It is used for fully- and
semi-automated data analysis and information extraction.
Scikit is a powerful and modern machine learning Python library. It is used for fully- and
semi-automated data analysis and information extraction.
Scikit-learn helps data scientists and machine learning engineers to solve problems
using the problem-solution approach.
Points to be considered while working with a scikit-learn dataset or loading the data to
scikit-learn:
Verify that the features and responses are in the form of a NumPy ndarray
Check features and responses have the same shape and size as the array
Pandas SciPy
Libraries
NumPy Matplotlib
Scikit-Learn: Installation
Web scraping is the process of constructing an algorithm that can extract, parse, download,
and organize useful information from the web automatically.
Web Scraping: Process
The four processes followed in web scraping are web requesting, retrieving, parsing, and
storing the desired data format.
Web Scraping: Tools
BeautifulSoup is an easy, intuitive, and robust Python library designed for web scraping.
BeautifulSoup
SymPy Requests SQLAlchemy Twisted
Data is stored in many file formats on the web and can be processed using web scraping.
Parser
Input Output
Commands Parser Methods
Example: Output:
import requests
from bs4 import BeautifulSoup
URL =
"http://www.values.com/inspirational-
quotes"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
# If this line causes an error, run 'pip install
html5lib' or install html5lib
print(soup.prettify())
Key Takeaways
SciPy is a free and open-source Python library used for scientific and
technical computing.
A. scipy.cluster
B. scipy.source
C. scipy.interpolate
D. scipy.signal
Knowledge
Check
A. scipy.cluster
B. scipy.source
C. scipy.interpolate
D. scipy.signal
A. Math
B. Random
C. Pandas
A. Math
B. Random
C. Pandas
A. 1D
B. 2D
C. 3D
A. 1D
B. 2D
C. 3D