Lesson 03 Python Libraries For Data Science
Lesson 03 Python Libraries For Data Science
A Python library is a group of interconnected modules. It contains code bundles that can be reused
in different programs and apps.
Python programming is made easier and more convenient for programmers due
to its reusability.
Python Libraries
SciPy
Pandas Matplotlib
NumPy Scikit-learn
Easy to learn
Open source
Big open-source
open source community
The library contains a lot of efficient tools for machine learning and
Scikit-learn
statistical modeling including classification, regression, clustering, and
dimensionality reduction.
Import Library into Python Program
Import Module in Python
In Python, a file is referred to as a module. The import keyword is used to utilize it.
Whenever we need
to use a module, we
Importing math
import it from its
library
library.
import math
Example 🡪
Example: Import Module in Python
In this code, the math library is imported. One of its methods, that is sqrt(square root), is used without
writing the actual code to calculate the square root of a number.
Output:
Example:
import math
A = 16
print(math.sqrt(A))
Example: Import Module in Python
As in the previous code, a complete library is imported to use one of its methods. However, only
importing “sqrt” from the math library would have worked.
Output:
Example:
In the above code, only “sqrt” and “sin” methods from the math library are imported.
NumPy
Introduction to NumPy
The installation of NumPy is easy if Python and PIP are already installed on the system. The following
command is used to install NumPy:
Output:
Example:
import numpy as np The import numpy portion of
arr = np.array ([1,2,3,4,5])
print (arr)
the code tells Python to bring
the NumPy library into the
current environment.
NumPy: Array Object
Output:
Example:
import numpy as np
arr = np.array ([10,20,30,40,50])
print (arr)
print (type(arr))
The built-in Python function
returns the type of the object
passed to it.
Output:
Example:
import numpy as np
arr = np.array(60)
print (arr)
Dimensions in Arrays: Example
1-D arrays are the basic arrays. It has 0-D arrays as its elements.
Output:
Example:
import numpy as np
arr = np.array([10,20,30,40])
print (arr)
Dimensions in Arrays: Example
Output:
Example:
import numpy as np
arr = np.array([[10,20,30,40], [50,60,70,80]])
print (arr)
Dimensions in Arrays: Example
3-D arrays represent a 3rd-order tensor. It has 2-D arrays as its elements.
Output:
Example:
import numpy as np
arr =
np.array([[[10,20,30,40],[50,60,70,80]],[[12,13,14,15],[16,17,18
,19]]])
print (arr)
Number of Dimensions
Output:
Example:
import numpy as np
p = np.array(50)
q = np.array([10,20,30,40,50])
r = np.array([[10,20,30,40], [50,60,70,80]])
s =
np.array([[[10,20,30,40],[50,60,70,80]],[[12,13,14,15],[16,17,1
8,19]]])
print (p.ndim)
print (q.ndim)
print (r.ndim)
print (s.ndim)
Number of Dimensions
Output:
Example:
import numpy as np
p = np.array(50)
q = np.array([10,20,30,40,50])
r = np.array([[10,20,30,40], [50,60,70,80]])
s =
np.array([[[10,20,30,40],[50,60,70,80]],[[12,13,14,15],[16,17,1
8,19]]])
print (p.ndim)
print (q.ndim)
print (r.ndim)
print (s.ndim)
Broadcasting
Broadcasting refers to NumPy's ability to handle arrays of different shapes during arithmetic
operations.
Example:
import numpy as np
a = np.array([[11, 22, 33], [10, 20, 30]])
print(a)
b = 4
print(b)
c = a + b
print(c)
The smaller array is broadcast across the larger array so that the shapes are compatible.
Broadcasting
Broadcasting follows a strict set of rules that determine how two arrays interact:
A shape with fewer dimensions is padded with ones on its leading (left)
Rule 01:
side if the two arrays differ in the number of dimensions.
If the shape of the two arrays does not match in a dimension, the array
Rule 02: with a shape equal to 1 in that dimension is stretched to match the
other shape.
An error occurs if in any dimension the sizes do not match and neither is
Rule 03:
equal to 1.
Why NumPy
NumPy
26 43 52
Arrays
NumPy Overview
numpy.add()
3 NumPy arithmetic functions numpy.subtract()
numpy.mod() and numpy.power()
numpy.median()
4 NumPy statistical functions numpy.mean()
numpy.average()
NumPy Array Functions
NumPy Array Function: Example 1
To access NumPy and its functions, import it in the Python code as shown below:
Output:
Example:
import numpy as np
arr = np.array([[10,20,30,40], [50,60,70,80]])
print (arr.shape)
In this example, the NumPy module is imported and the shape function is used.
NumPy Array Function: Example 2
To access NumPy and its functions, import it in the Python code as shown below:
Output:
Example:
import numpy as np
arr = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
newarr = arr.reshape(4,3)
print (newarr)
In this example, the NumPy module is imported and the reshape function is used.
NumPy Array Function: Example 3
To access NumPy and its functions, import it in the Python code as shown below:
Output:
Example:
import numpy as np
arr1 = np.array([10,20,30])
arr2 = np.array([40,50,60])
arr = np.concatenate ((arr1, arr2))
print(arr)
Combines two or more
arrays into a single array
In this example, the NumPy module is imported and the concatenate function is used.
NumPy String Functions
NumPy String Function: Example 1
To access NumPy and its functions, import it in the Python code as shown below:
Output:
Example:
import numpy as np
a = np.array(['Hello','World'])
b = np.array(['Welcome', 'Learners'])
result = np.char.add(a,b)
print(result)
Returns element-wise
string concatenation for
two arrays of string or
unicode
In this example, the NumPy module is imported and the add function is used.
NumPy String Function: Example 2
To access NumPy and its functions, import it in the Python code as shown below:
Output:
Example:
import numpy as np
str = "Hello How Are You"
print(str)
a = np.char.replace (str, 'Hello', 'Hi')
print (a)
Replaces the old substring
with the new substring
In this example, the NumPy module is imported and the replace function is used.
NumPy String Function: Example 3
To access NumPy and its functions, import it in the Python code as shown below:
Output:
Example:
import numpy as np
a = "hello how are you"
Converts all lowercase
print(a)
x = np.char.upper (a)
print(x) characters in a string to
b = "GREETINGS OF THE DAY"
print(b) uppercase
y = np.char.lower (b)
print(y)
Converts all uppercase
characters in a string to
lowercase
In this example, the NumPy module is imported and the upper and lower functions are used.
NumPy Arithmetic Functions
NumPy Arithmetic Function: Example 1
To access NumPy and its functions, import it in the Python code as shown below:
Output:
Example:
import numpy as np
a = np.array([30,20,10])
b = np.array([10,20,30])
result = np.add (a,b) It computes the addition
of two arrays.
print(result)
In this example, the NumPy module is imported and the add function is used.
NumPy Arithmetic Function: Example 2
To access NumPy and its functions, import it in the Python code as shown below:
Output:
Example:
import numpy as np
a = np.array([[30,40,60], [50,70,90]])
b = np.array([[10,20,30], [40,30,80]])
result = np.subtract (a,b)
print(result)
It is used to compute the
difference between two
arrays.
In this example, the NumPy module is imported and the subtract function is used.
NumPy Arithmetic Function: Example 3
To access NumPy and its functions, import it in the Python code as shown below:
Output:
Example:
import numpy as np
a = np.array([20,40,70])
b = np.array([10,30,40])
result = np.mod(a,b) It returns the element-
print(result)
wise remainder of the
division between two
arrays.
In this example, the NumPy module is imported and the mod function is used.
NumPy Arithmetic Function: Example 4
To access NumPy and its functions, import it in the Python code as shown below:
Output:
Example:
import numpy as np
a = [2,2,2,2,2]
b = [2,3,4,5,6]
c = np.power(a,b)
An array element from
print(c) the first array is raised to
the power of the first
element in the second
array.
In this example, the NumPy module is imported and the power function is used.
NumPy Statistical Functions
NumPy Statistical Function: Example 1
To access NumPy and its functions, import it in the Python code as shown below:
Output:
Example:
import numpy as np
a = [[1,17,19,33,49],[14,6,87,8,19],[34,2,54,4,7]]
print(np.median(a))
print(np.median(a, axis = 0)) It is used to
compute the
print(np.median(a, axis = 1))
median along any
specified axis.
In this example, the NumPy module is imported and the median function is used.
NumPy Statistical Function: Example 2
To access NumPy and its functions, import it in the Python code as shown below:
Output:
Example:
import numpy as np
a = [20,2,7,1,34]
print(a)
b = np.mean(a) It computes the arithmetic
print(b) mean of the given array of
elements.
In this example, the NumPy module is imported and the mean function is used.
NumPy Statistical Function: Example 3
To access NumPy and its functions, import it in the Python code as shown below:
An average is used to compute the weighted average along the specified axis.
Output:
Example:
import numpy as np
a = np.array([[2,3,4],
[3,6,7],
[5,7,8]])
b = np.average(a, axis = 0) It calculates the average
print(b)
of the elements of the
total NumPy array.
In this example, the NumPy module is imported and the average function is used.
NumPy Array Indexing
NumPy Array Indexing
An array element can be accessed using its index number. It is the same as array
indexing.
Index 0 1 2 3
i=0
Index 1 4 5 6
i=1
Indexes for NumPy arrays begin at 0. The first element has index 0, the second
has 1, and so on.
NumPy Array Indexing: Examples
Example
numpy as np
Output:
Computers
Example
import numpy as np
Example 2: Print the addition of indexes
index = np.array([121, 235, 353, 254])
0 and 1
print(index[1] + index[0])
Output:
356
Two-Dimensional Array
0 1 2
Row Index
Two-Dimensional Array: Examples
Example
import numpy as np
Example 1: In this example, the fourth Y = np.array([[10,20,30,40,50], [60,70,80,90,100]])
element of the first row of a two-dimensional print('4th element on 1st row: ', Y[0, 3])
array is executed. Output:
Example
import numpy as np
Example 2: In this example, the concept of the X1 = np.array([[14,25,37,46,59, 45], [63,74,86,98,12,76]])
2-D array is used to retrieve the third element print('3rd element on 2nd row: ', X1[1, 2])
from the array’s second row. Output:
1 2 3
1 2 3 1 2 3
1 2 3
1 2 3 1 21 32 3
1 2 3
1 2 3 1 21 32 3
array( [1, 2, 3] )
1 2 3
1 2 3
array( [ [1, 2, 3 ], array( [ [1, 2, 3 ],
[1, 2, 3 ], [1, 2, 3 ],
[1, 2, 3] ]) [1, 2, 3 ], ],
[1, 2, 3 ],
[1, 2, 3 ],
[1, 2, 3 ], ],
[1, 2, 3 ],
[1, 2, 3 ],
[1, 2, 3] ] ])
Three-Dimensional Array: Examples
Example
element of the second array is printed. Z = np.array([[[11, 22, 33], [44, 55, 66]], [[77, 88, 99],
[100, 111, 122]]])
print(Z[1, 1, 0])
Output:
100
Example
Example 2: In this example, two numbers
are subtracted from the same index, and import numpy as np
the output is displayed using a 3D array. Y = np.array([[[5,6,36], [44,65,67]], [[47,78,59],
[10,21,42]]])
Output:
2
Negative Indexing
Example
import numpy as np
Example 1: Printing the last element of an Neg_index = np.array([[5,3,2,6,8], [2,4,16,4,12]])
array using negative indexing print('Last element from 1st dim: ', Neg_index[0, -1])
Output:
Example
import numpy as np
Example 2: Printing the second vehicle from
Vehicles = np.array([['car','bus','Rowboat','Bicycle'],
the end in the first dimension ['train','flight','Truck', 'Ship']])
Example 1: Illustrates the use of slicing to retrieve employee ratings for a team of seven
employees in the first quarter from an array.
Example
import numpy as np
print(Employee_rating[1:7])
Output:
[4 3 5 6 8 9]
Slicing: Examples
Example
import numpy as np
Books =
Example 2: Printing the list of three subjects np.array(['Physics','DataScience','Maths','Python','Hadoop',
'OPPs', 'Java', 'Cloud'])
from the fourth index to the end
print(Books[5:])
Output:
Example
Example 3: Displaying the results of five import numpy as np
students who received certificates in Marks = np.array([60, 78, 45, 80, 97, 96, 77])
Python print(Marks[:5])
Example 1 Example 2
Example Example
import numpy as np
import numpy as np
Y = np.array([18, 26, 34, 48, 54, 67,76])
X = np.array([8, 7, 6, 5, 4, 3, 2, 1])
print(Y[::5])
print(X[1:6:3])
Output:
Output:
[18 67]
[7 4]
Slicing: Two-Dimensional Array
The following example illustrates the concept of slicing to retrieve the elements:
Example
import numpy as np
Z = np.array([[11, 22, 33, 44, 55], [66, 77, 88, 99, 110]])
print(Z[0, 2:3])
Output:
[33]
Negative Slicing
Negative slicing is the same as negative indexing, which is interpreted as counting from the end of an
array. Basic slicing follows the standard rules of sequence slicing on a per-dimension basis (Including
using a step index).
Array Size = 4
1 2 3 4
Indices 0 1 2 4
-3:-1
Negative Slicing 23
Negative Slicing: Example
The following example illustrates the concept of negative slicing to retrieve the
elements:
Example Example
import numpy as np import numpy as np
Neg_slice = np.array([13, 34, 58, 69, 44, 56, 37,24]) Neg_slice = np.array([15, 26, 37, 48, 55, 64, 34])
print(Neg_slice[:-1]) print(Neg_slice[-4:-1])
Output: Output:
It returns an array with evenly spaced elements within a given interval. Values are generated within
the half-open interval [0, stop) where the interval includes start but excludes stop. Its syntax is:
Parameters:
start: [OPTIONAL] START OF INTERVAL RANGE. BY DEFAULT, START EQUALS TO 0
stop: END OF AN INTERVAL RANGE
step: [OPTIONAL] STEP SIZE OF INTERVAL. BY DEFAULT, STEP SIZE EQUALS TO 1
dtype: TYPE OF OUTPUT ARRAY
arange Function in Python
Example:
import numpy as np
print("Numbers:",type(np.arange(2,10)))
Parameters :
Return:
ndarray
step: [FLOAT, OPTIONAL], IF RESTEP EQUALS TO TRUEPARAMETERS
linspace Function
Example:
The random module in Python defines a series of functions that are used to generate or
manipulate random numbers. The random function generates a random float number
between 0.0 and 1.0.
Example:
import random
n = random.random()
print(n)
randn Function
The randn() function generates an array with the given shape and fills it with random values
that follow the standard normal distribution.
Example:
import random
print("Numbers from Normal distribution with
zero mean and standard deviation 1 i.e. standard
normal")
print(np.random.randn(5,3))
randint Function
The randint function is used to generate a random integer within the range [start, end].
Example:
Note: It works with integers. If float values are provided, a value error will be returned.
If string data is provided, a type error will be returned.
Random Module: Seed Function
Example:
import random
# Before adding seed function
for i in range(5):
print(random.randint(1,50))
for i in range(5):
random.seed(13)
print(random.randint(1,50))
Reshape Function
The numpy.reshape() function shapes an array without changing the data of the array.
Example:
import numpy as np
x=np.arange(12)
y=np.reshape(x, (4,3))
print(x)
print(y)
Ravel Function
Numpy.ravel() returns a contiguous flattened array (1D array containing all elements
of the input array).
Example:
import numpy as np
x = np.array([[1, 3, 5], [11, 35, 56]])
y = np.ravel(x, order='F')
z = np.ravel(x, order='C')
p = np.ravel(x, order='A')
q = np.ravel(x, order='K')
print(y)
print(z)
print(p)
print(q)
Pandas
Pandas
Pandas is a Python package that allows you to work with large datasets.
Pandas library is built on top of the NumPy, which means NumPy is required for
operating the Pandas. NumPy is great for mathematical computing.
NumPy
Purpose of Pandas
Intrinsic data
alignment
Pandas
Data representation
01 DataFrame and Series represent the data
in a way that is appropriate for data
analysis.
Clear code
The simple AI found in Pandas helps to
02 focus on the essential part of a code,
making it clear and concise.
Features of Pandas
Powerful data
structure
Fast and
High performance
efficient
merging and joining
data wrangling
of datasets
Pandas
Intelligent and Easy data
automated aggregation and
data alignment transformation
Data 4 11 21 36
0 1 2 3
Label(Index)
Data alignment is intrinsic and cannot be broken until changed explicitly by a program.
Series
Data Input
• Integer
• String
• Python • Data Structures
ndarray 2 3 8 4
Object • dict 0 1 2 3
• Floating • scalar
Point • list Label(Index)
Data Structures
Basic Method
4 11 21 36
S = pd.Series(data, index = [index])
Series
Creating Series from a List
Import libraries
Data
value
Index
Data
type
The index is not created for data but notices that data alignment is done automatically.
Creating Series of Values
A DataFrame is a type of data structure that arranges data into a 2-dimensional table of rows
and columns, much like a spreadsheet.
Data Input
• Integer
• String
• ndarray 2 3 8 4
• Python
• dict 5 8 10 1
Object
• List 0 1 2 3
• Floating
• Series Label(Index)
Point
• DataFrame
Data Types DataFrame
Creating DataFrame from Lists
Entire dict
A Viewing DataFrame
A DataFrame can be viewed by referring to the column names or using the describe function.
Series Functions in Pandas
ndim 2 3 size
empty 1 4 dtype
tail() 7 5 values
6
head()
Empty Function
Output:
Example:
import pandas as pd
import numpy as np
Output:
Example:
import pandas as pd
import numpy as np
Output:
Example:
import pandas as pd
import numpy as np
It returns the dtype of the object. This example shows how to create a size series.
Output:
Example:
import pandas as pd
import numpy as np
It returns the actual data in the series as an array. This example shows how to create size
series.
Output:
Example:
import pandas as pd
import numpy as np
It returns the first n rows. This example shows how to create a head and tail series.
Output:
Example:
import pandas as pd
import numpy as np
It returns the last n rows. This example shows how to create a head and tail series.
Output:
Example:
import pandas as pd
import numpy as np
It returns the DataFrame's transposed value. The rows and columns will switch places.
Example: Output:
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame(d)
print ("The transpose of the data series is:")
print (df.T)
dtypes Function
Example: Output:
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame(d)
print ("The data types of each column are:")
print (df.dtypes)
Empty Function
Example: Output:
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame(d)
print ("Is the object empty?")
print (df.empty)
ndim Function
Output:
Example:
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The dimension of the object is:")
print (df.ndim)
Shape Function
It returns a tuple that represents the DataFrame's dimensionality. The number of rows and
columns is represented by the tuple (a,b).
Output:
Example:
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The shape of the object is:")
print (df.shape)
Size Function
Output:
Example:
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The total number of elements in our object is:")
print (df.size)
Values Function
Output:
Example:
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The actual data in our data frame is:")
print (df.values)
Head Function
The head () function is used to access the first n rows of a DataFrame or series.
Output:
Example:
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print (df)
print ("The first two rows of the data frame is:")
print (df.head(2))
Tail Function
The last n rows are returned by the tail () function. This can be seen in the index values
of the example shown below.
Output:
Example:
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print (df)
print ("The last two rows of the data frame is:")
print (df.tail(2))
datetime Module
The datetime module enables us to create custom date objects and perform various
operations on dates.
Date 1 Time
2
6
Timezone
3 Datetime
5
4
Tzinfo Timedelta
datetime Module: Example
In the example given below, the datetime module is used to find the current year,
current month, and current day:
Example:
In the example given below, the datetime module is used to get the current date:
Example:
The example returns the first five rows of a dataset using the df.head() function.
Output:
Example:
import pandas as pd
import numpy as np
df = pd.read_csv('driver-data.csv')
df.head()
Pandas Functions: Example 2
The example returns the dataset's shape using the df.shape() function.
Output:
Example:
import pandas as pd
import numpy as np
df = pd.read_csv('driver-data.csv')
df.shape
Pandas Functions: Example 3
The example uses df.info() function to return the information of the dataset.
Output:
Example:
import pandas as pd
import numpy as np
df = pd.read_csv('driver-data.csv')
df.info
Matplotlib
Matplotlib
Python’s matplotlib library is a comprehensive tool for building static, animated, and interactive
visualizations.
Example
import matplotlib
matplotlib.__version__
Output:
‘3.5.1’
Matplotlib: Advantages
It can work well with many operating systems and graphics at the backend.
It has high-quality graphics and plots to print and view a range of graphs.
Matplotlib: Advantages
There are many contexts in which Matplotlib can be used, such as Jupyter Notebooks,
Python scripts, and the Python and iPython shells.
02 Cartopy 05 Qt interface
Pyplot
Histograms 3 Markers
11
Bars 10 4 Line
9 5 Labels
Scatter
8 6
Seaborn.countplot() 7 Grid
Subplot
Pyplot
Pyplot is a collection of functions that enable matplotlib to perform tasks like MATLAB.
Example: Draw a pyplot to show the increase in the chocolate rate according to its weight.
Example Output
import matplotlib.pyplot as plt
import numpy as np
plt.xlabel("Chocolate rate")
plt.ylabel("Chocolate gram")
plt.plot(xpoints, ypoints)
plt.show()
Plotting
The plot() function draws a line from one point to another by default.
Plot a graph to know the pay raise of employees over the years from 2010 to 2022.
Example Output
import matplotlib.pyplot as plt
import numpy as np
A1 = np.array([20000, 80000])
A2 = np.array([2010, 2022])
plt.xlabel("Employee salary")
plt.ylabel("Year")
plt.plot(A1, A2)
plt.show()
Marker Plot
Each point can be emphasized with a specific marker by using the keyword argument marker:
Example: Mark each point with a square to detect the number of, sick leaves applied by an
employee in the span of five days.
Example Output
import numpy as np
plt.xlabel("No of days")
plt.ylabel("Difference of 5 days")
plt.show()
Line Plot
To change the style of the plotted line, use the keyword argument linestyle, or the shorter ls.
Example: Draw a line in a diagram to change the style (Use a dotted line).
Output
Example
import numpy as np
plt.show()
Label Plot
The xlabel() and ylabel() functions in pyplot can be used to label the x- and y-axis, respectively.
Example: Create a diet chart including labels like protein intake and calories burned.
Example Output
import numpy as np
plt.plot(B1, B2)
plt.title("Diet chart")
plt.xlabel("Proteins intake")
plt.ylabel("Calorie Burnage")
plt.show()
Grid Plot
The grid() function in pyplot can be used to add grid lines to the plot.
Example: Create a graph on fuel rates and add grid lines to it.
Example
import numpy as np Output
import matplotlib.pyplot as plt
plt.title("Fuel rate")
plt.xlabel("Litre")
plt.ylabel("Price")
plt.plot(Y1, Y2)
plt.grid()
plt.show()
Subplot
With the subplot() function, multiple plots can be drawn in a single diagram.
Example
import matplotlib.pyplot as plt Output
import numpy as np
plt.subplot(1, 2, 1)
plt.plot(x1,y1)
plt.subplot(1, 2, 2)
plt.plot(x2,y2)
plt.show()
Scatter Plot
For each observation, the scatter() function plots a single dot. It requires two identical-length
arrays, one for the values on the x-axis and the other for the values on the y-axis.
Output
Example
import matplotlib.pyplot as plt
import numpy as np
A =
np.array([2,3,4,11,12,17,22,39,14,21,23,9,6])
B =
np.array([59,26,67,78,121,23,20,69,93,45,24,1
5,66])
plt.scatter(A, B)
plt.show()
Bar Plot
Example Output
import numpy as np
plt.bar(x,y)
plt.show()
Histogram Plot
A graph displaying frequency distributions is called a histogram. It is a graph that displays how
many observations were made during each interval.
Example: Create a histogram chart in pyplot to observe the height of 250 people.
Example Output
import numpy as np
plt.hist(A)
plt.show()
Pie Plot
Example: Create a simple pie chart in pyplot using the pie() function.
Example Output
import numpy as np
plt.pie(y)
plt.show()
Count Plot
The counts of observations in each categorical bin are displayed using bars using the
seaborn.countplot() method.
Example Output
df = sns.load_dataset('List')
plt.show()
SciPy
SciPy
SciPy is a free and open-source Python library used for scientific and technical computing.
SciPy has built-in packages that help in handling the scientific domains.
Mathematics
integration Statistics
(Normal
distribution)
Linear algebra
Multidimensional
image processing
Mathematics Language
constants integration
SciPy and Its Characteristics
Simplifies scientific
application development 6
Efficient and fast data
3 processing
cluster ndimage
Clustering algorithms N-dimensional image processing
constants odr
Physical and mathematical constant Orthogonal distance regression
fftpack optimize
Fast Fourier Transform routines Optimization and root-finding routines
integrate signal
Integration and ordinary differential equation solvers Signal processing
Spatial sparse
Spatial data structures and algorithms Sparse matrices and associated routines
interpolate weave
Interpolation and smoothing splines C/C++ integration
IO stats
Input and Output Statistical distributions and functions
special
linalg
Special functions
Linear algebra
SciPy Packages
IO
Optimize
Integration
Statistics
SciPy Packages: Example 1
Output:
Example:
linalg.det( two_d_array )
Output:
Example:
In this example, the function returns two values in which the first value is integration, and the
second value is the estimated error in integral.
Scikit-Learn
Scikit-Learn
Scikit is a powerful and modern machine learning Python library. It is used for fully- and
semi-automated data analysis and information extraction.
Scikit is a powerful and modern machine learning Python library. It is used for fully- and
semi-automated data analysis and information extraction.
Scikit-learn helps data scientists and machine learning engineers to solve problems
using the problem-solution approach.
Points to be considered while working with a scikit-learn dataset or loading the data to
scikit-learn:
Verify that the features and responses are in the form of a NumPy ndarray
Check features and responses have the same shape and size as the array
Pandas SciPy
Libraries
NumPy Matplotlib
Scikit-Learn: Installation
1 Clustering
5 Feature selection
Scikit-learn provides toy datasets that can be used for clustering, regression, and classification
problems. These datasets are quite helpful while learning new libraries.
Boston house prices
1
2 Iris plants
Diabetes 6
Datasets
3 Wine recognition
Digits 5
4
Breast cancer
To import the toy dataset, it is required to use the sklearn library with the import
keyword as shown below:
A load function is used to load each dataset and its syntax is shown below:
load_dataset()
Here, the dataset refers to the name of the dataset.
Import Datasets Using Scikit-Learn: Example
The below example illustrates how to load the wine dataset from the sklearn library
and store it into a variable called data.
data = datasets.load_breast_cancer()
Here, the load function will not return data in the tabular format. It will return a
dictionary with the key and value.
Import Datasets Using Scikit-Learn: Example
The below example shows that the dataset is present in a key-value pair.
Example:
import pandas as pd
import numpy as np
from sklearn import datasets
data = datasets.load_breast_cancer()
data
Import Datasets Using Scikit-Learn: Example
Example:
print(data.keys())
data
Suppose a user needs to know the dataset column names or features present in the
dataset. Then the below syntax can be used:
Example:
print(data.features_names)
The target_names is the name of the target variable, in other words, the
name of the target column.
Example:
print(data.target.names)
Here, malignant and benign denote the values present in the target column.
Import Datasets Using Scikit-Learn: Example
The target indicates the actual labels in a NumPy array, Here, the target data is one column
that classifies the tumor as either 0 indicating malignant or 1 for benign.
Example:
data.target
Import Datasets Using Scikit-Learn: Example
DESCR represents the description of the dataset, and the filename is the path to the actual
file of the data in CSV format.
Example:
print(data.DESCR)
Print(data.filename)
Working with the Dataset
Scikit-learn provides various datasets to read the dataset. It is required to import the Pandas
library as shown below:
Example:
# Import pandas
import pandas as pd
# Read the DataFrame, first using the
feature data
df = pd.DataFrame(data.data,
columns=data.feature_names)
# Add a target column, and fill it with the
target data
df['target'] = data.target
# Show the first five rows
df.head()
Note: The dataset has been loaded into the Pandas DataFrame.
Preprocessing Data in Scikit-Learn
Standardization, or mean
removal and variance Normalization
scaling
Encoding categorical
Imputation of missing values
features
Standardization
It is a scaling technique where data values are normally distributed. Also, standardization tends to
make the dataset's mean equal to 0 and its standard deviation equal to 1.
cnt
300
200
m = 10.0
S = 30.0
100
0
100 0 100 200
Preprocessing with
Standardization
Standardization
cnt
300
m = 0.0
200
S = 1.0
100
The preprocessing module provides the StandardScaler utility class to perform the following
operation on the dataset.
Example:
import numpy as np
import pandas as pd Import libraries
#Generating normally distributed data
df is DataFrame df = pd.DataFrame({
‘x’: np.random.normal(0,3,10000),
‘y’: np.random.normal(6,4,10000),
‘z’: np.random.normal(-6,6,10000)
})
Next, it is required to see the plot to know whether the data is on a different or
the same scale.
Example:
# Plotting data
df.plot.kde()
Here, x,y, and z are on different scales. So, it is required to keep all data on
the same scale to improve any algorithm's performance.
Standardization
Next, to scale the values of x,y, and z to the same scale, a standard scaler is used. The x, y, and
z values are displayed on the same scale in the graph below:
Example:
MinMaxScaler transforms each feature to a given range using scaling. This estimator scales
and translates each feature individually such that it is in the given range on the training set,
for example, between zero and one.
The preprocessing module provides the MinMaxScaler utility class to perform the following
operation on the dataset.
Example:
df = pd.DataFrame({
# positive skew
'x': np.random.chisquare(8,1000),
# negative skew
'y': np.random.beta(8,2,1000) * 40,
# no skew
'z': np.random.normal(50,3,1000)
})
MinMaxScaler: Example
Next, it is required to see the plot to know whether the data is normalized.
Example:
df.plot.kde()
MinMaxScaler: Example
Example:
Algorithms cannot process missing values. Imputers infer the value of missing data
from existing data.
import numpy as np
from sklearn.impute import SimpleImputer
imp_values = SimpleImputer(missing_values=np.nan, strategy='mean')
imp_values.fit([[3,5],[np.nan,7],[1,3]])
X = [[np.nan, 2],[6, np.nan],[7,6]]
print(imp_values.transform(X))
A categorical variable is a variable that can take a limited and fixed number of possible
values, assigning each individual or other unit of observation to a particular group on the
basis of some qualitative property.
To deal with categorical variables encoding schemes are used, such as:
Example:
data = pd.DataFrame({
'Age':[12,34,56,22,24,35],
‘Income':['Low','Low','High','Medium','Medium','High']
})
data
data.Income.map({‘Low’:1,’Medium’:2,’High’:3})
It adds extra columns to the original data that indicate whether each possible value is
present or not.
Red 1 0 0
Red 1 0 0
Yellow 1 1 0
0 0 1
Green
0 1 0
Yellow
One-Hot Encoding: Example
Example:
Example:
# Print one hot encoded categories to know the
# column labels using the .categories_ attribute of
# the encoder
print(ohe.categories_)
data[ohe.categories_[0]] = transform.toarray()
data
Key Takeaways
SciPy is a free and open-source Python library used for scientific and
technical computing.
A. scipy.cluster
B. scipy.source
C. scipy.interpolate
D. scipy.signal
Knowledge
Check
A. scipy.cluster
B. scipy.source
C. scipy.interpolate
D. scipy.signal
A. Math
B. Random
C. Pandas
A. Math
B. Random
C. Pandas
A. 1D
B. 2D
C. 3D
A. 1D
B. 2D
C. 3D