Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
96 views

Lesson 03 Python Libraries For Data Science

Uploaded by

Ashish Munda
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views

Lesson 03 Python Libraries For Data Science

Uploaded by

Ashish Munda
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 190

Applied Data Science with Python

Python Libraries for Data Science


Learning Objectives

By the end of this lesson, you will be able to:

Explain the use of Python library

List various Python libraries

Identify the SciPy sub-packages


Python Libraries for Data Science
What Is Python Library?

A Python library is a group of interconnected modules. It contains code bundles that can be reused
in different programs and apps.

Python programming is made easier and more convenient for programmers due
to its reusability.
Python Libraries

Various other Python libraries make programming easier.

SciPy

Pandas Matplotlib

NumPy Scikit-learn

Some commonly used libraries are:


Benefits of Python Libraries

Easy to learn

Open source

Efficient and multi-platform


multi platform
support
Huge collection of libraries,
functions and
functions, and modules
modules

Big open-source
open source community

Integrates well with enterprise


apps and systems

Great vendor and product support


Python Libraries

Numerical Python is a machine learning library that can handle big


NumPy
matrices and multi-dimensional data.

Pandas consist of a variety of analysis tools and configurable high-


Pandas
level data structures.

SciPy Scientific Python is an open-source high-level scientific computation


package. This library is based on a NumPy extension.
Python Libraries

It is also an open-source library that plots high-definition figures


Matplotlib
such as pie charts, histograms etc.

The library contains a lot of efficient tools for machine learning and
Scikit-learn
statistical modeling including classification, regression, clustering, and
dimensionality reduction.
Import Library into Python Program
Import Module in Python

In Python, a file is referred to as a module. The import keyword is used to utilize it.

Whenever we need
to use a module, we
Importing math
import it from its
library
library.
import math
Example 🡪
Example: Import Module in Python

In this code, the math library is imported. One of its methods, that is sqrt(square root), is used without
writing the actual code to calculate the square root of a number.

Output:

Example:

import math

A = 16
print(math.sqrt(A))
Example: Import Module in Python

As in the previous code, a complete library is imported to use one of its methods. However, only
importing “sqrt” from the math library would have worked.

Output:
Example:

from math import


sqrt, sin
A = 16
B = 3.14
print(sqrt(A))
print(sin(B))

In the above code, only “sqrt” and “sin” methods from the math library are imported.
NumPy
Introduction to NumPy

NumPy stands for Numerical Python.

• It is a Python library used for working with


arrays.
• It consists of a multidimensional array of
objects and a collection of functions for
manipulating them.
• It conducts mathematical and logical
operations on arrays.

The array object in NumPy is called ndarray.


Advantages of NumPy

The following are the advantages of NumPy:

• It provides an array object that is faster than


traditional Python lists.
• It provides supporting functions.
• Arrays are frequently used in data science.
• NumPy arrays are stored in one continuous
place in memory, unlike lists.
NumPy: Installation

The installation of NumPy is easy if Python and PIP are already installed on the system. The following
command is used to install NumPy:

C:\Users\Your Name>pip install numpy

The applications can be imported by adding the import


keyword.
Import NumPy: Example

NumPy is imported under the name np.

Output:
Example:
import numpy as np The import numpy portion of
arr = np.array ([1,2,3,4,5])
print (arr)
the code tells Python to bring
the NumPy library into the
current environment.
NumPy: Array Object

A NumPy ndarray object can be created by using the array() function.

Consider the following example:

Output:
Example:
import numpy as np
arr = np.array ([10,20,30,40,50])
print (arr)
print (type(arr))
The built-in Python function
returns the type of the object
passed to it.

Shows that arr is a numpy.ndarray type


Dimensions in Arrays: Example

0-D arrays indicate that each value in an array is a 0-D array.

Output:
Example:
import numpy as np
arr = np.array(60)
print (arr)
Dimensions in Arrays: Example

1-D arrays are the basic arrays. It has 0-D arrays as its elements.

Output:
Example:
import numpy as np
arr = np.array([10,20,30,40])
print (arr)
Dimensions in Arrays: Example

2-D arrays represent matrices. It has 1-D arrays as its elements.

Output:
Example:
import numpy as np
arr = np.array([[10,20,30,40], [50,60,70,80]])
print (arr)
Dimensions in Arrays: Example

3-D arrays represent a 3rd-order tensor. It has 2-D arrays as its elements.

Output:
Example:
import numpy as np
arr =
np.array([[[10,20,30,40],[50,60,70,80]],[[12,13,14,15],[16,17,18
,19]]])
print (arr)
Number of Dimensions

The ndim attribute checks the number of array dimensions.

Output:
Example:
import numpy as np
p = np.array(50)
q = np.array([10,20,30,40,50])
r = np.array([[10,20,30,40], [50,60,70,80]])
s =
np.array([[[10,20,30,40],[50,60,70,80]],[[12,13,14,15],[16,17,1
8,19]]])
print (p.ndim)
print (q.ndim)
print (r.ndim)
print (s.ndim)
Number of Dimensions

The ndim attribute checks the number of array dimensions.

Output:
Example:
import numpy as np
p = np.array(50)
q = np.array([10,20,30,40,50])
r = np.array([[10,20,30,40], [50,60,70,80]])
s =
np.array([[[10,20,30,40],[50,60,70,80]],[[12,13,14,15],[16,17,1
8,19]]])
print (p.ndim)
print (q.ndim)
print (r.ndim)
print (s.ndim)
Broadcasting

Broadcasting refers to NumPy's ability to handle arrays of different shapes during arithmetic
operations.

Example:

import numpy as np
a = np.array([[11, 22, 33], [10, 20, 30]])
print(a)

b = 4
print(b)

c = a + b
print(c)

The smaller array is broadcast across the larger array so that the shapes are compatible.
Broadcasting

Broadcasting follows a strict set of rules that determine how two arrays interact:

A shape with fewer dimensions is padded with ones on its leading (left)
Rule 01:
side if the two arrays differ in the number of dimensions.

If the shape of the two arrays does not match in a dimension, the array
Rule 02: with a shape equal to 1 in that dimension is stretched to match the
other shape.

An error occurs if in any dimension the sizes do not match and neither is
Rule 03:
equal to 1.
Why NumPy

Numerical Python (NumPy) supports multidimensional arrays over which mathematical


operations can be easily applied.

NumPy

26 43 52

Arrays
NumPy Overview

NumPy is the foundational package for mathematical computing in Python.


It has the following properties:

Supports fast and efficient


multidimensional arrays
(ndarray) Executes element-wise
computations and
mathematical calculations
Performs linear algebraic
operations, Fourier transforms,
and random number
generation Tools for reading/writing
array-based datasets to disk

An efficient way of storing


and manipulating data
Tools for integrating
language codes (C, C++)
Functions of NumPy Module

S.No NumPy Module There are three types of facts: Functions


numpy.reshape()
1 NumPy array manipulation functions numpy.concatenate()
numpy.shape()
numpy.char.add()
numpy.char.replace()
2 NumPy string functions numpy.char.upper() and numpy.char.lower()

numpy.add()
3 NumPy arithmetic functions numpy.subtract()
numpy.mod() and numpy.power()
numpy.median()
4 NumPy statistical functions numpy.mean()
numpy.average()
NumPy Array Functions
NumPy Array Function: Example 1

To access NumPy and its functions, import it in the Python code as shown below:

Output:
Example:
import numpy as np
arr = np.array([[10,20,30,40], [50,60,70,80]])
print (arr.shape)

The shape of an array is


defined by the number of
elements in each dimension.

In this example, the NumPy module is imported and the shape function is used.
NumPy Array Function: Example 2

To access NumPy and its functions, import it in the Python code as shown below:

Output:
Example:
import numpy as np
arr = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
newarr = arr.reshape(4,3)
print (newarr)

Changes the shape of an array

In this example, the NumPy module is imported and the reshape function is used.
NumPy Array Function: Example 3

To access NumPy and its functions, import it in the Python code as shown below:

Output:
Example:
import numpy as np
arr1 = np.array([10,20,30])
arr2 = np.array([40,50,60])
arr = np.concatenate ((arr1, arr2))
print(arr)
Combines two or more
arrays into a single array

In this example, the NumPy module is imported and the concatenate function is used.
NumPy String Functions
NumPy String Function: Example 1

To access NumPy and its functions, import it in the Python code as shown below:

Output:
Example:
import numpy as np
a = np.array(['Hello','World'])
b = np.array(['Welcome', 'Learners'])
result = np.char.add(a,b)
print(result)
Returns element-wise
string concatenation for
two arrays of string or
unicode

In this example, the NumPy module is imported and the add function is used.
NumPy String Function: Example 2

To access NumPy and its functions, import it in the Python code as shown below:

Output:
Example:
import numpy as np
str = "Hello How Are You"
print(str)
a = np.char.replace (str, 'Hello', 'Hi')
print (a)
Replaces the old substring
with the new substring

In this example, the NumPy module is imported and the replace function is used.
NumPy String Function: Example 3

To access NumPy and its functions, import it in the Python code as shown below:

Output:
Example:
import numpy as np
a = "hello how are you"
Converts all lowercase
print(a)
x = np.char.upper (a)
print(x) characters in a string to
b = "GREETINGS OF THE DAY"
print(b) uppercase
y = np.char.lower (b)
print(y)
Converts all uppercase
characters in a string to
lowercase

In this example, the NumPy module is imported and the upper and lower functions are used.
NumPy Arithmetic Functions
NumPy Arithmetic Function: Example 1

To access NumPy and its functions, import it in the Python code as shown below:

Output:
Example:
import numpy as np
a = np.array([30,20,10])
b = np.array([10,20,30])
result = np.add (a,b) It computes the addition
of two arrays.
print(result)

In this example, the NumPy module is imported and the add function is used.
NumPy Arithmetic Function: Example 2

To access NumPy and its functions, import it in the Python code as shown below:

Output:
Example:
import numpy as np
a = np.array([[30,40,60], [50,70,90]])
b = np.array([[10,20,30], [40,30,80]])
result = np.subtract (a,b)
print(result)
It is used to compute the
difference between two
arrays.

In this example, the NumPy module is imported and the subtract function is used.
NumPy Arithmetic Function: Example 3

To access NumPy and its functions, import it in the Python code as shown below:

Output:
Example:
import numpy as np
a = np.array([20,40,70])
b = np.array([10,30,40])
result = np.mod(a,b) It returns the element-
print(result)
wise remainder of the
division between two
arrays.

In this example, the NumPy module is imported and the mod function is used.
NumPy Arithmetic Function: Example 4

To access NumPy and its functions, import it in the Python code as shown below:

Output:
Example:
import numpy as np
a = [2,2,2,2,2]
b = [2,3,4,5,6]
c = np.power(a,b)
An array element from
print(c) the first array is raised to
the power of the first
element in the second
array.

In this example, the NumPy module is imported and the power function is used.
NumPy Statistical Functions
NumPy Statistical Function: Example 1

To access NumPy and its functions, import it in the Python code as shown below:

Median calculates the median value from an unsorted data list.

Output:
Example:
import numpy as np
a = [[1,17,19,33,49],[14,6,87,8,19],[34,2,54,4,7]]
print(np.median(a))
print(np.median(a, axis = 0)) It is used to
compute the
print(np.median(a, axis = 1))
median along any
specified axis.

In this example, the NumPy module is imported and the median function is used.
NumPy Statistical Function: Example 2

To access NumPy and its functions, import it in the Python code as shown below:

The mean calculates the mean or average of a given list of numbers.

Output:
Example:
import numpy as np
a = [20,2,7,1,34]
print(a)
b = np.mean(a) It computes the arithmetic
print(b) mean of the given array of
elements.

In this example, the NumPy module is imported and the mean function is used.
NumPy Statistical Function: Example 3

To access NumPy and its functions, import it in the Python code as shown below:

An average is used to compute the weighted average along the specified axis.

Output:
Example:
import numpy as np
a = np.array([[2,3,4],
[3,6,7],
[5,7,8]])
b = np.average(a, axis = 0) It calculates the average
print(b)
of the elements of the
total NumPy array.

In this example, the NumPy module is imported and the average function is used.
NumPy Array Indexing
NumPy Array Indexing

An array element can be accessed using its index number. It is the same as array
indexing.

Index 0 Index 1 Index 2


j=0 j=1 j=2

Index 0 1 2 3
i=0
Index 1 4 5 6
i=1

Indexes for NumPy arrays begin at 0. The first element has index 0, the second
has 1, and so on.
NumPy Array Indexing: Examples

Example
numpy as np

X = np.array(['Maths', 'Science', 'Chemistry', 'Computers'])


Example 1: Print the value of index 3
print(X[3])

Output:

Computers

Example

import numpy as np
Example 2: Print the addition of indexes
index = np.array([121, 235, 353, 254])
0 and 1
print(index[1] + index[0])

Output:

356
Two-Dimensional Array

Consider a 2D array as a table, with dimensions as rows and indexes as columns.

0 1 2

0 (0,0) (0,1) (0,2)


Column Index

1 (1,0) (1,1) (1,2)

2 (2,0) (2,1) (2,2)

Row Index
Two-Dimensional Array: Examples

Example
import numpy as np
Example 1: In this example, the fourth Y = np.array([[10,20,30,40,50], [60,70,80,90,100]])
element of the first row of a two-dimensional print('4th element on 1st row: ', Y[0, 3])
array is executed. Output:

4th element on 1st row: 40

Example
import numpy as np
Example 2: In this example, the concept of the X1 = np.array([[14,25,37,46,59, 45], [63,74,86,98,12,76]])
2-D array is used to retrieve the third element print('3rd element on 2nd row: ', X1[1, 2])
from the array’s second row. Output:

3rd element on 2nd row: 86


Three-Dimensional Array

01 NumPy includes a function that allows us to manipulate data that is accessible.


The three-dimensional means, that nested levels of an array can be used.

1D Array 2D Array 3D Array

1 2 3
1 2 3 1 2 3
1 2 3
1 2 3 1 21 32 3
1 2 3
1 2 3 1 21 32 3
array( [1, 2, 3] )
1 2 3
1 2 3
array( [ [1, 2, 3 ], array( [ [1, 2, 3 ],
[1, 2, 3 ], [1, 2, 3 ],
[1, 2, 3] ]) [1, 2, 3 ], ],
[1, 2, 3 ],
[1, 2, 3 ],
[1, 2, 3 ], ],
[1, 2, 3 ],
[1, 2, 3 ],
[1, 2, 3] ] ])
Three-Dimensional Array: Examples

Example

Example 1: In this example, the first import numpy as np

element of the second array is printed. Z = np.array([[[11, 22, 33], [44, 55, 66]], [[77, 88, 99],
[100, 111, 122]]])

print(Z[1, 1, 0])

Output:

100

Example
Example 2: In this example, two numbers
are subtracted from the same index, and import numpy as np
the output is displayed using a 3D array. Y = np.array([[[5,6,36], [44,65,67]], [[47,78,59],
[10,21,42]]])

print( Y[0,1,2] - Y[0,1,1])

Output:

2
Negative Indexing

• Negative indices are counted from the end of an array.

• In a negative indexing system, the last element will be the


first element with an index of -1, the second last element
with an index of -2, and so on.
Negative Indexing: Examples

Example
import numpy as np
Example 1: Printing the last element of an Neg_index = np.array([[5,3,2,6,8], [2,4,16,4,12]])
array using negative indexing print('Last element from 1st dim: ', Neg_index[0, -1])

Output:

Last element from 1st dim: 8

Example
import numpy as np
Example 2: Printing the second vehicle from
Vehicles = np.array([['car','bus','Rowboat','Bicycle'],
the end in the first dimension ['train','flight','Truck', 'Ship']])

print('Access second vehicle from 1st dim: ', Vehicles[0, -2])


Output:

Access second vehicle from 1st dim: Rowboat


Slicing

In Python, slicing refers to moving elements from one index to


another.

Instead of using an index, the slice is passed as [start:end].

Another way to pass the slice is to add a step as [start:end:step].

In slicing, if the starting is not passed, it is considered as 0. If the step


is not passed as 1 and if the end is not passed, it is considered as the
length of the array in that dimension.
Slicing: Examples

Example 1: Illustrates the use of slicing to retrieve employee ratings for a team of seven
employees in the first quarter from an array.

Example

import numpy as np

Employee_rating = np.array([1, 4, 3, 5, 6, 8, 9, 10, 12])

print(Employee_rating[1:7])

Output:

[4 3 5 6 8 9]
Slicing: Examples

Example
import numpy as np

Books =
Example 2: Printing the list of three subjects np.array(['Physics','DataScience','Maths','Python','Hadoop',
'OPPs', 'Java', 'Cloud'])
from the fourth index to the end
print(Books[5:])

Output:

[‘OPPs’ ‘Java’ ‘Cloud’]

Example
Example 3: Displaying the results of five import numpy as np
students who received certificates in Marks = np.array([60, 78, 45, 80, 97, 96, 77])
Python print(Marks[:5])

Output: [60 78 45 80 97]


Slicing Using Step Value: Example

The idea of the step value slicing is demonstrated in the


examples below.

Example 1 Example 2

Example Example

import numpy as np
import numpy as np
Y = np.array([18, 26, 34, 48, 54, 67,76])
X = np.array([8, 7, 6, 5, 4, 3, 2, 1])
print(Y[::5])
print(X[1:6:3])
Output:
Output:
[18 67]
[7 4]
Slicing: Two-Dimensional Array

The following example illustrates the concept of slicing to retrieve the elements:

Example

import numpy as np

Z = np.array([[11, 22, 33, 44, 55], [66, 77, 88, 99, 110]])

print(Z[0, 2:3])

Output:

[33]
Negative Slicing

Negative slicing is the same as negative indexing, which is interpreted as counting from the end of an
array. Basic slicing follows the standard rules of sequence slicing on a per-dimension basis (Including
using a step index).

Array Size = 4

1 2 3 4

Indices 0 1 2 4

-3:-1

Negative Slicing 23
Negative Slicing: Example

The following example illustrates the concept of negative slicing to retrieve the
elements:

Example Example
import numpy as np import numpy as np

Neg_slice = np.array([13, 34, 58, 69, 44, 56, 37,24]) Neg_slice = np.array([15, 26, 37, 48, 55, 64, 34])

print(Neg_slice[:-1]) print(Neg_slice[-4:-1])

Output: Output:

[13 34 58 69 44 56 37] [48 55 64]


arange Function in Python

It returns an array with evenly spaced elements within a given interval. Values are generated within
the half-open interval [0, stop) where the interval includes start but excludes stop. Its syntax is:

numpy.arange([start, ]stop, [step, ]dtype=None, *, like=None)

Parameters:
start: [OPTIONAL] START OF INTERVAL RANGE. BY DEFAULT, START EQUALS TO 0
stop: END OF AN INTERVAL RANGE
step: [OPTIONAL] STEP SIZE OF INTERVAL. BY DEFAULT, STEP SIZE EQUALS TO 1
dtype: TYPE OF OUTPUT ARRAY
arange Function in Python

The following example illustrates the use of arange function:

Example:

import numpy as np
print("Numbers:",type(np.arange(2,10)))

# A series of numbers from low to high


np.arange(2,10,1.2)
linspace Function

It returns an evenly spaced sequence in a specified interval. It is similar to arange function.


Instead of a step, it uses a sample number. Its syntax is:

numpy.linspace(start, stop, num = 50,endpoint = True,retstep = False,dtype = None)

Parameters :

start: START OF INTERVAL RANGE. BY DEFAULT, START EQUALS TO 0


stop: END OF AN INTERVAL RANGE
restep:IF TRUE, RETURN (SAMPLES, STEP). BY DEFAULT, RESTEP EQUALS TO FALSE
Num: [INT, OPTIONAL] NO. OF SAMPLES TO GENERATE
dtype: TYPE OF OUTPUT ARRAY

Return:
ndarray
step: [FLOAT, OPTIONAL], IF RESTEP EQUALS TO TRUEPARAMETERS
linspace Function

The following example illustrates the use of the linspace function:

Example:

print("Linearly spaced numbers between 1 and 6\n")


print((np.linspace(1,6,50)))
Random Number Generation

The random module in Python defines a series of functions that are used to generate or
manipulate random numbers. The random function generates a random float number
between 0.0 and 1.0.

Example:

import random
n = random.random()
print(n)
randn Function

The randn() function generates an array with the given shape and fills it with random values
that follow the standard normal distribution.

Example:

import random
print("Numbers from Normal distribution with
zero mean and standard deviation 1 i.e. standard
normal")
print(np.random.randn(5,3))
randint Function

The randint function is used to generate a random integer within the range [start, end].

Example:

#Generates a random number between a given positive range


random1 = random.randint(1,10)
print ("\nRandom numbers between 1 and 10 is %s" %
(random1))

#randint to print 2x2 matrix


print(np.random.randint(1,50,(2,2)))

Note: It works with integers. If float values are provided, a value error will be returned.
If string data is provided, a type error will be returned.
Random Module: Seed Function

The seed() method is used to initialize the random number generator.

Example:

import random
# Before adding seed function
for i in range(5):
print(random.randint(1,50))

# After adding seed function

for i in range(5):
random.seed(13)
print(random.randint(1,50))
Reshape Function

The numpy.reshape() function shapes an array without changing the data of the array.

Example:

import numpy as np

x=np.arange(12)

y=np.reshape(x, (4,3))

print(x)

print(y)
Ravel Function

Numpy.ravel() returns a contiguous flattened array (1D array containing all elements
of the input array).

There are two parameters of ravel function, which are:

order: {'C','F', 'A',


x: array_like
'K'}(optional)
Ravel Function: Example

An example of the ravel function is given below.

Example:

import numpy as np
x = np.array([[1, 3, 5], [11, 35, 56]])
y = np.ravel(x, order='F')
z = np.ravel(x, order='C')
p = np.ravel(x, order='A')
q = np.ravel(x, order='K')
print(y)
print(z)
print(p)
print(q)
Pandas
Pandas

Pandas is a Python package that allows you to work with large datasets.

It offers tools for data analysis, cleansing, exploration, and manipulation.


Pandas

Pandas library is built on top of the NumPy, which means NumPy is required for
operating the Pandas. NumPy is great for mathematical computing.

Pandas with several


functionalities

NumPy
Purpose of Pandas

Pandas is basically used for:

Intrinsic data
alignment

Data structures Data operation

Pandas

Data standardization ? Data handling


Benefits of Pandas

Below are some benefits that are listed:

Data representation
01 DataFrame and Series represent the data
in a way that is appropriate for data
analysis.

Clear code
The simple AI found in Pandas helps to
02 focus on the essential part of a code,
making it clear and concise.
Features of Pandas

It is a useful library for data scientists because of its numerous features.

Powerful data
structure

Fast and
High performance
efficient
merging and joining
data wrangling
of datasets

Pandas
Intelligent and Easy data
automated aggregation and
data alignment transformation

Tools for reading


and writing data
Data Structures

The two main libraries of Panda's data structure are:

• One-dimensional labeled array


Series
• Supports multiple data types • Two-dimensional labeled
array
• Supports multiple data types
DataFrame
• Input can be a series
• Input can be another
DataFrame
Understanding Series

Series is a one-dimensional array-like object containing data and labels or index.

Data 4 11 21 36
0 1 2 3

Label(Index)

Data alignment is intrinsic and cannot be broken until changed explicitly by a program.
Series

Series can be created with different data inputs:

Data Input
• Integer
• String
• Python • Data Structures
ndarray 2 3 8 4
Object • dict 0 1 2 3
• Floating • scalar
Point • list Label(Index)

Data Types Series


Series Creation

Key points to note while creating a series are:

• Import Pandas as it is the main library (Import Pandas as pd)


• Import NumPy while working with ndarrays (Import NumPy as np)
• Apply the syntax and pass the data elements as arguments

Data Structures

Basic Method

4 11 21 36
S = pd.Series(data, index = [index])
Series
Creating Series from a List

A sample that shows how to create a series from a list:

Import libraries

Pass list as an argument

Data
value
Index

Data
type

The index is not created for data but notices that data alignment is done automatically.
Creating Series of Values

A sample showing how to create a series of vlaues:

Provides a list of indices with .values

Prints the value at the chosen index


Total Series Calculation

Performs calculations across the entire


series
DataFrame

A DataFrame is a type of data structure that arranges data into a 2-dimensional table of rows
and columns, much like a spreadsheet.

Data Input
• Integer
• String
• ndarray 2 3 8 4
• Python
• dict 5 8 10 1
Object
• List 0 1 2 3
• Floating
• Series Label(Index)
Point
• DataFrame
Data Types DataFrame
Creating DataFrame from Lists

A sample showing how to create DataFrames from Lists:

Pass the list to the DataFrame


Creating DataFrame from Dictionary

This example shows how to create a DataFrame from a series of dictionary.

dict one dict two

Entire dict
A Viewing DataFrame

A DataFrame can be viewed by referring to the column names or using the describe function.
Series Functions in Pandas

These Pandas series functions are listed below.

ndim 2 3 size

empty 1 4 dtype

tail() 7 5 values

6
head()
Empty Function

It returns TRUE if a series is empty as shown below:

Output:

Example:

import pandas as pd
import numpy as np

#create a series with 4 random numbers


s = pd.Series(np.random.randn(4))
print ("Is the Object empty?")
print (s.empty)
ndim Function

A ndim series is created in the example shown below.

Output:

Example:

import pandas as pd
import numpy as np

#create a series with 4 random numbers


s = pd.Series(np.random.randn(4))
print (s)
print ("The dimensions of the object:")
print (s.ndim)
Size Function

It provides the count of the underlying data elements. This example


shows how to create a size series.

Output:

Example:

import pandas as pd
import numpy as np

#create a series with 4 random numbers


s = pd.Series(np.random.randn(2))
print (s)
print ("The size of the object:")
print (s.size)
dtype Function

It returns the dtype of the object. This example shows how to create a size series.

Output:

Example:

import pandas as pd
import numpy as np

#create a series with 4 random numbers


s = pd.Series(np.random.randn(4))
print(s)
Values Function

It returns the actual data in the series as an array. This example shows how to create size
series.

Output:

Example:

import pandas as pd
import numpy as np

#create a series with 4 random numbers


s = pd.Series(np.random.randn(4))
print(s)
print ("The actual data series is:")
print(s.values)
Head Function

It returns the first n rows. This example shows how to create a head and tail series.

Output:

Example:

import pandas as pd
import numpy as np

#create a series with 4 random numbers


s = pd.Series(np.random.randn(4))
print ("The original series is:")
print (s)
print ("The first two rows of the data series:")
print (s.head(2))
Tail Function

It returns the last n rows. This example shows how to create a head and tail series.

Output:

Example:

import pandas as pd
import numpy as np

#create a series with 4 random numbers


s = pd.Series(np.random.randn(4))
print ("The original series is:")
print (s)
print ("The last two rows of the data series:")
print (s.tail(2))
DataFrame Functions in Pandas

1 T (Transposes rows and


columns)
dtypes (Dataframe object's
2
dtypes are returned)
empty (True if NDFrame has
3 no content)

ndim (Number of array 4


dimensions)
shape (Returns a tuple that
5 represents the dimensionality
of the DataFrame)
size (NDFrame has a certain 6
number of elements)
values (NDFrame represented
7 using NumPy)

head() and tail() (Return the 8


first and last n rows)
T Function

It returns the DataFrame's transposed value. The rows and columns will switch places.

Example: Output:

import pandas as pd
import numpy as np

# Create a Dictionary of series


d=
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack’]),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

# Create a DataFrame
df = pd.DataFrame(d)
print ("The transpose of the data series is:")
print (df.T)
dtypes Function

It returns the data type of each column.

Example: Output:

import pandas as pd
import numpy as np

# Create a Dictionary of series


d=
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack’]),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

# Create a DataFrame
df = pd.DataFrame(d)
print ("The data types of each column are:")
print (df.dtypes)
Empty Function

It returns a Boolean value indicating whether the object is empty or not;


the value True denotes the existence of an empty object.

Example: Output:

import pandas as pd
import numpy as np

# Create a Dictionary of series


d=
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack’]),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

# Create a DataFrame
df = pd.DataFrame(d)
print ("Is the object empty?")
print (df.empty)
ndim Function

It returns the number of the object's dimensions. DataFrame is a 2D object by definition.

Output:
Example:

import pandas as pd
import numpy as np

# Create a Dictionary of series


d=
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack’]),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

# Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The dimension of the object is:")
print (df.ndim)
Shape Function

It returns a tuple that represents the DataFrame's dimensionality. The number of rows and
columns is represented by the tuple (a,b).

Output:
Example:

import pandas as pd
import numpy as np

# Create a Dictionary of series


d=
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack’]),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

# Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The shape of the object is:")
print (df.shape)
Size Function

It returns the number of elements in the DataFrame.

Output:

Example:

import pandas as pd
import numpy as np

# Create a Dictionary of series


d=
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack’]),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

# Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The total number of elements in our object is:")
print (df.size)
Values Function

It returns an NDarray containing the actual data from the DataFrame.

Output:

Example:

import pandas as pd
import numpy as np

# Create a Dictionary of series


d=
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack’]),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

# Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The actual data in our data frame is:")
print (df.values)
Head Function

The head () function is used to access the first n rows of a DataFrame or series.

Output:

Example:

import pandas as pd
import numpy as np

# Create a Dictionary of series


d=
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack’]),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

# Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print (df)
print ("The first two rows of the data frame is:")
print (df.head(2))
Tail Function

The last n rows are returned by the tail () function. This can be seen in the index values
of the example shown below.

Output:

Example:

import pandas as pd
import numpy as np

# Create a Dictionary of series


d=
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack’]),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

# Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print (df)
print ("The last two rows of the data frame is:")
print (df.tail(2))
datetime Module

The datetime module enables us to create custom date objects and perform various
operations on dates.

Date 1 Time
2

6
Timezone
3 Datetime

5
4
Tzinfo Timedelta
datetime Module: Example

In the example given below, the datetime module is used to find the current year,
current month, and current day:

Example:

from datetime import date

# Date object of today's date


today = date.today()

print("Current year:", today.year)


print("Current month:", today.month)
print("Current day:", today.day)
datetime Module: Example

In the example given below, the datetime module is used to get the current date:

Example:

from datetime import date

# Calling the today


# Function of date class
today = date.today()

print("Today's date is", today)


Pandas Functions: Example 1

The example returns the first five rows of a dataset using the df.head() function.

Output:

Example:

import pandas as pd
import numpy as np
df = pd.read_csv('driver-data.csv')
df.head()
Pandas Functions: Example 2

The example returns the dataset's shape using the df.shape() function.

Output:
Example:

import pandas as pd
import numpy as np
df = pd.read_csv('driver-data.csv')
df.shape
Pandas Functions: Example 3

The example uses df.info() function to return the information of the dataset.

Output:
Example:

import pandas as pd
import numpy as np
df = pd.read_csv('driver-data.csv')
df.info
Matplotlib
Matplotlib

Python’s matplotlib library is a comprehensive tool for building static, animated, and interactive
visualizations.

Matplotlib is an open-source library and can be used freely.


Installation of Matplotlib

• Install Python and PIP


• Install matplotlib using the command: C:\Users\userName>pip
install matplotlib
• Include the following import module statement in the code after
installing matplotlib
• Note: In the __version__ string of matplotlib there are two
underscore characters used

Example
import matplotlib

matplotlib.__version__

Output:

‘3.5.1’
Matplotlib: Advantages

It is a multi-platform data visualization tool; therefore, it is fast and efficient.

It can work well with many operating systems and graphics at the backend.

It has high-quality graphics and plots to print and view a range of graphs.
Matplotlib: Advantages

There are many contexts in which Matplotlib can be used, such as Jupyter Notebooks,
Python scripts, and the Python and iPython shells.

It has a huge community and cross-platform support, as it is an open-source


tool.

It has full control over graphs or plot styles.


Matplotlib: Toolkits

There are various toolkits that enhance matplotlib's functionality.

01 Basemap 04 GTK tools

02 Cartopy 05 Qt interface

03 Excel tools 06 Seaborn


Matplotlib: Examples

Pyplot

Pie charts 1 Plotting


12 2

Histograms 3 Markers
11

Bars 10 4 Line

9 5 Labels
Scatter

8 6
Seaborn.countplot() 7 Grid

Subplot
Pyplot

Pyplot is a collection of functions that enable matplotlib to perform tasks like MATLAB.

Example: Draw a pyplot to show the increase in the chocolate rate according to its weight.

Example Output
import matplotlib.pyplot as plt

import numpy as np

xpoints = np.array([100, 250 ])

ypoints = np.array([200, 400])

plt.xlabel("Chocolate rate")

plt.ylabel("Chocolate gram")

plt.plot(xpoints, ypoints)

plt.show()
Plotting

A plot() function is used to draw points in the diagram.

The plot() function draws a line from one point to another by default.

The function accepts parameters for specifying points.

The first parameter is an array of x-axis points.

The second parameter is an array of y-axis points.


Plotting: Example

Plot a graph to know the pay raise of employees over the years from 2010 to 2022.

Example Output
import matplotlib.pyplot as plt

import numpy as np

A1 = np.array([20000, 80000])

A2 = np.array([2010, 2022])

plt.xlabel("Employee salary")

plt.ylabel("Year")

plt.plot(A1, A2)

plt.show()
Marker Plot

Each point can be emphasized with a specific marker by using the keyword argument marker:

Example: Mark each point with a square to detect the number of, sick leaves applied by an
employee in the span of five days.

Example Output

import matplotlib.pyplot as plt

import numpy as np

Sick_leave_applied = np.array([4, 12, 2, 25])

plt.xlabel("No of days")

plt.ylabel("Difference of 5 days")

plt.plot(Sick_leave_applied, marker = 's')

plt.show()
Line Plot

To change the style of the plotted line, use the keyword argument linestyle, or the shorter ls.

Example: Draw a line in a diagram to change the style (Use a dotted line).

Output
Example

import matplotlib.pyplot as plt

import numpy as np

Average_marks = np.array([2, 10, 3, 15])

plt.plot(Average_marks, linestyle = 'dotted')

plt.show()
Label Plot

The xlabel() and ylabel() functions in pyplot can be used to label the x- and y-axis, respectively.

Example: Create a diet chart including labels like protein intake and calories burned.

Example Output
import numpy as np

import matplotlib.pyplot as plt

B1 = np.array([80, 85, 90, 95, 100, 105, 110,


115, 120, 125])

B2 = np.array([240, 250, 260, 270, 280, 290,


300, 310, 320, 330])

plt.plot(B1, B2)

plt.title("Diet chart")

plt.xlabel("Proteins intake")

plt.ylabel("Calorie Burnage")

plt.show()
Grid Plot

The grid() function in pyplot can be used to add grid lines to the plot.

Example: Create a graph on fuel rates and add grid lines to it.

Example
import numpy as np Output
import matplotlib.pyplot as plt

Y1 = np.array([80, 85, 90, 95, 100, 105, 110,


115, 120, 125])

Y2 = np.array([240, 250, 260, 270, 280, 290,


300, 310, 320, 330])

plt.title("Fuel rate")

plt.xlabel("Litre")

plt.ylabel("Price")

plt.plot(Y1, Y2)

plt.grid()

plt.show()
Subplot

With the subplot() function, multiple plots can be drawn in a single diagram.

Example: Create two subplots in a single diagram.

Example
import matplotlib.pyplot as plt Output
import numpy as np

x1 = np.array([2000, 2010, 2020, 2030])

y1 = np.array([6, 3, 12, 10])

plt.subplot(1, 2, 1)

plt.plot(x1,y1)

x2 = np.array([2000, 2010, 2020, 2030])

y2 = np.array([10, 20, 30, 40])

plt.subplot(1, 2, 2)

plt.plot(x2,y2)

plt.show()
Scatter Plot

For each observation, the scatter() function plots a single dot. It requires two identical-length
arrays, one for the values on the x-axis and the other for the values on the y-axis.

Example: Create a simple graph to show a scatter plot.

Output
Example
import matplotlib.pyplot as plt

import numpy as np

A =
np.array([2,3,4,11,12,17,22,39,14,21,23,9,6])

B =
np.array([59,26,67,78,121,23,20,69,93,45,24,1
5,66])

plt.scatter(A, B)

plt.show()
Bar Plot

The bar() function in pyplot can be used to create bar graphs.

Example: Create a bar graph using the bar() function in pyplot.

Example Output

import matplotlib.pyplot as plt

import numpy as np

x = np.array(["S", "I", "M", "L"])

y = np.array([4, 8, 12, 16])

plt.bar(x,y)

plt.show()
Histogram Plot

A graph displaying frequency distributions is called a histogram. It is a graph that displays how
many observations were made during each interval.

Example: Create a histogram chart in pyplot to observe the height of 250 people.

Example Output

import matplotlib.pyplot as plt

import numpy as np

A = np.random.normal(134, 20, 450)

plt.hist(A)

plt.show()
Pie Plot

The pie() function in pyplot can be used to create pie charts.

Example: Create a simple pie chart in pyplot using the pie() function.

Example Output

import matplotlib.pyplot as plt

import numpy as np

plt.title("Population rate in 2010")

y = np.array([45, 25, 35, 15])

plt.pie(y)

plt.show()
Count Plot

The counts of observations in each categorical bin are displayed using bars using the
seaborn.countplot() method.

Example: For a single categorical variable, display value counts.

Example Output

import seaborn as sns

import matplotlib.pyplot as plt

# read a tips.csv file from seaborn library

df = sns.load_dataset('List')

# count plot on single categorical variable

sns.countplot(x ='time', data = df)

# Show the plot

plt.show()
SciPy
SciPy

SciPy is a free and open-source Python library used for scientific and technical computing.

It has greater optimization, statistics, and signal processing functions.


SciPy

SciPy has built-in packages that help in handling the scientific domains.

Mathematics
integration Statistics
(Normal
distribution)

Linear algebra

Multidimensional
image processing
Mathematics Language
constants integration
SciPy and Its Characteristics

Built-in mathematical libraries 1 High-level commands for data


and functions 2 manipulation and visualization

Simplifies scientific
application development 6
Efficient and fast data
3 processing

Large collection of sub-packages


for different scientific domains 5
4
Integrates well with multiple
systems and environments
SciPy Sub-Package

SciPy has multiple sub-packages which handle different scientific domains.

cluster ndimage
Clustering algorithms N-dimensional image processing

constants odr
Physical and mathematical constant Orthogonal distance regression

fftpack optimize
Fast Fourier Transform routines Optimization and root-finding routines

integrate signal
Integration and ordinary differential equation solvers Signal processing

Spatial sparse
Spatial data structures and algorithms Sparse matrices and associated routines

interpolate weave
Interpolation and smoothing splines C/C++ integration

IO stats
Input and Output Statistical distributions and functions

special
linalg
Special functions
Linear algebra
SciPy Packages

Some widely used packages are:

IO

Optimize
Integration

Linear algebra Weave packages

Statistics
SciPy Packages: Example 1

Let's look at SciPy with scipy.linalg as an example.

Output:

Example:

from scipy import linalg


import numpy as np

two_d_array = np.array([ [4,5], [3,2] ])

linalg.det( two_d_array )

The example above calculates the determinant of a two-dimensional matrix.


SciPy Packages: Example 2

Let's look at SciPy with scipy.integrate as an example.

Output:

Example:

from scipy import integrate


f = lambda x : x**2
integration = integrate.quad(f, 0 , 1)
print(integration)

In this example, the function returns two values in which the first value is integration, and the
second value is the estimated error in integral.
Scikit-Learn
Scikit-Learn

Scikit is a powerful and modern machine learning Python library. It is used for fully- and
semi-automated data analysis and information extraction.

Allows many tools to Provides a collection Consists of many


identify, organize, and of free downloadable libraries to learn and
solve real-life problems datasets predict
Scikit-Learn

Scikit is a powerful and modern machine learning Python library. It is used for fully- and
semi-automated data analysis and information extraction.

Provides model Maintains model Provides open-source


support for every persistence community and
problem type vendor support
Scikit-Learn

• It is also known as sklearn.


• It is used to build a machine learning model
that has various features such as classification,
regression, and clustering.
• It includes algorithms such as k-means, k-
nearest neighbors, support vector machine
(SVM), and decision tree.
Scikit-Learn: Problem-Solution Approach

Scikit-learn helps data scientists and machine learning engineers to solve problems
using the problem-solution approach.

Model Estimator Model Model


Predictions Accuracy
selection object training tuning
Scikit-Learn: Problem-Solution Considerations

Points to be considered while working with a scikit-learn dataset or loading the data to
scikit-learn:

Create separate objects for features and responses

Ensure features and responses only have numeric values

Verify that the features and responses are in the form of a NumPy ndarray

Check features and responses have the same shape and size as the array

Ensure features are always mapped as x, and responses as y


Scikit-Learn: Prerequisite for Installation

The libraries that must be installed before installing Scikit-learn are:

Pandas SciPy

Libraries

NumPy Matplotlib
Scikit-Learn: Installation

To install scikit-learn in Jupyter notebook via pip, enter the code:


!pip install scikit-learn

To install scikit-learn via command prompt, enter the code:


conda install scikit-learn
Scikit-Learn: Models

Some popular groups of models provided by scikit-learn are:

1 Clustering
5 Feature selection

2 Cross-validation 6 Parameter tuning

3 Ensemble methods 7 Supervised learning algorithms

4 Feature extraction 8 Unsupervised learning algorithms


Scikit-Learn: Models

Some popular groups of models provided by scikit-learn are:

Clustering It is used for grouping unlabeled data.

It is a technique to check the accuracy of


Cross-validation
supervised models on unseen data.

Scikit-learn uses ensemble methods to combine


Ensemble methods the outcomes of various supervised models for
better predictions.

It defines the attributes in image and text data by


Feature extraction
extracting features from the data.
Scikit-Learn: Models

Some popular groups of models provided by scikit-learn are:

It identifies useful attributes to create supervised


Feature selection
models.

It refers to the process of finding hyper-parameters


Parameter tuning that produce the best outcome.

It includes multiple supervised learning


Supervised learning
algorithms techniques, including linear regression, support
vector machine, decision tree, and others.

Unsupervised learning It includes all the main unsupervised learning


algorithms algorithms. Along with clustering, factor analysis,
PCA, and unsupervised neural networks.
Scikit-Learn: Datasets

Scikit-learn provides toy datasets that can be used for clustering, regression, and classification
problems. These datasets are quite helpful while learning new libraries.
Boston house prices
1
2 Iris plants

Diabetes 6
Datasets
3 Wine recognition

Digits 5
4
Breast cancer

The datasets can be found in sklearn.datasets package.


Import Datasets Using Scikit-Learn

To import the toy dataset, it is required to use the sklearn library with the import
keyword as shown below:

from sklearn import datasets

A load function is used to load each dataset and its syntax is shown below:

load_dataset()
Here, the dataset refers to the name of the dataset.
Import Datasets Using Scikit-Learn: Example

The below example illustrates how to load the wine dataset from the sklearn library
and store it into a variable called data.

data = datasets.load_breast_cancer()

Here, the load function will not return data in the tabular format. It will return a
dictionary with the key and value.
Import Datasets Using Scikit-Learn: Example

The below example shows that the dataset is present in a key-value pair.

Example:

import pandas as pd
import numpy as np
from sklearn import datasets
data = datasets.load_breast_cancer()
data
Import Datasets Using Scikit-Learn: Example

The keys of a dataset can be printed as shown below:

Example:

print(data.keys())
data

Here, data denotes all the feature data in a NumPy array.


Import Datasets Using Scikit-Learn: Example

Suppose a user needs to know the dataset column names or features present in the
dataset. Then the below syntax can be used:

Example:

print(data.features_names)

Here, feature_names denotes the names of the feature variables, in other


words, the names of the columns in the dataset.
Import Datasets Using Scikit-Learn: Example

The target_names is the name of the target variable, in other words, the
name of the target column.

Example:

print(data.target.names)

Here, malignant and benign denote the values present in the target column.
Import Datasets Using Scikit-Learn: Example

The target indicates the actual labels in a NumPy array, Here, the target data is one column
that classifies the tumor as either 0 indicating malignant or 1 for benign.

Example:

data.target
Import Datasets Using Scikit-Learn: Example

DESCR represents the description of the dataset, and the filename is the path to the actual
file of the data in CSV format.

Example:

print(data.DESCR)
Print(data.filename)
Working with the Dataset

Scikit-learn provides various datasets to read the dataset. It is required to import the Pandas
library as shown below:

Example:
# Import pandas
import pandas as pd
# Read the DataFrame, first using the
feature data
df = pd.DataFrame(data.data,
columns=data.feature_names)
# Add a target column, and fill it with the
target data
df['target'] = data.target
# Show the first five rows
df.head()

Note: The dataset has been loaded into the Pandas DataFrame.
Preprocessing Data in Scikit-Learn

The sklearn.preprocessing package provides a series of common utility functions and


transformer classes to transform raw feature vectors into a representation that is best fitted
for the downstream estimators. These are:

Standardization, or mean
removal and variance Normalization
scaling

Encoding categorical
Imputation of missing values
features
Standardization

It is a scaling technique where data values are normally distributed. Also, standardization tends to
make the dataset's mean equal to 0 and its standard deviation equal to 1.

cnt
300

200
m = 10.0
S = 30.0
100

0
100 0 100 200

Preprocessing with
Standardization
Standardization
cnt
300

m = 0.0
200
S = 1.0

100

4.0 - 2.0 0.0 2.0 4.0


Standardization

The preprocessing module provides the StandardScaler utility class to perform the following
operation on the dataset.

In the example, a random function generates the data using a random


function in three columns x,y, and z.

Example:
import numpy as np
import pandas as pd Import libraries
#Generating normally distributed data

df is DataFrame df = pd.DataFrame({
‘x’: np.random.normal(0,3,10000),
‘y’: np.random.normal(6,4,10000),
‘z’: np.random.normal(-6,6,10000)
})

mean Total distribution


of data
Standard deviation
Standardization

Next, it is required to see the plot to know whether the data is on a different or
the same scale.

Example:

# Plotting data

df.plot.kde()

Here, x,y, and z are on different scales. So, it is required to keep all data on
the same scale to improve any algorithm's performance.
Standardization

Next, to scale the values of x,y, and z to the same scale, a standard scaler is used. The x, y, and
z values are displayed on the same scale in the graph below:

Example:

from sklearn.preprocessing import StandardScaler


standardscaler = StandardScaler()
data_tf = standardscaler.fit_transform(df)
df = pd.DataFrame(data_tf,columns=['x','y','z'])
df.plot.kde()
Normalization

Normalization is a technique in Scikit-learn that involves rescaling each observation to assume


a length of 1, which is a unit form in linear algebra. Normalizer class software can be best
used for normalizing data in Python.
Normalization

To implement normalization, the following functions are used to achieve functionality:

It computes the mean and standard deviation for a given


fit(data) feature, which helps in further scaling.

It generates a transformed dataset using mean and


transform(data)
standard deviation calculated using the .fit() method.

It is a combination of fit and transform methods.


fit_transform()
It increases the efficiency of the model.
Normalization Using MinMaxScaler

MinMaxScaler transforms each feature to a given range using scaling. This estimator scales
and translates each feature individually such that it is in the given range on the training set,
for example, between zero and one.

Note: This technique is sensitive to outliers.


MinMaxScaler: Example

The preprocessing module provides the MinMaxScaler utility class to perform the following
operation on the dataset.

In the example, a random function generates the data using a random


function in three columns x,y, and z.

Example:

df = pd.DataFrame({
# positive skew
'x': np.random.chisquare(8,1000),
# negative skew
'y': np.random.beta(8,2,1000) * 40,
# no skew
'z': np.random.normal(50,3,1000)
})
MinMaxScaler: Example

Next, it is required to see the plot to know whether the data is normalized.

Example:

df.plot.kde()
MinMaxScaler: Example

Next, the MinMaxScaler function normalizes the values of x,y, and z.

Example:

from sklearn.preprocessing import MinMaxScaler


minmax = MinMaxScaler()
data_tf = minmax.fit_transform(df)
df= pd.DataFrame(data_tf,columns = ['x1','x2','x3'])
df.plot.kde()
Imputation of Missing Values

Algorithms cannot process missing values. Imputers infer the value of missing data
from existing data.

Import SimpleImputer class from


Example:
scikit-learn

import numpy as np
from sklearn.impute import SimpleImputer
imp_values = SimpleImputer(missing_values=np.nan, strategy='mean')
imp_values.fit([[3,5],[np.nan,7],[1,3]])
X = [[np.nan, 2],[6, np.nan],[7,6]]
print(imp_values.transform(X))

SimpleImputer class replaces


the NaN values with mean
Categorical Variables

A categorical variable is a variable that can take a limited and fixed number of possible
values, assigning each individual or other unit of observation to a particular group on the
basis of some qualitative property.

Roll of a six-sided dice: possible Demographic information of a


Example
outcomes are 1, 2, 3, 4, 5, or 6 population: gender, disease status
Encoding Categorical Variables

To deal with categorical variables encoding schemes are used, such as:

Ordinal encoding One-hot encoding


Ordinal Encoding

It assigns each unique value to a different variable.

Example:

data = pd.DataFrame({
'Age':[12,34,56,22,24,35],
‘Income':['Low','Low','High','Medium','Medium','High']
})
data

data.Income.map({‘Low’:1,’Medium’:2,’High’:3})

This strategy assumes


that the categories are
ordered: “Low" (1) <
“Medium" (2) < “High" (3)
One-Hot Encoding

It adds extra columns to the original data that indicate whether each possible value is
present or not.

Color Red Yellow Green

Red 1 0 0

Red 1 0 0

Yellow 1 1 0

0 0 1
Green
0 1 0
Yellow
One-Hot Encoding: Example

The following example explains the concept of one-hot encoding:

Example:

from sklearn import datasets


from sklearn.preprocessing import OneHotEncoder
from seaborn import load_dataset
# Dataset loaded into a Pandas DataFrame data
data = load_dataset('penguins')
# Instantiated a OneHotEncoder object and assigned it to ohe
ohe = OneHotEncoder()
#Fitting and transform data using the fit_transform() method
transform = ohe.fit_transform(data[['island']])
# It will return the array version of the transform data using the
# .toarray() method
print(transform.toarray())
# Three columns are present in the array in the binary form because
there are three unique values in the Island column
One-Hot Encoding: Example

The following example explains the concept of one-hot encoding:

Example:
# Print one hot encoded categories to know the
# column labels using the .categories_ attribute of
# the encoder

print(ohe.categories_)

# Add these columns as a separate column in the #


DataFrame

data[ohe.categories_[0]] = transform.toarray()
data
Key Takeaways

SciPy is a free and open-source Python library used for scientific and
technical computing.

NumPy is a library that consists of multidimensional array objects


and a collection of functions for manipulating them.

Matplotlib is a visualization tool that uses a low-level graph plotting


library written in Python.

Scikit is a powerful and modern machine learning Python library.


It is used for fully- and semi-automated data analysis and
information extraction.
Knowledge Check
Knowledge
Check

1 Which of the following SciPy sub-packages is incorrect?

A. scipy.cluster

B. scipy.source

C. scipy.interpolate

D. scipy.signal
Knowledge
Check

1 Which of the following SciPy sub-packages is incorrect?

A. scipy.cluster

B. scipy.source

C. scipy.interpolate

D. scipy.signal

The correct answer is B


scipy.source is not a sub-package of SciPy.
Knowledge
Check
__________ is an important library used for analyzing data.
2

A. Math

B. Random

C. Pandas

D. None of the above


Knowledge
Check
__________ is an important library used for analyzing data.
2

A. Math

B. Random

C. Pandas

D. None of the above

The correct answer is C


Pandas is an important library used for analyzing data.
Knowledge
Check

3 Matplotlib is a ___________plotting library.

A. 1D

B. 2D

C. 3D

D. All of the above


Knowledge
Check

3 Matplotlib is a ___________plotting library.

A. 1D

B. 2D

C. 3D

D. All of the above

The correct answer is B

Matplotlib is a 2D plotting library.

You might also like