Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
39 views

LT2 - 07 - Numpy Matplotlib Pandas

The document discusses NumPy, Matplotlib, and Pandas libraries. It provides an overview of NumPy including its uses, how to install and import it, how to create NumPy arrays, and common array operations like indexing, joining, searching, sorting, and mathematical functions. NumPy arrays can be manipulated much faster than regular Python lists and are commonly used for data science tasks.

Uploaded by

Le Thi Minh Thi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

LT2 - 07 - Numpy Matplotlib Pandas

The document discusses NumPy, Matplotlib, and Pandas libraries. It provides an overview of NumPy including its uses, how to install and import it, how to create NumPy arrays, and common array operations like indexing, joining, searching, sorting, and mathematical functions. NumPy arrays can be manipulated much faster than regular Python lists and are commonly used for data science tasks.

Uploaded by

Le Thi Minh Thi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 101

NUMPY, MATPLOTLIB, PANDAS

Lê Ngọc Hiếu
hieu.ln@ou.edu.vn
Objectives
• Hiểu được khái niệm và các thao tác với thư viện NUMPY,
MATPLOTLIB và PANDAS.
• Nắm được các thao tác với thư viện NUMPY, MATPLOTLIB và
PANDAS.
• Sử dụng các thư viện NUMPY, MATPLOTLIB và PANDAS vào các bài
toán xử lý và phân tích dữ liệu.

2
1. Numpy Package
2. Matplotlib Package
Contents 3. Pandas Package
4. Exercises

3
Numpy Package

4
What is Numpy?
• NumPy is a Python library used for working with arrays.
• It also has functions for working in domain of linear algebra, fourier
transform, and matrices.
• NumPy was created in 2005 by Travis Oliphant. It is an open source
project and you can use it freely.
• NumPy stands for Numerical Python.
Why use Numpy?
• In Python we have lists that serve the purpose of arrays, but they are
slow to process.
• NumPy aims to provide an array object that is up to 50x faster than
traditional Python lists.
• The array object in NumPy is called ndarray, it provides a lot of
supporting functions that make working with ndarray very easy.
• Arrays are very frequently used in data science, where speed and
resources are very important.

6
Instal and Start to Use Numpy
• Already included in Anaconda.
• If you wish to install Numpy, open Command Prompt Window (CMD)
and type: pip install numpy
• To use numpy, import numpy package before using its functions.

7
Numpy Array
• A numpy array is a grid of values, all of the same type, and is indexed by
a tuple of nonnegative integers.
• The number of dimensions is the rank of the array.
• Syntax to get the rank of a numpy array: <array_name>.ndim
• The shape of an array is a tuple of integers giving the size of the array
along each dimension.
• Syntax to get the shape of a numpy array: <array_name>.shape
• We can initialize numpy arrays from nested Python lists.

8
Numpy Array

9
Create Numpy Array
• There are several way to create Numpy arrays.
• Consider three ways:
• Convert from Python List or Type using the array function.
• Create array with initialize values using ones, or zeros function.
• Create a sequence of numbers using arrange or linspace function.

10
Create Numpy Array using array function

11
Create Numpy Array using ones, or zeros function
Syntax:
• <var_name> = np.zeros((ndim,
nrows, ncolumns))

• <var_name> = np.ones((ndim,
nrows, ncolumns))

12
Create Numpy array using arrange or linspace function
Syntax:
• <var_name> = np.array(start,
end, step)

• <var_name> = np.linspace(start,
end, number_of_elements)

13
Array Indexing - Slicing
• Similar to Python
lists, numpy arrays
can be sliced.
• Since arrays may be
multidimensional,
you must specify a
slice for each
dimension of the
array.

14
Array Indexing - Integer array indexing

Integer array
indexing allows
you to construct
arbitrary arrays
using the data
from another
array.

15
NumPy Data Types
• Basic Data Types in Python:
• strings - used to represent text data, the text is given under quote marks.
e.g. "ABCD"
• integer - used to represent integer numbers. e.g. -1, -2, -3
• float - used to represent real numbers. e.g. 1.2, 42.42
• boolean - used to represent True or False.
• complex - used to represent complex numbers. e.g. 1.0 + 2.0j, 1.5 + 2.5j

16
NumPy Data Types
• NumPy has some extra data types, and refer to data types with one
character:
• i - integer
• b - boolean
• u - unsigned integer
• f - float
• c - complex float
• m - timedelta
• M - datetime
• O - object
• S - string
• U - unicode string
17
NumPy Data Types
• Checking the Data Type of an Array:
• The NumPy array object has a property
called dtype that returns the data type of
the array.
• Creating Arrays With a Defined Data Type:
• The array() can take an optional argument
dtype that allows us to define the expected
data type of the array elements.
• Converting Data Type on Existing Arrays:
• The astype() function creates a copy of
the array and allows you to specify the data
type as a parameter.
18
NumPy Array Copy vs View
• The main difference between a copy and a view of an array is that the
copy is a new array, and the view is just a view of the original array.

• The copy owns the data and any changes made to the copy will not
affect original array, and any changes made to the original array will not
affect the copy.

• The view does not own the data and any changes made to the view will
affect the original array, and any changes made to the original array will
affect the view.
19
NumPy Array Copy vs View

20
NumPy Array Reshaping
• Reshaping means changing the shape of an array.

• The shape of an array is the number of elements in each dimension.

• By reshaping we can add or remove dimensions or change number of


elements in each dimension.

21
22
23
NumPy Array Iterating – Use for loop

24
NumPy Joining Array

25
NumPy Joining Array - concatenate
• Concatenation refers to joining. This function is used to join two or
more arrays of the same shape along a specified axis.
• Syntax: numpy.concatenate((array1, araay2, ...), axis)
• If axis is not explicitly passed, it is taken as 0.

26
NumPy Joining Array - concatenate

27
NumPy Joining Array - stack
• This function joins the sequence of arrays along a new axis.
• Syntax: numpy.stack(arrays, axis)

28
NumPy Joining Array - hstack
• Variants of numpy.stack function to stack so as to make a single array
horizontally.
• Syntax: numpy.hstack(array1, array2, …, arrayn)

29
NumPy Joining Array - vstack
• Variants of numpy.stack function to stack so as to make a single array
vertically.
• Syntax: numpy.vstack(array1, array2, …, arrayn)

30
NumPy Splitting Array - numpy.split
• Syntax: numpy.split(array, indices_or_sections, axis)

31
NumPy Searching Arrays - where() method
• where() method search an array for a certain value and return the
indexes that get a match.

32
NumPy Searching Arrays - searchsorted() method
• searchsorted() method performs a binary search in the array and
returns the index where the specified value would be inserted to
maintain the search order.

33
NumPy Sorting Arrays
• Syntax: numpy.sort(a, axis=- 1, kind=None, order=None)
• a: Array to be sorted.
• axis: int or None, optional. Axis along which to sort. If None, the array is
flattened before sorting. The default is -1, which sorts along the last axis.
• kind{‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, optional
• order: str or list of str, optional. When a is an array with fields defined,
this argument specifies which fields to compare first, second, etc.

34
NumPy Sorting Arrays

35
NumPy Sorting Arrays

36
Numpy Array Math
Basic Elementwise Operators
• Elementwise sum: + or numpy.add
• Elementwise difference: - or numpy.subtract
• Elementwise product: numpy.multiply
• Elementwise division: numpy.divide
• Elementwise square root: numpy.sqrt

37
38
Numpy Array Math
• Inner products of vectors:
• Multiply a vector by a matrix:
• Multiply matrices
• Syntax: Given matrix/vector a
and b
• numpy.dot(a, b)
• a.dot(b)

39
Full list of Numpy mathematical functions
• Link: https://numpy.org/doc/stable/reference/routines.math.html
• Categories:
• Trigonometric functions • Floating point routines
• Hyperbolic functions • Rational routines
• Rounding • Arithmetic operations
• Sums, products, differences • Handling complex numbers
• Exponents and logarithms • Extrema Finding
• Other special functions • Miscellaneous

40
Numpy - Broadcasting
• The term broadcasting refers to the ability of NumPy to treat arrays of
different shapes during arithmetic operations.
• If the dimensions of two arrays are dissimilar, element-to-element
operations are not possible.
• However, operations on arrays of non-similar shapes is still possible in
NumPy, because of the broadcasting capability.
• The smaller array is broadcast to the size of the larger array so that they
have compatible shapes.

41
Numpy - Broadcasting

Figure from:
https://www.tutorialspoint.com/numpy/numpy_broadcasting.htm

42
Matplotlib Package

Reference:
https://www.w3schools.com/python/matplotlib_intro.asp 43
What is Matplotlib?
• Matplotlib is a low level graph plotting library in python that serves as a
visualization utility.

• Matplotlib was created by John D. Hunter.

• Matplotlib is open source, and we can use it freely.

• Matplotlib is mostly written in python, a few segments are written in C,


Objective-C and Javascript for Platform compatibility.
Install Matplotlib
• Already included in Anaconda.
• If you wish to install Matplotlib, open Command Prompt Window
(CMD) and type: pip install matplotlib
• To use Matplotlib, import Matplotlib package before using its functions.

45
Matplotlib Pyplot
• Most of the Matplotlib utilities lies under the pyplot submodule, and
are usually imported under the plt alias:
• Syntax: import matplotlib.pyplot as plt

46
Basic Plot Type
• Line plot
• Scatter plot

47
Line plot
• The plot() function is used to draw points (markers) in a diagram.
• By default, the plot() function draws a line from point to point.
• Basic syntax: plt.plot(xpoints, ypoints)
• xpoints is an array containing the points on the x-axis.
• ypoints is an array containing the points on the y-axis.

48
plot() function with default X-Points
• If the points in the x-axis are not specified, they will get the default
values 0, 1, 2, 3, …

49
Plotting Options
• All options:
https://matplotlib.org/2.1.2/api/_as_gen/matplotlib.pyplot.plot.html

50
Plot Label and Title
• To set a label for the x- and y-axis: • To set a title for the plot:
• xlabel() • title()
• ylabel()

51
Legends

52
Legends

53
Legend Position

54
Legend Position
Location String Location Code Location String Location Code

'best' 0 'center left' 6

'upper right' 1 'center right' 7

'upper left' 2 'lower center' 8

'lower left' 3 'upper center' 9

'lower right' 4 'center' 10

'right' 5
55
Legend Position - bbox_to_anchor

56
57
Scatter Plots

58
59
Customizing Markers in Scatter Plots
• Four main features of the markers used in a scatter plot that can be
customized:
• Size
• Color
• Shape (https://matplotlib.org/stable/api/markers_api.html#module-
matplotlib.markers)
• Transparency

60
61
62
63
64
65
66
67
ColorMap
Available ColorMaps: https://www.w3schools.com/python/matplotlib_scatter.asp

68
69
plt.scatter(x, y, c=colors, cmap='Accent')

70
plt.scatter(x, y, c=colors, cmap='Blues')

71
72
Bar Plot

73
74
Matplotlib Multiple Bar Chart

75
76
Create Multiple Bar Chart
• Syntax: plt.bar(x, height, width=None, bottom=None, align='center',
data=None, **kwargs)
• The parameters are defined below:
• x: specify the x-coordinates of the bars.
• height: y-coordinates specify the height of the bars.
• width: specify the width of the bars.
• bottom: specify the y coordinates of the bases of the bars.
• align: alignment of the bars.

77
Matplotlib Histograms

A histogram is a graph showing


frequency distributions.

78
Syntax to create a histogram plot:
matplotlib.pyplot.hist(x, bins=None, range=None, density=False, weights=None,
cumulative=False, bottom=None, histtype='bar', align='mid', orientation='vertical',
rwidth=None, log=False, color=None, label=None, stacked=False, *, data=None,
**kwargs)
79
Matplotlib Histograms - Options
• bins:
• If bins is an integer, it defines the number of equal-width bins in the
range.
• If bins is a sequence, it defines the bin edges, including the left edge of
the first bin and the right edge of the last bin; in this case, bins may be
unequally spaced. All but the last (righthand-most) bin is half-open
• Example: if bins is [1, 2, 3, 4] then the first bin is [1, 2), and the second [2,
3). The last bin is [3, 4].
• rwidth (default: None)
• The relative width of the bars as a fraction of the bin width. If None,
automatically compute the width.
80
plt.hist(commutes, bins=10, edgecolor='black')
81
plt.hist(commutes, bins=20, edgecolor='black')
82
Matplotlib Pie Charts

83
Matplotlib Pie Charts

84
Matplotlib Pie Charts

85
Pandas Package

86
Introduction

Pandas is a Python library.


Pandas is used to analyze data.
What is Pandas?
• Pandas is a Python library used for working with data sets.

• It has functions for analyzing, cleaning, exploring, and manipulating


data.

• The name "Pandas" has a reference to both "Panel Data", and "Python
Data Analysis" and was created by Wes McKinney in 2008.

88
Why Use Pandas?
• Pandas allows us to analyze big data and make conclusions based on
statistical theories.

• Pandas can clean messy data sets, and make them readable and
relevant.

• Relevant data is very important in data science.

89
What Can Pandas Do?
• Pandas gives you answers about the data.

• For examples:
• Is there a correlation between two or more columns?
• What is average value?
• Max value?
• Min value?
• Pandas are also able to delete rows that are not relevant, or contains
wrong values, like empty or NULL values. This is called cleaning the data.

90
Pandas Getting Started
• Install Pandas: pip install pandas
• Import Pandas:
• import pandas
• import pandas as pd

91
Pandas Series
• A Pandas Series is like a column in a table.
• It is a one-dimensional array holding data of any type.

92
Pandas Series - Labels
• If nothing else is specified, the values are labeled with their index
number.
• First value has index 0, second value has index 1 etc.
• This label can be used to access a specified value.

Output: 7

93
Pandas Series - Create Labels
• With the index argument, you can name your own labels.

Output: 7

94
Key/Value Objects as Series

95
Pandas DataFrames
• A Pandas DataFrame is a 2 dimensional data structure, like a 2
dimensional array, or a table with rows and columns.

96
Access to DataFrame Elements
• Syntax: pandas.loc[row_index][column_index]

97
Named Indexes.
• With the index argument, you can name your own indexes

98
Pandas Read CSV
• What is CSV (comma separated value) files:
• A simple way to store big data sets.
• CSV files contains plain text and is a well know format that can be read by
everyone including Pandas.

100
Load the CSV into a DataFrame
• Use read_csv() function.
• Syntax: pandas.read_csv(csv_filename)

Link to download ‘data.csv’:


https://www.w3schools.com/python/pandas/data.csv
101
Pandas Read JSON
• Use read_json() function.
• Syntax: pandas.read_json(json_filename)

Link to download ‘data.csv’:


https://www.w3schools.com/python/pandas/data.js
102

You might also like