Numpy
Numpy
Numpy
Unit-03
Capturing, Preparing and Working with data
Outline
Looping
• When we open file using with we need not to close the file.
Example : Write file in Python
• write() method will write the specified data to the file.
readdemo.py
1 with open('college.txt','a') as f:
2 f.write('Hello world')
• If we open file with ‘w’ mode it will overwrite the data to the existing file or will
create new file if file does not exists.
• If we open file with ‘a’ mode it will append the data at the end of the existing file or
will create new file if file does not exists.
Reading CSV files without any library functions
• A comma-separated values file is a delimited text file that uses a comma to separate
values.
• Each line of is a data record, Each record consists of many fields, separated by
commas. Book1.csv readlines.py
Unit-03.01
Lets Learn
NumPy
NumPy
• NumPy (Numeric Python) is a Python library to manipulate arrays.
• Almost all the libraries in python rely on NumPy as one of their main building
block.
• NumPy provides functions for domains like Algebra, Fourier transform etc..
• NumPy is incredibly fast as it has bindings to C libraries.
Install :
• OR
• conda install numpy
• pip install numpy
NumPy Array
• The most important object defined in NumPy is an N-dimensional array type called ndarray.
• It describes the collection of items of the same type, Items in the collection can be accessed using a zero-based
index.
• An instance of ndarray class can be constructed in many different ways, the basic ndarray can be created as
below.
syntax
import numpy as np
a= np.array(list | tuple | set | dict)
numpyarray.py Output
1 import numpy as np <class 'numpy.ndarray'>
2 a=np.array(['swaminarayan','Insitute','gandh [‘swaminarayan' 'Insitute'
inagar']) ‘gandhinagar']
3 print(type(a))
4 print(a)
NumPy Array (Cont.)
• arange(start,end,step) function will create NumPy array starting from start till end (not included) with specified
steps.
numpyarange.py Output
1 import numpy as np [0 1 2 3 4 5 6 7 8 9]
2 b = np.arange(0,10,1)
3 print(b)
• zeros(n) function will return NumPy array of given shape, filled with zeros.
numpyzeros.py Output
1 import numpy as np [0. 0. 0.]
2 c = np.zeros(3)
3 print(c) [[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]
4 c1 = np.zeros((3,3)) #have to give as tuple
5 print(c1)
• ones(n) function will return NumPy array of given shape, filled with ones.
NumPy Array (Cont.)
• eye(n) function will create 2-D NumPy array with ones on the diagonal and zeros elsewhere.
numpyeye.py Output
1 import numpy as np [[1. 0. 0.]
2 b = np.eye(3) [0. 1. 0.]
3 print(b) [0. 0. 1.]]
• linspace(start,stop,num) function will return evenly spaced numbers over a specified interval.
numpylinspace.py Output
1 import numpy as np [0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
2 c = np.linspace(0,1,11) 0.9 1. ]
3 print(c)
• Note: in arange function we have given start, stop & step, whereas in lispace
function we are giving start,stop & number of elements we want.
Array Shape in NumPy
• We can grab the shape of ndarray using its shape property.
numpyarange.py Output
1 import numpy as np (3,3)
2 b = np.zeros((3,3))
3 print(b.shape)
numpyrand.py Output
1 import numpy as np 0.23937253208490505
2 r1 = np.random.rand()
3 print(r1) [[0.58924723 0.09677878]
4 r2 = np.random.rand(3,2) # no tuple [0.97945337 0.76537675]
5 print(r2) [0.73097381 0.51277276]]
• radint(low,high,num) function will create one-dimensional array with num random integer data between low and
high.
numpyrandint.py Output
1 import numpy as np [78 78 17 98 19 26 81 67 23 24]
2 r3 = np.random.randint(1,100,10)
3 print(r3)
• We can reshape the array in any shape using reshape method, which we learned in
previous slide.
NumPy Random (Cont.)
• randn(p1,p2….,pn) function will create n-dimensional array with random data using standard normal distribution, if
we do not specify any parameter it will return random float number.
numpyrandn.py Output
1 import numpy as np -0.15359861758111037
2 r1 = np.random.randn()
3 print(r1) [[ 0.40967905 -0.21974532]
4 r2 = np.random.randn(3,2) # no tuple [-0.90341482 -0.69779498]
5 print(r2) [ 0.99444948 -1.45308348]]
• Note: rand function will generate random number using uniform distribution,
whereas randn function will generate random number using standard normal
distribution.
• We are going to learn the difference using visualization technique (as a data
scientist, We have to use visualization techniques to convince the audience)
Visualizing the difference between rand & randn
• We are going to use matplotlib library to visualize the difference
matplotdemo.py
1 import numpy as np
2 from matplotlib import pyplot as plt
3 %matplotlib inline
4 samplesize = 100000
5 uniform = np.random.rand(samplesize)
6 normal = np.random.randn(samplesize)
7 plt.hist(uniform,bins=100)
8 plt.title('rand: uniform')
9 plt.show()
10 plt.hist(normal,bins=100)
11 plt.title('randn: normal')
12 plt.show()
Aggregations
• min() function will return the minimum value from the ndarray, there are two ways in which we
can use min function, example of both ways are given below.
numpymin.py Output
1 import numpy as np Min way1 = 1
2 l = [1,5,3,8,2,3,6,7,5,2,9,11,2,5,3,4,8,9,3,1,9,3] Min way2 = 1
3 a = np.array(l)
4 print('Min way1 = ',a.min())
5 print('Min way2 = ',np.min(a))
• max() function will return the maximum value from the ndarray, there are two ways in which we can use max
function, example of both ways are given below.
numpymax.py Output
1 import numpy as np Max way1 = 11
2 l = [1,5,3,8,2,3,6,7,5,2,9,11,2,5,3,4,8,9,3,1,9,3] Max way2 = 11
3 a = np.array(l)
4 print('Max way1 = ',a.max())
5 print('Max way2 = ',np.max(a))
Aggregations (Cont.)
• NumPy support many aggregation functions such as min, max, argmin, argmax,
sum, mean, std, etc…
numpymin.py Output
1 l = [7,5,3,1,8,2,3,6,11,5,2,9,10,2,5,3,7,8,9,3,1,9,3]
2 a = np.array(l)
3 print('Min = ',a.min()) Min = 1
4 print('ArgMin = ',a.argmin()) ArgMin = 3
5 print('Max = ',a.max()) Max = 11
6 print('ArgMax = ',a.argmax()) ArgMax = 8
7 print('Sum = ',a.sum()) Sum = 122
8 print('Mean = ',a.mean()) Mean = 5.304347826086956
9 print('Std = ',a.std()) Std = 3.042235771223635
Using axis argument with aggregate functions
• When we apply aggregate functions with multidimensional ndarray, it will apply aggregate function to all its
dimensions (axis).
numpyaxis.py Output
1 import numpy as np sum = 45
2 array2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
3 print('sum = ',array2d.sum())
• If we want to get sum of rows or cols we can use axis argument with the aggregate functions.
numpyaxis.py Output
1 import numpy as np sum (cols) = [12 15 18]
2 array2d = np.array([[1,2,3],[4,5,6],[7,8,9]]) sum (rows) = [6 15 24]
3 print('sum (cols)= ',array2d.sum(axis=0)) #Vertical
4 print('sum (rows)= ',array2d.sum(axis=1)) #Horizontal
Single V/S Double bracket notations
• There are two ways in which you can access element of multi-dimensional array,
example of both the method is given below
numpybrackets.py Output
1 arr = double = h
2 np.array([['a','b','c'],['d','e','f'],['g','h','i']]) single = h
3 print('double = ',arr[2][1]) # double bracket notaion
4 print('single = ',arr[2,1]) # single bracket notation
• Both method is valid and provides exactly the same answer, but single bracket
notation is recommended as in double bracket notation it will create a temporary
sub array of third row and then fetch the second column from it.
• Single bracket notation will be easy to read and write while programming.
Slicing ndarray
• Slicing in python means taking elements from one given index to another given index.
• Similar to Python List, we can use same syntax array[start:end:step] to slice ndarray.
• Default start is 0
• Default end is length of the array
• Default step is 1
numpyslice1d.py Output
1 import numpy as np ['c' 'd' 'e']
2 arr = ['a' 'b' 'c' 'd' 'e']
np.array(['a','b','c','d','e','f','g','h']) ['f' 'g' 'h']
3 print(arr[2:5]) ['c' 'e' 'g']
4 print(arr[:5]) ['h' 'g' 'f' 'e' 'd' 'c'
5 print(arr[5:]) 'b' 'a']
6 print(arr[2:7:2])
7 print(arr[::-1])
Slicing multi-dimensional array
• Slicing multi-dimensional array would be same as single dimensional array with
the help of single bracket notation we learn earlier, lets see an example.
numpyslice1d.py Output
1 arr = [['a' 'b']
2 np.array([['a','b','c'],['d','e','f'],['g','h', ['d' 'e']]
'i']]) [['g' 'h' 'i']
3 print(arr[0:2 , 0:2]) #first two rows and cols ['d' 'e' 'f']
4 print(arr[::-1]) #reversed rows ['a' 'b' 'c']]
5 print(arr[: , ::-1]) #reversed cols [['c' 'b' 'a']
6 print(arr[::-1,::-1]) #complete reverse ['f' 'e' 'd']
['i' 'h' 'g']]
[['i' 'h' 'g']
['f' 'e' 'd']
['c' 'b' 'a']]
Warning : Array Slicing is mutable !
• When we slice an array and apply some operation on them, it will also make changes in original array, as it will not
create a copy of a array while slicing.
• Example,
numpyslice1d.py Output
1 import numpy as np Original Array = [2 2 2 4 5]
2 arr = np.array([1,2,3,4,5]) Sliced Array = [2 2 2]
3 arrsliced = arr[0:3]
4
5 arrsliced[:] = 2 # Broadcasting
6
7 print('Original Array = ', arr)
8 print('Sliced Array = ',arrsliced)
NumPy Arithmetic Operations
numpyop.py Output
1 import numpy as np Addition Scalar = [[3 4 5]
2 arr1 = np.array([[1,2,3],[1,2,3],[1,2,3]]) [3 4 5]
3 arr2 = np.array([[4,5,6],[4,5,6],[4,5,6]]) [3 4 5]]
Addition Matrix = [[5 7 9]
4
[5 7 9]
5 arradd1 = arr1 + 2 # addition of matrix with scalar [5 7 9]]
6 arradd2 = arr1 + arr2 # addition of two matrices Substraction Scalar = [[-1 0 1]
7 print('Addition Scalar = ', arradd1) [-1 0 1]
8 print('Addition Matrix = ', arradd2) [-1 0 1]]
9 Substraction Matrix = [[-3 -3 -3]
10 arrsub1 = arr1 - 2 # substraction of matrix with [-3 -3 -3]
scalar [-3 -3 -3]]
Division Scalar = [[0.5 1. 1.5]
11 arrsub2 = arr1 - arr2 # substraction of two matrices
[0.5 1. 1.5]
12 print('Substraction Scalar = ', arrsub1) [0.5 1. 1.5]]
13 print('Substraction Matrix = ', arrsub2) Division Matrix = [[0.25 0.4 0.5
14 arrdiv1 = arr1 / 2 # substraction of matrix with ]
scalar [0.25 0.4 0.5 ]
15 arrdiv2 = arr1 / arr2 # substraction of two matrices [0.25 0.4 0.5 ]]
16 print('Division Scalar = ', arrdiv1)
17 print('Division Matrix = ', arrdiv2)
NumPy Arithmetic Operations (Cont.)
numpyop.py Output
1 import numpy as np Multiply Scalar = [[2 4 6]
2 arrmul1 = arr1 * 2 # multiply matrix with scalar [2 4 6]
3 arrmul2 = arr1 * arr2 # multiply two matrices [2 4 6]]
Multiply Matrix = [[ 4 10 18]
4 print('Multiply Scalar = ', arrmul1)
[ 4 10 18]
5 #Note : its not metrix multiplication* [ 4 10 18]]
6 print('Multiply Matrix = ', arrmul2) Matrix Multiplication = [[24 30
7 # In order to do matrix multiplication 36]
8 arrmatmul = np.matmul(arr1,arr2) [24 30 36]
9 print('Matrix Multiplication = ',arrmatmul) [24 30 36]]
10 # OR Dot = [[24 30 36]
arrdot = arr1.dot(arr2) [24 30 36]
[24 30 36]]
11 print('Dot = ',arrdot)
Python 3.5+ support = [[24 30 36]
12 # OR [24 30 36]
13 arrpy3dot5plus = arr1 @ arr2 [24 30 36]]
14 print('Python 3.5+ support = ',arrpy3dot5plus)
Sorting Array
• The sort() function returns a sorted copy of the input array.
syntax Parameters
import numpy as np arr = array to sort (inplace)
# arr = our ndarray axis = axis to sort (default=0)
np.sort(arr,axis,kind,order) kind = kind of algo to use
# OR arr.sort() (‘quicksort’ <- default,
‘mergesort’, ‘heapsort’)
order = on which field we want
• Example : to sort (if multiple fields)
numpysort.py Output
1 import numpy as np Before Sorting = ['Swaminarayan'
2 arr = 'Institute' 'of' 'Technology‘]
np.array([‘Swaminarayan',‘Institute',‘of',‘ After Sorting = ['Institute' 'Swaminarayan'
Technology']) 'Technology' 'of']
3 print("Before Sorting = ", arr)
4 arr.sort() # or np.sort(arr)
5 print("After Sorting = ",arr)
Sort Array Example
numpysort2.py Output
1 import numpy as np [(b'ABC', 300) (b'PQR', 200) (b'XYZ', 100)]
2 dt=np.dtype([('name', 'S10'),('age', int)])
3 arr2=np.array([('PQR',200),('ABC',300),('XYZ'
,100)],dtype=dt)
4 arr2.sort(order='name')
5 print(arr2)
Conditional Selection
• Similar to arithmetic operations when we apply any comparison operator to Numpy Array, then
it will be applied to each element in the array and a new bool Numpy Array will be created with
values True or False.
numpycond1.py Output
1 import numpy as np [25 17 24 15 17 97 42 10 67
2 arr = np.random.randint(1,100,10) 22]
3 print(arr) [False False False False
4 boolArr = arr > 50 False True False False True
5 print(boolArr) False]
numpycond2.py Output
1 import numpy as np All = [31 94 25 70 23 9 11
2 arr = np.random.randint(1,100,10) 77 48 11]
3 print("All = ",arr) Filtered = [94 70 77]
4 boolArr = arr > 50
5 print("Filtered = ", arr[boolArr])