Durgasoft - Python For Data Science Running Notes
Durgasoft - Python For Data Science Running Notes
------------------------------------------------------------
Python for Devops--->Regular Core Python Knowledge
Concise code
Rich Libraries --->70 to 90% our libraries 10% we have to write the code
Data Science
Devops vs DataScience
Programming background--->DataScience
Admin related background, non-programming background--->devops
Numpy
Pandas
Matplotlib
seaborn
scikit learn
scipy
etc
a = 10
1
b= 20
a+b
a-b
a*b
a/b
math.sqrt(10)
list of 100
identity matrix???
3. Data Analysis
2 crore samples are analyzed..
2
100 points--->
new patient-->
History of Numpy:
-----------------
Origin of Numpy --->Numeric Library
Numeric Library--->Jim Hugunin
Numpy--->Travis Oliphant and other contributors 2005
Open Source Library and Freeware
1 dimensional arrays--->Vector
2 dimensional arrays--->Matrix
..
n dimensional arrays
3
Application areas of Numpy?
Arithmetic operators
Broadcasting
Array Manipulation functions
reshape()
resize()
flatten()
ravel()
transpose()
etc...
Matrix class
etc
Running Notes
Material
Videos for 6 months access
Rs 999
4
Bhavani:
durgasoftonlinetraining@gmail.com
durgasoftonline@gmail.com
99 2737 2737, 80 96969696
If you are facing any audio or video problem even in future please logout and
login again
https://www.youtube.com/watch?v=-ffFPJlq7JA
Performance Test:
----------------
Numpy vs Normal Python :
-----------------------
import numpy as np
from datetime import datetime
a = np.array([10,20,30])
b = np.array([1,2,3])
before = datetime.now()
for i in range(1000000):
dot_product(a,b)
after = datetime.now()
Array:
-----
An indexed collection of homogeneous data elements.
C/C++/Java
6
2 ways
D:\durgaclasses>py test.py
<class 'array.array'>
array('i', [10, 20, 30])
Elements one by one:
10
20
30
Note: array module is not recommended because much library support is not
available.
2. numpy module:
-----------------
import numpy
a = numpy.array([10,20,30])
print(type(a)) #<class 'numpy.ndarray'>
print(a)
7
print('Elements one by one:')
for x in a:
print(x)
D:\durgaclasses>py test.py
<class 'numpy.ndarray'>
[10 20 30]
Elements one by one:
10
20
30
1. Similarities:
----------------
1. Both can be used to store data
2. The order will be preserved in both. Hence indexing and slicing concepts are
applicable.
3. Both are mutable, ie we can change the content.
2. Differences:
---------------
1. list is python's inbuilt type. we have to install and import numpy explicitly.
8
4. Arrays consume less memory than list.
import numpy as np
import sys
l=
[10,20,30,40,50,60,70,80,90,100,10,20,30,40,50,60,70,80,90,100,10,20,30,40,50,
60,70,80,90,100]
a=
np.array([10,20,30,40,50,60,70,80,90,100,10,20,30,40,50,60,70,80,90,100,10,20,
30,40,50,60,70,80,90,100])
print('The Size of list:',sys.getsizeof(l))
print('The Size of ndarray:',sys.getsizeof(a))
https://drive.google.com/drive/folders/1asCu9DPBttM3wI44uFrWh3qXwlgzZBP
n?usp=sharing
https://www.youtube.com/watch?v=-ffFPJlq7JA
https://www.youtube.com/watch?v=sFMY8TGBFto
1-D array:
-----------
>>> l = [10,20,30]
10
>>> type(l)
<class 'list'>
>>> a = np.array(l)
>>> type(a)
<class 'numpy.ndarray'>
>>> a
array([10, 20, 30])
Note:
a.ndim--->To know dimension of ndarray
a.dtype--->To know data type of elements
>>> a = np.array([[10,20,30],[40,50,60],[70,80,90]])
>>> type(a)
<class 'numpy.ndarray'>
>>> a.ndim
2
>>> a
array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90]])
>>> a.ndim
2
>>> a.shape
(3, 3)
>>> a.size
9
11
eg-3: 1-D array from the tuple:
-------------------------------
>>> a = np.array(('durga','ravi','shiva'))
>>> a
array(['durga', 'ravi', 'shiva'], dtype='<U5')
>>> type(a)
<class 'numpy.ndarray'>
>>> a.ndim
1
>>> a.shape
(3,)
>>> a.size
3
>>> a = np.array([10,20,10.5])
>>> a
array([10. , 20. , 10.5])
>>> a.dtype
dtype('float64')
>>> a=np.array([10,20,'a'])
>>> a
array(['10', '20', 'a'], dtype='<U11')
>>> a = np.array([10,20,30.5])
>>> a
array([10. , 20. , 30.5])
>>> a = np.array([10,20,30.5],dtype=int)
>>> a
array([10, 20, 30])
>>> a = np.array([10,20,30.5],dtype=float)
>>> a
array([10. , 20. , 30.5])
>>> a = np.array([10,20,30.5],dtype=bool)
>>> a
array([ True, True, True])
>>> a = np.array([10,20,30.5,0],dtype=bool)
>>> a
array([ True, True, True, False])
>>> a = np.array([10,20,30.5,0],dtype=complex)
>>> a
array([10. +0.j, 20. +0.j, 30.5+0.j, 0. +0.j])
>>> a = np.array([10,20,30.5],dtype=str)
>>> a
array(['10', '20', '30.5'], dtype='<U4')
>>> a = np.array([10,'durga'],dtype=int)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'durga'
object
13
int,float,str,bool,complex
>>> a = np.array([10,'durga',10.5,True,10+20j],dtype=object)
>>> a
array([10, 'durga', 10.5, True, (10+20j)], dtype=object)
>>> a = np.array([10,'durga',10.5,True,10+20j])
>>> a
array(['10', 'durga', '10.5', 'True', '(10+20j)'], dtype='<U64')
>>>
3. range(begin,end,step)
range(1,11,1)--->1,2,3,4,5,6,7,8,9,10
range(1,11,2)--->1,3,5,7,9
range(1,11,3)--->1,4,7,10
14
eg-1:
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a.ndim
1
>>> a.shape
(10,)
>>> a.dtype
dtype('int32')
eg-2:
>>> a = np.arange(1,11)
>>> a
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
eg-3:
>>> a = np.arange(1,11,2)
>>> a
array([1, 3, 5, 7, 9])
>>> a = np.arange(1,11,3,dtype=float)
>>> a
array([ 1., 4., 7., 10.])
so no 2d array by arange()?
linspace():
----------
15
in the specified interval , linearly spaced values
np.linspace(0,1)
arange() vs linspace()
-----------------------
arange()--->elements will be considered in the given range based on step value.
linspace() --->The specified number of values will be considered in the given
range.
zeros():
--------
(2,3,4)--->3-D array
3-D array contains a collection of 2-D arrays
(10,)
(5,2)
(4,3,2)
(2,3,4)
array([[[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]],
[[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]]])
1 2-D array
3 rows and 2 columns
(2,2,3,4)
4-D array means a group of 3-D arrays
2 3-D arrays are there
[
[
[[0,0,0,0],
[0,0,0,0],
[0,0,0,0]],
[[0,0,0,0],
[0,0,0,0],
[0,0,0,0]]
]
[
[[0,0,0,0],
[0,0,0,0],
[0,0,0,0]],
[[0,0,0,0],
[0,0,0,0],
[0,0,0,0]]
]
]
>>> np.zeros((2,2,3,4),dtype=int)
array([[[[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]],
[[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]]],
18
[[[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]],
[[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]]]])
1. ones():
----------
Exactly same as zeros except that instead of zero array filled with value 1.
fill_value is 1
>>> np.ones(10)
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
>>> np.ones((5,2),dtype=int)
array([[1, 1],
[1, 1],
[1, 1],
[1, 1],
[1, 1]])
>>> np.ones((2,3,4),dtype=int)
array([[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]],
19
[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]]])
1. array()
2. arange()
3. linspace()
4. zeros()
5. ones()
6. full()
>>> np.full(10,2)
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
>>> np.full(10,4)
array([4, 4, 4, 4, 4, 4, 4, 4, 4, 4])
>>> np.full((5,4),7)
array([[7, 7, 7, 7],
[7, 7, 7, 7],
[7, 7, 7, 7],
[7, 7, 7, 7],
[7, 7, 7, 7]])
>>> np.full((2,3,4),9)
array([[[9, 9, 9, 9],
[9, 9, 9, 9],
[9, 9, 9, 9]],
[[9, 9, 9, 9],
[9, 9, 9, 9],
[9, 9, 9, 9]]])
20
eye()
identity()
https://www.youtube.com/playlist?list=PLd3UqWTnYXOmq7EMCWvIcyPHyoOj
8JClb
https://drive.google.com/drive/folders/1asCu9DPBttM3wI44uFrWh3qXwlgzZBP
n?usp=sharing
np.full(shape=(2,3,4),fill_value=7)
np.full((2,3,4),fill_value=7)
np.full((2,3,4),7)
zeros()
ones()
full()
eye():
------
To generate identity matrix
f(a,b)
21
f(a,/,b)--->We should pass values as positional arguments only
f(10,20)
f(10,b=20)
f(a=10,b=20)--->invalid
/--->before variables
f(a,*,b)
f(*,a,b)
*-->for The variables after *, we should provide values by keyword arguemnts
only/
N--->Number of rows
M--->Number of columns
>>> np.eye(2,3)
array([[1., 0., 0.],
[0., 1., 0.]])
>>> np.eye(3,2)
array([[1., 0.],
[0., 1.],
[0., 0.]])
>>> np.eye(3)
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
>>> np.eye(3,dtype=int)
array([[1, 0, 0],
22
[0, 1, 0],
[0, 0, 1]])
>>> np.eye(5)
array([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])
>>> np.eye(5,k=1)
array([[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.],
[0., 0., 0., 0., 0.]])
>>> np.eye(5,k=-3)
array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.]])
Ans: D
23
A. It can be any dimensional array.
B. The number of rows and number of columns must be same.
C. only main diagonal contains 1s.
D. None of these.
Ans: D
identity() function:
--------------------
It is exactly same as eye() function except that
1. It is always square matrix(The number of rows and number of columns
always same)
2. only main diagonal contains 1s
identity(N,
sir can we get any value other than 1 in eye and identity???
No
24
C. Bydefault main diagonal contains 1s. But we can customize the diagonal
which has to contain 1s.
D. All of these.
Ans: A
Ans: B,C
array()
arange()
linspace()
zeros()
ones()
full()
eye()
identity()
empty():
--------
empty(shape, dtype=float, order='C', *, like=None)
Return a new array of given shape and type, without initializing entries.
np.empty((3,3))
zeros() vs empty():
-------------------
25
If we required an array only with zeros then we should go for zeros().
If we never worry about data, just we required an empty array for future
purpose, then we should go for empty().
The time required to create emtpy array is very very less when compared with
zeros array. i.e performance wise empty() function is recommended than zeros()
if we are not worry about data.
eg:
import numpy as np
from datetime import datetime
import sys
begin = datetime.now()
a = np.zeros((25000,300,400))
after = datetime.now()
print('Time taken by zeros:',after-begin)
a= None
begin = datetime.now()
a = np.empty((25000,300,400))
after = datetime.now()
print('Time taken by empty:',after-begin)
D:\durgaclasses>py test.py
Time taken by zeros: 0:00:00.430188
Time taken by empty: 0:00:00.056541
1. array()
2. arange()
26
3. linspace()
4. zeros()
5. ones()
6. full()
7. eye()
8. identity()
Numpy: ndarray
Scipy
Pandas: Series and DataFrame
Matplotlib--->Seaborn-->plotly
1. randint()
2. rand()
3. uniform()
4. randn()
5. normal()
6. shuffle()
etc
1. randint():
-------------
To generate random int values in the given range
27
Return random integers from `low` (inclusive) to `high` (exclusive).
[low,high)
eg-1:
np.random.randint(10,20)
it will generate a single random int value in the range 10 to 19.
np.random.randint(1,9,size=10)
>>> np.random.randint(1,9,size=10)
array([6, 7, 6, 3, 5, 3, 3, 5, 4, 2])
>>> np.random.randint(1,9,size=10)
array([5, 3, 2, 3, 4, 7, 7, 1, 1, 6])
>>> np.random.randint(1,9,size=10)
array([8, 1, 4, 6, 6, 5, 4, 1, 4, 8])
28
>>> np.random.randint(100,size=(2,3,4))
array([[[ 1, 50, 90, 47],
[83, 97, 44, 85],
[60, 16, 15, 35]],
int8
int16
int32
int64
np.random.randint(1,11,size=(20,30))
Diagram
2. rand():
----------
It will generates random float values in the range [0,1) from uniform
distribution samples.
>>> np.random.rand()
0.8120549440326994
>>> np.random.rand()
0.38273484920797385
>>> np.random.rand()
0.6674923918787643
30
>>> np.random.rand()
0.6895876701908819
>>> np.random.rand()
0.6462006096485643
>>> np.random.rand()
0.9098768943342903
>>> np.random.rand()
0.9362490328984621
>>> np.random.rand()
0.10852373644592084
1-D array:
----------
np.random.rand(10)
2-D array:
>>> np.random.rand(3,5)
array([[0.33078624, 0.3070355 , 0.19368932, 0.22608363, 0.13782822],
[0.78162618, 0.4927585 , 0.39567571, 0.15164908, 0.49992492],
[0.54410989, 0.67688525, 0.06385654, 0.87085947, 0.00411324]])
Sir we will get float value in range from 0 to 1?? we can't pass range???
uniform():
---------
rand()--->range is always [0,1)
uniform() --->customize range
uniform(low=0.0,high=1.0,size=None)
np.random.uniform()
31
>>> np.random.uniform(10,20)
12.132400676377454
[[16.2757348 , 14.17440408],
[17.14606001, 13.50384673],
[10.01452353, 15.15903525]]])
randint()
rand()
uniform(low,high)
s = np.random.uniform(20,30,size=1000000)
import matplotlib.pyplot as plt
count, bins, ignored = plt.hist(s, 15, density=True)
plt.plot(bins, np.ones_like(bins), linewidth=2, color='r')
32
plt.show()
4. randn():
-----------
values from normal distribution with mean 0 and varience is 1
>>> np.random.randn(10)
array([-2.01508925, -0.28026307, -0.1646846 , -0.48833416, -0.93808559,
1.14070496, 1.29201805, -1.35400766, 0.81779975, -0.13334964])
>>> np.random.randn(2,3)
array([[-1.26242305, -1.41742932, -0.76201615],
[ 0.29272704, 1.14245971, 0.79194175]])
>>> np.random.randn(2,3,4)
array([[[-0.13889006, 0.35716343, -1.39591255, 0.39167841],
[ 0.88693158, 1.03613745, 1.06677121, 0.57198922],
[-0.28978554, -1.08459609, 1.67696806, -0.70562164]],
randint()
33
normal() function:
------------------
We can customize mean and varience .
>>> np.random.normal(10,4,size=10)
array([10.84790681, 9.61363893, 8.84843827, 9.49880292, 5.75684037,
10.35347207, 10.55850404, 13.75850698, 5.78664002, 10.21131944])
>>> np.random.normal(10,4,size=(2,3,4))
array([[[11.91237354, 9.28093298, 9.13238368, 16.32420395],
[ 9.92394143, 11.60553826, 8.93651744, 12.34608286],
[ 9.73972687, 9.90505171, 13.78076301, 12.88354459]],
s = np.random.normal(10,4,1000000)
import matplotlib.pyplot as plt
count, bins, ignored = plt.hist(s, 15, density=True)
plt.plot(bins, np.ones_like(bins), linewidth=2, color='r')
34
plt.show()
randint()
rand()
uniform()
randn()
normal()
An array with random float values in the range[0,1) from uniform distribution---
>np.random.rand()
An array with random float values in the specified range from uniform
distribution--->np.random.uniform()
An array with random float values with mean 0 and standard deviation 1 from
normal distribution--->np.random.randn()
An array with random float values with specified mean and standard deviation
from normal distribution--->np.random.normal()
35
Uniform distribution vs Normal distribution:
--------------------------------------------
Normal Distribution is a probability distribution where probability of x is highest
at centre and lowest in the ends whereas in Uniform Distribution probability of
x is constant. ... Uniform Distribution is a probability distribution where
probability of x is constant.
Diagram
randint()
rand()
uniform()
randn()
normal()
shuffle():
---------
shuffle(x)
a = np.arange(9)
a = np.randint(1,101,size=(6,5))
>>> a = np.random.randint(1,101,size=(6,5))
>>> a
array([[20, 87, 85, 18, 64],
36
[77, 31, 23, 80, 9],
[42, 86, 17, 46, 7],
[65, 89, 99, 26, 27],
[94, 55, 61, 78, 7],
[82, 26, 20, 16, 95]])
>>> np.random.shuffle(a)
>>> a
array([[77, 31, 23, 80, 9],
[65, 89, 99, 26, 27],
[82, 26, 20, 16, 95],
[94, 55, 61, 78, 7],
[42, 86, 17, 46, 7],
[20, 87, 85, 18, 64]])
axis-0
Diagram
If we apply shuffle for 3-D array, then the order of 2-D arrays will be changed
but not its internal content.
>>> a = np.arange(48).reshape(4,3,4)
37
>>> a
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
39
Array Attributes:
------------------
1. a.ndim--->returns the dimension of the array
2. a.shape-->Return shape of the array(10,) (2,3,4)
3. size---->To get total number of elements
4. dtype --->To get data type of array elements
5. itemsize--->4 Bytes
[10,20,30,40]
>>> a.shape
(4,)
>>> a = np.array([10,20,30,40])
>>> a.ndim
1
>>> a.shape
(4,)
>>> a.dtype
dtype('int32')
>>> a.size
4
>>> a.itemsize
4
>>> a = np.array([[10,20,30],[40,50,60],[70,80,90]],dtype='float')
>>> a
array([[10., 20., 30.],
[40., 50., 60.],
[70., 80., 90.]])
>>> a.ndim
40
2
>>> a.shape
(3, 3)
>>> a.size
9
>>> a.dtype
dtype('float64')
>>> a.itemsize
8
i--->integer(int8,int16,int32,int64)
b--->boolean
u--->unsigned integer(uint8,uint16,uint32,uint64)
f--->float(float16,float32,float64)
c--->complex(complex64,complex128)
s-->String
U--->Unicode String
M-->datetime
etc
int8:
-----
The value will be represented by 8 bits.
MSB is reserved for sign.
The range: -128 to 127
41
int16:
-----
The value will be represented by 16 bits.
MSB is reserved for sign.
The range: -32768 to 32767
int32:
------
The value will be represented by 32 bits.
MSB is reserved for sign.
The range: -2147483648 to 2147483647
int64:
-----
The value will be represented by 64 bits.
MSB is reserved for sign.
The range: -9223372036854775808 to 9223372036854775807
>>> a = np.array([10,20,30,40])
>>> a
array([10, 20, 30, 40])
>>> import sys
>>> sys.getsizeof(a)
120
>>> a = np.array([10,20,30,40],dtype=int8)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'int8' is not defined
>>> a = np.array([10,20,30,40],dtype='int8')
42
>>> sys.getsizeof(a)
108
arrayobj.dtype
>>> a
array([10, 20, 30, 40], dtype=int8)
>>> a.dtype
dtype('int8')
a = np.array([10,20,30,40],dtype='int8')
a = np.array([10,20,30,40],dtype='int16')
a = np.array([10,20,30,40],dtype='float32')
a = np.array(['a',10,10.5],dtype=int)
>>> a = np.array(['a',10,10.5],dtype=int)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'a'
>>> a = np.array([10,20,30,40])
>>> a
array([10, 20, 30, 40])
>>> b = a.astype('float64')
>>> b
array([10., 20., 30., 40.])
>>> a.dtype
dtype('int32')
>>> b.dtype
dtype('float64')
2nd way:
-------
float64() function
int()
float()
str()
bool()
>>> a = np.array([10,20,30,40])
>>> a
array([10, 20, 30, 40])
>>> a
array([10, 20, 30, 40])
>>> a.dtype
dtype('int32')
>>> f = np.float64(a)
>>> f
array([10., 20., 30., 40.])
44
>>> f.dtype
dtype('float64')
>>> a = np.array([10,0,20,0,30])
>>> a
array([10, 0, 20, 0, 30])
>>> a.dtype
dtype('int32')
>>> x = np.bool(a)
<stdin>:1: DeprecationWarning: `np.bool` is a deprecated alias for the builtin
`bool`. To silence this warning, use `bool` by itself. Doing this will not modify
any behavior and is safe. If you specifically wanted the numpy scalar type, use
`np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()
>>> x = np.bool_(a)
>>> x
array([ True, False, True, False, True])
Numpy Introduction
Creation of Numpy Arrays
Array Attributes
Data Types
How to get/access elements of Numpy Array:
------------------------------------------
45
1. Indexing--->only one element
2. Slicing--->group of elements
3. Advanded Indexing
1. Indexing:
------------
By using index, we can get/access single element of the array.
Zero Based indexing. ie the index of first element is 0
supports both +ve and -ve indexing.
a = np.array([10,20,30,40,50])
>>> a = np.array([10,20,30,40,50])
>>> a
array([10, 20, 30, 40, 50])
>>> a[0]
10
>>> a[1]
20
>>> a[-1]
50
>>> a[10]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 10 is out of bounds for axis 0 with size 5
a = np.array([[10,20,30],[40,50,60]])
>>> a = np.array([[10,20,30],[40,50,60]])
>>> a
array([[10, 20, 30],
[40, 50, 60]])
To access 50:
-------------
a[1][1]
a[-1][-2]
a[1][-2]
a[-1][1]
To access 30:
-------------
a[0][2]
a[-2][-1]
a[0][-1]
a[-2][2]
(2,3,4)
47
a[i][j][k]
i--->represents which 2-D array(index of 2-D array) | can be either +ve or -ve
j--->represents row index in that 2-D array | can be either +ve or -ve
k--->represents column index in that 2-D array | can be either +ve or -ve
a[0][1][2]
l = [[[1,2,3],[4,5,6],[7,8,9]],[[10,11,12],[13,14,15],[16,17,18]]]
a = np.array(l)
Shape: (2,3,3)
a[i][j][k]
i--->represents which 2-D array(index of 2-D array) | can be either +ve or -ve
j--->represents row index in that 2-D array | can be either +ve or -ve
k--->represents column index in that 2-D array | can be either +ve or -ve
48
Accessing elements of 4-D array:
--------------------------------
(2,3,4,5)
(i,j,k,l)
(2,3,4,5)
a[i][j][k][l]
a = np.arange(1,121).reshape(2,3,4,5)
Bhavani
durgasoftonline@gmail.com
durgasoftonlinetraining@gmail.com
+91-9927372737
+91-8885252627
49
Accessing elements of ndarray By using slice operator:
------------------------------------------------------
Python's slice operator:
------------------------
Slice -->small piece|part
l = [10,20,30,40,50,60,70]
syntax-1:
l[begin:end]
It returns elements from begin index to end-1 index
l[:]--->complete list
>>> l
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
>>> l[2:10000]
[30, 40, 50, 60, 70, 80, 90, 100]
>>> l[10000:20000]
[]
50
Syntax-2:
l[begin:end:step]
Default value for step is: 1
>>> l = [10,20,30,40,50,60,70,80,90,100]
>>> l[2:8]
[30, 40, 50, 60, 70, 80]
>>> l[2:8:1]
[30, 40, 50, 60, 70, 80]
>>> l[2:8:2]
[30, 50, 70]
>>> l[2:8:3]
[30, 60]
>>> l[::3]
[10, 40, 70, 100]
Note:
1. For begin,end and step we can take both positive and negative values.
But for step we cannot take zero
>>> l[2:7:0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: slice step cannot be zero
If step value is +ve then we have to consider elements from begin to end-1 in
forward direction.
51
If step value is -ve then we have to consider elements from begin to end+1 in
backward direction.
>>> l[::-1]
[100, 90, 80, 70, 60, 50, 40, 30, 20, 10]
>>> l[::-2]
[100, 80, 60, 40, 20]
>>> a = np.arange(10,101,10)
>>> a
array([ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
>>> a[2:5]
array([30, 40, 50])
>>> a[::1]
array([ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
>>> a[::-1]
array([100, 90, 80, 70, 60, 50, 40, 30, 20, 10])
52
>>> a[::-2]
array([100, 80, 60, 40, 20])
a = np.array([[10,20],[30,40],[50,60]])
>>> a[0:1,:]
array([[10, 20]])
a[::2,:]
a[0:2,1:2]
a[:2,1:]
>>> a = np.array([[10,20],[30,40],[50,60]])
>>> a
array([[10, 20],
[30, 40],
[50, 60]])
53
>>> a[0:1,:]
array([[10, 20]])
>>> a[0,:]
array([10, 20])
>>> a[::2,:]
array([[10, 20],
[50, 60]])
>>> a[0:2,1:2]
array([[20],
[40]])
>>> a[:2,1:]
array([[20],
[40]])
eg-2:
a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
>>> a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
>>> a
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16]])
>>> a[0:2,:]
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
>>> a[:2,:]
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
>>> a[::3,:]
array([[ 1, 2, 3, 4],
[13, 14, 15, 16]])
54
>>> a[:,0:2]
array([[ 1, 2],
[ 5, 6],
[ 9, 10],
[13, 14]])
>>> a[:,::2]
array([[ 1, 3],
[ 5, 7],
[ 9, 11],
[13, 15]])
>>> a[1:3,1:3]
array([[ 6, 7],
[10, 11]])
>>> a[::3,::3]
array([[ 1, 4],
[13, 16]])
2 --->number of 2D arrays
3--->The number of rows
4--->The number of columns
l = [[[1,2,3,4],[5,6,7,8],[9,10,11,12]],[[13,14,15,16],[17,18,19,20],[21,22,23,24]]]
a = np.array(l)
a[i,j,k]
a[begin:end:step,begin:end:step,begin:end:step]
55
a[:,:,0:1]
a[:,:,:1]
a[:,0:1,:]
a[:,:1,:]
a[:,::2,:]
begin:end:step
::2
a[:,0:2,1:3]
a[:,::2,::3]
Advanced Indexing:
------------------
By using index, we can access only one element at a time.
a[i], a[i][j], a[i][j][k]
56
By using slice operator we can access multiple elements at a time, but all
elements should be in order.
a[begin:end:step]
a[begin:end:step,begin:end:step]
a[begin:end:step,begin:end:step,begin:end:step]
a = np.arange(10,101,10)
[2,4,5,8]
1st way:
-------
create ndarray with required indices.
indices = np.array([2,4,5,8])
>>> a
array([ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
57
>>> indices
array([2, 4, 5, 8])
>>> a[indices]
array([30, 50, 60, 90])
2nd way:
--------
l = [2,4,5,8]
a[l]
>>> l = [2,4,5,8]
>>> a[l]
array([30, 50, 60, 90])
>>> a[[0,4,6,9]]
array([ 10, 50, 70, 100])
a[[0,9,4,6]]
Syntax:
-------
a[[row_indices],[column_indices]]
a[[0,1,2,3],[0,1,2,3]]
58
It select elements from (0,0),(1,1),(2,2) and (3,3)
L-shape
>>> a[[0,1,2,3,3,3,3],[0,0,0,0,1,2,3]]
array([ 1, 5, 9, 13, 14, 15, 16])
Observations:
-------------
1. a[[0,1,2],[0,1]]
2. a[[0,1],[0,1,2]]
3. a[[0,1,2],[0]]
>>> a[[0,1,2],[0]]
array([1, 5, 9])
4. a[[0],[0,1,2]]
59
>>> a[[0],[0,1,2]] --->(0,0),(0,1),(0,2)
array([1, 2, 3])
l = [[[1,2,3,4],[5,6,7,8],[9,10,11,12]],[[13,14,15,16],[17,18,19,20],[21,22,23,24]]]
a = np.array(l)
a[i][j][k]
i represents the index of 2-D array
j represents row index
k represents column index
Syntax:
-------
a[[indices of 2d array],[row indices],[column indices ]]
a[[0,1],[1,1],[2,1]]
The selected elements will be present at: (0,1,2) and (1,1,1)
Syntax: array[boolean_array]
in the boolean array, where ever True present, the corresponding value will be
selcted.
a = np.array([10,20,30,40])
>>> a = np.array([10,20,30,40])
>>> a
array([10, 20, 30, 40])
>>> boolean_array=np.array([True,False,False,True])
>>> boolean_array
array([ True, False, False, True])
>>> a[boolean_array]
array([10, 40])
a = np.array([10,20,30,40])
Select elements which are greater than 25
[False,False,True,True]
>>> a>25
61
array([False, False, True, True])
>>> b_a = a>25
>>> a[b_a]
array([30, 40])
>>> a[a>25]
array([30, 40])
a = np.array([10,-5,20,40,-3,-1,75])
>>> a
array([10, -5, 20, 40, -3, -1, 75])
>>> a[a<0]
array([-5, -3, -1])
>>> a[a>0]
array([10, 20, 40, 75])
>>> a[a%2==0]
array([10, 20, 40])
>>> a[a%2!=0]
array([-5, -3, -1, 75])
>>> a[a%5==0]
array([10, -5, 20, 40, 75])
>>> a = np.arange(1,26).reshape(5,5)
62
>>> a
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
>>> a[a%2==0]
array([ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24])
>>> a[a%10==0]
array([10, 20])
>>> l = [10,20,30,40]
>>> l2=l[::]
>>> l2
[10, 20, 30, 40]
>>> l
[10, 20, 30, 40]
>>> l2
[10, 20, 30, 40]
>>> l2[1]=7777
>>> l2
[10, 7777, 30, 40]
>>> l
[10, 20, 30, 40]
>>> l[0]=8888
63
>>> l
[8888, 20, 30, 40]
>>> l2
[10, 7777, 30, 40]
a = np.arange(10,101,10)
>>> a = np.arange(10,101,10)
>>> a
array([ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
>>> b = a[0:4]
>>> b
array([10, 20, 30, 40])
>>> a[0]=7777
>>> a
array([7777, 20, 30, 40, 50, 60, 70, 80, 90, 100])
>>> b
array([7777, 20, 30, 40])
>>> b[1]=8888
>>> b
array([7777, 8888, 30, 40])
>>> a
array([7777, 8888, 30, 40, 50, 60, 70, 80, 90, 100])
64
Case-3: Advanced Indexing and Condition Based Selection:
--------------------------------------------------------
It will select required elements based on provided index or condition and with
those elements a new 1-D array object will be created.
>>> a = np.arange(10,101,10)
>>> a
array([ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
>>> b = a[[0,2,5]]
>>> b
array([10, 30, 60])
>>> a
array([ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
>>> a[0]=7777
>>> a
array([7777, 20, 30, 40, 50, 60, 70, 80, 90, 100])
>>> b
array([10, 30, 60])
>>> b[1]=9999
>>> b
array([ 10, 9999, 60])
>>> a
array([7777, 20, 30, 40, 50, 60, 70, 80, 90, 100])
----------------------------
Summary of Syntaxes:
--------------------
1. Basic Indexing:
------------------
65
a[i],a[i][j],a[i][j][k]
a[i],a[i,j],a[i,j,k]
2. Slicing:
-----------
a[begin:end:step] for 1-D
a[begin:end:step,begin:end:step] for 2-D
a[begin:end:step,begin:end:step,begin:end:step] for 3-D
3. Advanced Indexing:
---------------------
1.a[x] for 1-D array---> x can be ndarray or list which contains indices.
eg: a[[0,1,2]]
2. a[[row indices],[column indices]] for 2-D
3. a[[indices of 2D array],[row indices],[column indices]] for 3-D
3 ways
1. By using Python's loops concept
2. By using nditer() function
3. By using ndenumerate() function
2-D array:
---------
import numpy as np
a = np.array([[10,20,30],[40,50,60],[70,80,90]])
for x in np.nditer(a):
print(x)
3-D array:
----------
import numpy as np
a = np.array([[[10,20],[30,40]],[[50,60],[70,80]]])
for x in np.nditer(a):
print(x)
68
import numpy as np
a = np.array([[10,20,30],[40,50,60],[70,80,90]])
for x in np.nditer(a,flags=['buffered'],op_dtypes=['float']):
print(x)
print(a)
D:\durgaclasses>py test.py
10 element present at index/position:(0, 0)
20 element present at index/position:(0, 1)
30 element present at index/position:(0, 2)
40 element present at index/position:(1, 0)
50 element present at index/position:(1, 1)
60 element present at index/position:(1, 2)
70 element present at index/position:(2, 0)
80 element present at index/position:(2, 1)
90 element present at index/position:(2, 2)
Basic Indexing
Slicing
Advanced Indexing
Condition Based Selection
70
Arithmetic Operators:
---------------------
+,-,*,/,//,%,**
10+20 --->30
20-10--->10
10*20--->200
10%2-->0
10**2 --->100
10/2 --->5.0
10/3-->3.333333
10//2-->5
10//3-->3
10.0//2-->5.0
10.0//3--->3.0
15/4 --->3.75
15.0/4-->3.75
71
15//4--->3
15.0//4--->3.0
15.0//4--->3.0
>>> 10/0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero
>>> 0/0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero
>>> a = np.arange(6)
>>> a
array([0, 1, 2, 3, 4, 5])
>>> a+2
array([2, 3, 4, 5, 6, 7])
>>> a-2
array([-2, -1, 0, 1, 2, 3])
>>> a/0
<stdin>:1: RuntimeWarning: divide by zero encountered in true_divide
<stdin>:1: RuntimeWarning: invalid value encountered in true_divide
array([nan, inf, inf, inf, inf, inf])
74
>>>
l = [10,20,30,40,50,60]
l/2
>>> a
array([1, 2, 3, 4])
>>> b
array([10, 20, 30, 40])
>>> a+b
array([11, 22, 33, 44])
>>> a-b
array([ -9, -18, -27, -36])
>>> a*b
array([ 10, 40, 90, 160])
>>> b/a
array([10., 10., 10., 10.])
>>> b//a
array([10, 10, 10, 10], dtype=int32)
eg-2:
>>> a = np.array([[1,2],[3,4]])
>>> b = np.array([[5,6],[7,8]])
>>> a
array([[1, 2],
75
[3, 4]])
>>> b
array([[5, 6],
[7, 8]])
>>> a+b
array([[ 6, 8],
[10, 12]])
>>> a-b
array([[-4, -4],
[-4, -4]])
>>> a*b
array([[ 5, 12],
[21, 32]])
>>> b/a
array([[5. , 3. ],
[2.33333333, 2. ]])
eg-3:
a = np.array([10,20,30,40])
b = np.array([10,20,30,40,50])
>>> a+b
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes (4,) (5,)
a+b ===>np.add(a,b)
a-b ===>np.subtract(a,b)
a*b ===>np.multiply(a,b)
a/b ===>np.divide(a,b)
a//b ===>np.floor_divide(a,b)
a%b ===>np.mod(a,b)
76
a**b ===>np.power(a,b)
>>> a
array([10, 20, 30, 40])
>>> b = np.array([10,20,30,40])
>>> a
array([10, 20, 30, 40])
>>> b
array([10, 20, 30, 40])
>>> np.add(a,b)
array([20, 40, 60, 80])
>>> np.subtract(a,b)
array([0, 0, 0, 0])
>>> np.multiply(a,b)
array([ 100, 400, 900, 1600])
>>> np.divide(a,b)
array([1., 1., 1., 1.])
>>> np.floor_divide(a,b)
array([1, 1, 1, 1], dtype=int32)
>>> np.mod(a,b)
array([0, 0, 0, 0], dtype=int32)
Note: The functions which operates element by element on whole array, are
called universal functions(ufunc). Hence all the above functions are ufuncs.
Broadcasting:
-------------
Eventhough dimensions are different,shapes are different and sizes are
different still some arithmetic operations are allowed. This is because of
broadcasting.
eg-1:
Before:
(4,3)--->2D
(3,)---->1D
After:
(4,3)--->2D
(1,3)---->2D
eg-2:
Before:
(3,2,2)--->3D
(3,)---->1D
After:
-----
(3,2,2)--->3D
78
(1,1,3)---->3D
Rule-2:
-------
If the size of 2 arrays does not match in any dimension,the array with size equal
to 1 in that dimension is expanded/increases to match other size of the same
dimension.
In any dimension, the sizes are not matched and neither equal to 1, then we will
get error.
eg-1:
Before
(4,3)--->2D
(1,3)---->2D
After
(4,3)--->2D
(4,3)---->2D
eg-2:
Before:
(3,2,2)--->3D
(1,1,3)---->3D
After:
------
(3,2,2)--->3D
(3,2,3)---->3D
79
Note: The data will be reused from the same input array.
If the rows are required then reuse existing rows.
If columns are required then reuse existing columns.
inputs: 3-D,1-D
output: 3-D
eg-1:
a = np.array([10,20,30,40])
b = np.array([1,2,3])
a+b
(4,)
(3,)
>>> a = np.array([10,20,30,40])
>>> b = np.array([1,2,3])
>>> a+b
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes (4,) (3,)
eg-2:
----
a = np.array([10,20,30])
80
b = np.array([40])
Rule-1: satisfied
Rule-2:
(3,)
(3,)
>>> a = np.array([10,20,30])
>>> b = np.array([40])
>>> a+b
array([50, 60, 70])
eg-3:
----
a = np.array([[10,20],[30,40],[50,60]])--->2-D shape:(3,2)
b = np.array([10,20])--->1-D shape:(2,)
Rule-1:
Before:
(3,2)
(2,)
After:
(3,2)
(1,2)
Rule-2:
Before:
(3,2)
(1,2)
After:
81
(3,2)
(3,2)
>>> a = np.array([10,20,30])
>>> b = np.array([40])
>>> a+b
array([50, 60, 70])
>>> a = np.array([[10,20],[30,40],[50,60]])
>>> b = np.array([10,20])
>>> a
array([[10, 20],
[30, 40],
[50, 60]])
>>> b
array([10, 20])
>>> a
array([[10, 20],
[30, 40],
[50, 60]])
>>> b
array([10, 20])
>>> a+b
array([[20, 40],
[40, 60],
[60, 80]])
>>> a-b
array([[ 0, 0],
[20, 20],
[40, 40]])
>>> a*b
82
array([[ 100, 400],
[ 300, 800],
[ 500, 1200]])
>>> a/b
array([[1., 1.],
[3., 2.],
[5., 3.]])
>>> a//b
array([[1, 1],
[3, 2],
[5, 3]], dtype=int32)
>>> a%b
array([[0, 0],
[0, 0],
[0, 0]], dtype=int32)
eg-4:
-----
a = np.array([[10],[20],[30]]) ---->2-D, shape:(3,1)
b = np.array([10,20,30]) ---->1-D, shape:(3,)
Rule-1:
Before:
(3,1)
(3,)
After:
(3,1)
(1,3)
Rule-2:
Before:
(3,1)
83
(1,3)
After:
(3,3)
(3,3)
>>> a = np.array([[10],[20],[30]])
>>> b = np.array([10,20,30])
>>> a
array([[10],
[20],
[30]])
>>> b
array([10, 20, 30])
>>> a+b
array([[20, 30, 40],
[30, 40, 50],
[40, 50, 60]])
1. reshape():
-------------
np.reshape(array,shape)
array.reshape(shape)
b = (-1,-1)
a = 24
b = np.reshape(a,(2,3,-1))
b = np.reshape(a,(2,-1,4))
b = np.reshape(a,(-1,3,4))
b = np.reshape(a,(5,-1))
eg-1:
>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> np.reshape(a,(12,))
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
87
>>> np.reshape(a,(12,),'C')
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>> np.reshape(a,(12,),'F')
array([ 0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11])
eg-2:
>>> a = np.arange(24)
>>> a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23])
>>> np.reshape(a,(6,4))
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]])
>>> np.reshape(a,(6,4),'C')
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]])
>>> np.reshape(a,(6,4),'F')
array([[ 0, 6, 12, 18],
[ 1, 7, 13, 19],
[ 2, 8, 14, 20],
[ 3, 9, 15, 21],
[ 4, 10, 16, 22],
[ 5, 11, 17, 23]])
Conclusions:
88
------------
1. To reshape array without changing data.
2. The sizes must be matched.
3. We can use either numpy library function or ndarray class method.
np.reshape()
a.reshape()
4. It won't create a new array object, just we will get view.
5. We can use -1 in unknown dimension, but only once.
6. order: 'C','F'
resize() function:
------------------
output array: can be any dimension,any shape,any size
a = np.arange(10)
sir pls explain once why np.resize() is a function and other one is a method
1. reshape()
2. resize()
3. flatten():
-------------
1-D,2-D,3-D,n-D
90
1. convert any n-D array to 1-D array.
2. It is method present in ndarray class but not numpy library function.
a.flatten()--->valid
np.flatten()-->invalid
3. a.flatten(order='C')
C-style====>row major order
F-stype===>column major order
4. It will create a new array and returns it (ie copy but not view)
5. The output of flatten method is always 1D array
>>> a = np.arange(6).reshape(3,2)
>>> a
array([[0, 1],
[2, 3],
[4, 5]])
>>> a.flatten()
array([0, 1, 2, 3, 4, 5])
>>> a
array([[0, 1],
[2, 3],
[4, 5]])
>>> a.flatten('F')
array([0, 2, 4, 1, 3, 5])
>>> a
array([[0, 1],
[2, 3],
[4, 5]])
>>> b = a.flatten()
>>> b
91
array([0, 1, 2, 3, 4, 5])
>>> a[0][0]=7777
>>> a
array([[7777, 1],
[ 2, 3],
[ 4, 5]])
>>> b
array([0, 1, 2, 3, 4, 5])
>>> a = np.arange(1,19).reshape(3,3,2)
>>> a.ndim
3
>>> a
array([[[ 1, 2],
[ 3, 4],
[ 5, 6]],
[[ 7, 8],
[ 9, 10],
[11, 12]],
[[13, 14],
[15, 16],
[17, 18]]])
>>> a
array([[[ 1, 2],
[ 3, 4],
[ 5, 6]],
[[ 7, 8],
[ 9, 10],
[11, 12]],
[[13, 14],
92
[15, 16],
[17, 18]]])
>>> b = a.flatten()
>>> b
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18])
1. reshape()
2. resize()
3. flatten()
4. flat variable
5. ravel() function
-------------------
It is exactly same as flatten function except that it returns view but not copy.
>>> a = np.arange(24).reshape(2,3,4)
>>> a
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
>>> a = np.arange(18).reshape(6,3)
>>> a
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
>>> b = np.ravel(a)
>>> b
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17])
6. transpose()
7. swapaxes()
6. transpose():
---------------
to find transpose() of given ndarray.
np.transpose(a, axes=None)
Reverse or permute the axes of an array; returns the modified array.
95
>>> a = np.arange(1,5).reshape(2,2)
>>> a
array([[1, 2],
[3, 4]])
>>> np.transpose(a)
array([[1, 3],
[2, 4]])
***Note: No change in data and hence it returns only view but not copy.
>>> a = np.arange(6)
96
>>> a
array([0, 1, 2, 3, 4, 5])
>>> np.transpose(a)
array([0, 1, 2, 3, 4, 5])
4-D array:
---------
a --->(2,3,4,5)
np.transpose(a) ---->(5,4,3,2)
axes parameter:
---------------
If we are not using axes parameter, then dimensions will be reversed.
axes parameter descirbes in which order we have to take axes.
It is very helpful for 3-D and 4-D arrays.
np.transpose(a) --->(4,3,2)
axis-0---->number of rows
axis-1---->number of columns
np.transpose(a,axes=(0,1))
np.transpose(a,axes=(1,0))
Note: If we repeat same axis multiple times then we will get error.
for 3-D array:(2,3,4)
np.transpose(a,axes=(0,2,2))
ValueError: repeated axis in transpose
----------------------------------
1. For 1-D array, there is no effect of transpose() function.
2. If we are not using axes argument, then dimensions will be reversed.
3. If we provide axes argument, then we can specify our own order of axes.
4. Repeated axis in transpose is not allowed.
5. axes argument is more helpful from 3-D array onwards but not for 2-D array.
a.transpose(*axes)
eg-1:
a = np.arange(24).reshape(2,3,4)
98
b = a.transpose()
>>> b = a.transpose((2,0,1))
>>> b.shape
(4, 2, 3)
a.T also
Note:
1. For 1-D array, there is no effect of transpose() function.
2. If we are not using axes argument, then dimensions will be reversed.
3. If we provide axes argument, then we can specify our own order of axes.
4. Repeated axis in transpose is not allowed.
5. axes argument is more helpful from 3-D array onwards but not for 2-D
arrays.
6. Various possible syntaxes:
1. numpy.transpose(a)
2. numpy.transpose(a,axes=(2,0,1))
3. ndarrayobject.transpose()
4. ndarrayobject.transpose(*axes)
5. ndarrayobject.T
Here 1,3,5 lines are equal wrt functionality.
99
swapaxes()
----------
input: (2,3,4)
output: (4,3,2),(3,2,4),(2,4,3),(3,4,2) etc
a: (2,3,4)
np.swapaxes(a,0,2)-->(4,3,2)
np.swapaxes(a,1,2)-->(2,4,3)
-----------------------------
>>> a = np.arange(6).reshape(3,2)
>>> a
array([[0, 1],
[2, 3],
[4, 5]])
>>> np.swapaxes(a,0,1)
array([[0, 2, 4],
[1, 3, 5]])
>>> np.swapaxes(a,1,0)
array([[0, 2, 4],
[1, 3, 5]])
swapaxes(...)
a.swapaxes(axis1, axis2)
1. reshape()
2. resize()
3. flatten()
4. flat variable
5. ravel()
6. transpose()
7. swapaxes()
101
Joining of multiple ndarrays into a single array:
-------------------------------------------------
It is something like join queries in Oracle.
1. concatenate()
2. stack()
3. vstack()
4. hstack()
5. dstack()
Rules:
------
1. We can join any number of arrays, but all arrays should be of same
dimension.
2. The sizes of all axes, except concatenation axes must be matched.
102
3. The result of concatenation and out must have same shape.
>>> np.concatenate((a,b))
array([0, 1, 2, 3, 0, 1, 2, 3, 4])
>>> a
103
array([0, 1, 2, 3])
>>> b
array([0, 1, 2, 3, 4])
>>> np.concatenate((a,b),dtype='float')
array([0., 1., 2., 3., 0., 1., 2., 3., 4.])
>>> np.concatenate((a,b),dtype='str')
array(['0', '1', '2', '3', '0', '1', '2', '3', '4'], dtype='<U11')
***Note: We cannot use dtype and out simultaneously, because out array has
its own dtype.
>>> a = np.arange(4)
>>> b = np.arange(5)
>>> c = np.empty(9)
>>> np.concatenate((a,b),out=c,dtype='int')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in concatenate
TypeError: concatenate() only takes `out` or `dtype` as an argument, but both
were provided.
>>> a = np.arange(5)
>>> b = np.arange(12).reshape(3,4)
>>> np.concatenate(a,b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in concatenate
TypeError: only integer scalar arrays can be converted to a scalar index
104
Joining of 2-D arrays:
----------------------
For 2-D array the existing axes are:
eg-1:
a = np.array([[10,20],[30,40],[50,60]])
b = np.array([[70,80],[90,100]])
>>> a
array([[10, 20],
[30, 40],
[50, 60]])
>>> b
array([[ 70, 80],
[ 90, 100]])
-------------------
eg-2:
a = np.arange(6).reshape(3,2)
b = np.arange(9).reshape(3,3)
>>> a = np.arange(6).reshape(3,2)
>>> b = np.arange(9).reshape(3,3)
>>> a
105
array([[0, 1],
[2, 3],
[4, 5]])
>>> b
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> np.concatenate((a,b),axis=0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input array dimensions for the concatenation axis must
match exactly, but along dimension 1, the array at index 0 has size 2 and the
array at index 1 has size 3
>>> a
array([[0, 1],
[2, 3],
[4, 5]])
>>> b
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> np.concatenate((a,b),axis=1)
array([[0, 1, 0, 1, 2],
[2, 3, 3, 4, 5],
[4, 5, 6, 7, 8]])
a = np.arange(4).reshape(2,2)
b = np.arange(4).reshape(2,2)
>>> a = np.arange(4).reshape(2,2)
>>> b = np.arange(4).reshape(2,2)
>>> a
106
array([[0, 1],
[2, 3]])
>>> b
array([[0, 1],
[2, 3]])
>>> np.concatenate((a,b),axis=0)
array([[0, 1],
[2, 3],
[0, 1],
[2, 3]])
>>> np.concatenate((a,b),axis=1)
array([[0, 1, 0, 1],
[2, 3, 2, 3]])
>>> np.concatenate((a,b),axis=None)
array([0, 1, 2, 3, 0, 1, 2, 3])
eg-1:
>>> a = np.arange(8).reshape(2,2,2)
>>> b = np.arange(8).reshape(2,2,2)
>>> a
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
>>> b
107
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
>>> np.concatenate((a,b),axis=0)
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]],
[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
>>> a
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
>>> b
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
>>> np.concatenate((a,b),axis=1)
array([[[0, 1],
[2, 3],
[0, 1],
108
[2, 3]],
[[4, 5],
[6, 7],
[4, 5],
[6, 7]]])
>>> np.concatenate((a,b),axis=2)
array([[[0, 1, 0, 1],
[2, 3, 2, 3]],
[[4, 5, 4, 5],
[6, 7, 6, 7]]])
>>> np.concatenate((a,b),axis=None)
array([0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7])
a=(2,3,2)
b=(2,3,3)
axis-0--->no
axis-1--->no
axis-2---->yes
>>> a = np.arange(12).reshape(2,3,2)
>>> a
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
109
[10, 11]]])
>>> b = np.arange(18).reshape(2,3,3)
>>> b
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]]])
>>> np.concatenate((a,b),axis=0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input array dimensions for the concatenation axis must
match exactly, but along dimension 2, the array at index 0 has size 2 and the
array at index 1 has size 3
>>> np.concatenate((a,b),axis=1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input array dimensions for the concatenation axis must
match exactly, but along dimension 2, the array at index 0 has size 2 and the
array at index 1 has size 3
>>> np.concatenate((a,b),axis=2)
array([[[ 0, 1, 0, 1, 2],
[ 2, 3, 3, 4, 5],
[ 4, 5, 6, 7, 8]],
[[ 6, 7, 9, 10, 11],
[ 8, 9, 12, 13, 14],
[10, 11, 15, 16, 17]]])
>>> np.concatenate((a,b),axis=None)
110
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17])
1-D + 1-D----->2-D
2-D + 2-D---->3-D
Rules:
1. The input arrays must have same shape.
2. The resultant stacked array has one more dimension than the input arrays.
3. Joining will be happend along new axis of newly created array.
>>> a = np.array([10,20,30])
>>> b = np.array([40,50,60])
>>> a
array([10, 20, 30])
>>> b
array([40, 50, 60])
>>> np.stack((a,b))
array([[10, 20, 30],
[40, 50, 60]])
>>> np.stack((a,b))
array([[10, 20, 30],
[40, 50, 60]])
>>> np.stack((a,b),axis=1)
array([[10, 40],
[20, 50],
[30, 60]])
112
a = np.array([[1,2,3],[4,5,6]])
b = np.array([[7,8,9],[10,11,12]])
np.stack((a,b),axis=0)
np.stack((a,b))
np.stack((a,b),axis=1)
>>> np.stack((a,b),axis=1)
array([[[ 1, 2, 3],
[ 7, 8, 9]],
[[ 4, 5, 6],
[10, 11, 12]]])
a = np.arange(1,7).reshape(3,2)
b = np.arange(7,13).reshape(3,2)
c = np.arange(13,19).reshape(3,2)
>>> a
array([[1, 2],
[3, 4],
[5, 6]])
>>> b
113
array([[ 7, 8],
[ 9, 10],
[11, 12]])
>>> c
array([[13, 14],
[15, 16],
[17, 18]])
Based on axis-0:
----------------
In 3-D array axis-0 means the number of 2-d arrays
np.stack((a,b,c),axis=0)
np.stack((a,b,c))
>>> np.stack((a,b,c),axis=0)
array([[[ 1, 2],
[ 3, 4],
[ 5, 6]],
[[ 7, 8],
[ 9, 10],
[11, 12]],
[[13, 14],
[15, 16],
[17, 18]]])
Based on axis-1:
----------------
In 3-D array, axis-1 means the number of rows.
Stacking row wise
114
np.stack((a,b,c),axis=1)
>>> np.stack((a,b,c),axis=1)
array([[[ 1, 2],
[ 7, 8],
[13, 14]],
[[ 3, 4],
[ 9, 10],
[15, 16]],
[[ 5, 6],
[11, 12],
[17, 18]]])
Based on axis-2:
----------------
in 3-D array axis-2 means the number of columns in every 2-D array.
stacking column wise
np.stack((a,b,c),axis=2)
>>> np.stack((a,b,c),axis=2)
array([[[ 1, 7, 13],
[ 2, 8, 14]],
[[ 3, 9, 15],
[ 4, 10, 16]],
[[ 5, 11, 17],
[ 6, 12, 18]]])
115
Stacking of three 1-D arrays:
-----------------------------
a = np.arange(4)
b = np.arange(4,8)
c = np.arange(8,12)
>>> a
array([0, 1, 2, 3])
>>> b
array([4, 5, 6, 7])
>>> c
array([ 8, 9, 10, 11])
Based on axis-0:
----------------
axis-0 in 2-D array means the number of rows
np.stack((a,b,c),axis=0)
>>> np.stack((a,b,c),axis=0)
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Based on axis-1:
---------------
axis-1 in 2-D array means the number of columns
np.stack((a,b,c),axis=1)
>>> np.stack((a,b,c),axis=1)
array([[ 0, 4, 8],
[ 1, 5, 9],
116
[ 2, 6, 10],
[ 3, 7, 11]])
3. To perform concatenation, all input arrays must have same dimension. The
size of all dimensions except concatenation axis must be same.
3. To perform stack operation, compulsory all input arrays must have same
shape.ie dimensions,sizes also needs to be same.
eg-1:
a = np.array([10,20,30,40])
b = np.array([50,60,70,80])
np.vstack((a,b))
117
eg-2:
a = np.arange(1,10).reshape(3,3)
b = np.arange(10,16).reshape(2,3)
np.vstack((a,b))
eg-3:
a = np.arange(1,10).reshape(3,3)
b = np.arange(10,16).reshape(3,2)
np.vstack((a,b))
ValueError: all the input array dimensions for the concatenation axis must
match exactly, but along dimension 1, the array at index 0 has size 3 and the
array at index 1 has size 2
a = np.arange(1,25).reshape(2,3,4)
b = np.arange(25,49).reshape(2,3,4)
>>> np.vstack((a,b))
array([[[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]],
array([ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
>>> a = np.arange(1,7).reshape(3,2)
>>> b = np.arange(7,16).reshape(3,3)
>>> a
array([[1, 2],
[3, 4],
[5, 6]])
>>> b
array([[ 7, 8, 9],
119
[10, 11, 12],
[13, 14, 15]])
>>> np.hstack((a,b))
array([[ 1, 2, 7, 8, 9],
[ 3, 4, 10, 11, 12],
[ 5, 6, 13, 14, 15]])
eg-2:
a = np.arange(1,7).reshape(2,3)
b = np.arange(7,16).reshape(3,3)
>>> a = np.array((1,2,3))
>>> b = np.array((2,3,4))
>>> np.dstack((a,b))
array([[[1, 2],
[2, 3],
[3, 4]]])
>>> a = np.array([[1],[2],[3]])
>>> b = np.array([[2],[3],[4]])
>>> np.dstack((a,b))
120
array([[[1, 2]],
[[2, 3]],
[[3, 4]]])
Splitting of ndarrays:
----------------------
We can perform split operation by using the following functions
1. split()
2. vsplit()
3. hsplit()
4. dsplit()
5. array_split()
1. split():
-----------
array, sections_or_indices, axis
121
sections:
1. Array will be splitted into sub arrays of equal size.
2. It returns list of sub arrays
a= np.arange(1,10)
sub_arrays = np.split(a,3)
sub_arrays = np.split(a,4)
We can also split based on axis-1. column wise split (horizontal split)
a = np.arange(1,25).reshape(6,4)
>>> a = np.arange(1,25).reshape(6,4)
>>> a
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20],
[21, 22, 23, 24]])
>>> np.split(a,2)
[array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]]), array([[13, 14, 15, 16],
[17, 18, 19, 20],
[21, 22, 23, 24]])]
>>> np.split(a,3,axis=0)
[array([[1, 2, 3, 4],
[5, 6, 7, 8]]), array([[ 9, 10, 11, 12],
[13, 14, 15, 16]]), array([[17, 18, 19, 20],
[21, 22, 23, 24]])]
>>> np.split(a,6)
[array([[1, 2, 3, 4]]), array([[5, 6, 7, 8]]), array([[ 9, 10, 11, 12]]), array([[13, 14,
15, 16]]), array([[17, 18, 19, 20]]), array([[21, 22, 23, 24]])]
123
>>> np.split(a,4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in split
File "C:\Python38\lib\site-packages\numpy\lib\shape_base.py", line 872, in
split
raise ValueError(
ValueError: array split does not result in an equal division
>>> a
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20],
[21, 22, 23, 24]])
>>> np.split(a,2,axis=1)
[array([[ 1, 2],
[ 5, 6],
[ 9, 10],
[13, 14],
[17, 18],
[21, 22]]), array([[ 3, 4],
[ 7, 8],
[11, 12],
[15, 16],
[19, 20],
[23, 24]])]
>>> np.split(a,4,axis=1)
124
[array([[ 1],
[ 5],
[ 9],
[13],
[17],
[21]]), array([[ 2],
[ 6],
[10],
[14],
[18],
[22]]), array([[ 3],
[ 7],
[11],
[15],
[19],
[23]]), array([[ 4],
[ 8],
[12],
[16],
[20],
[24]])]
>>> np.split(a,3,axis=1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in split
File "C:\Python38\lib\site-packages\numpy\lib\shape_base.py", line 872, in
split
raise ValueError(
ValueError: array split does not result in an equal division
a = np.arange(1,19).reshape(3,6)
np.split(a,[1,3,5],axis=1)
np.split(a,[2,4,4],axis=1)
np.split(a,[0,2,6],axis=1)
np.split(a,[1,5,3],axis=1)
Splitting by vsplit():
---------------------
vsplit means vertical split means row wise split
split is based on axis-0
vsplit(array, sections_or_indices)
a = np.arange(10)
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.vsplit(a,2)
Traceback (most recent call last):
126
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in vsplit
File "C:\Python38\lib\site-packages\numpy\lib\shape_base.py", line 990, in
vsplit
raise ValueError('vsplit only works on arrays of 2 or more dimensions')
ValueError: vsplit only works on arrays of 2 or more dimensions
splitting by hsplit():
-----------------------
split horizontally (column wise)
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.hsplit(a,2)
[array([0, 1, 2, 3, 4]), array([5, 6, 7, 8, 9])]
>>> np.hsplit(a,10)
[array([0]), array([1]), array([2]), array([3]), array([4]), array([5]), array([6]),
array([7]), array([8]), array([9])]
>>> np.hsplit(a,3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in hsplit
File "C:\Python38\lib\site-packages\numpy\lib\shape_base.py", line 942, in
hsplit
return split(ary, indices_or_sections, 0)
File "<__array_function__ internals>", line 5, in split
File "C:\Python38\lib\site-packages\numpy\lib\shape_base.py", line 872, in
split
raise ValueError(
ValueError: array split does not result in an equal division
>>> a = np.arange(24).reshape(4,6)
>>> a
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]])
>>> np.hsplit(a,[2,4])
[array([[ 0, 1],
[ 6, 7],
[12, 13],
[18, 19]]), array([[ 2, 3],
[ 8, 9],
[14, 15],
[20, 21]]), array([[ 4, 5],
[10, 11],
[16, 17],
[22, 23]])]
>>> np.hsplit(a,[1,4])
[array([[ 0],
[ 6],
[12],
[18]]), array([[ 1, 2, 3],
[ 7, 8, 9],
[13, 14, 15],
130
[19, 20, 21]]), array([[ 4, 5],
[10, 11],
[16, 17],
[22, 23]])]
In 3-D array:
axis-0--->number of 2-D arrays
axis-1--->number of rows
axis-2-->number of columns
>>> a = np.arange(24).reshape(2,3,4)
>>> a
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13],
[16, 17],
[20, 21]]]), array([[[ 2, 3],
[ 6, 7],
[10, 11]],
131
[[14, 15],
[18, 19],
[22, 23]]])]
>>> a = np.arange(24).reshape(2,3,4)
>>> a
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13],
[16, 17],
[20, 21]]]), array([[[ 2, 3],
[ 6, 7],
[10, 11]],
[[14, 15],
[18, 19],
[22, 23]]])]
>>> a
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
132
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
>>> #[1,3]
>>> np.dsplit(a,[1,3])
[array([[[ 0],
[ 4],
[ 8]],
[[12],
[16],
[20]]]), array([[[ 1, 2],
[ 5, 6],
[ 9, 10]],
[[13, 14],
[17, 18],
[21, 22]]]), array([[[ 3],
[ 7],
[11]],
[[15],
[19],
[23]]])]
2 sub-arrays of size: 3
4,3,3
>>> a = np.arange(10,101,10)
>>> a
array([ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
>>> np.split(a,3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in split
134
File "C:\Python38\lib\site-packages\numpy\lib\shape_base.py", line 872, in
split
raise ValueError(
ValueError: array split does not result in an equal division
>>> np.array_split(a,3)
[array([10, 20, 30, 40]), array([50, 60, 70]), array([ 80, 90, 100])]
eg-2:
11 elements 3 sections
x=11
n=3
4,4,3
x=12
n=3
(4,4,4)
x=13
n=3
(5,4,4,)
135
split(),vsplit(),hsplit(),dsplit(),array_split()
x=6
n=4
>>> a = np.arange(24).reshape(6,4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]])
>>> np.array_split(a,4)
[array([[0, 1, 2, 3],
[4, 5, 6, 7]]), array([[ 8, 9, 10, 11],
[12, 13, 14, 15]]), array([[16, 17, 18, 19]]), array([[20, 21, 22, 23]])]
136
Summary of split methods:
-------------------------
split()-->Split an array into multiple sub-arrays of equal size.
vsplit()-->Split array into multiple sub-arrays vertically (row wise).
hsplit()-->Split array into multiple sub-arrays horizontally (column-wise).
dsplit()--> Split array into multiple sub-arrays along the 3rd axis (depth).
array_split()-->Split an array into multiple sub-arrays of equal or near-
equal size.Does not raise an exception if an equal division cannot be made.
joining
splitting
Sorting of ndarrays:
--------------------
np.sort(a)
quicksort --->merge sort, heap sort
>>> a = np.array([70,20,60,10,50,40,30])
>>> a
array([70, 20, 60, 10, 50, 40, 30])
>>> np.sort(a)
array([10, 20, 30, 40, 50, 60, 70])
137
To sort in descending order:
----------------------------
1st way:
-------
np.sort(a)[::-1]
>>> np.sort(a)[::-1]
array([70, 60, 50, 40, 30, 20, 10])
2nd way:
--------
>>> -np.sort(-a)
array([70, 60, 50, 40, 30, 20, 10])
>>> a = np.array(['cat','rat','bat','vat','dog'])
>>> a
array(['cat', 'rat', 'bat', 'vat', 'dog'], dtype='<U3')
>>> np.sort(a)
array(['bat', 'cat', 'dog', 'rat', 'vat'], dtype='<U3')
a= np.array([[40,20,70],[30,20,60],[70,90,80]])
where(...)
where(condition, [x, y])
a = np.array([3,5,7,6,9,4,6,10,15])
>>> a = np.array([3,5,7,6,9,4,6,10,15])
>>> a
array([ 3, 5, 7, 6, 9, 4, 6, 10, 15])
>>> np.where(a==7)
(array([2], dtype=int64),)
eg-2: Find indices where odd numbers present in the given 1-D array?
139
np.where(a%2 != 0)
>>> np.where(a%2 != 0)
(array([0, 1, 2, 4, 8], dtype=int64),)
If condition satisfied that element will be replaced from x and if the condition
fails that element will be replaced from y.
eg: Replace every even number with 8888 and every odd number with 7777?
140
b = np.where(a%2==0,8888,9999)
>>> a
array([ 3, 5, 7, 6, 9, 4, 6, 10, 15])
>>> b = np.where(a%2==0,8888,9999)
>>> b
array([9999, 9999, 9999, 8888, 9999, 8888, 8888, 8888, 9999])
b = np.where(a%2 != 0,9999,a)
>>> a
array([ 3, 5, 7, 6, 9, 4, 6, 10, 15])
>>> b = np.where(a%2 != 0,9999)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in where
ValueError: either both or neither of x and y should be given
>>> b = np.where(a%2 != 0,9999,a)
>>> b
array([9999, 9999, 9999, 6, 9999, 4, 6, 10, 9999])
The first ndarray represents row indices and second ndarray represents column
indices. ie the required elements present at (0,0),(1,2) and (3,1) index places.
>>> np.where(a%5==0,9999,a)
array([[9999, 1, 2],
[ 3, 4, 9999],
[ 6, 7, 8],
[ 9, 9999, 11]])
sort()
where()
searchsorted() function:
-----------------------
Internally this function will use Binary Search algorithm. Hence we can call this
function only for sorted arrays.
If the array is not sorted then we will get abnormal results.
Note: Bydefault it will always search from left hand side to identify insertion
point. If we want to search from right hand side we should use side='right'
>>> a = np.array([3,5,7,6,7,9,4,10,15,6])
>>> a
array([ 3, 5, 7, 6, 7, 9, 4, 10, 15, 6])
>>> a = np.sort(a)
>>> a
array([ 3, 4, 5, 6, 6, 7, 7, 9, 10, 15])
>>> np.searchsorted(a,6)
3
>>> np.searchsorted(a,6,side='right')
5
Summary:
1. sort()--->To sort given array
2. where() --->To perform search and replace operation
3. searchsorted() --->To identify insertion point in the given sorted array.
1. insert()
2. append()
1. insert():
143
------------
insert(arr, obj, values, axis=None)
Insert values along the given axis before the given indices.
obj--->object that defines index or indices before which the value will be
inserted.
b = np.insert(a,2,7777)
>>> b = np.insert(a,2,7777)
>>> b
array([ 0, 1, 7777, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = np.insert(a,[2,5],7777)
>>> b
array([ 0, 1, 7777, 2, 3, 4, 7777, 5, 6, 7, 8,
9])
b = np.insert(a,[2,5],[7777,8888])
144
eg-4: observations
b = np.insert(a,[2,5],[7777,8888,9999])
ValueError: shape mismatch: value array of shape (3,) could not be broadcast to
indexing result of shape (2,)
b = np.insert(a,[2,5,7],[7777,8888])
ValueError: shape mismatch: value array of shape (2,) could not be broadcast to
indexing result of shape (3,)
b = np.insert(a,[2,5,5],[777,888,999])
>>> b = np.insert(a,[2,5,5],[777,888,999])
>>> b
array([ 0, 1, 777, 2, 3, 4, 888, 999, 5, 6, 7, 8, 9])
b = np.insert(a,25,7777)
IndexError: index 25 is out of bounds for axis 0 with size 10
****Note:
Array should contain only homogeneous elements. If we are trying to insert any
other type element,that element will be converted to array type automatically
before insertion. If the conversion not possible then we will get error.
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.insert(a,2,123.456)
array([ 0, 1, 123, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.insert(a,2,True)
array([0, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9])
145
>>> np.insert(a,2,'durga')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in insert
File "C:\Python38\lib\site-packages\numpy\lib\function_base.py", line 4640,
in insert
values = array(values, copy=False, ndmin=arr.ndim, dtype=arr.dtype)
ValueError: invalid literal for int() with base 10: 'durga'
>>> np.insert(a,2,10+20j)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in insert
File "C:\Python38\lib\site-packages\numpy\lib\function_base.py", line 4640,
in insert
values = array(values, copy=False, ndmin=arr.ndim, dtype=arr.dtype)
TypeError: can't convert complex to int
Summary:
--------
While inserting elements into 1-D array we have to take care of the following:
eg:
a = np.array([[10,20],[30,40]])
146
np.insert(a,1,100)
>>> a = np.array([[10,20],[30,40]])
>>> np.insert(a,1,100)
array([ 10, 100, 20, 30, 40])
eg-2:
np.insert(a,1,100,axis=0)
>>> np.insert(a,1,100,axis=0)
array([[ 10, 20],
[100, 100],
[ 30, 40]])
eg-3:
np.insert(a,1,[100,200],axis=0)
>>> np.insert(a,1,[100,200],axis=0)
array([[ 10, 20],
[100, 200],
[ 30, 40]])
>>> np.insert(a,1,[100,200,300],axis=0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in insert
File "C:\Python38\lib\site-packages\numpy\lib\function_base.py", line 4652,
in insert
new[tuple(slobj)] = values
ValueError: could not broadcast input array from shape (1,3) into shape (1,2)
eg:
147
np.insert(a,1,100,axis=1)
To insert a new column
>>> np.insert(a,1,100,axis=1)
array([[ 10, 100, 20],
[ 30, 100, 40]])
>>> np.insert(a,1,[100,200],axis=1)
array([[ 10, 100, 20],
[ 30, 200, 40]])
np.insert(a,0,[100,200],axis=-1)
>>> np.insert(a,0,[100,200],axis=-1)
array([[100, 10, 20],
[200, 30, 40]])
>>> np.insert(a,1,[100,200,300],axis=0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in insert
File "C:\Python38\lib\site-packages\numpy\lib\function_base.py", line 4652,
in insert
new[tuple(slobj)] = values
ValueError: could not broadcast input array from shape (1,3) into shape (1,2)
Syntax:
insert(array,object,values,axis)
append(array,values,axis)
a = np.arange(10)
np.append(a,9999)
>>> np.append(a,9999)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 9999])
np.append(a,[10,20,30])
>>> np.append(a,[10,20,30])
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30])
>>> a = np.arange(10)
>>> a
149
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.append(a,10.5)
>>> np.append(a,10.5)
array([ 0. , 1. , 2. , 3. , 4. , 5. , 6. , 7. , 8. , 9. , 10.5])
>>> np.append(a,10.5)
array([ 0. , 1. , 2. , 3. , 4. , 5. , 6. , 7. , 8. , 9. , 10.5])
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.append(a,'durga')
array(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'durga'],
dtype='<U11')
>>> np.append(a,True)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1])
>>> np.append(a,10+20j)
array([ 0. +0.j, 1. +0.j, 2. +0.j, 3. +0.j, 4. +0.j, 5. +0.j,
6. +0.j, 7. +0.j, 8. +0.j, 9. +0.j, 10.+20.j])
***2. If we are providing axis, then all input arrays must have same number of
dimensions and same shape of provided axis.
a = np.array([[10,20],[30,40]])
>>> a
array([[10, 20],
[30, 40]])
150
new row: [[70,80]]
np.append(a,70)
>>> np.append(a,70)
array([10, 20, 30, 40, 70])
eg-2:
np.append(a,70,axis=0)
ValueError: all the input arrays must have same number of dimensions, but the
array
at index 0 has 2 dimension(s) and the array at index 1 has 0 dimension(s)
eg-3:
np.append(a,[70,80],axis=0)
ValueError: all the input arrays must have same number of dimensions, but the
array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)
eg-4:
np.append(a,[[70,80]],axis=0)
>>> np.append(a,[[70,80]],axis=0)
array([[10, 20],
[30, 40],
[70, 80]])
>>> np.append(a,[[70,80],[90,100]],axis=0)
array([[ 10, 20],
[ 30, 40],
[ 70, 80],
[ 90, 100]])
151
EG-5:
np.append(a,[[70,80]],axis=1)
ValueError: all the input array dimensions for the concatenation axis must
match exactly, but along dimension 0, the array at index 0 has size 2 and the
array at index 1 has size 1
np.append(a,[[70],[80]],axis=1)
>>> np.append(a,[[70],[80]],axis=1)
array([[10, 20, 70],
[30, 40, 80]])
np.append(a,[[70,80],[90,100]],axis=1)
>>> np.append(a,[[70,80],[90,100]],axis=1)
array([[ 10, 20, 70, 80],
[ 30, 40, 90, 100]])
>>> a = np.arange(10,101,10)
>>> a
array([ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
>>> np.delete(a,3)
153
array([ 10, 20, 30, 50, 60, 70, 80, 90, 100])
>>> np.delete(a,[0,4,6])
array([ 20, 30, 40, 60, 80, 90, 100])
>>> np.delete(a,np.s_[2:6])
array([ 10, 20, 70, 80, 90, 100])
np.delete(a,range(2,6))
a = np.arange(1,13).reshape(3,4)
>>> a = np.arange(1,13).reshape(3,4)
>>> a
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> np.delete(a,1)
array([ 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
>>> a = np.arange(1,13).reshape(3,4)
>>> a
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> np.delete(a,1)
array([ 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
>>> [ 5, 6, 7, 8],
([5, 6, 7, 8],)
>>> np.delete(a,0,axis=0)
array([[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> np.delete(a,[0,2],axis=0)
array([[5, 6, 7, 8]])
>>> np.delete(a,np.s_[:2],axis=0)
array([[ 9, 10, 11, 12]])
>>> a
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
np.delete(a,0,axis=1)
np.delete(a,[0,2],axis=1)
np.delete(a,np.s_[::3],axis=1)
np.delete(a,np.s_[1:],axis=1)
>>> np.delete(a,3)
array([ 0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23])
np.delete(a,1,axis=0)
np.delete(a,1,axis=1)
To delete 1st indexed row in every 2-D array
np.delete(a,2,axis=2)
np.delete(a,[0,2],axis=2)
np.delete(a,np.s_[1:],axis=2)
156
>>> a = np.arange(24).reshape(2,3,4)
>>> a
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
157
[[12, 13, 15],
[16, 17, 19],
[20, 21, 23]]])
>>> np.delete(a,[0,2],axis=2)
array([[[ 1, 3],
[ 5, 7],
[ 9, 11]],
[[13, 15],
[17, 19],
[21, 23]]])
>>> np.delete(a,np.s_[1:],axis=2)
array([[[ 0],
[ 4],
[ 8]],
[[12],
[16],
[20]]])
Case Study:
-----------
Q. Consider the following array:
>>> a = np.arange(12).reshape(4,3)
>>> a
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
Summary:
---------
insert()-->Insert elements into an array at specified index.
append()-->Append elements at the end of an array.
delete()--->Delete elements from an array.
np.dot(a,b)
a.dot(b)
class matrix(ndarray)
| matrix(data, dtype=None, copy=True)
a = np.matrix('10,20;30,40')
161
eg-2: Creating matrix object from nested list
a = np.matrix([[10,20],[30,40]])
>>> a = np.matrix([[10,20],[30,40]])
>>> a
matrix([[10, 20],
[30, 40]])
conclusions:
------------
1. matrix is child class of ndarray class. Hence all methods and attributes of
ndarray class are applicable ot matrix also.
162
2. We can use +,*,T,** for matrix objects also.
3. In the case of ndarray, * operator performs element level multiplication.
But in case of matrix, * operator preforms matrix multiplication.
>>> a = np.array([[1,2],[3,4]])
>>> m = np.matrix([[1,2],[3,4]])
>>> a
array([[1, 2],
[3, 4]])
>>> m
matrix([[1, 2],
[3, 4]])
>>> a+a
array([[2, 4],
[6, 8]])
>>> m+m
matrix([[2, 4],
[6, 8]])
>>> a*a
array([[ 1, 4],
[ 9, 16]])
>>> m*m
matrix([[ 7, 10],
163
[15, 22]])
>>> a
array([[1, 2],
[3, 4]])
>>> a**2
array([[ 1, 4],
[ 9, 16]], dtype=int32)
>>> m**2
matrix([[ 7, 10],
[15, 22]])
>>> a
array([[1, 2],
[3, 4]])
>>> a.T
array([[1, 3],
[2, 4]])
>>> m
matrix([[1, 2],
[3, 4]])
>>> m.T
matrix([[1, 3],
[2, 4]])
>>> a = np.array([[1,2],[3,4]])
>>> a
array([[1, 2],
[3, 4]])
>>> ainv = np.linalg.inv(a)
>>> ainv
array([[-2. , 1. ],
[ 1.5, -0.5]])
How to check:
-------------
np.dot(a,ainv) = I
>>> a = np.array([[1,2],[3,4]])
>>> a
array([[1, 2],
[3, 4]])
>>> ainv = np.linalg.inv(a)
165
>>> ainv
array([[-2. , 1. ],
[ 1.5, -0.5]])
>>> i = np.eye(2)
>>> i
array([[1., 0.],
[0., 1.]])
>>> np.dot(a,ainv)
array([[1.0000000e+00, 0.0000000e+00],
[8.8817842e-16, 1.0000000e+00]])
>>> np.allclose(np.dot(a,ainv),i)
Note:
allclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)
Returns True if two arrays are element-wise equal within a tolerance.
***Note: We can find inverse only for square matrices, otherwise we will get
error.
>>> a = np.arange(10).reshape(5,2)
>>> a
array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
>>> np.linalg.inv(a)
numpy.linalg.LinAlgError: Last 2 dimensions of the array must be square
>>> a = np.arange(8).reshape(2,2,2)
>>> a
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
>>> np.linalg.inv(a)
array([[[-1.5, 0.5],
[ 1. , 0. ]],
[[-3.5, 2.5],
[ 3. , -2. ]]])
matrix_power(a, n)
Raise a square matrix to the (integer) power `n`.
a = np.array([[1,2],[3,4]])
>>> a
array([[1, 2],
[3, 4]])
>>> a = np.array([[1,2],[3,4]])
>>> a
array([[1, 2],
[3, 4]])
>>> np.linalg.matrix_power(a,0)
array([[1, 0],
[0, 1]])
>>> np.linalg.matrix_power(a,2)
array([[ 7, 10],
[15, 22]])
>>> np.linalg.matrix_power(a,-2)
array([[ 5.5 , -2.5 ],
[-3.75, 1.75]])
>>> np.dot(np.linalg.inv(a),np.linalg.inv(a))
array([[ 5.5 , -2.5 ],
[-3.75, 1.75]])
>>> np.linalg.matrix_power(np.linalg.inv(a),2)
array([[ 5.5 , -2.5 ],
[-3.75, 1.75]])
Note: We can find matrix_power only for a square matrix,otherwise we will get
error.
168
>>> a = np.arange(10).reshape(5,2)
>>> a
array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
>>> np.linalg.matrix_power(a,2)
numpy.linalg.LinAlgError: Last 2 dimensions of the array must be square
>>> a = np.array([[1,2],[3,4]])
>>> a
array([[1, 2],
[3, 4]])
>>> np.linalg.det(a)
-2.0000000000000004
169
Note: We can find determinant only for square matrices, otherwise we will get
error.
>>> a = np.arange(10).reshape(5,2)
>>> np.linalg.det(a)
numpy.linalg.LinAlgError: Last 2 dimensions of the array must be square
Parameters
----------
a : (..., M, M) array_like
Coefficient matrix.
b : {(..., M,), (..., M, K)}, array_like
Ordinate or "dependent variable" values.
Case Study:
-----------
case study:
---------------
Problem:
Boys and Girls are attending Durga sir's datascience class.
For boys fee is $3 and for girls fee is $8. For a certain batch 2200 people
attented and $10100 fee collected. How many boys and girls attended for that
batch?
3(2200-y)+8y=10100
6600-3y+8y = 10100
5y=10100-6600
5y=3500
y=700
x=1500
x+y = 2200
3x+8y=10100
a = np.array([[1,1],[3,8]])
b = np.array([2200,10100])
>>> a = np.array([[1,1],[3,8]])
>>> b = np.array([2200,10100])
>>> a
array([[1, 1],
[3, 8]])
>>> b
array([ 2200, 10100])
>>> np.linalg.solve(a,b)
array([1500., 700.])
eg-2:
171
-4x+7y-2z = 2
x-2y+z = 3
2x-3y+z = -4
a = np.array([[-4,7,-2],[1,-2,1],[2,-3,1]])
b = np.array([2,3,-4])
np.linalg.solve(a,b)
array([-13., -6., 4.])
x=-13,
y=-6
z=4
x-2y+z = 3
-13+12+4=3
Note:
1. The data will be stored in binary form
2. File extension should be .npy, otherwise save() function itself will add that
extension.
3. By using save() function we can write only one obejct to the file. If we want to
write multiple objects to a file then we should go for savez() function.
D:\durgaclasses>py test.py
['arr_0', 'arr_1']
[[10 20 30]
[40 50 60]]
[[ 70 80]
[ 90 100]]
Note:
np.save() --->Save an array to a binary file in .npy format
np.savez()---->Save several arrays into a single file in .npz format but in
uncompressed form.
np.savez_compressed()-->---->Save several arrays into a single file in .npz format
but in compressed form.
np.load()--->To load/read arrays from .npy or .npz files.
compressed form:
---------------
import numpy as np
a = np.array([[10,20,30],[40,50,60]]) #2-D array with shape:(2,3)
b = np.array([[70,80],[90,100]]) #2-D array with shape:(2,2)
Analysis:
---------
D:\durgaclasses>dir out.npz out_compressed.npz
Volume in drive D has no label.
Volume Serial Number is E2E9-F953
Directory of D:\durgaclasses
Directory of D:\durgaclasses
175
Note:
if we are using save() function the file extension: npy
if we are using savez() or savez_compressed() functions the file extension: npz
import numpy as np
a = np.array([[10,20,30],[40,50,60]]) #2-D array with shape:(2,3)
TypeError: Mismatch between array dtype ('<U11') and format specifier ('%.18e
%.18e')
eg-2:
import numpy as np
a1 = np.array([['Sunny',1000],['Bunny',2000],['Chinny',3000],['Pinny',4000]])
D:\durgaclasses>py test.py
[['Sunny' '1000']
['Bunny' '2000']
['Chinny' '3000']
['Pinny' '4000']]
out.txt:
-------
Sunny 1000
Bunny 2000
Chinny 3000
Pinny 4000
177
Zinny 5000
Vinny 6000
Minny 7000
Tinny 8000
import numpy as np
a1 = np.array([[10,20,30],[40,50,60]])
Summary:
--------
1. Save one ndarray object to the binary file(save() and load())
2. Save multiple ndarray objects to the binary file in uncompressed form(savez()
and load())
178
3. Save multiple ndarray objects to the binary file in compressed
form(savez_compressed() and load())
4. Save ndarry object to the text file (savetxt() and loadtxt())
5. Save ndarry object to the csv file (savetxt() and loadtxt() with delimiter=',')
Cricket Batsman
Data
1. Minimum value
2. Maximum value
3. Average Value
4. Sum of all values
5. Mean value
6. Median value
7. Variance
8. Standard deviation etc
10th+Intermediate+degree--->20 members
500 Rs List of questions
100 villages
179
20 samples
10 Lakhs from every mla candidate
running a shop
1. Minimum value
2. Maximum value
3. Average Value
4. Sum of all values
5. Mean value
6. Median value
7. Variance
8. Standard deviation etc
1. Minimum value:
-----------------
np.min(a)
np.amin(a)
a.min()
180
eg-1: for 1-D array
>>> a = np.array([10,5,20,3,25])
>>> a
array([10, 5, 20, 3, 25])
>>> np.min(a)
3
>>> np.amin(a)
3
>>> a.min()
3
>>> a.amin()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'amin'
>>> np.min(a)
4
>>> np.min(a)
4
>>> a
181
array([[100, 20, 30],
[ 10, 50, 60],
[ 25, 15, 18],
[ 4, 5, 19]])
>>> np.min(a,axis=0) #returns minimum row and that row contains 3 elements
array([ 4, 5, 18])
>>> np.min(a,axis=1) #returns minimum column and that column contains 4
elements
array([20, 10, 15, 4])
>>> a = np.arange(24).reshape(6,4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]])
>>> np.min(a)
0
>>> np.min(a,axis=0)
array([0, 1, 2, 3])
>>> np.min(a,axis=1)
array([ 0, 4, 8, 12, 16, 20])
eg-4:
>>> a = np.arange(24)
>>> a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
182
17, 18, 19, 20, 21, 22, 23])
>>> np.random.shuffle(a)
>>> a
array([13, 19, 12, 15, 2, 6, 5, 14, 8, 21, 10, 11, 22, 0, 18, 4, 20,
17, 7, 9, 23, 1, 3, 16])
>>> a = a.reshape(6,4)
>>> a
array([[13, 19, 12, 15],
[ 2, 6, 5, 14],
[ 8, 21, 10, 11],
[22, 0, 18, 4],
[20, 17, 7, 9],
[23, 1, 3, 16]])
>>> np.min(a)
0
>>> np.min(a,axis=0)
array([2, 0, 3, 4])
>>> np.min(a,axis=1)
array([12, 2, 8, 0, 7, 1])
>>> a = np.arange(4)
>>> a
array([0, 1, 2, 3])
>>> np.sum(a)
6
>>> a.sum()
6
184
>>> a = np.arange(9).reshape(3,3)
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> np.sum(a)
36
>>> a.sum()
36
>>> np.sum(a,axis=0)
array([ 9, 12, 15])
>>> np.sum(a,axis=1)
array([ 3, 12, 21])
np.mean(a)
a.mean()
Returns the average of the array elements. The average is taken over
the flattened array by default, otherwise over the specified axis.
185
`float64` intermediate and return values are used for integer inputs.
>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> np.mean(a)
2.0
>>> a.mean()
2.0
np.median(a)
>>> a = np.array([10,20,30])
>>> np.median(a)
20.0
Note:
Mean means average where as Median means middle element
average of
squared
deviations from the mean.
188
NUMPY contains var() function to find variance.
mean(a) = 3.0
deviations from the mean: [-2.0,-1.0,0.0,1.0,2.0]
squares of deviations from the mean: [4.0,1.0,0.0,1.0,4.0]
Average of squares of deviations from the mean: 2.0===>VARIANCE
>>> a = np.array([1,2,3,4,5])
>>> np.var(a)
2.0
Summary:
--------
1. np.min(a)/np.amin(a)/a.min()--->Returns the minimum value of the array
2. np.max(a)/np.amax(a)/a.max()--->Returns the maximum value of the array
3. np.sum(a)/a.sum()--->Returns the Sum of values of the array
4. np.mean(a)/a.mean()--->Returns the arithmetic mean of the array.
5. np.median(a) --->Returns median value of the array
6. np.var(a)/a.var() --->Returns variance of the values in the array
7. np.std(a)/a.std() --->Returns Standard deviation of the values in the array
191
To perform mathematical operations numpy library contains several universal
functions(ufunc).
>>> a = np.array([[1,2],[3,4]])
>>> np.exp(a)
array([[ 2.71828183, 7.3890561 ],
[20.08553692, 54.59815003]])
>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> np.exp(a)
array([ 1. , 2.71828183, 7.3890561 , 20.08553692, 54.59815003])
>>> np.sqrt(a)
array([0. , 1. , 1.41421356, 1.73205081, 2. ])
>>> np.log(a)
<stdin>:1: RuntimeWarning: divide by zero encountered in log
array([ -inf, 0. , 0.69314718, 1.09861229, 1.38629436])
>>> np.sin(a)
array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 ])
>>> np.cos(a)
array([ 1. , 0.54030231, -0.41614684, -0.9899925 , -0.65364362])
>>> np.tan(a)
array([ 0. , 1.55740772, -2.18503986, -0.14254654, 1.15782128])
192
How to find unique items and count:
-----------------------------------
unique() function
test.py:
--------
import numpy as np
a = np.array(['a','a','b','c','a','a','b','c','a','b','d'])
items,indices,counts = np.unique(a,return_index=True,return_counts=True)
for item,index,count in
zip(np.nditer(items),np.nditer(indices),np.nditer(counts)):
print(f"Element '{item}' occurred {count} times and its first occurrence
index:{index}")
D:\durgaclasses>py test.py
Element 'a' occurred 5 times and its first occurrence index:0
Element 'b' occurred 3 times and its first occurrence index:2
Element 'c' occurred 2 times and its first occurrence index:3
Element 'd' occurred 1 times and its first occurrence index:10
>>> a = np.array([10,20,30,40])
>>> np.diag(a,k=0)
array([[10, 0, 0, 0],
195
[ 0, 20, 0, 0],
[ 0, 0, 30, 0],
[ 0, 0, 0, 40]])
>>> np.diag(a,k=1)
array([[ 0, 10, 0, 0, 0],
[ 0, 0, 20, 0, 0],
[ 0, 0, 0, 30, 0],
[ 0, 0, 0, 0, 40],
[ 0, 0, 0, 0, 0]])
>>> np.diag(a,k=-1)
array([[ 0, 0, 0, 0, 0],
[10, 0, 0, 0, 0],
[ 0, 20, 0, 0, 0],
[ 0, 0, 30, 0, 0],
[ 0, 0, 0, 40, 0]])
----------------------------
9. Creation of diagonal array by using diag() function:
----------------------------------------------------
Syntax:
diag(v, k=0)
-->Extract a diagonal or construct a diagonal array.
-->If `v` is a 2-D array, return a copy of its `k`-th diagonal.
If `v` is a 1-D array, return a 2-D array with `v` on the `k`-th
diagonal.
>>> a = np.array([10,20,30,40])
>>> b = a.view()
>>> b
array([10, 20, 30, 40])
>>> a
array([10, 20, 30, 40])
>>> a[0]=7777
>>> a
array([7777, 20, 30, 40])
>>> b
array([7777, 20, 30, 40])
Copy:
-----
Copy means separate object.
If we perform any changes to the original array, those changes won't be
reflected to the Copy. viceversa also.
By using copy() method of ndarray class, we can create copy of existing ndarray.
>>> a = np.array([10,20,30,40])
>>> b = a.copy()
198
>>> a
array([10, 20, 30, 40])
>>> b
array([10, 20, 30, 40])
>>> a[0]=7777
>>> a
array([7777, 20, 30, 40])
>>> b
array([10, 20, 30, 40])
============================================
Chapter-23: Numpy Practice Quesions Set-1
===========================================
Q1. Create an array of 7 zeros?
>>> np.zeros(7)
array([0., 0., 0., 0., 0., 0., 0.])
>>> np.zeros(7,dtype=int)
array([0, 0, 0, 0, 0, 0, 0])
>>> np.full(7,0)
array([0, 0, 0, 0, 0, 0, 0])
>>> np.arange(10,41)
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40])
>>> np.arange(10,41,2)
array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40])
>>> np.arange(11,41,2)
array([11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39])
1st way:
-------
>>> np.arange(14,41,7)
array([14, 21, 28, 35])
200
2nd way:
--------
>>> a = np.arange(10,41)
>>> a
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40])
>>> a[a%7==0]
array([14, 21, 28, 35])
Q8. Create a numpy array having 10 numbers starts from 24 but only even
numbers?
1st way:
-------
>>> np.arange(24,43,2)
array([24, 26, 28, 30, 32, 34, 36, 38, 40, 42])
2nd way:
-------
>>> a = np.arange(24,50)
>>> a
array([24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49])
>>> a = a[a%2==0]
>>> a
array([24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48])
>>> np.resize(a,10)
array([24, 26, 28, 30, 32, 34, 36, 38, 40, 42])
>>> np.eye(4)
array([[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]])
>>> np.identity(4)
array([[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]])
Q11. By using numpy module, generate ndarray with random numbers from 1
to 100 with the shape:(2,3,4)?
>>> np.random.randint(1,101,size=(2,3,4))
array([[[74, 40, 32, 67],
[93, 34, 31, 86],
[ 4, 3, 54, 90]],
>>> np.random.rand()
0.6563606029950465
>>> np.random.rand()
0.33598671990249174
>>> np.random.rand()
0.15195445013829945
>>> np.random.rand()
0.5626155619658889
>>> np.random.rand()
0.6960796589387932
>>> np.random.rand()
0.6192875505685667
>>> np.random.rand()
0.16912615729913438
np.random.rand(10)
>>> np.random.rand(10)
array([0.23175153, 0.23516775, 0.16853863, 0.4361167 , 0.43694742,
0.24545343, 0.974236 , 0.64757367, 0.0890843 , 0.3444159 ])
np.random.uniform(low=0.0,high=1.0,size=None)
>>> np.random.uniform(10,20,10)
203
array([19.80499139, 11.35811947, 11.31370507, 14.94415343, 17.25710869,
12.40993842, 16.25033344, 17.10067103, 10.18653984, 19.31369384])
np.random.randn(10)
a = np.random.normal(15,4,10)
>>> np.linspace(1,100,10)
array([ 1., 12., 23., 34., 45., 56., 67., 78., 89., 100.])
>>> np.linspace(0,1,15)
array([0. , 0.07142857, 0.14285714, 0.21428571, 0.28571429,
0.35714286, 0.42857143, 0.5 , 0.57142857, 0.64285714,
204
0.71428571, 0.78571429, 0.85714286, 0.92857143, 1. ])
Diagram-27: diagram_27
>>> a[2][3]
16
>>> a[2,3]
16
array([[2, 3, 4, 5]])
>>> a[0:1,1:5]
array([[2, 3, 4, 5]])
205
Q3. To get the following array:
array([2, 3, 4, 5])
>>> a[0,1:5]
array([2, 3, 4, 5])
>>> a[::5,:]
array([[ 1, 2, 3, 4, 5, 6],
[31, 32, 33, 34, 35, 36]])
>>> a[1:4,2:4]
array([[ 9, 10],
[15, 16],
[21, 22]])
206
>>> a[a%2==0]
array([ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34,
36])
>>> a[a%5==0]
array([ 5, 10, 15, 20, 25, 30, 35])
Q8. Create 1-D array with elements 8,17,26 and 35. We have to use elements of
a?
>>> a[[1,2,4,5],[1,4,1,4]]
array([ 8, 17, 26, 35])
>>> np.amin(a)
1
>>> np.min(a)
1
>>> a.min()
1
np.amax(a)
207
np.max(a)
a.max()
>>> np.amax(a)
36
>>> np.max(a)
36
>>> a.max()
36
>>> np.sum(a,axis=0)
array([ 96, 102, 108, 114, 120, 126])
>>> np.median(a)
18.5
208
Q15. Find variance of this array?
>>> np.var(a)
107.91666666666667
import numpy as np
a = np.array([])
print(a.shape)
A. (0,)
B. (1,)
C. (1,1)
D. 0
Ans: A
import numpy as np
a = np.arange(10,20,-1)
b = a.reshape(5,2)
print(b.flatten())
Ans: D
import numpy as np
a = np.arange(1,6)
a = a[::-2]
print(a)
A. [1 2 3 4 5]
B. [6 4 2]
C. [5 3 1]
D. [4 2]
Ans: C
a = np.array([3,3])
b = np.array([3,3.5])
c = np.array([3,'3'])
d = np.array([3,True])
210
A. int,float,str,int
B. int,int,str,bool
C. float,float,int,int
D. int,float,str,bool
E. ValueError while creating b,c,d
Ans: A
A. [3 2 3]
B. [2 3 2]
C. [3 2 3 4]
D. IndexErrror
Ans: A
A. [3 3 5]
B. [3 2 3 4 5]
C. [3 2 3 4]
D. IndexErrror
Ans: A
211
Q7. Consider the code:
a = np.array([1,2,3,2,3,4,5,2,3,4,1,2,3,6,7])
print(a[:3:3])
A. [1]
B. [1 2 3]
C. [1 3]
D. IndexErrror
Ans: A
A. [2 4 2]
B. [2 5 4 3]
C. []
D. IndexErrror
Ans: C
Q9. Consider the code:
a = np.array([1,2,3,2,3,4,5,2,3,4,1,2,3,6,7])
print(a[[1,2,4]])
A. [1 2 4]
212
B. [2 3 3]
C. [True True True]
D. IndexErrror
Ans: B
Q10. Consider the ndarray:
a = np.arange(20).reshape(5,4)
A. a[3][3]
B. a[-2][-1]
C. a[3][-1]
D. a[3,3]
E. All of these
Ans: E
A. True
B. False
Ans: B
213
Ans: B and C
A. 0
B. 1
C. -1
D. -2
E. None of these
Ans: E
A. np.array([10,20,30,40])
B. np.createArray([10,20,30,40])
C. np.makeArray([10,20,30,40])
D. np([10,20,30,40])
Ans: A
Q15. Which of the following are valid ways of finding the number of dimensions
of input array a?
A. np.ndim(a)
B. np.dim(a)
C. a.ndim()
D. a.ndim
Ans: A and D
A. print(a[0])
B. print(a[1])
C. print(a.0)
D. print(a.1)
Ans: A
A. np.dtype(a)
B. a.dtype()
C. a.dtype
D. All of these
Ans: C
A. a = np.array([10,20,30,40],dtype='f')
B. a = np.array([10,20,30,40],dtype='float')
C. a = np.array([10,20,30,40],dtype=float)
D. a = np.array([10,20,30,40])
Ans: A,B,C
A. If we perform any changes to the original array, then those changes will be
reflected to the VIEW.
215
B. If we perform any changes to the original array, then those changes won't be
reflected to the VIEW.
C. If we perform any changes to the original array, then those changes will be
reflected to the COPY.
D. If we perform any changes to the original array, then those changes won't be
reflected to the COPY.
Ans: A,D
Ans: A
(2,3,4)
a.size
A. find()
B. search()
C. where()
D. All of these
Ans: C
A. np.find(a == 10)
B. np.where(a == 10)
C. np.search(a == 10)
D. None of these
Ans: B
Q23. Which of the following code collects samples from uniform distribution of
1000 values in the interval [10,100)?
A. np.uniform(10,100,size=1000)
B. np.random.uniform(10,100,size=1000)
C. np.random.uniform(low=10,high=100,size=1000)
D. np.random.uniform(from=10,to=100,size=1000)
Ans: B,C
Q24. Which of the following code collects samples from normal distribution of
1000 values with the mean 10 and standard deviation 0.3?
A. np.normal(10,0.3,1000)
B. np.random.normal(10,0.3,1000)
C. np.random.normal(mean=10,std=0.3,size=1000)
217
D. np.random.normal(loc=10,scale=0.3,size=1000)
Ans: B and D
normal(loc=0.0, scale=1.0, size=None)
A. np.add(a,b)
B. np.sum(a,b)
C. np.append(a,b)
D. a+b
Ans: A,D
A. a-b
B. np.minus(a,b)
C. np.min(a,b)
D. np.subtract(a,b)
Ans: A,D
218
A. np.trunc(a)
B. np.fix(a)
C. np.around(a)
D. All of these
Ans: D
np.trunc(a):
-----------
Remove the digits after decimal point
>>> np.trunc(1.23456)
1.0
>>> np.trunc(1.99999)
1.0
np.fix(a):
---------
Round to nearest integer towards zero
>>> np.fix(1.234546)
1.0
>>> np.fix(1.99999999)
1.0
np.around(a):
-------------
It will perform round operation.
If the next digit is >=5 then remove that digit by incrementing previous digit.
If the next digit is <5 then remove that digit and we are not required to do
anything with the previous digit.
>>> np.around(1.23456)
219
1.0
>>> np.around(1.99999)
2.0
A. np.trunc(a)
B. np.fix(a)
C. np.around(a)
D. All of these
Ans: C
Q29. Which of the following are valid ways of creating a 2-D array?
A. np.array([[10,20,30],[40,50,60]])
B. np.array([10,20,30,40,50,60])
C. np.array([10,20,30,40,50,60],ndim=2)
D. np.array([10,20,30,40,50,60],ndmin=2)
Ans: A,D
a = np.array([10,20,30,40])
print(np.cumsum(a))
A. [10 20 30 40]
B. [10 30 60 100]
C. [100 100 100 100]
220
D. None of these
Ans: B
A. a[1:4]
B. a[1:5]
C. a[2:5]
D. a[2:4]
Ans: A
A. a[1:4]
B. a[1:5]
C. a[2:5]
D. a[2:4]
Ans: D
221
A. a[1:6:2]
B. a[1:5:2]
C. a[2:6:2]
D. a[2:7:2]
Ans: A
A. a[:7:2]
B. a[0::2]
C. a[0:7:2]
D. a[::2]
E. All of these
Ans: E
Which of the following is the valid way to convert array into int data type?
A. newarray = a.int()
B. newarray = a.asInt()
C. newarray = a.astype(int)
D. newarray = a.astype('int')
E. newarray = np.int32(a)
Ans: C,D,E
222
Q36. Consider the following array:
a = np.array([[10,20,30],[40,50,60]])
Which of the following is valid way to get element 50?
A. a[1][1]
B. a[-1][-2]
C. a[1][-2]
D. a[-1][1]
E. All the above
Ans: E
Ans: G
Ans: A,B,D
https://www.w3schools.com/python/numpy/numpy_quiz.asp
-------------------------------------------------------
t.me/durgasoftupdates
Numpy is Python based library which defines several functions to create and
manage arrays and to perform complex mathematical operations in Datascience
domain.
1. For Creation and manipulation of multi dimensional arrays, which is the most
commonly used data structure in the datascience domain.
2. For Mathematical operations includes trigonometric operations, statistical
operations and algebraic computations.
3. Solving Differential equations
Q3. Why is NumPy preferred to other programming tools such as Idl, Matlab,
Octave, Or Yorick?
225
3. Numpy acts as backbone for Data Science Libraries like pandas, scikit-learn
etc
Pandas internally used 'nd array' to store data, which is numpy data structure.
Scikit-learn internally used numpy's nd array.
4. Numpy has vectorization operations which can be performed at element
level.
5. It defines several easy to use functions for mathematical operations like
trigonometric operations, statistical operations and algebraic computations.
2 ways
1st way:
---------
By using Anaconda Distribution
Anaconda is python flavour for Data Science,ML etc.
Anaconda distribution has inbuilt numpy library and hence we are not
required to install.
2nd way:
-------
If Python is already installed in our system, then we can install numpy library as
follows
Q6. What are various similarities between NumPy Arrays and Python lists?
Q7. What are various differences between NumPy Arrays and Python lists?
1. List is inbuilt data type but numpy array is not inbuilt. To use numpy arrays,
we have to install and import numpy library explicitly.
227
3. On arrays we can perform vector operations(the operations which can be
operated on every element of the array). But we cannot perform vector
operations on list.
D:\durgaclasses>py test.py
The Size of Numpy Array: 168
The Size of List: 184
Q8. What are the advantages NumPy Arrays over Python lists?
1. Performance wise Arrays are Super Fast when compared with list.
2. Arrays consume less memory when compared with list.
3. On arrays we can perform vector operations(the operations which can be
operated on every element of the array). But we cannot perform vector
operations on list.
Q9. What are the advantages Python Lists over NumPy Arrays?
1. List is inbuilt data type but numpy array is not inbuilt. To use numpy arrays,
we have to install and import numpy library explicitly.
2. List can hold heterogeneous (Different types) elements.
eg: l = [10,10.5,True,'durga']
But array can hold only homogeneous elements.
eg: a = numpy.array([10,20,30])
Q10. How to create 1-D array and 2-D array from python lists?
1d_arrray = np.array([10,20,30,40])
2d_arrray = np.array([[10,20],[30,40]])
229
Q11. How to create a 3D array from Python Lists?
3d_arrray = np.array([[[10,20],[30,40]],[[50,60],[70,80]]])
a = np.array([[[10,20],[30,40]],[[50,60],[70,80]]])
print(a.shape) #(2,2,2)
Q14. How to count the number of times a given value appears in an array of
integers?
Q15. How to check whether array is empty or not i.e array contains zero number
of elements or not?
>>> a = np.array([10,20,30])
>>> a.size
230
3
>>> a = np.array([])
>>> a.size
0
Q16. How to find the indices of an array in NumPy where some condition is
true?
By using where() function.
>>> a = np.array([20,25,30,35,40])
>>> a
array([20, 25, 30, 35, 40])
>>> np.where(a%10==0)
(array([0, 2, 4], dtype=int64),)
1st way:
--------
a = np.arange(16).reshape(4,4)
a.flatten()[::-1].reshape(4,4)
2nd way:
--------
Numpy Library contains flip() function
a = np.arange(16).reshape(4,4)
>>> np.flip(a)
array([[15, 14, 13, 12],
231
[11, 10, 9, 8],
[ 7, 6, 5, 4],
[ 3, 2, 1, 0]])
Q18. Create a 10x10 array with random values and find the minimum and
maximum values
a = np.random.rand(10,10)
#a = np.random.random((10,10))
amin, amax = a.min(), a.max()
print(amin, amax)
Q19. Create a random vector of size 30 and find the mean value
a = np.random.random(30)
#a = np.random.rand(30)
b = a.mean()
print(b)
Q20. Create a 2d array of shape (10,10) with 1 on the border and 0 inside
a = np.ones((10,10),dtype=int)
#a[1:-1,1:-1] = 0
a[1:9,1:9] = 0
print(a)
o/p:
[[1 1 1 1 1 1 1 1 1 1]
[1 0 0 0 0 0 0 0 0 1]
[1 0 0 0 0 0 0 0 0 1]
[1 0 0 0 0 0 0 0 0 1]
232
[1 0 0 0 0 0 0 0 0 1]
[1 0 0 0 0 0 0 0 0 1]
[1 0 0 0 0 0 0 0 0 1]
[1 0 0 0 0 0 0 0 0 1]
[1 0 0 0 0 0 0 0 0 1]
[1 1 1 1 1 1 1 1 1 1]]
[[1 1 1 1 1 1 1 1 1 1]
[1 1 0 0 0 0 0 0 0 1]
[1 0 1 0 0 0 0 0 0 1]
[1 0 0 1 0 0 0 0 0 1]
[1 0 0 0 1 0 0 0 0 1]
[1 0 0 0 0 1 0 0 0 1]
[1 0 0 0 0 0 1 0 0 1]
[1 0 0 0 0 0 0 1 0 1]
[1 0 0 0 0 0 0 0 1 1]
[1 1 1 1 1 1 1 1 1 1]]
>>> a = np.ones((10,10),dtype=int)
>>> a[1:9,1:9] = 0
>>> a[[1,2,3,4,5,6,7,8],[1,2,3,4,5,6,7,8]]=1
>>> print(a)
a = np.ones((10,10),dtype=int)
a[1:9,1:9]=2
a[2:8,2:8]=3
a[3:7,3:7]=4
a[4:6,4:6]=5
o/p:
[0 7 9]
234
Write equivalent code without using ndenumerate() function?
Sol:
for index in np.ndindex(a.shape):
print(index, a[index])
o/p:
(0, 0) 0
(0, 1) 1
(0, 2) 2
(1, 0) 3
(1, 1) 4
(1, 2) 5
(2, 0) 6
(2, 1) 7
(2, 2) 8
(0, 0) 0
(0, 1) 1
(0, 2) 2
(1, 0) 3
(1, 1) 4
(1, 2) 5
(2, 0) 6
(2, 1) 7
(2, 2) 8
Q26. How to access multiple elements of array which are not in order?
235
Q27. Consider the array:
a = np.array([10,20,30,40,50,60,70,80,90])
How to access the elements of 10,30,80?
1st way:
>>> indexes = np.array([0,2,7])
>>> a[indexes]
array([10, 30, 80])
2nd way:
>>> l = [0,2,7]
>>> a[l]
array([10, 30, 80])
>>> a[[0,2,7]]
array([10, 30, 80])
Q28. What are various differences between Slicing and Advanced Indexing?
1. The elements should be ordered and we cannot select arbitrary elements.
1. The elements need not be ordered and we can select arbitrary elements.
2. Condition based selection not possible.
2. Condition based selection is possible.
3. In numpy slicing, we wont get a new object just we will get view of the
original object. If we perform any changes to the original copy, those changes
will be reflected to the sliced copy.
3. But in the case of advanced indexing, a new separate copy will be created.
If we perform any changes in one copy, then those changes won't be reflected in
other.
import numpy as np
a = np.array([[[10,20],[30,40]],[[40,50],[60,70]]])
print(a)
print('Elements one by one:')
for x in a: # x is 2-D array
for y in x: #y is 1-D array
for z in y: #z is scalar
print(z)
Q33. How to get required data type elements while iterating by using nditer()
function?
237
While iterating elements of nd array, we can specify our required type. For
this, we have to use op_dtypes argument.
Q34. Explain differences between normal for loop and nditer() function?
Syntax:
238
numpy.resize(a, new_shape)-->For extra elements repeated copies of a will be
reused.
a.resize(new_shape) which fills with zeros instead of repeated copies of 'a'.
1. It won't create new array object and just we will get view of existing array.
If we perform any changes in the reshaped copy, automatically those
changes will be reflected in original copy.
1. It will create new array object with required new shape.
If we perform any changes in the resized array, those changes won't be
reflected in original copy.
Syntax:
numpy.swapaxes(a, axis1, axis2)
Q44. which function we can use for searching the element of ndarray?
We can search elements of ndarray by using where() function.
240
In Linear algebra, multiplication can be represented by using dot(.). Hence
the name dot function.
eg: A.B
241
Q52. By using which functions we can perform splitting ndarray?
1. split()
1. vsplit()
2. hsplit()
3. dsplit()
5. array_split()
Numpy
Matplotlib
Pandas
Seaborn
plotly
....
Matplotlib:
------------
242
Numpy --->Data Analysis Library
Pandas--->Data Analysis Library/Visualization library
Matplotlib/Seaborn/Plotly --->Data Visualization Libraries
6. It has very large community support. Every data scientist used this library
atleast once in his life.
****Examples tab
Installing Matplotlib:
---------------------
There are 2 ways
D:\durgaclasses>pip list
D:\durgaclasses>pip freeze
Types of Plots:
---------------
There are multiple types are available to represent our data in graphical form.
The important are:
1. Line Plots
2. Bar charts
3. Pie charts
4. Histogram
5. Scatter plots
etc
Based on input data and requirement, we can choose the corresponding plot.
Note:
1. Matplotlib --->package/library
2. pyplot --->module name
3. pyplot module defines several functions to create plots
plot()
bar()
pie()
hist()
scatter()
etc
4. We can create plots in 2 approaches
245
1. Functional oriented approach (For small data sets)
2. Object oriented approach (For larger data sets)
11-07-2021
Line Plots:
-----------
We can mark data points from the input data and we can connect these data
points with lines. Such type of plots are called line plots.
We can use line plots to determine the relationship between two data sets.
Data set is a collection of values like ndarray,python's list etc
wickets = [1,2,3,4,5,6,7,8,9,10]
overs = [1,4,5,,,..20]
The values from each data set will be plotted along an axis.(x-axis,y-axis)
matplotlib.pyplot.plot()
246
plot(*args, scalex=True, scaley=True, data=None, **kwargs)
Plot y versus x as lines and/or markers.
Call signatures::
>>> plot(x, y) # plot x and y using default line style and color
>>> plot(x, y, 'bo') # plot x and y using blue circle markers
>>> plot(y) # plot y using x as index array 0..N-1
>>> plot(y, 'r+') # ditto, but with red plusses
plt.plot(x,y)
The data points will be considered from x and y values.
x=[10,20,30]
y=[1,2,3]
What is figure?
---------------
Figure is an individual window on the screen, in which matplotlib displays the
graphs. ie it is the container for the graphical output.
plt.xlabel('N value')
plt.ylabel('Square of N')
plt.xlabel('Overs')
plt.ylabel('Wickets')
plt.title('Fall of wickets')
248
plt.xlabel('Year')
plt.ylabel('Sales')
plt.title('Nokia Mobile Sales Report')
plt.plot(x,y)
plt.title()
plt.xlabel()
plt.ylabel()
plt.show()
Line properties:
-----------------
A line drawn on the graph has several properties like color,style,width of the
line,transparency etc. We can customize these based on our requirement.
1. Marker property:
-------------------
249
We can use marker property to highlight data points on the line plot.
We have to use marker keyword argument.
plt.plot(a,b,marker='o')
============= ===============================
character description
============= ===============================
``'.'`` point marker
``','`` pixel marker
``'o'`` circle marker
``'v'`` triangle_down marker
``'^'`` triangle_up marker
``'<'`` triangle_left marker
``'>'`` triangle_right marker
``'1'`` tri_down marker
``'2'`` tri_up marker
``'3'`` tri_left marker
``'4'`` tri_right marker
``'8'`` octagon marker
``'s'`` square marker
``'p'`` pentagon marker
``'P'`` plus (filled) marker
``'*'`` star marker
``'h'`` hexagon1 marker
``'H'`` hexagon2 marker
``'+'`` plus marker
``'x'`` x marker
250
``'X'`` x (filled) marker
``'D'`` diamond marker
``'d'`` thin_diamond marker
``'|'`` vline marker
``'_'`` hline marker
============= ===============================
2. Linestyle property:
---------------------
Linestyle specifies whether the line is solid or dashed or dotted etc
We can specify linestyle by using linestyle keyword argument.
- ----->solid line
-- ----->dashed line
: ----->dotted line
-. ----->dash-dotted line
============= ===============================
character description
============= ===============================
``'-'`` solid line style
251
``'--'`` dashed line style
``'-.'`` dash-dot line style
``':'`` dotted line style
============= ===============================
3. color property:
------------------
We can specify our required color for the line plot.
We have to use color keyword argument.
We can use any color even hexa code also.
Matplotlib defines some short codes for commonly used colors. We can use
short codes also.
============= ===============================
character color
============= ===============================
``'b'`` blue
``'g'`` green
``'r'`` red
``'c'`` cyan
``'m'`` magenta
``'y'`` yellow
``'k'`` black
252
``'w'`` white
============= ===============================
default color:
--------------
If we are not specifying color then default color will be selcted from style cycle.
We can check default colors as follows:
>>> plt.rcParams['axes.prop_cycle'].by_key()
{'color': ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b',
'#e377c2', '#7f7f7f', '#bcbd22', '#17becf']}
253
shortcut way to specify all 3 marker, linestyle,color properties:
-----------------------------------------------------------------
plt.plot(a,b,'o-r')--->valid
plt.plot(a,b,'o-red')--->invalid
plt.plot(a,b,'o-#1c203d')--->invalid
Bydefault this figure will be saved in the current working directory. But we can
provide any location based on our requirement.
a = np.array([10,20,30,40,50])
plt.plot(a) # 0 to 4 will be considered for x-axis.
Now the data points are: (0,10),(1,20),(2,30),(3,40),(4,50)
shortcut way:
-------------
We can also use single plot() function for all 3 lines.
Note:
plt.plot(x,i,'o-r',x,s,'o-b',x,c,'o-g',lw=10)
258
How to customize title properties:
-----------------------------------
title(label, fontdict=None, loc=None, pad=None, *, y=None, **kwargs)
Set a title for the Axes.
label : str
Text to use for the title
fontdict : dict
A dictionary controlling the appearance of the title text
https://matplotlib.org/stable/tutorials/text/text_props.html
Note: fontdict properties are same for title,xlabel and ylabel. These values can
be passed as keyword arguments also. In the case of conflict, keyword
arguments will get more priority.
case-1:
plt.grid()
In this case grid will be visible.
261
case-2:
plt.grid()
plt.grid()
case-3:
plt.grid()
plt.grid(color='g')
case-4:
plt.grid(b=True)
plt.grid(b=False)
which property:
---------------
major grid lines and minor grid lines
It decides which grid lines have to display whether major or minor
The allowed values:
>>> help(plt.minorticks_on)
Help on function minorticks_on in module matplotlib.pyplot:
262
minorticks_on()
Display minor ticks on the axes.
Displaying minor ticks may reduce performance; you may turn them off
using `minorticks_off()` if drawing speed is a problem.
axis property:
--------------
Along which axis, grid lines have to display
263
axis : {'both', 'x', 'y'},
default value: both
Adding Legend:
---------------
If multiple lines present then it is difficult to identify which line represents
which dataset/function.
Syntax:
264
legend(*args, **kwargs)
Call signatures::
legend()
legend(labels)
legend(handles, labels)
1. legend():
------------
entries will be added to the legend in the order of plots creation.
2. legend(labels)
------------------
The argument is list of strings.
Each string is considered as a lable for the plots, in the order they created.
plt.legend(['label-1','label-2','label-3'])
This approach is best suitable for adding legend for already existing plots.
265
import matplotlib.pyplot as plt
import numpy as np
a = np.arange(10)
plt.plot(a,a,marker='o')
plt.plot(a,a**2,marker='o')
plt.plot(a,a**3,marker='o')
plt.legend(['identity','square','cubic'])
plt.show()
Note: This approach is not recommended to use because we should aware the
order in which plots were created.
legend(handles, labels):
-----------------------
We can define explicitly lines and labels in the legend() function itself.
It is recommended approach as we have complete control.
plt.legend([line1,line2,line3],['label-1','label-2','label-3'])
observation:
l = [10]
a=l
print(a) #[10]
plt.plot(x,i,'o-r',x,s,'o-b',x,c,'o-g',lw=10)
266
For the first line: x,i,'o-r'
For the second line: x,s,'o-b'
For the third line: x,c,'o-g'
=============== =============
Location String Location Code
=============== =============
'best' 0
'upper right' 1
'upper left' 2
'lower left' 3
'lower right' 4
'right' 5
'center left' 6
267
'center right' 7
'lower center' 8
'upper center' 9
'center' 10
268
Adding title to the legend:
---------------------------
We can title for the legend explicitly. For this we have to use title keyword
argument.
Diagram: legend_title
Diagram: legend_loc1
269
Diagram: legend_loc2
Note:
271
1. Without providing tick values we cannot provide labels, otherwise we will get
error.
2. If we pass empty list to ticks then tick values will become invisible.
plt.yticks([])
for x-axis:
left
right
For y-axis:
bottom
top
272
xlim(*args, **kwargs)
Get or set the x limits of the current axes.
Call signatures::
If you do not specify args, you can pass *left* or *right* as kwargs,
i.e.::
ylim() function:
----------------
274
import matplotlib.pyplot as plt
import numpy as np
a = np.arange(1,101)
b = a**2
plt.plot(a,b,'o-r')
plt.grid()
plt.ylim(bottom=100)
print(plt.ylim())
plt.show()
1. Linear scaling
2. Logarithmic Scaling
1. Linear scaling:
------------------
The difference between any two consecutive points on the given axis is always
fixed, such type of scaling is called linear scaling.
Default scaling in matplotlib is linear scaling.
275
If the data set values are spreaded over small range, then linear scaling is the
best choice.
2. Logarithmic Scaling:
-----------------------
The difference between any two consecutive points on the given axis is not
fixed and it is multiples of 10, such type of scaling is called logarithmic scaling.
If the data set values are spreaded over big range, then logarithmic scaling is the
best choice.
plt.xticks()
plt.yticks()
plt.xlim()
plt.ylim()
plt.xscale()
plt.yscale()
xscale(value, **kwargs)
Set the x-axis scale.
value : {"linear", "log", "symlog", "logit", ...}
yscale(value, **kwargs)
Set the y-axis scale.
value : {"linear", "log", "symlog", "logit", ...}
277
import matplotlib.pyplot as plt
import numpy as np
a = np.arange(10000)
b = np.arange(10000)
plt.plot(a,b)
plt.grid()
plt.xscale('log',base=2) #logarithmic scaling
plt.yscale('log',base=9) #logarithmic scaling
plt.show()
plotting styles:
----------------
We can customize look and feel of hte plot by using style library.
There are multiple predefined styles are available...
plt.style.available
>>> plt.style.available
['Solarize_Light2', '_classic_test_patch', 'bmh', 'classic', 'dark_background',
'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn', 'seaborn-bright', 'seaborn-
colorblind', 'seaborn-dark', 'seaborn-dark-palette', 'seaborn-darkgrid', 'seaborn-
deep', 'seaborn-muted', 'seaborn-notebook', 'seaborn-paper', 'seaborn-pastel',
'seaborn-poster', 'seaborn-talk', 'seaborn-ticks', 'seaborn-white', 'seaborn-
whitegrid', 'tableau-colorblind10']
Note:
1. ggplot--->To emulate the most powerful ggplot style of R language.
278
2. seaborn-->To emulate seaborn style
3. fivethirtyeight--->The most commonly used style in real time.
etc
We can set our own customized style for the plot as follows:
plt.style.use('ggplot')
1. Procedural/Functional oriented
2. OOP
procedural:
----------
def f1():
print('f1 function')
def f2():
print('f2 function')
def f3():
279
print('f31 function')
def f4():
print('f4 function')
f1()
f2()
f3()
f4()
OOP approach:
-------------
class Test:
def m1(self):
print('m1 method')
def m2(self):
print('m2 method')
def m3(self):
print('m3 method')
def m4(self):
print('m4 method')
t = Test()
t.m1()
t.m2()
t.m3()
t.m4()
1. Procedural/Functional Approach:
-----------------------------------
We can create plots with the help of mulitple functions from pyplot module.
280
#Creation of line plot to represent square functionality from 1 to 10.
import matplotlib.pyplot as plt
import numpy as np
a = np.arange(1,11)
b = a**2
plt.plot(a,b)
plt.xlabel('N')
plt.ylabel('Square Value of N')
plt.title('Square Function')
plt.show()
plot()
xlabel()
ylable()
title()
show()
fig = plt.figure()
281
2. Creation of Axes object:
---------------------------
Once figure object is ready, then we have to add axes to that object. For this we
have to use add_axes() method of Figure class. This method returns Axes object.
Call signatures::
axes.plot(a,b)
axes.set_xlabel('xlabel')
axes.set_ylabel('ylabel')
axes.set_title('title')
plt.show()
282
#Creation of line plot to represent square functionality from 1 to 10.
import matplotlib.pyplot as plt
import numpy as np
a = np.arange(1,11)
b = a**2
fig = plt.figure()
axes = fig.add_axes([0.2,0.3,0.6,0.4]) #[left,bottom,width,height] lbwh
axes.plot(a,b)
axes.set_xlabel('N')
axes.set_ylabel('Square of N')
axes.set_title('Square Function')
axes.grid()
plt.show()
Note: We can use single set() method to set all axes properties like
title,xlabel,ylabel,xlim,ylim etc
283
axes.set(xlabel='N',ylabel='Square of N', title='Square
Function',xlim=(1,5),ylim=(1,25))
axes.grid()
plt.show()
Summary:
--------
1. Creation of Figure object
2. Creation of Axes object
3. plot the graph
4. set the properties of the axis.
plt.bar()
Syntax:
-------
bar(x, height, width=0.8, bottom=None, *, align='center', data=None, **kwargs)
Make a bar plot.
To align the bars on the right edge pass a negative *width* and
``align='edge'``.
eg-1: Represent the number of movies of each hero by using bar chart
--------------------------------------------------------------------
import matplotlib.pyplot as plt
plt.xlabel('Hero Name',color='b',fontsize=15)
plt.ylabel('Number of Movies',color='b',fontsize=15)
plt.title('Hero wise number of movies',color='r',fontsize=15)
plt.show()
Observations:
-------------
1. plt.bar(heroes,movies,color='r')
Now all bars with RED color
7. alignment: center
for left alignment:
plt.bar(heroes,movies,align='edge')
years = [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
sales = [10000, 25000, 45000, 30000, 10000, 5000,70000,60000,65000,50000]
c = ['r','k','y','g','orange','m','c','b','lime','violet']
plt.bar(years,sales,color=c)
plt.xlabel('Year',color='b',fontsize=15)
plt.ylabel('Number of Sales',color='b',fontsize=15)
plt.title('Nokia Mobile Sales in the last Decade',color='r',fontsize=15)
plt.xticks(years,rotation=30)
plt.tight_layout()
plt.grid(axis='y')
plt.show()
1. pyplot.text()
2. pyplot.annotate()
1. pyplot.text():
----------------
Syntax:
text(x, y, s, fontdict=None, **kwargs)
Add text to the Axes.
Add the text *s* to the Axes at location *x*, *y* in data coordinates.
a = np.arange(10)
plt.plot(a,a,'o-r')
for i in range(a.size): # 0 to 9
plt.text(a[i]+0.4,a[i]-0.2,f'({a[i]},{a[i]})',color='b')
plt.show()
2. pyplot.annotate():
---------------------
annotate(text, xy, *args, **kwargs)
Annotate the point *xy* with text *text*.
xy : (float, float)
The point *(x, y)* to annotate.
288
import matplotlib.pyplot as plt
import numpy as np
a = np.arange(10)
plt.plot(a,a,'o-r')
for i in range(a.size): # 0 to 9
#plt.text(a[i]+0.4,a[i]-0.2,f'({a[i]},{a[i]})',color='b')
plt.annotate(f'({a[i]},{a[i]})',(a[i]+0.4,a[i]-0.2),color='g')
plt.show()
years = [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
sales = [10000, 25000, 45000, 30000, 10000, 5000,70000,60000,65000,50000]
plt.bar(years,sales,color='r')
plt.xlabel('Year',color='b',fontsize=15)
plt.ylabel('Number of Sales',color='b',fontsize=15)
plt.title('Nokia Mobile Sales in the last Decade',color='r',fontsize=15)
plt.xticks(years,rotation=30)
plt.tight_layout()
for i in range(len(years)): # 0 to 9
plt.text(years[i],sales[i]+500,sales[i],ha='center',color='b')
plt.show()
years = [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
sales = [10000, 25000, 45000, 30000, 10000, 5000,70000,60000,65000,50000]
plt.bar(years,sales,color='r')
plt.xlabel('Year',color='b',fontsize=15)
plt.ylabel('Number of Sales',color='b',fontsize=15)
plt.title('Nokia Mobile Sales in the last Decade',color='r',fontsize=15)
plt.xticks(years,rotation=30)
plt.tight_layout()
for i in range(len(years)): # 0 to 9
plt.text(years[i],sales[i]+500,str(sales[i]//1000)+'k',ha='center',color='b')
#plt.annotate(str(sales[i]//1000)+'k',(years[i],sales[i]+500),ha='center',col
or='g',backgroundcolor='yellow')
plt.show()
color =
['#f54287','#f542ec','#bc42f5','#427ef5','#42d7f5','#4287f5','#f56f42','#f2f542','#
5df542','#42f5b6']
--------------------------------------------------------
Plotting bar chart with data from csv file:
-------------------------------------------
Assume that data is available in students.csv file, which is present in current
working directory.
290
names = np.array([],dtype='str')
marks = np.array([],dtype='int')
f = open('students.csv','r')
r = csv.reader(f) # Returns csvreader object
h = next(r) #to read header and ignore
for row in r:
names = np.append(names,row[0])
marks = np.append(marks,int(row[1]))
plt.bar(names,marks,color='r')
plt.show()
Note:
If the labels are too long or too many values to represent then we should go for
horizontal bar chart instead of vertical bar chart.
291
barh() --->To create horizontal bar chart.
Syntax:
barh(y, width, height=0.8, left=None, *, align='center', **kwargs)
Make a horizontal bar plot.
vertical vs horizontal
------------------------
height ----->width
width ---> height
bottom -->left
bar() -->barh()
students.csv:
------------
Name of Student Marks
Sunny100
Bunny 200
Chinny 300
Vinny 200
Pinny 400
Zinny 300
Kinny 500
Minny 600
Dinny 400
Ginny 700
Sachin 300
Dravid 900
Kohli 1000
Rahul 800
292
Ameer 600
Sharukh 500
Salman 700
Ranveer 600
Katrtina 300
Kareena 400
demo program:
--------------
import matplotlib.pyplot as plt
import numpy as np
import csv
names = np.array([],dtype='str')
marks = np.array([],dtype='int')
f = open('students.csv','r')
r = csv.reader(f) # Returns csvreader object
h = next(r) #to read header and ignore
for row in r:
names = np.append(names,row[0])
marks = np.append(marks,int(row[1]))
plt.barh(names,marks,color='r')
plt.xlabel('Marks',fontsize=15,color='b')
plt.ylabel('Name of Student',fontsize=15,color='b')
plt.title('Students Marks Report',fontsize=15,color='r')
plt.tight_layout()
plt.show()
293
Vertical bar chart
Horizontal bar chart
names = ['Sunny','Bunny','Chinny','Vinny','Tinny']
english_marks = [90,80,85,25,50]
maths_marks = [25,23,45,32,50]
294
plt.bar(names,english_marks,color='r')
plt.bar(names, maths_marks, bottom=english_marks, color='green')
plt.show()
plt.bar(names,english_marks,color='#09695c',label='English')
plt.bar(names,math_marks,bottom=english_marks,color='#9c0c8b',label="Math
s")
for i in range(len(names)):
plt.text(names[i],(english_marks[i]/2),str(english_marks[i]),ha='center',co
lor='white',weight=1000)
plt.text(names[i],(english_marks[i]+math_marks[i]/2),str(math_marks[i]),
ha='center',color='white',weight=1000)
plt.text(names[i],(total_marks[i]+2),
str(total_marks[i]),ha='center',color='#008080',weight=1000)
295
Diagram: stacked_bar_with_text_labels
bar()--->barh()
bottom--->left
xlabel and ylabels are interchanged.
297
eg-2: Country wise medals we have to represent.But in that total number of
medals, we have to represent gold,silver and bronze medals separately side by
side.
We can create clustered bar chart by using either bar() or barh() functions.
Demo program:
-------------
import matplotlib.pyplot as plt
import numpy as np
names = ['Sunny','Bunny','Chinny','Vinny','Tinny']
english_marks = np.array([90,80,85,25,50])
math_marks = np.array([25,23,45,32,25])
xpos = np.arange(len(names)) #[0,1,2,3,4]
w = 0.3
plt.bar(xpos,english_marks,color='r',width=w)
plt.bar(xpos+w,math_marks,color='g',width=w)
#plt.xticks(xpos+0.15,names)
plt.xticks(xpos+w/2,names)
plt.legend(['eng','math'])
plt.show()
Demo program-2:
---------------
import matplotlib.pyplot as plt
import numpy as np
country_name = ['India','China','US','UK']
gold_medals = np.array([60,40,50,20])
silver_medals = np.array([50,30,25,43])
bronze_medals = np.array([55,24,45,6])
xpos = np.arange(len(country_name)) #[0,1,2,3]
w = 0.2
298
plt.bar(xpos,gold_medals,color='#FFD700',width=w)
plt.bar(xpos+w,silver_medals,color='#C0C0C0',width=w)
plt.bar(xpos+2*w,bronze_medals,color='#CD7F32',width=w)
plt.xticks(xpos+w,country_name)
plt.ylabel('Country Name',color='b',fontsize=15)
plt.xlabel('Number of Medals',color='b',fontsize=15)
plt.title('Country Wise Medals Report',color='r',fontsize=15)
plt.legend(['gold','silver','bronze'])
for i in range(len(country_name)):
plt.text(xpos[i],gold_medals[i]+1,gold_medals[i],ha='center',color='r',wei
ght=1000)
plt.text(xpos[i]+w,silver_medals[i]+1,silver_medals[i],ha='center',color='r'
,weight=1000)
plt.text(xpos[i]+2*w,bronze_medals[i]+1,bronze_medals[i],ha='center',col
or='r',weight=1000)
plt.show()
eg-2A: India and Australia 20-20 overwise scores required to represent by using
clustered bar chart?
300