Value Added Course: Programming in Python and Machine Learning UNIT-2
Value Added Course: Programming in Python and Machine Learning UNIT-2
Ex: Ex:
import numpy as np
a = np.array([[1,2,3],[4,5,6]])
a=np.array([[1,2,3],[4,5,6]])
b = a.reshape(3,2)
a.shape=(3,2)
print(b)
print(a)
Output: Output:
[[1 2] [[1 2]
[3 4] [3 4]
[5 6]] [5 6]]
ndim: returns the number of array dimensions
Output:
Ex: [ 2 3 4 5 5 6 7 8 1 12 23 34 56
import numpy as np 67 45 33 55 66 77 15 17 19 20 0]
a = np.array([2,3,4,5,5,6,7,8,1,12,23,34,56, 1
67,45,33,55,66,77,15,17,19,20,0]) [[[ 2 3 4]
print(a) [ 5 5 6]
[ 7 8 1]
print(a.ndim) [12 23 34]]
b = a.reshape(2,4,3)
print(b) [[56 67 45]
print(b.ndim) [33 55 66]
[77 15 17]
[19 20 0]]]
3
NUMPY − ARRAY CREATION
• Empty:It creates an uninitialized array of specified shape and
dtype
– Syntax: numpy.empty(shape, dtype=float, order='C')
• Zeros:Returns a new array of specified size, filled with zeros.
– Syntax: numpy.zeros(shape, dtype=float, order='C')
• Ones: Returns a new array of specified size and type, filled
with ones.
– Syntax: numpy.ones(shape, dtype=None, order='C')
• Full: Returns Constant array
• Eye: Returns identity matrix
• Random (or) rand : generates random numbers
• Arrange: returns evenly spaced elements
– Syntax: numpy.arrange(start,stop,step, dtype)(by default
step is 1)
Ex: Output:
[[ 2 3]
import numpy as np
[ 4 19]
a = np.empty([3,2], dtype=int) [20 0]]
Print(a) [0 0 0 0 0]
b = np.zeros((5,), dtype=np.int)
[[1 1]
Print(b) [1 1]]
Ex:
Output:
import numpy as np
x = np.array([[ 0, 1, 2],[ 3, 4, 5],[ 6,
Our array is:
7, 8],[ 9, 10, 11]]) [[ 0 1 2]
print('Our array is:' ) [ 3 4 5]
print(x ) [ 6 7 8]
print('\n') [ 9 10 11]]
# print the items greater than 5 The items greater than 5 are:
print('The items greater than 5 are:')
[ 6 7 8 9 10 11]
print(x[x > 5])
Broadcasting
• The term broadcasting refers to the ability of NumPy to treat arrays of
different shapes during arithmetic operations.
• Arithmetic operations on arrays are usually done on corresponding
elements.
• If two arrays are of exactly the same shape, then these operations are
smoothly performed.
• Frequently we have a smaller array and a larger array, and we want to use
the smaller array multiple times to perform some operation on the larger
array.
Ex:
import numpy as np
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = x + v # Add v to each row of x using broadcasting
print(y)
Output:
[[ 2 2 4]
[ 5 5 7]
[ 8 8 10]
[11 11 13]]
Python-Pandas
Pandas-Introduction
Pandas is an open-source Python Library providing high-
performance data manipulation and analysis tool using its powerful
data structures. The name Pandas is derived from the word Panel
Data
Key Features of Pandas
• Fast and efficient DataFrame object with default and customized
indexing.
• Tools for loading data into in-memory data objects from different
file formats.
• Data alignment and integrated handling of missing data.
• Reshaping and pivoting of date sets.
• Label-based slicing, indexing and subsetting of large data sets.
• Columns from a data structure can be deleted or inserted.
Pandas deals with the following three data structures −
• Series
• DataFrame
• Panel
Pandas-Series
Series is a one-dimensional array like structure with
homogeneous data.
• A pandas Series can be created using the following constructor
Syntax: pandas.Series( data, index, dtype, copy)
The parameters of the constructor are as follows −
Sr.No Parameter & Description
Example-1 Example-2
import pandas as pd import pandas as pd
import numpy as np import numpy as np
data = np.array(['a','b','c','d'])
data = np.array(['a','b','c','d']) s=
s = pd.Series(data) pd.Series(data,index=[100,101,102,1
print(s) 03])
Print(s)
Output:−
output :-
0 a 100 a
1 b 101 b
2 c 102 c
3 D 103 d
dtype: object dtype: object
Create a Series from dict
Example-1 Example-2
import pandas as pd import pandas as pd
import numpy as np import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.} data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data) s=
Print(s) pd.Series(data,index=['b','c','d','
Output:- a'])
a 0.0 Print(s)
b 1.0 Output:-
c 2.0 b 1.0
dtype: float64 c 2.0
d NaN(Not a Number)
a 0.0
dtype: float64
Accessing Data from Series with
Position
Example-1
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
#retrieve the first three element
Print(s[:3])
Output:-
a 1
b 2
c 3
dtype: int64
Pandas - DataFrame
• A Data frame is a two-dimensional data structure, i.e., data is aligned in
a tabular fashion in rows and columns.
Features of DataFrame
• Potentially columns are of different types
• Size – Mutable
• Labeled axes (rows and columns)
• Can Perform Arithmetic operations on rows and columns
A pandas DataFrame can be created using the following constructor −
Syntax: pandas.DataFrame( data, index, columns, dtype, copy)
Sr.No Parameter & Description
1 data: data takes various forms like ndarray, series, map, lists, dict,
constants and also another DataFrame.
2 index: For the row labels, the Index to be used for the resulting
frame is Optional Default np.arrange(n) if no index is passed.
3 columns: For column labels, the optional default syntax is -
np.arrange(n). This is only true if no index is passed.
4 dtype:Data type of each column.
5 Copy:This command (or whatever it is) is used for copying of data, if
the default is False.
DataFrame-Creation
A pandas DataFrame can be created using various inputs like −
1) Lists 2)dict 3)Series 4)Numpy ndarrays
Create a DataFrame from Lists Create a DataFrame from Dict
import pandas as pd import pandas as pd
data = data = {'Name':['Tom', 'Jack', 'Steve',
[['Alex',10],['Bob',12],['Clarke',13]] 'Ricky'], 'Age':[28,34,29,42]}
df = df = pd.DataFrame(data,
pd.DataFrame(data,columns=['Name',' index=['rank1','rank2','rank3','rank4'])
Age']) Print(df)
print(df) Output :-
Output:- Name Age Age Name
0 Alex 10 rank1 28 Tom
1 Bob 12 rank2 34 Jack
2 Clarke 13 rank3 29 Steve
rank4 42 Ricky
•Create a DataFrame from Series • Create a DataFrame from
import pandas as pd Numpy- ndarray
d = {'one' : pd.Series([1, 2, 3],
index=['a', 'b', 'c']), import numpy as np
'two' : pd.Series([1, 2, 3, 4], import pandas as pd
index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d) a=np.array([[1,2,3],[4,5,6]]
print(df) )
df = pd.DataFrame(a)
Output :-
one two Print(df)
a 1.0 1 Output :-
b 2.0 2
c 3.0 3 0 1 2
d NaN 4 0 1 2 3
1 4 5 6
Row Selection, Addition, and Deletion
Row Selection:
Selection by Label: Rows can be selected by passing
row label to a loc function
Example:-
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 3, 2, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print(df.loc['b'])
Output:-
one 2.0
two 3.0
Name: b, dtype: float64
Selection by integer Location: Slice Rows: Multiple rows
Rows can be selected by can be selected using ‘ : ’
passing integer location to operator.
an iloc function.
Example:- Example:-
import pandas as pd import pandas as pd
d = {'one' : pd.Series([1, 2, 3], d = {'one' : pd.Series([1, 2, 3],
index=['a', 'b', 'c']), index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], 'two' : pd.Series([1, 2, 3, 4],
index=['a', 'b', 'c', 'd'])} index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d) df = pd.DataFrame(d)
Print(df.iloc[2]) print(df[2:4])
Output:- Output:-
one 3.0 one two
two 3.0 c 3.0 3
Name: c, dtype: float64 d NaN 4
Addition of Rows: Add new Deletion of Rows: Use index label
rows to a DataFrame using to delete or drop rows from a
the append function. This DataFrame. If label is duplicated,
function will append the rows at then multiple rows will be
the end. dropped.
Example:- Example:-
import pandas as pd import pandas as pd
df = pd.DataFrame([[1, 2], [3, df1 = pd.DataFrame([[1, 2], [3,
4]], columns = ['a','b']) 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7,
8]], columns = ['a','b']) df = df1.drop(0)
df = df.append(df2) print(df)
print(df) Output:-
Output:- a b
a b 1 3 4
0 1 2
1 3 4
0 5 6
1 7 8
Python Pandas - Panel
A panel is a 3D container of data. The term Panel data is
derived from econometrics and is partially responsible for the
name pandas − pan(el)-da(ta)-s.
The names for the 3 axes are intended to give some semantic
meaning to describing operations involving panel data. They
are −
• items − axis 0, each item corresponds to a DataFrame
contained inside.
• major_axis − axis 1, it is the index (rows) of each of the
DataFrames.
• minor_axis − axis 2, it is the columns of each of the
DataFrames.
A Panel can be created using the following constructor −
Syntax:
pandas.Panel(data, items, major_axis, minor_axis, dtype,
copy)
Parameter Description
The parameters of the data Data takes various forms like ndarray, series,
constructor are as follows − map, lists, dict, constants and also another
DataFrame
items axis=0
major_axis axis=1
minor_axis axis=2
Create Panel
Example:- Output:-
import pandas as pd <class 'pandas.core.panel.Panel'>
import numpy as np Dimensions: 2 (items) x 4 (major_axis) x 5
data = (minor_axis)
np.random.rand(2,4,5) Items axis: 0 to 1
p = pd.Panel(data) Major_axis axis: 0 to 3
print(p) Minor_axis axis: 0 to 4
Selecting the Data from Panel
Select the data from the panel using − 1) Items 2)Major_axis
3)Minor_axis
Using Items:-
Example:-
import pandas as pd
import numpy as np
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)
Print(p['Item1'])
Output:-
0 1 2
0 0.488224 -0.128637 0.930817
1 0.417497 0.896681 0.576657
2 -2.775266 0.571668 0.290082
3 -0.400538 -0.144234 1.110535
Using major_axis Using minor_axis
Example:- Example:-
import pandas as pd import pandas as pd
import numpy as np import numpy as np
data = {'Item1' : data = {'Item1' :
pd.DataFrame(np.random.randn(4, pd.DataFrame(np.random.randn(4,
3)), 3)),
'Item2' : 'Item2' :
pd.DataFrame(np.random.randn(4, pd.DataFrame(np.random.randn(4,
2))} 2))}
p = pd.Panel(data) p = pd.Panel(data)
Print(p.major_xs(1)) print(p.minor_xs(1))
Output:- Output:-
Item1 Item2 Item1 Item2
0 0.417497 0.748412 0 -1.194259 -0.171606
1 0.896681 -0.557322 1 0.949656 0.585843
2 0.576657 NaN 2 1.074569 1.115871
3 0.821483 1.831133
Data Analysis using Pandas
•We might have our data in .csv files , SQL tables, Excel files
Or .tsv files.
•We want to analyze that data using pandas.
•The first step will be to read it into a data structure that’s
compatible with pandas.
How to create a csv file:-
my_dict = { 'name' : ["a", "b", "c", "d", "e","f", "g"],
'age' : [20,27, 35, 55, 18, 21, 35],
'designation': ["VP", "CEO", "CFO", "VP", "VP",
"CEO", "MD"]}
df = pd.DataFrame(my_dict)
df.to_csv('csv_example',index=False)
Now we have the CSV file which contains the data present
in the DataFrame above.
Data Analysis using Pandas
To Load the data into dataframe using read_csv ,the syntax is:-
pandas.read_csv(filepath_or_buffer,sep=', ',`names=None`,`index_col=None`,`ski
pinitialspace=False`)
• filepath_or_buffer: Path or URL with the data
• sep=', ': Define the delimiter to use
• `names=None`: Name the columns. If the dataset has ten columns, you need
to pass ten names
• `index_col=None`: If yes, the first column is used as a row index
• `skipinitialspace=False`: Skip spaces after delimiter.
Example:- Output:-
#Load the CSV file and create a new DataFrame name age designation
df_csv = pd.read_csv('csv_example') 0 a 20 VP
Print(df_csv) 1 b 27 CEO
2 c 35 CFO
3 d 55 VP
4 e 18 VP
5 f 21 CEO
6 g 35 MD
Data Analysis using Pandas
•To print the top 5 lines:- • To print the random few lines:-
print(df_csv.head()) print(df_csv.sample(3))
Output:- Output:-
name age designation name age designation
0 a 20 VP 3 d 55 VP
1 b 27 CEO 0 a 20 VP
2 c 35 CFO 1 b 27 CEO
3 d 55 VP • To print the specific columns:-
4 e 18 VP print(df_csv[['name','age']])
• To print the bottom 5 lines:- Output:-
print(df_csv.tail()) name age
Output:- 0 a 20
name age designation 1 b 27
2 c 35 CFO 2 c 35
3 d 55 VP 3 d 55
4 e 18 VP 4 e 18
5 f 21 CEO 5 f 21
6 g 35 MD 6 g 35