Python Pandas
Python Pandas
Neelmatha Lucknow
Python Pandas
What is Pandas?
Pandas is a powerful Python library that is specifically designed to work on
data frames that have "relational" or "labeled" data. Its aim aligns with doing
real-world data analysis using Python. Its flexibility and functionality make it
indispensable for various data-related tasks. Hence, this Python package
works well for data manipulation, operating a dataset, exploring a data
frame, data analysis, and machine learning-related tasks.
Generally, Pandas operates a data frame using Series and DataFrame; where
Series works on a one-dimensional labeled array holding data of any type
like integers, strings, and objects, while a DataFrame is a two-dimensional
data structure that manages and operates data in tabular form (using rows
and columns).
Why Pandas?
The beauty of Pandas is that it simplifies the task related to data frames and
makes it simple to do many of the time-consuming, repetitive tasks involved
in working with data frames, such as:
Applications of Pandas
The most common applications of Pandas are as follows:
−
library provides two primary data structures for handling and analyzing data
• Series
• DataFrame
Series
A Series is a one-dimensional labeled array that can hold any data type. It
can store integers, strings, floating-point numbers, etc. Each value in a
Series is associated with a label (index), which can be an integer or a string.
Name Steve
Age 35
Gender Male
Rating 3.5
Example
import pandas as pd
data = ['Steve', '35', 'Male', '3.5']
series = pd.Series(data, index=['Name', 'Age', 'Gender', 'Rating'])
print(series)
On executing the above program, you will get the following output −
Name Steve
Age 35
Gender Male
Rating 3.5
dtype: object
Key Points
• Homogeneous data
• Size Immutable
DataFrame
Example
Open Compiler
import pandas as pd
data = {
df = pd.DataFrame(data)
print(df)
Output
On executing the above code you will get the following output −
• Heterogeneous data
• Size Mutable
• Data Mutable
Head()
Tail()
Info()
Describe()
Isnull()
Isnull().sum()
Data.duplicated()
import pandas as pd
data=pd.read_excel("salary.xlsx")
print(data.duplicated())
Data[“emp_id”].duplicated()
import pandas as pd
data=pd.read_excel("salary.xlsx")
print(data["Emp_ID"].duplicated())
Data[“emp_id”].duplicated().sum()
import pandas as pd
data=pd.read_excel("salary.xlsx")
print(data["Emp_ID"].duplicated().sum())
Data.drop_duplicates(“emp_id”)
import pandas as pd
data=pd.read_excel("salary.xlsx")
print(data.drop_duplicates("Emp_ID"))
Data.isnull
To print null values
import pandas as pd
data=pd.read_excel("salary.xlsx")
print(data.isnull())
data.isnull().sum())
to count null values
import pandas as pd
data=pd.read_excel("salary.xlsx")
print(data.isnull().sum())
data.dropna()
To delete null values
import pandas as pd
data=pd.read_excel("salary.xlsx")
print(data)
print("\n\n\n")
print(data.dropna())
data.replace(np.nan,"hii")
to replace nan
import numpy as np
import pandas as pd
data=pd.read_excel("salary.xlsx")
print(data)
data.replace(np.nan,"hii")
data["Salary"]=data["Salary"].replace(np.nan,30000)
to replace any special char
import pandas as pd
import numpy as np
data=pd.read_excel("salary.xlsx")
data["Salary"]=data["Salary"].replace(np.nan,30000)
print(data)
data["Salary"].mean()
import pandas as pd
import numpy as np
data=pd.read_excel("salary.xlsx")
print(data["Salary"].mean())
data.fillna(method="bfill")
import pandas as pd
import numpy as np
data=pd.read_excel("salary.xlsx")
print(data)
print("\n\n\n")
print(data.fillna(method="bfill"))
data.fillna(method="ffill")
import pandas as pd
import numpy as np
data=pd.read_excel("salary.xlsx")
print(data)
print("\n\n\n")
print(data.fillna(method="ffill"))
Column transformation in Pandas
import pandas as pd
data=pd.read_excel("salary.xlsx")
print(data,"\n\n")
print(data)
import pandas as pd
data=pd.read_excel("salary.xlsx")
print(data,"\n\n")
print(data)
import pandas as pd
data=pd.read_excel("salary.xlsx")
print(data,"\n\n")
data["Bonus"]=(data["Salary"]/100)*20
print(data)
import pandas as pd
data={"Month":["January","Fabruary","March","April"]}
a=pd.DataFrame(data)
print(a)
def extract(value):
return value[0:3]
a["Short_Months"]=a["Month"].map(extract)
print(a)
GroupBy In Pandas
Count gender by Deparment
import pandas as pd
data=pd.read_excel("Salary.xlsx")
print(data)
gp=data.groupby("Department").agg({"Gender":"count"})
print(gp)
By Job Title count Emp_id
import pandas as pd
data=pd.read_excel("Salary.xlsx")
print(data)
gp=data.groupby("Job Title").agg({"Emp_ID":"count"})
print(gp)
By Gender
import pandas as pd
data=pd.read_excel("Salary.xlsx")
print(data)
gp=data.groupby(["Department","Gender"]).agg({"Emp_ID":"count"})
print(gp)
By Age
import pandas as pd
data=pd.read_excel("Salary.xlsx")
print(data)
print("\n\n\n")
a=data.groupby("Countries").agg({"Age":"max"})
print(a)
By Age and Gender
import pandas as pd
data=pd.read_excel("Salary.xlsx")
print(data)
print("\n\n\n")
a=data.groupby(["Countries","Gender"]).agg({"Age":"max"})
print(a)
Concatenate
import pandas as pd
data1={"EEID":["A01","A02","A03","A04","A05","A06"],
"Name":["Amit","priya","Neha","Lovely","Karan","Mohit"]}
data2={"EEID":["A07","A08","A09","A010","A11","A12"],
"Name":["Atin","Pankaj","Alia","Suman","Sanjay","Karan"]}
df1=pd.DataFrame(data1)
df2=pd.DataFrame(data2)
print(df1)
print(df2)
print()
ndf=pd.concat([df1,df2])
print(ndf)
Join
import pandas as pd
data1={"EEI":["A01","A02","A03","A04","A05","A06"],
"Name":["Amit","priya","Neha","Lovely","Karab","Mohit"]}
print(data1)
data2={"EEID":["A09","A02","A03","A010","A05","A06"],
"Salary":[45000,47000,30000,14200,42300,456600]}
print(data2)
print("\n\n\n")
df1=pd.DataFrame(data1)
df2=pd.DataFrame(data2)
print(df1)
print()
print(df2)
print()
print(df1.join(df2))