Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Pandas,Numpy,Matplotlib

Pandas is an open-source library in Python for data manipulation and analysis, featuring data structures like Series and DataFrame. It provides functionalities for creating DataFrames, selecting rows and columns, handling missing data, and reading CSV files. Additionally, the document covers basic usage of Numpy for array processing and Matplotlib for data visualization through various plot types.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Pandas,Numpy,Matplotlib

Pandas is an open-source library in Python for data manipulation and analysis, featuring data structures like Series and DataFrame. It provides functionalities for creating DataFrames, selecting rows and columns, handling missing data, and reading CSV files. Additionally, the document covers basic usage of Numpy for array processing and Matplotlib for data visualization through various plot types.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Python Pandas

Pandas is an open-source library built on top of the Python programming


language, primarily designed for data manipulation and analysis. It offers data
structures like Series and DataFrame that allow users to work with structured
data seamlessly. Here's a more detailed look at its core components and
functionalities:
● A Series is a one-dimensional labeled array that can store data of various
types, such as integers, floats, strings, or even Python objects. Each item in
a Series corresponds to a labeled index, allowing for both positional and
label-based indexing.
● A Data frame is a two-dimensional data structure, i.e., data is aligned in a
tabular fashion in rows and columns. Pandas DataFrame consists of three
principal components, the data, rows, and columns.
Creating a Pandas DataFrame

# import pandas as pd
import pandas as pd
# list of strings
list1 = ['hey', 'rithu', 'how', 'people','happy','for']
# Calling DataFrame constructor on list
df = pd.DataFrame(list1)
print(df)

Dealing with Rows and Columns

⇒ Column Selection: In Order to select a column in Pandas DataFrame, we can


either access the columns by calling them by their columns name.

# Import pandas package


import pandas as pd
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
# select two columns
print(df[['Name', 'Qualification']])

⇒ Row Selection: Pandas provide a unique method to retrieve rows from a


Data frame. DataFrame.loc[] method is used to retrieve rows from Pandas
DataFrame. Rows can also be selected by passing integer location to an iloc[]
function.
Example 1:
#row selection using loc function
import pandas as pd
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
# select row using loc
print(df.loc[df['Name'] == 'Jai'])

Example 2:
#row selection using iloc function
import pandas as pd
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
# select row using iloc
print(df.iloc[0])
⇒ CSV file of Students named “students.csv”

Name,Age,Grade,City,Subject,Score
John,16,10,New York,Math,85
Alice,15,9,Los Angeles,English,92
Bob,17,11,Chicago,Science,88
Charlie,16,10,Houston,History,90
David,15,9,Phoenix,Math,75
Eva,17,11,Philadelphia,English,95
Frank,16,10,San Antonio,Science,80
Grace,15,9,San Diego,History,85
Henry,17,11,Dallas,Math,78
Isabella,16,10,San Jose,English,89

⇒ Indexing and Selecting Data

● To display the dataframe

import pandas as pd
# Load the CSV file into a DataFrame
df = pd.read_csv('students.csv')
# Display the DataFrame
print("DataFrame:")
print(df)

● Selecting a single columns

import pandas as pd
df = pd.read_csv('students.csv')
name_column = df['Name']
print("\nName Column:")
print(name_column)
● Selecting a single row

import pandas as pd
df = pd.read_csv('students.csv')
john_row = df.loc[df['Name'] == 'John']
print("\nRow where Name is 'John':")
print(john_row)

Working with Missing Data

Missing Data can occur when no information is provided for one or more items or
for a whole unit. Missing Data is a very big problem in real-life scenarios. Missing
Data can also refer to as NA(Not Available) values in pandas.

Checking for missing values using isnull() and notnull() :

# importing pandas as pd
import pandas as pd
# importing numpy as np
import numpy as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
# creating a dataframe from list
df = pd.DataFrame(dict)
# using isnull() function
df.isnull()
Filling missing values using fillna(), replace() :

→ fillna()
# importing pandas as pd
import pandas as pd
# importing numpy as np
import numpy as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
# filling missing value using fillna()
df.fillna(0)

→ replace()
# Importing pandas as pd
import pandas as pd

# Importing numpy as np
import numpy as np
# Dictionary of lists
data = {
'First Score': [100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score': [np.nan, 40, 80, 98]
}
# Creating a DataFrame from the dictionary
df = pd.DataFrame(data)
df.replace(100, 'null')
Dropping missing values using dropna() :

# importing pandas as pd
import pandas as pd
# importing numpy as np
import numpy as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, np.nan, 45, 56],
'Third Score':[52, 40, 80, 98],
'Fourth Score':[np.nan, np.nan, np.nan, 65]}
# creating a dataframe from dictionary df = pd.DataFrame(dict)
# using dropna() function
df.dropna()

Python Numpy

Numpy is a general-purpose array-processing package. It provides a


high-performance multidimensional array object and tools for working with these
arrays. It is the fundamental package for scientific computing with Python.

Array in Numpy is a table of elements (usually numbers), all of the same type,
indexed by a tuple of positive integers. In Numpy, number of dimensions of the
array is called rank of the array. A tuple of integers giving the size of the array
along each dimension is known as the shape of the array. An array class in
Numpy is called as ndarray.

→ Creating a Numpy Array

import numpy as np

# Creating a 1D array
array_1d = np.array([1, 2, 3, 4, 5])
print("1D Array:", array_1d)
# Creating a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2D Array:\n", array_2d)

→ Accessing the array Index

import numpy as np
# Create a 1D array
array_1d = np.array([10, 20, 30, 40, 50])
# Accessing elements by index
print("Element at index 0:", array_1d[0])
print("Element at index 2:", array_1d[2])

→ Slicing
eg:1
import numpy as np
# Create a 1D array
array_1d = np.array([10, 20, 30, 40, 50])
# Slicing a 1D array
slice_1d = array_1d[1:4] # Elements from index 1 to 3
print("Sliced 1D Array:", slice_1d)

eg:2
import numpy as np
# Creating a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2D Array:\n", array_2d)
# Slicing a 2D array
slice_2d = array_2d[0:2, 1:3] # Subarray with rows 0 to 1 and columns 1 to 2
print("Sliced 2D Array:\n", slice_2d)
Matplotlib
Matplotlib is an amazing visualization library in Python for 2D plots of
arrays.

python -m pip install -U matplotlib

Import matplotlib

Basic plots in Matplotlib

1. Line plot using Matplotlib

# importing matplotlib module


from matplotlib import pyplot as plt
# x-axis values
x = [5, 2, 9, 4, 7]
# Y-axis values
y = [10, 5, 8, 4, 2]
# Function to plot
plt.plot(x,y)
# function to show the plot
plt.show()
2. Bar plot using Matplotlib

# importing matplotlib module


from matplotlib import pyplot as plt
# x-axis values
x = [5, 2, 9, 4, 7]
# Y-axis values
y = [10, 5, 8, 4, 2]
# Function to plot the bar
plt.bar(x,y)
# function to show the plot
plt.show()

3. Histogram using Matplotlib

# importing matplotlib module


from matplotlib import pyplot as plt
# Y-axis values
y = [10, 5, 8, 4, 2]
# Function to plot histogram
plt.hist(y)
# Function to show the plot
plt.show()
4. Scatter Plot using Matplotlib

# importing matplotlib module


from matplotlib import pyplot as plt
# x-axis values
x = [5, 2, 9, 4, 7]
# Y-axis values
y = [10, 5, 8, 4, 2]
# Function to plot scatter
plt.scatter(x, y)
# function to show the plot
plt.show()

You might also like