0% found this document useful (0 votes)

6 views

BTech 5 CSE Data Analytics Using Python Unit 4 Notes

The document provides an overview of the pandas library, a powerful tool for data analysis and manipulation in Python, highlighting its key features, data structures (Series and DataFrame), and methods for loading data from various formats like CSV, Excel, and SQL databases. It discusses the importance of encoding, delimiters, and handling missing values when importing data, as well as the differences between CSV and Excel file formats. Additionally, it explains the Index object in pandas, which is essential for data alignment and selection.

Uploaded by

yeeshandas

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

BTech 5 CSE Data Analytics Using Python Unit 4 Notes

Uploaded by

yeeshandas

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

SRGI, BHILAI

UNIT – 4
THE pandas LIBRARY

Introduction to pandas Library

Pandas is a powerful and widely-used open-source data analysis and
manipulation library in Python. It provides data structures like Series (1D) and
DataFrame (2D), which are designed to handle structured data intuitively and
efficiently.
Pandas is built on top of NumPy and offers functions for tasks like:
• Data cleaning
• Data wrangling
• Data manipulation
• Handling missing data
• Merging, joining, and grouping datasets
• Time series analysis
Its user-friendly interface allows easy loading, manipulation, and analysis of
datasets from various formats like CSV, Excel, SQL databases, and more. Pandas
simplifies complex tasks in data science and machine learning by providing
flexible tools to manage and transform data. It is essential for data scientists and
analysts for its versatility and ease of use.

Pandas provides several methods for loading different types of data files,
including CSV, Excel, and SQL. Below are the commonly used methods for each:
1. Loading CSV Files
We can load a CSV file using the read_csv() function.
import pandas as pd
# Loading a CSV file
df = pd.read_csv('filename.csv')
We can also specify various parameters such as delimiter, encoding, and handling
missing values:
df = pd.read_csv('filename.csv', delimiter=',', encoding='utf-8',
na_values=['N/A', 'NA'])
In Pandas, when reading files like CSVs, you often use encoding='utf-8' to specify the character encoding of the
file you're reading. UTF-8 is one of the most widely used encodings for representing text in computers and is
compatible with most characters from different languages, symbols, and emojis.
Reasons for Using encoding='utf-8' in Pandas:
1. Handle Non-ASCII Characters:
o Many text files contain characters outside the standard ASCII range (e.g., accented letters,
symbols, non-Latin alphabets). UTF-8 can handle a wide variety of characters from different
languages.
2. Avoid Encoding Errors:
SRGI, BHILAI

o If the file contains non-UTF-8 characters and you don't specify the encoding, you might get
errors like UnicodeDecodeError. Specifying encoding='utf-8' ensures that Pandas knows how
to properly interpret the characters in the file.
3. Standard and Universal:
o UTF-8 is the default encoding for many web pages, applications, and databases, so it's a
common practice to use it when importing files. This makes your code more portable and
compatible across different systems and applications.
4. Prevent Misinterpretation of Data:
o If the wrong encoding is used (e.g., ASCII or ISO-8859-1), some characters may not be
interpreted correctly, leading to data corruption or unexpected symbols. By explicitly using utf-
8, you ensure proper interpretation of characters.

Why Use delimiter=","?

1. Comma-Separated Values: The default format for a CSV file uses commas to separate columns. If you
are working with a standard CSV file, you might not need to explicitly use delimiter=",", as it's the default
separator for pd.read_csv().
2. Handling Other Delimiters: Sometimes, files might not use a comma. In such cases, you would specify
the correct delimiter. For example:
o Tab-delimited files: delimiter="\t"
o Semicolon-separated files: delimiter=";"
3. Correct Parsing: Specifying the delimiter ensures that Pandas correctly parses the columns in the file.
If the file uses a comma as a delimiter, specifying delimiter="," guarantees that Pandas splits the data at
the commas.
Reasons for using na_values =['NA', 'N/A']:
1. Custom Missing Data Indicators: Different datasets use different representations for missing values.
Some use NA, N/A, or other placeholders to indicate missing data. By using the na_values parameter,
you can tell Pandas to interpret these specific strings as NaN.
2. Ensure Consistent Missing Data Handling: Inconsistent missing data formats can lead to incorrect data
analysis. Using na_values ensures that all specified missing values are treated consistently as NaN,
regardless of how they are represented in the original file.
3. Cleaner Data: By identifying specific placeholders like 'NA' or 'N/A' and converting them to NaN, you
make it easier to handle missing values later in your analysis, such as when imputing missing data or
filtering out incomplete rows.
Example:
When you import a CSV file with the na_values parameter, it automatically converts the specified values to NaN.
import pandas as pd
# Example CSV file data:
# Name, Age, City
# Ram, 25, NA
# Shyam, N/A, London
# Ajay, 30, N/A
SRGI, BHILAI

# Loading the CSV with custom NA values

df = pd.read_csv('data.csv', na_values=['NA', 'N/A'])
print(df)

# Output:
# Name Age City
# 0 Ram 25.0 NaN
# 1 Shyam NaN London
# 2 Ajay 30.0 NaN
Benefits:
• Flexibility: Handle multiple formats of missing data in different datasets.
• Data Cleaning: Convert all recognized missing values to a standard format (NaN).
• Ease of Processing: Allows seamless use of Pandas' built-in functionality for dealing with missing data
(like .fillna() or .dropna()).
Thus, using na_values helps ensure that data is correctly interpreted and missing values are properly handled
during import.

2. Loading Excel Files

To read Excel files, Pandas provides the read_excel() function. You can read
individual sheets or multiple sheets from the Excel workbook.
# Loading a single sheet from an Excel file
df = pd.read_excel('filename.xlsx', sheet_name='Sheet1')

# Loading multiple sheets

df_dict = pd.read_excel('filename.xlsx', sheet_name=['Sheet1', 'Sheet2'])
We may need to install openpyxl for handling .xlsx files:
pip install openpyxl
3. Loading Data from a SQL Database
To load data from an SQL database, Pandas uses the read_sql() function, which
requires an active database connection (using libraries like sqlite3, sqlalchemy,
or pyodbc).
First, set up a connection to your SQL database:
import pandas as pd
import sqlite3
# Connecting to a SQLite database
conn = sqlite3.connect('database_name.db')
# Loading data from a SQL table
df = pd.read_sql('SELECT * FROM table_name', conn)
Alternatively, with SQLAlchemy (for more complex databases like MySQL,
PostgreSQL):
from sqlalchemy import create_engine
# Create a connection engine
engine = create_engine('mysql://username:password@host:port/database_name')
SRGI, BHILAI

# Loading data from SQL using SQLAlchemy

df = pd.read_sql('SELECT * FROM table_name', engine)
These methods make it easy to load data from different formats into Pandas
DataFrames for further analysis and manipulation.

CSV (Comma-Separated Values) and Excel files (typically .xls or .xlsx formats) are both popular formats for
storing tabular data, but they have significant differences in structure, functionality, and usability. Here’s a detailed
comparison:
1. File Format
• CSV:
o A plain text file format that uses commas to separate values (though other delimiters like tabs
can also be used).
o Each line represents a row, and each value in the row is separated by a comma.
o File extension: .csv.
• Excel:
o A binary or XML-based file format used by Microsoft Excel to store data.
o Supports multiple sheets within a single file, allowing for more complex data organization.
o File extensions: .xls (older format) and .xlsx (newer format).
2. Data Structure
• CSV:
o Flat structure with a single table of data (no support for multiple sheets).
o Only supports text and numerical data; no built-in support for data types, formulas, or
formatting.
• Excel:
o Can contain multiple worksheets (tabs) within a single file.
o Supports various data types, including text, numbers, dates, and more complex objects.
o Allows for rich formatting (font styles, colors, borders, etc.), charts, graphs, and formulas.
3. Usability
• CSV:
o Simple and lightweight, making it easy to read and edit with any text editor.
o Ideal for simple data storage and exchange between applications.
o Limited in terms of features for data manipulation and visualization.
• Excel:
o User-friendly interface with advanced features for data analysis and visualization.
o Ideal for more complex datasets requiring calculations, charts, and formatting.
o Supports functionalities like pivot tables, data validation, and filtering.
4. Size and Performance
• CSV:
o Generally smaller in size compared to Excel files, as it contains only raw data without additional
formatting.
o Faster to load and process due to its simplicity.
• Excel:
o Typically larger in size because of additional features, formatting, and potential embedded
objects.
o Performance can degrade with very large datasets or complex calculations.
5. Interoperability
• CSV:
o Universally compatible with virtually any data processing software, programming languages,
and databases.
o Ideal for data exchange between different platforms and applications.
• Excel:
o Primarily designed for use with Microsoft Excel but can be opened by other spreadsheet
applications (e.g., Google Sheets, LibreOffice Calc).
SRGI, BHILAI

o May require additional libraries or software to read or manipulate in programming languages.

6. Data Integrity
• CSV:
o Lack of data validation means potential issues with data integrity (e.g., inconsistent formatting).
o Doesn’t store metadata (like column data types or constraints).
• Excel:
o Can include data validation features to restrict the type of data entered in cells.
o Stores metadata about the data, such as formatting and formulas, which can help maintain data
integrity.
7. Editing and Collaboration
• CSV:
o Easier to collaborate on in terms of version control, as it’s just a text file.
o However, editing can be less intuitive for non-technical users.
• Excel:
o Provides built-in collaboration tools (e.g., track changes, comments) for multiple users.
o More suitable for users who require a visual interface for data manipulation.
Summary Table
Feature CSV Excel
File Format Plain text Binary/XML
Data Structure Single table Multiple sheets
Usability Simple Advanced features
Size Generally smaller Generally larger
Interoperability Highly compatible Mostly Excel-based
Data Integrity Basic Enhanced validation
Editing Text editor Spreadsheet software
Conclusion
The choice between CSV and Excel depends on the specific requirements of the task at hand. CSV is best for
simple, flat data storage and transfer, while Excel is ideal for more complex datasets requiring advanced
functionalities and formatting.

Pandas data structures:

Pandas provides two primary data structures: Series and DataFrame. Each
is designed to handle different types of data efficiently and intuitively. Below is
a detailed explanation of both, with examples.
1. Pandas Series
A Series is a one-dimensional labeled array capable of holding any data type,
such as integers, strings, floats, or even Python objects. It is like a column in a
table or a one-dimensional array with labels.
Creating a Series
You can create a Series using various data types like lists, dictionaries, or NumPy
arrays.
SRGI, BHILAI

import pandas as pd

# Creating a Series from a list

s = pd.Series([10, 20, 30, 40, 50])
print(s)

# Output
# 0 10
# 1 20
# 2 30
# 3 40
# 4 50
# dtype: int64
Custom Indexing in Series
You can assign custom indices to the Series for better readability.

s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])

print(s)

# Output
# a 10
# b 20
# c 30
# dtype: int64
Series from a Dictionary
If you create a Series from a dictionary, the keys become the indices, and the
values become the data.

data = {'a': 100, 'b': 200, 'c': 300}

s = pd.Series(data)
print(s)

# Output
# a 100
# b 200
# c 300
# dtype: int64
SRGI, BHILAI

Key Features of Series:

• Homogeneous data: All elements must be of the same type.
• Supports indexing (custom or default numeric indices).
• Can handle missing data (NaN).

2. Pandas DataFrame
A DataFrame is a two-dimensional labeled data structure that can hold data of
different types (integer, float, string, etc.) in columns. Think of it as a table (like
an Excel sheet or SQL table) where rows and columns are both labeled.
Creating a DataFrame
You can create a DataFrame from various data structures like lists of lists,
dictionaries, or NumPy arrays.

# Creating a DataFrame from a dictionary

data = {
'Name': ['Ram', 'Shyam', 'Ajay'],
'Age': [23, 25, 22],
'City': ['New York', 'Paris', 'London']
}
df = pd.DataFrame(data)
print(df)

# Output
# Name Age City
# 0 Ram 23 New York
# 1 Shyam 25 Paris
# 2 Ajay 22 London
Custom Index in DataFrame
You can assign custom row and column labels to make the DataFrame more
descriptive.

df = pd.DataFrame(data, index=['Row1', 'Row2', 'Row3'])

print(df)

# Output
# Name Age City
# Row1 Ram 23 New York
# Row2 Shyam 25 Paris
# Row3 Ajay 22 London
SRGI, BHILAI

Creating a DataFrame from a List of Lists

You can create a DataFrame from a list of lists (each inner list represents a row).

data = [[23, 'Ram', 'New York'], [25, 'Shyam', 'Paris'], [22, 'Ajay', 'London']]
df = pd.DataFrame(data, columns=['Age', 'Name', 'City'])
print(df)

# Output
# Age Name City
# 0 23 Ram New York
# 1 25 Shyam Paris
# 2 22 Ajay London
DataFrame from a Dictionary of Series
You can also create a DataFrame using a dictionary of Pandas Series.

s1 = pd.Series([23, 25, 22], index=['Ram', 'Shyam', 'Ajay'])

s2 = pd.Series(['New York', 'Paris', 'London'], index=['Ram', 'Shyam', 'Ajay'])

df = pd.DataFrame({'Age': s1, 'City': s2})

print(df)

# Output
# Age City
# Ram 23 New York
# Shyam 25 Paris
# Ajay 22 London
Key Features of DataFrame:
• Heterogeneous data: Different columns can have different data types (int,
float, string, etc.).
• Size-mutable: You can add or remove rows and columns dynamically.
• Supports indexing by rows and columns (using labels or integers).
• Can handle missing data (NaN).
Key Differences between Series and DataFrame:
• Series is one-dimensional, while DataFrame is two-dimensional.
• Series can be thought of as a column, whereas a DataFrame is a collection
of multiple columns (with potentially different data types) organized into
rows and columns.
Example of DataFrame Manipulation:
You can access, filter, and manipulate the data in a DataFrame like so:
SRGI, BHILAI

# Accessing a column
print(df['Age'])
# Accessing rows by index
print(df.loc['Ram'])

# Filtering rows based on a condition

print(df[df['Age'] > 23])
These powerful data structures are the foundation of most data manipulation tasks
in Pandas. They are designed to be fast, flexible, and intuitive for handling
structured data.

The Index Object

The Index object in Pandas is an immutable array that holds the labels for
rows or columns of a Pandas Series or DataFrame. It plays a critical role in data
alignment and selection, allowing for efficient data access and manipulation.
Every Pandas object (Series or DataFrame) has an associated Index, which can
be customized or default (i.e., simple integer values starting from 0).
Key Features of the Index Object:
1. Immutable: The Index object is immutable, meaning its values cannot be
changed once the object is created. This ensures data integrity during
operations.
2. Supports Heterogeneous Data Types: Index can hold various types of
data, including strings, integers, datetime objects, and more.
3. Allows for Fast Lookups: Indexes are optimized for quick access to rows
and columns, enabling fast lookups and alignment in operations like
merging and joining.
4. Labels for Data: Indexes provide labels to Series and DataFrame rows or
columns, making data more accessible and meaningful. You can use these
labels to reference data rather than relying on integer-based positions.
Types of Indexes
• Default Index: When you don't provide an explicit index, Pandas assigns
a default index that consists of integers starting from 0.
• Custom Index: You can assign custom index labels, such as strings, dates,
or a combination of data types.
• MultiIndex (Hierarchical Index): This is an advanced index type that
allows multiple levels of indexing, which is useful for handling multi-
dimensional data.
SRGI, BHILAI

Example of the Index Object in Pandas

1. Default Index
When creating a Pandas Series or DataFrame, if no index is specified, Pandas
assigns a default index starting at 0.

import pandas as pd

# Creating a Series with a default integer index

s = pd.Series([100, 200, 300])
print(s)

# Output
# 0 100
# 1 200
# 2 300
# dtype: int64
2. Custom Index
You can define custom index labels when creating a Series or DataFrame.

# Creating a Series with a custom index

s = pd.Series([100, 200, 300], index=['a', 'b', 'c'])
print(s)

# Output
# a 100
# b 200
# c 300
# dtype: int64
3. Accessing the Index Object
You can access the index of a Series or DataFrame using the .index attribute.

# Accessing the index

print(s.index)

# Output
# Index(['a', 'b', 'c'], dtype='object')
SRGI, BHILAI

4. Indexing in DataFrame
The index in a DataFrame refers to the row labels, while the column labels are
referred to as columns.

# Creating a DataFrame with custom row index and column names

data = {
'Age': [23, 25, 22],
'City': ['New York', 'Paris', 'London']
}
df = pd.DataFrame(data, index=['Ram', 'Shyam', 'Ajay'])
print(df)

# Output
# Age City
# Ram 23 New York
# Shyam 25 Paris
# Ajay 22 London
You can access both row indices and column names:
# Accessing row index
print(df.index)
# Accessing column names
print(df.columns)
5. MultiIndex (Hierarchical Index)
A MultiIndex allows for multi-level indexing, which is helpful when you have
more complex data.
arrays = [
['A', 'A', 'B', 'B'],
['one', 'two', 'one', 'two']
]
multi_index = pd.MultiIndex.from_arrays(arrays, names=('Upper', 'Lower'))
# Creating a DataFrame with MultiIndex
df = pd.DataFrame({'Value': [10, 20, 30, 40]}, index=multi_index)
print(df)
# Output
# Value
# Upper Lower
#A one 10
# two 20
SRGI, BHILAI

#B one 30
# two 40

6. Setting and Resetting Index

You can set and reset the index of a DataFrame using set_index() and
reset_index().

# Setting a new index

df = pd.DataFrame({
'Name': ['Shyam', 'Ajay', 'Charlie'],
'Age': [25, 30, 35],
'City': ['Paris', 'London', 'New York']
})
df.set_index('Name', inplace=True)
print(df)

# Output
# Age City
# Name
# Shyam 25 Paris
# Ajay 30 London
# Charlie 35 New York

# Resetting the index back to default

df.reset_index(inplace=True)
print(df)

Key Index Operations:

1. Reindexing: You can change or reassign index labels using the reindex()
method.
# Reindexing a Series
s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s_reindexed = s.reindex(['c', 'b', 'a', 'd'], fill_value=0)
print(s_reindexed)

# Output
#c 3
#b 2
#a 1
SRGI, BHILAI

#d 0
# dtype: int64
2. Sorting: You can sort a DataFrame or Series by its index using
sort_index().

df_sorted = df.sort_index()
print(df_sorted)
3. Checking for Uniqueness: You can check whether an index is unique
using .is_unique.

print(df.index.is_unique)
4. Duplicating Index: Duplicates in the index can sometimes occur, and you
can use duplicated() to check for them.

print(df.index.duplicated())
Conclusion
The Index object is fundamental in Pandas for efficient data manipulation,
alignment, and access. It acts as a label or identifier for the data, allowing easy
selection, reindexing, and slicing of data. Whether you're working with simple or
hierarchical data, Pandas' Index provides a flexible and powerful tool to manage
data effectively.

Arithmetic and Data Alignment

In Python, arithmetic and data alignment often refers to performing
mathematical operations on data structures (such as pandas Series or DataFrame),
where the alignment of data between different elements (like indexes or labels)
plays a crucial role in how the arithmetic operations are performed.
Let’s break down both concepts:
1. Arithmetic in Python
Arithmetic operations involve basic mathematical functions like addition,
subtraction, multiplication, division, etc. These can be performed on individual
variables or on elements of data structures like lists, tuples, arrays, or pandas
objects.
Examples of arithmetic operations in Python:
# Arithmetic with integers and floats
a = 10
b=3
print(a + b) # Output: 13
SRGI, BHILAI

print(a - b) # Output: 7
print(a * b) # Output: 30
print(a / b) # Output: 3.333...

# Arithmetic with lists (element-wise addition using list comprehension)

list1 = [1, 2, 3]
list2 = [4, 5, 6]
result = [x + y for x, y in zip(list1, list2)] # Output: [5, 7, 9]
2. Data Alignment in Python (Pandas)
When working with pandas data structures such as Series and DataFrame,
data alignment is crucial when performing arithmetic operations. Pandas aligns
the data on index or labels before performing element-wise arithmetic. If the
indexes don’t match, it fills the unmatched elements with NaN (Not a Number).
Example 1: Arithmetic with Pandas Series
import pandas as pd

# Two pandas Series with different indexes

s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([4, 5, 6], index=['b', 'c', 'd'])

# Element-wise addition (automatically aligns by index)

result = s1 + s2
print(result)
# Output:
# a NaN # 'a' is not in s2, so it returns NaN
# b 6.0 # 2 + 4
# c 8.0 # 3 + 5
# d NaN # 'd' is not in s1, so it returns NaN
In this example, pandas aligns the data based on the index labels ('a', 'b', 'c', 'd'),
and fills in NaN for missing values where the indexes don’t match.
Example 2: Arithmetic with Pandas DataFrame
When performing operations between DataFrame objects, pandas aligns
data by both the row index and column labels.
df1 = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
}, index=['a', 'b', 'c'])
df2 = pd.DataFrame({
'A': [7, 8, 9],
SRGI, BHILAI

'C': [10, 11, 12]

}, index=['b', 'c', 'd'])

# Element-wise addition (aligns by row index and column labels)

result = df1 + df2
print(result)
# Output:
# A B C
# a NaN NaN NaN # No matching row index or column label for 'a'
# b 9.0 4.0 NaN # 2 + 7 for 'A', and 'B' and 'C' are missing
# c 11.0 5.0 NaN # 3 + 8 for 'A'
# d NaN NaN 12.0 # No 'A' or 'B' in df1 for 'd', only 'C' exists
In the above example:
• Columns ‘A’ are aligned and arithmetic is performed on the values.
• Since df2 has a column ‘C’ which doesn’t exist in df1, and df1 has a
column ‘B’ which doesn’t exist in df2, those result in NaN.
Handling Missing Values:
If you want to fill the NaN values with a specific value during arithmetic
operations, you can use the fill_value parameter in functions like add(), sub(), etc.
# Adding two DataFrames with fill_value
result = df1.add(df2, fill_value=0)
print(result)
# Output:
# A B C
# a 1.0 4.0 0.0
# b 9.0 4.0 0.0
# c 11.0 5.0 0.0
# d 0.0 0.0 12.0
Key Points:
• Arithmetic in Python involves basic math operations, but when used with
complex data structures (like Series or DataFrames), it operates element-
wise.
• Data alignment in pandas ensures that arithmetic operations are
performed based on matching indexes or labels. If the indexes don’t align,
pandas fills in the gaps with NaN.
This alignment behavior is very useful for working with incomplete or
mismatched datasets, as pandas handles the underlying complexity of ensuring
the data is properly aligned before performing the calculations.
SRGI, BHILAI

Operations between DataFrame and Series

In pandas, operations between a DataFrame and a Series are quite
common. These operations are usually performed row-wise (axis 1) or column-
wise (axis 0), depending on the alignment of indices and column labels.
Key Points for DataFrame and Series Operations:
1. Broadcasting: When you perform operations between a DataFrame and a
Series, pandas will attempt to broadcast the Series to match the
DataFrame's shape, aligning on the index (rows) or columns.
2. Alignment: Operations between a DataFrame and a Series are aligned by
labels. If the labels (index or column names) don't match, pandas will insert
NaN for missing data.
Let's look at different types of operations:
1. Row-wise Operations (Default)
By default, operations between a DataFrame and a Series are performed row-
wise (along each column). This happens when the index of the Series matches the
columns of the DataFrame.
Example: Subtracting a Series from each row of a DataFrame
import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})

# Creating a Series to subtract from DataFrame (broadcast row-wise)

series = pd.Series([1, 2, 3], index=['A', 'B', 'C'])

# Subtract the Series from DataFrame (column-wise operation)

result = df - series

print(result)
Output:
A B C
0 0 2 4
1 1 3 5
2 2 4 6
2. Column-wise Operations
SRGI, BHILAI

To perform operations column-wise, the index of the Series must align with the
DataFrame’s index (rows). You can explicitly specify the operation to be
performed along the columns using axis=0.
Example: Adding a Series to each column of a DataFrame
# Series with index matching the DataFrame's index (row-wise alignment)
series_row = pd.Series([1, 2, 3])

# Adding the Series to each column (element-wise)

result = df.add(series_row, axis=0)

print(result)
Output:
A B C
0 2 5 8
1 4 7 10
2 6 9 12
3. Operations with Non-Matching Indices/Columns
If the Series and DataFrame have non-matching indices or columns, pandas will
align based on the labels and fill in NaN where data is missing.
Example: Adding a Series with non-matching columns
# Series with different columns
series_diff = pd.Series([10, 20], index=['A', 'D']) # 'D' is not in DataFrame

# Adding Series to DataFrame (results in NaN for missing 'D')

result = df + series_diff

print(result)
Output:
A B C D
0 11 NaN NaN NaN
1 12 NaN NaN NaN
2 13 NaN NaN NaN
4. Other Operations (Multiplication, Division, etc.)
You can perform other element-wise operations between a DataFrame and a
Series, such as multiplication, division, modulo, etc.
Example: Multiplying a DataFrame by a Series
# Multiplying a Series to each column of the DataFrame
result = df * series
SRGI, BHILAI

print(result)
Output:
A B C
0 1 8 21
1 2 10 24
2 3 12 27
5. Using Functions like apply
You can also use the apply() function to perform more complex operations by
applying a function to each row or column.
Example: Applying a function row-wise
# Applying a lambda function row-wise
result = df.apply(lambda row: row + series, axis=1)

print(result)
Summary of Key Operations:
• Arithmetic: +, -, *, /, etc. will broadcast and align based on indices or
columns.
• Alignment: Missing labels will result in NaN.
• Control Axis: Use axis=0 for row-wise operations and axis=1 for column-
wise operations.
SRGI, BHILAI

Function by Element and Function by Row/Column

element-wise operations using applymap() and row/column-wise
operations using apply().
1. Element-wise Operation: applymap()
Scenario: Suppose we have a DataFrame with numeric values, and we want to
square each element in the DataFrame.
Example:
import pandas as pd
# Creating a sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})

# Applying a function to square each element in the DataFrame

df_squared = df.applymap(lambda x: x ** 2)

print("Original DataFrame:\n", df)

print("\nSquared DataFrame (element-wise):\n", df_squared)
Output:
Original DataFrame:
A B C
0 1 4 7
1 2 5 8
2 3 6 9

Squared DataFrame (element-wise):

A B C
0 1 16 49
1 4 25 64
2 9 36 81
Explanation:
• applymap(lambda x: x ** 2) applies the squaring function to each
individual element in the DataFrame, transforming the entire DataFrame
element-wise.
SRGI, BHILAI

2. Column-wise Operation: apply()

Scenario: We want to calculate the sum of values in each column.
Example:
# Applying a sum function to each column
column_sum = df.apply(sum, axis=0)
print("\nSum of each column:\n", column_sum)
Output:
Sum of each column:
A 6
B 15
C 24
dtype: int64
Explanation:
• The apply() function with axis=0 (default) applies the sum function down
each column, calculating the sum of values in each column.

3. Row-wise Operation: apply()

Scenario: We want to calculate the sum of values in each row.
Example:
# Applying a sum function to each row
row_sum = df.apply(sum, axis=1)

print("\nSum of each row:\n", row_sum)

Output:
Sum of each row:
0 12
1 15
2 18
dtype: int64
Explanation:
• The apply() function with axis=1 applies the sum function across each row,
calculating the sum of values in each row.

4. Applying Custom Function by Row

Scenario: Suppose we want to apply a custom function that calculates the range
(difference between max and min) for each row.
Example:
# Applying a custom function to calculate range (max - min) for each row
row_range = df.apply(lambda x: x.max() - x.min(), axis=1)
SRGI, BHILAI

print("\nRange (max - min) for each row:\n", row_range)

Output:
Range (max - min) for each row:
0 6
1 6
2 6
dtype: int64
Explanation:
• The custom lambda function lambda x: x.max() - x.min() calculates the
range of values in each row, with axis=1 ensuring it is applied row-wise.

5. Applying Custom Function by Column

Scenario: Now, let's apply a similar range function, but this time by column.
Example:
# Applying a custom function to calculate range (max - min) for each column
column_range = df.apply(lambda x: x.max() - x.min(), axis=0)

print("\nRange (max - min) for each column:\n", column_range)

Output:
Range (max - min) for each column:
A 2
B 2
C 2
dtype: int64
Explanation:
• The lambda function computes the range for each column, with axis=0
ensuring it is applied column-wise.

Summary of Key Differences:

• applymap(): Applies element-wise transformations. Ideal for modifying
each individual element in the DataFrame.
• apply(): Applies row-wise or column-wise transformations depending on
the axis parameter. Use this when you need to compute or transform across
a full row or column.
This flexibility in Pandas allows you to perform a wide variety of data
manipulations, whether on individual elements or entire rows/columns.
SRGI, BHILAI

Statistics Functions:
In data analytics, statistical functions are essential to describe, summarize,
and interpret data. They allow analysts to identify patterns, trends, and
relationships in datasets. Here’s a comprehensive list of the most commonly
used statistical functions in Python, along with examples.
1. Measures of Central Tendency
Mean (Average)
The mean is the average of all values in the dataset.
import numpy as np
data = [10, 15, 20, 25, 30]
mean = np.mean(data)
print("Mean:", mean)
Output:
Mean: 20.0
Median
The median is the middle value in an ordered dataset. It’s useful for skewed data,
as it isn’t affected by extreme values.
median = np.median(data)
print("Median:", median)
Output:
Median: 20.0
Mode
The mode is the most frequently occurring value. It’s especially useful for
categorical data.
from scipy import stats
data = [10, 15, 15, 20, 25, 25, 25, 30]
mode = stats.mode(data)
print("Mode:", mode.mode[0])
Output:
Mode: 25
2. Measures of Dispersion
Variance
Variance measures the spread of data points around the mean. A high variance
means the data is more spread out.
variance = np.var(data)
print("Variance:", variance)
Output:
Variance: 36.875
Standard Deviation
SRGI, BHILAI

The standard deviation is the square root of the variance, showing the average
distance from the mean.
std_dev = np.std(data)
print("Standard Deviation:", std_dev)
Output:
Standard Deviation: 6.07
Range
The range is the difference between the maximum and minimum values.
range_value = np.max(data) - np.min(data)
print("Range:", range_value)
Output:
Range: 20
3. Measures of Position
Percentiles
Percentiles divide the data into 100 equal parts. For instance, the 90th percentile
is the value below which 90% of the data lies.
percentile_90 = np.percentile(data, 90)
print("90th Percentile:", percentile_90)
Output:
90th Percentile: 28.5
Quartiles
Quartiles divide the data into four equal parts. The 25th percentile is the first
quartile (Q1), the 50th percentile is the median (Q2), and the 75th percentile is
the third quartile (Q3).
q1 = np.percentile(data, 25)
q2 = np.percentile(data, 50) # This is the median
q3 = np.percentile(data, 75)
print("Q1:", q1, "Q2 (Median):", q2, "Q3:", q3)
Output:
Q1: 15.0 Q2 (Median): 20.0 Q3: 25.0
Interquartile Range (IQR)
The IQR is the range between the first and third quartiles. It helps detect outliers.
python
iqr = q3 - q1
print("Interquartile Range (IQR):", iqr)
Output:
Interquartile Range (IQR): 10.0
4. Measures of Shape
Skewness
SRGI, BHILAI

Skewness measures the asymmetry of the distribution. Positive skew indicates a

right-skewed distribution, while negative skew indicates a left-skewed
distribution.
skewness = stats.skew(data)
print("Skewness:", skewness)
Kurtosis
Kurtosis measures the "tailedness" of the distribution. High kurtosis indicates
heavy tails, while low kurtosis indicates light tails.
kurtosis = stats.kurtosis(data)
print("Kurtosis:", kurtosis)
5. Measures of Association
Correlation
Correlation measures the linear relationship between two variables. Values close
to 1 or -1 indicate a strong relationship.
data1 = [10, 20, 30, 40, 50]
data2 = [15, 25, 35, 45, 55]
correlation = np.corrcoef(data1, data2)
print("Correlation Matrix:\n", correlation)
Output:
Correlation Matrix:
[[1. 1.]
[1. 1.]]
Covariance
Covariance measures how two variables vary together. Positive values indicate
that as one variable increases, the other tends to increase as well.
covariance = np.cov(data1, data2)
print("Covariance Matrix:\n", covariance)
Output:
Covariance Matrix:
[[250. 250.]
[250. 250.]]
6. Ranking and Sorting
Sorting
Sorting arranges the data in ascending or descending order, helpful for quickly
finding minimum or maximum values.
sorted_data_asc = np.sort(data)
sorted_data_desc = np.sort(data)[::-1]
print("Ascending Order:", sorted_data_asc)
print("Descending Order:", sorted_data_desc)
SRGI, BHILAI

Ranking
Ranking assigns a rank to each value based on its order. Ties can be handled by
averaging ranks or assigning ranks based on the order they appear.
import pandas as pd
rank_ascending = pd.Series(data).rank() # Ascending rank
rank_descending = pd.Series(data).rank(ascending=False) # Descending rank
print("Ascending Rank:\n", rank_ascending)
print("Descending Rank:\n", rank_descending)
7. Z-Scores
Z-scores indicate how many standard deviations a data point is from the mean.
It’s used to identify outliers.
z_scores = stats.zscore(data)
print("Z-Scores:", z_scores)
8. Probability Distributions
Normal Distribution
The normal distribution is a symmetrical, bell-shaped distribution. You can
generate it in Python with:
import matplotlib.pyplot as plt
data = np.random.normal(0, 1, 1000)
plt.hist(data, bins=30, alpha=0.5)
plt.show()
Binomial Distribution
The binomial distribution represents the number of successes in a fixed number
of trials.
from scipy.stats import binom
n, p = 10, 0.5 # 10 trials, 50% success probability
binom_data = binom.rvs(n, p, size=1000)
plt.hist(binom_data, bins=30, alpha=0.5)
plt.show()
Summary
This comprehensive set of statistical functions allows data analysts to gain
insights, make comparisons, detect outliers, and understand relationships within
data. These functions are essential for exploring data and forming conclusions
based on patterns and trends.

Google Hacking Database
83% (18)
Google Hacking Database
91 pages
Dangerous Google - Searching For Secrets PDF
88% (26)
Dangerous Google - Searching For Secrets PDF
12 pages
Download ebooks file The Volatility Edge in Options Trading New Technical Strategies for Investing in Unstable Markets 1st Edition Jeff Augen all chapters
No ratings yet
Download ebooks file The Volatility Edge in Options Trading New Technical Strategies for Investing in Unstable Markets 1st Edition Jeff Augen all chapters
55 pages
Dangerous Google Searching For Secrets
No ratings yet
Dangerous Google Searching For Secrets
12 pages
Google Hacking Database
No ratings yet
Google Hacking Database
91 pages
David Amos, Dan Bader, Joanna Jablonski, Fletcher Heisler Python
100% (15)
David Amos, Dan Bader, Joanna Jablonski, Fletcher Heisler Python
643 pages
Understanding Database Types - by Alex Xu
No ratings yet
Understanding Database Types - by Alex Xu
13 pages
Policy Document Ucc Redemption Understanding The Process Further
80% (20)
Policy Document Ucc Redemption Understanding The Process Further
37 pages
Hackers Black Book (2011-Edition)
No ratings yet
Hackers Black Book (2011-Edition)
6 pages
How To Use Google Hack
100% (1)
How To Use Google Hack
4 pages
UCC-1 Financing Statement
87% (39)
UCC-1 Financing Statement
94 pages
PayPal Hacks
100% (1)
PayPal Hacks
6 pages
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
No ratings yet
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
6 pages
Dark Web Market Price Index Hacking Tools July 2018 Top10VPN2
91% (11)
Dark Web Market Price Index Hacking Tools July 2018 Top10VPN2
7 pages
Kali Linux Tools Descriptions
100% (2)
Kali Linux Tools Descriptions
26 pages
Allison, Berkowitz - 2008 - SQL For Microsoft Access PDF
100% (1)
Allison, Berkowitz - 2008 - SQL For Microsoft Access PDF
393 pages
Hackers Favorite Search Queries 4
100% (1)
Hackers Favorite Search Queries 4
6 pages
canadianResumeTemplate 1
No ratings yet
canadianResumeTemplate 1
2 pages
Chapter 4 - Import-Export Data
No ratings yet
Chapter 4 - Import-Export Data
30 pages
1632369343606_Data_Frame_Notes2
No ratings yet
1632369343606_Data_Frame_Notes2
4 pages
Experiment No 3 Importing and Exporting Data in Python Using Pandas Student
No ratings yet
Experiment No 3 Importing and Exporting Data in Python Using Pandas Student
6 pages
Actuators and Drivers
No ratings yet
Actuators and Drivers
23 pages
pandas data frame
No ratings yet
pandas data frame
11 pages
EMPLOYEE DATA ANALYSIS SYSTEM (IP CLASS XII)
No ratings yet
EMPLOYEE DATA ANALYSIS SYSTEM (IP CLASS XII)
26 pages
Pandas 1
No ratings yet
Pandas 1
64 pages
Employee Data Analysis System ( Ip Class 12 ) ( 2024-25 )
No ratings yet
Employee Data Analysis System ( Ip Class 12 ) ( 2024-25 )
30 pages
Importing Data Into Pandas Dataframes
No ratings yet
Importing Data Into Pandas Dataframes
5 pages
Pandas 1
No ratings yet
Pandas 1
2 pages
XII IP CH 4 Importing Exporting
No ratings yet
XII IP CH 4 Importing Exporting
14 pages
Rest of the Ip Project
No ratings yet
Rest of the Ip Project
26 pages
Ainotes dataframe
No ratings yet
Ainotes dataframe
5 pages
Pandas Tutorial 1: Pandas Basics (Reading Data Files, Dataframes, Data Selection)
No ratings yet
Pandas Tutorial 1: Pandas Basics (Reading Data Files, Dataframes, Data Selection)
15 pages
ainotes
No ratings yet
ainotes
5 pages
Pandas
No ratings yet
Pandas
12 pages
Working With CSV Files
No ratings yet
Working With CSV Files
4 pages
1 Pandas Basic I
No ratings yet
1 Pandas Basic I
22 pages
III Unit Fds
No ratings yet
III Unit Fds
24 pages
7th class of CSV and DataFrame
No ratings yet
7th class of CSV and DataFrame
9 pages
Pandas
No ratings yet
Pandas
29 pages
RM - Pandas_Importing Data
No ratings yet
RM - Pandas_Importing Data
15 pages
Pandas I Notes 06 - June 20
No ratings yet
Pandas I Notes 06 - June 20
13 pages
CSV File
No ratings yet
CSV File
30 pages
L32, 33 Pandas
No ratings yet
L32, 33 Pandas
7 pages
Importing & Exporting CSV Fileppt For Class 12, Presentation With Examples
No ratings yet
Importing & Exporting CSV Fileppt For Class 12, Presentation With Examples
12 pages
L_CsvReadWrite
No ratings yet
L_CsvReadWrite
10 pages
MOD-3 Dap
No ratings yet
MOD-3 Dap
41 pages
05 Data Loading, Storage and Wrangling-1
No ratings yet
05 Data Loading, Storage and Wrangling-1
22 pages
CSV File
No ratings yet
CSV File
9 pages
Python For Data Analysis (1) - 171-192
No ratings yet
Python For Data Analysis (1) - 171-192
24 pages
INFORMATIC Complete Project
No ratings yet
INFORMATIC Complete Project
27 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
6 pages
Importing A CSV File Into The DataFrame
No ratings yet
Importing A CSV File Into The DataFrame
11 pages
Ch4class 12 Readingcsv Files
No ratings yet
Ch4class 12 Readingcsv Files
6 pages
Revision Point - Dataframe
No ratings yet
Revision Point - Dataframe
11 pages
Pandas
No ratings yet
Pandas
4 pages
2 Python Data Processing
100% (2)
2 Python Data Processing
66 pages
Data Transfer Between Files, SQL Databases & Dataframes: Comma To Separate Each Specific Data Value. CSV Advantages
No ratings yet
Data Transfer Between Files, SQL Databases & Dataframes: Comma To Separate Each Specific Data Value. CSV Advantages
6 pages
Exercise 3
No ratings yet
Exercise 3
25 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
DataFrame.docx
No ratings yet
DataFrame.docx
95 pages
Class 12 IP Ch-1 CSV File Handling
No ratings yet
Class 12 IP Ch-1 CSV File Handling
8 pages
Pandas
No ratings yet
Pandas
2 pages
12-IP
No ratings yet
12-IP
4 pages
Python 3rd unit question and answer
No ratings yet
Python 3rd unit question and answer
25 pages
What is pandas
No ratings yet
What is pandas
2 pages
Chapter5 3CSVFile
No ratings yet
Chapter5 3CSVFile
7 pages
14oct Pandas 2024
No ratings yet
14oct Pandas 2024
13 pages
CSV File Guide
From Everand
CSV File Guide
Mia Wright
No ratings yet
Exercise 3
No ratings yet
Exercise 3
12 pages
Practical Guide To Pandas For Data Science
No ratings yet
Practical Guide To Pandas For Data Science
26 pages
CSL-410-L16
No ratings yet
CSL-410-L16
22 pages
Python Pandas For Data Analytics
No ratings yet
Python Pandas For Data Analytics
7 pages
PPT for Assignment-3 (Final_Pandas_Lab)
No ratings yet
PPT for Assignment-3 (Final_Pandas_Lab)
40 pages
Csv Files Import Export
No ratings yet
Csv Files Import Export
3 pages
ANL252 SU4 Jul2022
No ratings yet
ANL252 SU4 Jul2022
55 pages
2_Pandas
No ratings yet
2_Pandas
22 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Turing Machine
No ratings yet
Turing Machine
12 pages
R-Programming Lab Mannual (1)
No ratings yet
R-Programming Lab Mannual (1)
33 pages
Computer Networks Lab Manual WORD
No ratings yet
Computer Networks Lab Manual WORD
39 pages
Data Analytics With Python Laboratory - Lab Manual
No ratings yet
Data Analytics With Python Laboratory - Lab Manual
45 pages
B. Tech. 1st & 2nd Semester (AICTE Scheme)
No ratings yet
B. Tech. 1st & 2nd Semester (AICTE Scheme)
1 page
Google Hacking Database PDF
0% (1)
Google Hacking Database PDF
100 pages
SQL Crash Course
No ratings yet
SQL Crash Course
17 pages
Introduction To Database Systems
No ratings yet
Introduction To Database Systems
42 pages
Useful Google Hacks
100% (4)
Useful Google Hacks
7 pages
TITLE 28 United States Code Sec. 3002
91% (11)
TITLE 28 United States Code Sec. 3002
77 pages
Microsoft Access For Beginners PDF
100% (2)
Microsoft Access For Beginners PDF
196 pages
Excel Cheat Sheet: Travis Cuzick
100% (1)
Excel Cheat Sheet: Travis Cuzick
15 pages
Full download Network Security and Cryptography Sarhan M. Musa pdf docx
No ratings yet
Full download Network Security and Cryptography Sarhan M. Musa pdf docx
40 pages
Master Cyber Digital Forensics
50% (2)
Master Cyber Digital Forensics
114 pages
Mythic Magazine #015
100% (3)
Mythic Magazine #015
34 pages
SFDSFD401 - Basics and Fundamentals of Database
No ratings yet
SFDSFD401 - Basics and Fundamentals of Database
77 pages
JCL Reference
No ratings yet
JCL Reference
722 pages
Record Keeping and Documentation
100% (4)
Record Keeping and Documentation
18 pages
COS2633 - Lesson 6 - 0 - 2023
No ratings yet
COS2633 - Lesson 6 - 0 - 2023
16 pages
EnergyIP Cybersecurity Overview
No ratings yet
EnergyIP Cybersecurity Overview
5 pages
Section 5 - Complete Your FPGEC Application
No ratings yet
Section 5 - Complete Your FPGEC Application
3 pages
EE765-Reliability and Failure Analysis of Electronic Devices
No ratings yet
EE765-Reliability and Failure Analysis of Electronic Devices
10 pages
HAHA
No ratings yet
HAHA
109 pages
Etsi-Gts-Gsm - 03 02+V5 1 0
No ratings yet
Etsi-Gts-Gsm - 03 02+V5 1 0
20 pages
Stock Clerk
No ratings yet
Stock Clerk
2 pages
Sheffer Stroke: Stroke, Named After Henry M. Sheffer, Written " - " (See
No ratings yet
Sheffer Stroke: Stroke, Named After Henry M. Sheffer, Written " - " (See
5 pages
Making It Rain Cloud Based Molecular Simulations For Everyone
No ratings yet
Making It Rain Cloud Based Molecular Simulations For Everyone
13 pages
Gap Assessment in The Emergency Response Community: PNNL-19782
No ratings yet
Gap Assessment in The Emergency Response Community: PNNL-19782
48 pages
N3xxx: Iso/Iec Jtc1/Sc2/Wg2 L2/06-xxx
No ratings yet
N3xxx: Iso/Iec Jtc1/Sc2/Wg2 L2/06-xxx
18 pages
Hikvision Pstor - v1.3.1 - User Manual - EN - 2019607
No ratings yet
Hikvision Pstor - v1.3.1 - User Manual - EN - 2019607
75 pages
Excel Accounts
No ratings yet
Excel Accounts
18 pages
Pranav - Galatagi - Resume - 09 06 2023 20 58 12
No ratings yet
Pranav - Galatagi - Resume - 09 06 2023 20 58 12
2 pages
99 The Paths We Choose
100% (4)
99 The Paths We Choose
44 pages
Union-Find Algorithm - Set 2 (Union by Rank and Path Compression)
No ratings yet
Union-Find Algorithm - Set 2 (Union by Rank and Path Compression)
3 pages
OpenScape Business - Experts Wiki
No ratings yet
OpenScape Business - Experts Wiki
20 pages
2nd Quarter Exam MATH 10
No ratings yet
2nd Quarter Exam MATH 10
2 pages
Paramount Monthly P&L RPT
No ratings yet
Paramount Monthly P&L RPT
34 pages
French Computer
No ratings yet
French Computer
6 pages
Cisco Command Reference PDF
100% (1)
Cisco Command Reference PDF
974 pages
Trumpf Ts 7500 Brochure en
No ratings yet
Trumpf Ts 7500 Brochure en
26 pages
Ge2112 Fundamentals of Computing and Programming: Introduction To Computers
No ratings yet
Ge2112 Fundamentals of Computing and Programming: Introduction To Computers
25 pages
Nail Varnish, Colours & Art - Nails - Maybelline UK
No ratings yet
Nail Varnish, Colours & Art - Nails - Maybelline UK
1 page
HSQLDB Guide
No ratings yet
HSQLDB Guide
278 pages

BTech 5 CSE Data Analytics Using Python Unit 4 Notes

Uploaded by

BTech 5 CSE Data Analytics Using Python Unit 4 Notes

Uploaded by

SRGI, BHILAI

Introduction to pandas Library

Why Use delimiter=","?

# Loading the CSV with custom NA values

2. Loading Excel Files

# Loading multiple sheets

# Loading data from SQL using SQLAlchemy

o May require additional libraries or software to read or manipulate in programming languages.

Pandas data structures:

# Creating a Series from a list

s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])

data = {'a': 100, 'b': 200, 'c': 300}

Key Features of Series:

# Creating a DataFrame from a dictionary

df = pd.DataFrame(data, index=['Row1', 'Row2', 'Row3'])

Creating a DataFrame from a List of Lists

s1 = pd.Series([23, 25, 22], index=['Ram', 'Shyam', 'Ajay'])

df = pd.DataFrame({'Age': s1, 'City': s2})

# Filtering rows based on a condition

The Index Object

Example of the Index Object in Pandas

# Creating a Series with a default integer index

# Creating a Series with a custom index

# Accessing the index

# Creating a DataFrame with custom row index and column names

6. Setting and Resetting Index

# Setting a new index

# Resetting the index back to default

Key Index Operations:

Arithmetic and Data Alignment

# Arithmetic with lists (element-wise addition using list comprehension)

# Two pandas Series with different indexes

# Element-wise addition (automatically aligns by index)

'C': [10, 11, 12]

# Element-wise addition (aligns by row index and column labels)

Operations between DataFrame and Series

# Creating a Series to subtract from DataFrame (broadcast row-wise)

# Subtract the Series from DataFrame (column-wise operation)

# Adding the Series to each column (element-wise)

# Adding Series to DataFrame (results in NaN for missing 'D')

Function by Element and Function by Row/Column

# Applying a function to square each element in the DataFrame

print("Original DataFrame:\n", df)

Squared DataFrame (element-wise):

2. Column-wise Operation: apply()

3. Row-wise Operation: apply()

print("\nSum of each row:\n", row_sum)

4. Applying Custom Function by Row

print("\nRange (max - min) for each row:\n", row_range)

5. Applying Custom Function by Column

print("\nRange (max - min) for each column:\n", column_range)

Summary of Key Differences:

Skewness measures the asymmetry of the distribution. Positive skew indicates a

You might also like