Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
37 views

Advanced Python Lab

The document provides an introduction to NumPy and Pandas libraries for numerical computing and data analysis in Python. It discusses creating and manipulating NumPy arrays, basic Pandas operations on DataFrames, and concludes with an assignment to demonstrate skills with Pandas data manipulations.

Uploaded by

nixati4707
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Advanced Python Lab

The document provides an introduction to NumPy and Pandas libraries for numerical computing and data analysis in Python. It discusses creating and manipulating NumPy arrays, basic Pandas operations on DataFrames, and concludes with an assignment to demonstrate skills with Pandas data manipulations.

Uploaded by

nixati4707
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Introduction to Numpy and Pandas

Introduction
NumPy, short for Numerical Python, is the cornerstone of numerical computing in Python. It
provides an efficient interface to store and operate on dense data buffers. Pandas, on the other
hand, is built on NumPy and provides high-level data structures and functions designed to
make data analysis fast and easy in Python.
Objective
The aim of this experiment is to gain a fundamental understanding of NumPy and Pandas
libraries. By the end of this session, you will be familiar with:
 Creating and manipulating NumPy arrays.
 Basic operations like indexing and slicing on NumPy arrays.
 Introduction to Pandas and its primary data structure, the DataFrame.
 Basic operations in Pandas like DataFrame creation, indexing, and manipulation.
Theory
 NumPy: This library provides support for large, multi-dimensional arrays and
matrices, along with a large collection of high-level mathematical functions to operate
on these arrays. NumPy arrays are faster and more compact than traditional Python
lists. They provide vectorized arithmetic operations, which are the backbone of data
analysis in Python.
 Pandas: Built on top of NumPy, Pandas is all about data manipulation and analysis. It
introduces two new data structures to Python – Series and DataFrame, both of which
are built on top of NumPy arrays. The DataFrame is particularly important as it allows
for storing and manipulating tabular data in rows of observations and columns of
variables.
Procedure
1. Numpy Arrays, Operations, and Indexing
 Import the NumPy library.
 Create a NumPy array from a Python list.
 Explore basic operations like addition, subtraction, and multiplication.
 Perform indexing and slicing operations on NumPy arrays.
2. Basic Pandas Operations
 Import the Pandas library.
 Create a DataFrame from a dictionary of Python lists or a NumPy array.
 Explore basic DataFrame operations like indexing, adding new columns, and
deleting columns.
 Perform basic data manipulations like sorting and filtering.
Pseudocode
# NumPy Operations
import numpy as np
array = np.array([Python list])
perform_operations(array)

# Pandas Operations
import pandas as pd
dataframe = pd.DataFrame([Python dictionary or NumPy array])
manipulate_dataframe(dataframe)
Conclusion
In this session, you have been introduced to the fundamentals of NumPy and Pandas, two key
libraries in Python used for data analysis. Understanding these libraries is crucial for
handling, manipulating, and analyzing data efficiently in Python.
Further Implementation
To deepen your understanding and skills, try the following:
 Explore more complex array operations in NumPy, like reshaping, stacking, and
splitting.
 Experiment with more advanced features of Pandas like merging, joining data sets,
and working with time series data.
 Apply these skills to a real-world dataset to perform data cleaning, transformation,
and analysis.
Assignment
 Task: Create a DataFrame using Pandas and perform basic data manipulations like
sorting, filtering, and adding new columns.
 Data: Utilize any sample dataset or create your own.
 Objective: Demonstrate your understanding of Pandas operations and data
manipulation techniques.
Advanced Numpy and Pandas
Introduction
Building upon the basic concepts of NumPy and Pandas, this session is designed to explore
more advanced functionalities of these powerful libraries. Advanced operations in NumPy
and complex data manipulations in Pandas form the crux of data analysis and scientific
computing in Python.
Objective
The primary goal is to delve deeper into the functionalities of NumPy and Pandas. By the end
of this session, you should be able to:
 Handle multidimensional arrays and understand broadcasting in NumPy.
 Perform complex operations in Pandas like merging, joining, and concatenating
DataFrames.
Theory
 Advanced NumPy Operations: NumPy's power lies in its ability to perform
vectorized operations, which include broadcasting - a method that allows NumPy to
work with arrays of different shapes when performing arithmetic operations.
 Complex Pandas Operations: Pandas offers extensive capabilities for data
manipulation. Understanding how to merge, join, and concatenate DataFrames is
essential for combining multiple sources of data into a single, coherent dataset.
Procedure
1. Advanced NumPy Operations
 Explore multidimensional arrays and their operations.
 Understand and implement broadcasting to perform operations on arrays of
different sizes.
2. Complex Pandas Operations
 Learn the differences between merging, joining, and concatenating
DataFrames.
 Use these operations to combine multiple datasets into a single DataFrame.
Pseudocode
# Advanced NumPy Operations
import numpy as np
array1 = np.array([...])
array2 = np.array([...])
result = advanced_operations(array1, array2)
# Complex Pandas Operations
import pandas as pd
df1 = pd.DataFrame(...)
df2 = pd.DataFrame(...)
merged_df = complex_merge_join_concat(df1, df2)
Conclusion
This session has provided an in-depth look into more sophisticated aspects of NumPy and
Pandas. Mastering these advanced operations is crucial for handling complex data analysis
tasks and working efficiently with large datasets in Python.
Further Implementation
For extended learning and application:
 Explore advanced NumPy functions like np.linalg for linear algebra operations and
np.fft for Fourier transforms.
 Dive into time series analysis and handling of missing data in Pandas.
 Apply these techniques to larger and more complex datasets, possibly integrating with
other libraries for data visualization and statistical analysis.
Assignment
 Task: Merge at least two DataFrames using Pandas, and then perform complex
manipulations like grouping, sorting, and aggregating data.
 Data: Choose datasets that allow for meaningful merging and analysis.
 Objective: Demonstrate proficiency in complex Pandas operations and your ability to
derive insights from merged datasets.
Data Wrangling
Introduction
Data wrangling, often a preliminary step in the data analysis process, involves cleaning and
unifying messy and complex data sets for easy access and analysis. A key component of data
science, it ensures the data is in a usable and insightful form. The quality of data analysis is
directly dependent on the effectiveness of data wrangling.
Objective
The aim of this experiment is to familiarize students with essential data wrangling techniques,
focusing on:
 Handling missing data and applying data imputation methods.
 Transforming data through normalization and standardization.
Theory
 Handling Missing Data: Data often comes with missing values, which can lead to
inaccurate analyses if not handled properly. Techniques like imputation (filling
missing values) or removal are essential.
 Data Transformation: This includes normalization (scaling data to a range) and
standardization (shifting the distribution of each attribute to have a mean of zero and a
standard deviation of one). These techniques are crucial for modeling as they ensure
that the scales of different features are comparable.
Procedure
1. Handling Missing Data
 Identify missing data in a dataset.
 Apply imputation techniques like mean or median imputation or more
complex methods like using machine learning models to predict missing
values.
 Alternatively, explore data removal strategies where appropriate.
2. Data Transformation
 Implement normalization and standardization techniques.
 Use libraries like Pandas and Scikit-learn to perform these transformations.
Pseudocode
# Handling Missing Data
import pandas as pd
dataframe = pd.read_csv('dataset.csv')
dataframe.fillna(method='impute_method') # or dataframe.dropna()
# Data Transformation
from sklearn.preprocessing import StandardScaler, MinMaxScaler
scaler = StandardScaler() # or MinMaxScaler()
transformed_data = scaler.fit_transform(dataframe)
Conclusion
Through this exercise, students will gain hands-on experience in preparing data for analysis.
Data wrangling is a critical skill in data science, as clean and well-prepared data leads to
more reliable and meaningful analysis results.
Further Implementation
Expand your skills in data wrangling by:
 Exploring more sophisticated imputation techniques, such as k-Nearest Neighbors or
deep learning-based methods.
 Applying these techniques to larger and more complex datasets, possibly with varying
types of data (text, numerical, categorical).
 Integrating these methods into a larger data analysis or machine learning workflow.
Assignment
 Task: Clean a provided dataset using data wrangling techniques. This includes
handling missing values, and performing normalization or standardization.
 Data: A dataset will be provided, which will contain several challenges typical in
real-world data.
 Objective: Successfully prepare the dataset for further analysis, demonstrating your
understanding of data wrangling techniques.
Data Aggregation and Group Operations
Introduction
Data aggregation and group operations are pivotal in data analysis, allowing for the
consolidation of data into meaningful summaries. This process is fundamental in statistical
analysis, enabling the extraction of patterns and insights from large and complex datasets.
Objective
This experiment aims to empower students with the skills to effectively group and aggregate
data. Key focuses include:
 Understanding and implementing data grouping.
 Utilizing aggregation functions.
 Creating pivot tables and cross-tabulations for advanced data summarization.
Theory
 Grouping Data: Involves organizing data into groups based on some criteria. This is
particularly useful in segmenting data into subsets for further analysis.
 Aggregation Functions: These are applied to groups of data, providing a summary
statistic (like sum, mean, median, etc.) of each group.
 Pivot Tables and Cross-tabulations: Pivot tables are used to summarize and
reorganize data in a dataset, while cross-tabulation is a method to quantitatively
analyze the relationship between multiple variables.
Procedure
1. Grouping Data
 Use groupby operations to segment data into subsets.
 Apply various functions to each group independently.
2. Aggregation Functions
 Implement aggregation functions such as sum, mean, count, etc., on grouped
data.
 Explore custom aggregation functions for specific analysis needs.
3. Pivot Tables and Cross-tabulations
 Create pivot tables for multi-dimensional data summarization.
 Utilize cross-tabulation for analyzing the relationship between two or more
variables.
Pseudocode
import pandas as pd
# Grouping Data
dataframe = pd.read_csv('data.csv')
grouped_data = dataframe.groupby('grouping_column')

# Aggregation Functions
aggregated_data = grouped_data.agg(['sum', 'mean', 'count'])

# Pivot Tables and Cross-tabulations


pivot_table = pd.pivot_table(dataframe, values='value_column', index='row_column',
columns='column_column')
cross_tab = pd.crosstab(dataframe['column1'], dataframe['column2'])
Conclusion
Mastering data aggregation and group operations is a cornerstone in data analysis, providing
the ability to distill large datasets into actionable insights. These techniques are invaluable in
making informed decisions based on data.
Further Implementation
To further develop these skills, students can:
 Experiment with different types of aggregations and see how they affect the
interpretation of data.
 Apply these methods to more complex, real-world datasets to understand the nuances
of practical data analysis.
 Integrate visualization techniques to represent aggregated data more effectively.
Assignment
 Task: Aggregate a provided dataset using different grouping and aggregation
techniques.
 Data: A dataset will be provided, suitable for applying various aggregation methods.
 Objective: Draw meaningful insights from the aggregated data, showcasing your
ability to interpret and summarize large datasets.
Time Series Analysis
Introduction
Time series analysis is a statistical technique that deals with time series data, or trend
analysis. Time series data means that data points are collected at different times. It's used in
fields like economics, weather forecasting, and capacity planning to predict future events
based on previously observed values.
Objective
The objective of this experiment is to provide an understanding of the fundamentals of time
series analysis. Students will learn:
 Manipulating time series data using indexing and slicing.
 Techniques such as resampling, frequency conversion, rolling, and expanding
windows.
Theory
 Time Series Data Manipulation: This involves handling data indexed in time order.
Techniques like indexing and slicing are crucial for segmenting the data for analysis.
 Resampling and Frequency Conversion: This refers to converting the frequency of
your time series data (e.g., converting from minutes to hours) and resampling it (e.g.,
aggregating data).
 Rolling and Expanding Windows: Rolling windows are used to perform
calculations over a moving window of observations, while expanding windows allow
for calculations over an expanding window from the start of the time series.
Procedure
1. Time Series Data Manipulation
 Load a time series dataset.
 Perform indexing and slicing to segment the data for specific time periods.
2. Resampling and Frequency Conversion
 Apply resampling methods to change the frequency of the data points.
 Utilize frequency conversion to aggregate or disaggregate data based on time
intervals.
3. Rolling and Expanding Windows
 Implement rolling window operations for moving average or moving sum
calculations.
 Use expanding window calculations for cumulative measures.
Pseudocode
import pandas as pd
# Load Dataset
time_series_data = pd.read_csv('time_series_data.csv', index_col='Time', parse_dates=True)

# Indexing and Slicing


selected_data = time_series_data['start_time':'end_time']

# Resampling and Frequency Conversion


resampled_data = time_series_data.resample('New Frequency').mean()

# Rolling and Expanding Windows


rolling_data = time_series_data.rolling(window_size).mean()
expanding_data = time_series_data.expanding(min_periods=1).mean()
Conclusion
Time series analysis is an essential tool in many fields for making predictions based on
historical data. Understanding these concepts and techniques is crucial for analyzing temporal
data effectively.
Further Implementation
To deepen your understanding, you could:
 Explore more advanced time series analysis techniques like time series decomposition
and forecasting models.
 Apply these techniques to different types of time series data, such as financial,
environmental, or social media trend data.
 Integrate time series analysis into larger data projects, using it to inform decision-
making processes or predictive models.
Assignment
 Task: Conduct a time series analysis on a provided dataset.
 Data: A real-world time series dataset will be provided, encompassing various
complexities.
 Objective: Use time series manipulation, resampling, and rolling/expanding window
techniques to analyze the dataset and extract meaningful insights.
Advanced Time Series Analysis
Introduction
Advanced time series analysis involves more sophisticated techniques that extend beyond
basic trend analysis. This includes dealing with time zones, converting periods and
frequencies, and most importantly, forecasting future values based on historical data. These
techniques are crucial in fields like finance, meteorology, and economics, where predicting
future trends is essential.
Objective
This experiment is aimed at exploring advanced time series analysis techniques, with a focus
on:
 Handling time zones in time series data.
 Converting between different periods and frequencies.
 Understanding and applying time series forecasting methods.
Theory
 Time Zone Handling: This is crucial for datasets that span multiple time zones. It
involves converting timestamps from one time zone to another and normalizing time
series data across different time zones.
 Period and Frequency Conversion: This refers to changing the granularity of time
series data, such as converting from a monthly to a quarterly time series.
 Time Series Forecasting: This involves using historical data to predict future values.
Methods range from simple moving averages to complex models like ARIMA
(AutoRegressive Integrated Moving Average).
Procedure
1. Time Zone Handling
 Convert time series data to a specific time zone.
 Normalize time series data across various time zones.
2. Period and Frequency Conversion
 Change the frequency of time series data using resampling or asfreq methods.
 Convert time series data into different time periods for analysis.
3. Time Series Forecasting
 Select and apply appropriate forecasting models.
 Evaluate the accuracy of the forecasts.
Pseudocode
import pandas as pd
# Load Dataset
time_series_data = pd.read_csv('time_series_data.csv', index_col='Time', parse_dates=True)

# Time Zone Handling


time_series_data_tz = time_series_data.tz_localize('UTC').tz_convert('America/New_York')

# Period and Frequency Conversion


resampled_data = time_series_data.resample('Q').mean() # Quarterly resampling
period_data = time_series_data.to_period('M')

# Time Series Forecasting


from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(time_series_data, order=(p, d, q))
forecast = model.fit().forecast(steps=n_steps)
Conclusion
Advanced time series analysis offers a deeper understanding of temporal data patterns and
enhances the ability to forecast future trends. Mastering these techniques is critical in many
domains where predictions based on time series data are essential.
Further Implementation
For extended learning, consider:
 Exploring different time series forecasting models like SARIMA, Holt-Winters, and
machine learning-based methods.
 Applying these techniques to a range of time series data types, such as stock prices,
weather data, or sales figures.
 Integrating forecast results into decision-making processes in business or policy
formulation.
Assignment
 Task: Use advanced time series techniques to forecast future values of a provided
dataset.
 Data: A dataset with complex time series characteristics will be provided.
 Objective: Apply time zone handling, period conversion, and forecasting methods to
predict future trends in the dataset.
Data Visualization
Introduction
Data visualization is the graphical representation of information and data. By using visual
elements like charts, graphs, and maps, data visualization tools provide an accessible way to
see and understand trends, outliers, and patterns in data. In this session, we focus on two
popular Python libraries for data visualization: Matplotlib and Seaborn.
Objective
The main objective of this experiment is to learn how to create and customize visual
representations of data using Matplotlib and Seaborn. The activities include:
 Understanding the basic usage of Matplotlib and Seaborn.
 Creating a variety of plots (like line plots, scatter plots, histograms, and bar plots).
 Customizing plots for better readability and visual appeal.
Theory
 Matplotlib: A plotting library for the Python programming language and its
numerical mathematics extension, NumPy. It provides an object-oriented API for
embedding plots into applications.
 Seaborn: Built on top of Matplotlib, Seaborn is a higher-level interface for drawing
attractive and informative statistical graphics.
Procedure
1. Introduction to Matplotlib and Seaborn
 Install and import Matplotlib and Seaborn libraries.
 Understand the basic components of a plot in Matplotlib.
2. Creating Various Types of Plots
 Use Matplotlib to create basic plots like line plots, scatter plots, and
histograms.
 Explore Seaborn for creating more complex plots like box plots and violin
plots.
3. Customizing Visualizations
 Learn how to customize plots with titles, labels, legends, and color schemes.
 Understand the importance of plot aesthetics for clear data representation.
Pseudocode
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
# Load Dataset
data = pd.read_csv('dataset.csv')

# Creating a Line Plot using Matplotlib


plt.figure(figsize=(10,6))
plt.plot(data['x_column'], data['y_column'], label='Line 1')
plt.title('Line Plot Title')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.legend()
plt.show()

# Creating a Histogram using Seaborn


sns.histplot(data['column_name'], kde=True)
plt.show()
Conclusion
Data visualization is a powerful tool for communicating information clearly and effectively.
By the end of this experiment, students will have a foundational understanding of creating
and customizing data visualizations using Matplotlib and Seaborn.
Further Implementation
Students are encouraged to:
 Explore more advanced plotting techniques and types of visualizations.
 Apply these visualization techniques to larger and more complex datasets.
 Integrate data visualization into data analysis projects to enhance data storytelling.
Assignment
 Task: Visualize a provided dataset using different types of plots. At least one plot
should be created using Matplotlib, and one using Seaborn.
 Data: A dataset will be provided that includes a variety of data types suitable for
different plots.
 Objective: Demonstrate the ability to choose appropriate plot types for different data
and customize the visualizations for clarity and appeal.
Web Scraping
Introduction
Web scraping is a technique used to extract large amounts of data from websites whereby the
data is extracted and saved to a local file or database for further analysis. In this session, we
will focus on using Python with libraries like BeautifulSoup to perform web scraping.
Objective
The main objective of this experiment is to introduce students to the basics of extracting data
from the web. Key activities include:
 Learning how to use Python and BeautifulSoup for web scraping.
 Understanding how to extract data from web pages and handle different data formats.
Theory
 Web Scraping: It involves programmatically retrieving data from the Internet. This
technique is useful for gathering data from websites that do not offer an API for data
access.
 BeautifulSoup: A Python library for pulling data out of HTML and XML files. It
creates parse trees that are helpful to extract the data easily.
Procedure
1. Introduction to Web Scraping Using BeautifulSoup
 Install and import necessary Python libraries (like BeautifulSoup and
requests).
 Understand the structure of a webpage by inspecting HTML elements.
2. Extracting Data from Web Pages
 Use BeautifulSoup to parse HTML content.
 Extract specific data (like text, links, and other attributes) from HTML
elements.
3. Handling Different Data Formats
 Store the scraped data in a desired format (like CSV, JSON, or a database).
Pseudocode
from bs4 import BeautifulSoup
import requests

# Fetch Web Page


url = 'https://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Extract Data
for element in soup.find_all('tag_name'):
data = element.get('attribute')

# Store Data
with open('output.csv', 'w') as file:
file.write(data)
Conclusion
Web scraping is a valuable skill for data scientists, allowing them to gather and utilize data
from the web efficiently. This experiment gives a foundational understanding of how to
extract and handle web data using Python.
Further Implementation
After mastering the basics, students can:
 Explore more advanced scraping techniques, handling dynamic websites using
libraries like Selenium.
 Implement error handling and respect the legality and ethical aspects of web scraping.
 Integrate web scraping into larger data analysis projects.
Assignment
 Task: Scrape data from a specified webpage and organize it into a structured format
like a CSV file.
 Data: A webpage URL will be provided. Students must identify relevant data to be
scraped.
 Objective: Demonstrate the ability to efficiently extract web data and process it into a
usable format.

You might also like