0% found this document useful (0 votes)

34 views

Python CA2

Here are three examples where Seaborn would be preferred over Matplotlib for data visualization: 1. Visualizing statistical relationships between variables: Seaborn excels at visualizing statistical relationships through plots like scatterplots, line plots, bar plots, etc. It makes it easier to visualize how variables are distributed and relate to each other through built-in statistical visualization functions. 2. Working with relational or categorical data: Seaborn has special support for categorical data through factor plots, joint plots, box plots, swarmplots etc. It makes visualizing relationships between categorical and continuous variables more intuitive compared to Matplotlib. 3. High-level abstractions for common statistical plots: Seaborn provides high

Uploaded by

raushan78889

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Python CA2

Uploaded by

raushan78889

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

COURSE TITLE: BUSINESS ANALYTICS-I

COURSE CODE: MGNM801 ACADEMIC TASK TYPE: INDIVIDUAL

ACADEMIC TASK: 02 DATE OF SUBMISSION: 23-12-23

DATE OF ALLOTMENT: GUIDED BY: PUNEET PAHADIA

NAME: RAUSHAN KUMAR REGISTRATION NO: 12313538

SECTION: Q2327 ROLL NO: RQ2327A08

Declaration:
I declare that this Assignment is my individual work. I have not copied it from any other student’s
work or from any other source except where due acknowledgement is made explicitly in the text,
nor has any part been written for me by any other person.

EVALUATOR’S COMMENTS (FOR INSTRUCTOR’S USE ONLY)

GENERAL OBSERVATIONS SUGGESTIONS FOR BEST PART OF

IMPROVEMENT ASSIGNMENTS

EVALUATOR’S SIGNATURE AND DATE

MARKS OBTAINED - __ MAX. MARKS - _

UNIT 4
1 PART 1: PANDAS

1.1 Q1. LIST AT LEAST THREE REAL-WORLD SCENARIOS WHERE PANDAS CAN BE USED FOR DATA ANALYSIS.

1.2 EXPLAIN THE SPECIFIC USE CASES IN EACH SCENARIO.

Some real-world scenarios where Pandas can be used for data analysis:

1.Sports: Analyzing player performance statistics, tracking team trends, identifying factors that
contribute to wins and losses, optimizing training and strategies.

Pandas use:

• Loading and cleaning data from various sources (game scores, player statistics, sensor
readings, etc.).
• Calculating key performance metrics (averages, shooting percentages, assists, rebounds,
etc.).
• Visualizing trends and patterns (player performance over time, team comparisons, win-loss
distributions).
• Building predictive models to forecast player performance, game outcomes, or injury risks.

2. Social Media: Understanding user behaviour, identifying popular topics and trends, analyzing
sentiment and engagement, optimizing marketing campaigns.

Pandas use:

• Collecting and preparing social media data (tweets, posts, comments, likes, shares).
• Cleaning and preprocessing text data (removing noise, handling
emojis, stemming/lemmatizing words).
• Conducting sentiment analysis (classifying positive, negative, or neutral sentiment in text).
• Identifying trending topics and influencers.
• Visualizing social network structures and interactions.

3. HR Analytics for Employee Performance: An HR manager wants to assess employee

performance by analyzing data related to key performance indicators (KPIs), training records, and
employee feedback.

Pandas use:

• Import and merge HR datasets using Pandas to consolidate information on employee

performance, training records, and feedback.
• Aggregate and summarize data to gain insights into performance at various levels, such as
individual, team, or department.
• Apply time-series analysis with Pandas to identify trends in employee performance over
different time periods.
• Utilize Pandas to calculate key performance metrics, such as productivity scores, completion
rates for training programs, and overall performance indicators.

1.3 Q2. DESCRIBE THE PRIMARY DATA STRUCTURES IN PANDAS, NAMELY SERIES AND DATAFRAME. EXPLAIN
THE DIFFERENCES AND USE CASES FOR EACH.

Here's a description of Series and DataFrame, the primary data structures in Pandas, along with their
differences and use cases:

Series:

• One-dimensional array of labeled data: Think of it as a single column in a spreadsheet.

• Holds any data type: Numbers, strings, dates, booleans, or even custom objects.

• Two key components:

o Values: The actual data elements.

o Index: A label for each value, often used for selection and alignment.

Use cases:

• Representing a single feature or variable in a dataset.

• Storing time series data (e.g., stock prices over time).

• Creating dictionaries with meaningful keys.

DataFrame:

• Two-dimensional labeled data structure: Think of it as a spreadsheet or a table.

• Collection of Series objects: Each column is a Series, and each row represents an
observation.

• Can have columns of different data types.

• Two key components:

o Rows: Represent individual observations or records.

o Columns: Represent different variables or features.

o Index: Labels for both rows and columns, enabling flexible access and manipulation.

Use cases:

• Representing tabular data, such as datasets imported from CSV, Excel, or databases.

• Storing and analyzing multivariate data with multiple features.

• Performing operations like filtering, grouping, aggregation, and joining on tabular data.

2 PART 2: NUMPY

2.1 Q1. WRITE A BRIEF DESCRIPTION OF WHAT NUMPY IS AND WHY IT IS IMPORTANT FOR SCIENTIFIC
COMPUTING AND DATA ANALYSIS IN PYTHON

NumPy, short for Numerical Python, is a fundamental library for numerical computing in Python. It
provides support for large, multi-dimensional arrays and matrices, along with a collection of high-
level mathematical functions to operate on these arrays.

Key features and reasons why NumPy is important for scientific computing and data analysis in
Python include:

Efficient Array Operations: NumPy provides a powerful N-dimensional array object (ndarray), which
allows for efficient storage and manipulation of large datasets. The ndarray supports a variety of
data types and enables vectorized operations, which significantly enhances the performance of
numerical computations.

Mathematical Functions: NumPy includes a comprehensive set of mathematical functions that

operate element-wise on arrays. These functions are optimized for performance and are crucial for
scientific computations, linear algebra, signal processing, and more. Examples include trigonometric,
logarithmic, statistical, and linear algebra functions.

Broadcasting: NumPy's broadcasting capability enables operations on arrays of different shapes and
sizes, making it easier to perform element-wise operations without the need for explicit loops. This
enhances code readability and reduces the need for unnecessary duplication of data.

Memory Management: NumPy efficiently manages memory and provides tools for creating views on
arrays without copying data, saving both time and resources. This is particularly beneficial when
working with large datasets, as it minimizes memory overhead.

2.2 Q2. EXPLAIN THE SIGNIFICANCE OF NUMPY IN TERMS OF PERFORMANCE AND EFFICIENCY WHEN WORKING
WITH LARGE DATASETS AND NUMERICAL COMPUTATIONS.

When it comes to handling large datasets and complex numerical computations in Python, NumPy
reigns supreme in terms of performance and efficiency. Here's why:

1. Memory Efficiency:

▪ Contiguous Memory Layout: NumPy stores data in contiguous blocks of memory, unlike
Python lists which can be scattered. This allows for faster access and manipulation of
elements as data doesn't need to be searched across memory fragments.
▪ Optimized Data Types: NumPy offers specialized data types like float64 or int32 designed for
numerical operations. These are more compact and efficient than generic Python types like
"float" or "int", reducing memory footprint and boosting processing speed.

2. Vectorized Operations:

▪ Single Instruction, Multiple Data (SIMD): NumPy leverages vectorized operations, utilizing
SIMD instructions on modern CPUs. This allows performing the same operation on multiple
data elements simultaneously, leading to significant speedups compared to looping over
elements one by one.
▪ Broadcasting: NumPy automatically broadcasts operations between arrays of different sizes,
eliminating the need for manual loop-based iteration and further enhancing performance.

3. C-optimized Backend:

NumPy relies heavily on optimized C code under the hood, making it significantly faster than pure
Python implementations. This C code takes advantage of hardware capabilities and low-level
memory access, further pushing the boundaries of performance.

4. Reduced Code Complexity:

Concise syntax: NumPy provides vectorized functions and operators that eliminate the need for long
and intricate loops, simplifying code and making it more readable. This not only improves efficiency
but also reduces the risk of errors.

Overall, NumPy's efficiency advantages translate to:

• Faster execution times: Analyzing large datasets and performing complex calculations
become significantly faster with NumPy compared to pure Python or other less optimized
libraries.
• Reduced CPU and memory usage: Smaller memory footprint and efficient computations
translate to lower resource consumption, enabling smooth processing of even massive
datasets on smaller machines.
• Simplified code and easier maintenance: Concise and readable code thanks to vectorization
improves maintainability and reduces debugging time.
UNIT 5
3 DATA VISUALIZATION

3.1 Q1. CREATE A MATPLOTLIB BAR PLOT SHOWING THE SALES OF PRODUCTS IN A STORE FOR A GIVEN
MONTH. LABEL THE AXES, ADD A TITLE, AND CUSTOMIZE THE APPEARANCE (E.G., COLOUR, WIDTH)

Code:
3.2 Q2. PROVIDE AT LEAST THREE EXAMPLES OF DATA VISUALIZATION SCENARIOS WHERE SEABORN IS THE
PREFERRED LIBRARY OVER MATPLOTLIB. DESCRIBE THE TYPE OF PLOTS OR CHARTS INVOLVED AND WHY
SEABORN IS A BETTER CHOICE.

Here are three examples where Seaborn often excels over Matplotlib for specific visualization tasks:
1. Visualizing Statistical Relationships:

• Plot types: Pair plots, joint plots, distributions, heatmaps, violin plots

• Why Seaborn is better:

o Simplifies creation of multi-faceted plots with minimal code.

o Integrates statistical estimation and visual representation for informative plots.

o Automatically handles data alignment and labeling for complex relationships.

o Offers aesthetically pleasing default styles and color palettes.

Example: Visualizing correlations between multiple variables in a dataset using a pair plot, revealing
patterns and potential interactions.
2. Exploring Categorical Data:

• Plot types: Bar plots, box plots, violin plots, strip plots, point plots

• Why Seaborn is better:

o Simplifies visualization of multi-categorical data with automatic grouping and visual

distinction.

o Provides informative summaries of distributions within categories, highlighting

outliers and central tendencies.

o Offers built-in estimation of confidence intervals and statistical significance.

Example: Comparing distributions of customer satisfaction scores across different product categories
using box plots to identify potential issues.
3. Handling Data with Facets:

• Plot types: FacetGrids, relplot, catplot, jointplot

• Why Seaborn is better:

o Simplifies creation of multi-panel plots to visualize relationships across multiple

variables or groups.

o Automatically handles data splitting and layout, ensuring consistent visual

comparisons.

o Offers flexible customization options for facet arrangement and labeling.

Example: Comparing sales trends across regions and product categories using a faceted line plot to
identify regional differences and potential market opportunities.
In summary, Seaborn shines when:

• Statistical relationships and distributions are central to the analysis.

• Categorical data requires clear comparisons and summaries.

• Multi-faceted visualizations are needed to explore complex interactions.

• Aesthetic appeal and concise code are desired for effective communication.

UNIT 6
4 DESCRIBE THE THREE KEY STRUCTURES IN PLOTLY:

4.1 Q1. FIGURE, DATA, AND LAYOUT. EXPLAIN THE PURPOSE OF EACH STRUCTURE IN CREATING
VISUALIZATIONS.

1. Figure:
• The overall container: It acts as the canvas or window that holds all the elements of your
visualization.

• Foundation for visual elements: It provides the space where you'll create and arrange
plots, axes, titles, legends, annotations, and other visual components.

• Management and customization: It allows you to manage the overall size, aspect
ratio, background color, and other stylistic properties of the entire visualization.

2. Data:
• The heart of the visualization: It consists of the numerical values, categorical information, or
text that you want to visualize.

• Source and format: It can come from various sources like arrays, dataframes, or external
files, and it's typically structured in a format that visualization libraries can understand.

• Mapping to visual elements: It's used to create the visual representations within the
figure, such as bars in a bar chart, lines in a line plot, or points in a scatter plot.

3. Layout:
• Organization and arrangement: It determines the spatial arrangement of visual elements
within the figure, ensuring clarity and readability.

• Grid-based or hierarchical: It can involve a grid-based layout for multiple subplots or a

hierarchical structure for nested plots.

• Customization and control: It allows you to adjust spacing, margins, alignment, and the
overall visual hierarchy of elements to effectively guide the viewer's attention.
How they work together:

1. Create a figure: You typically start by creating a figure object to establish the overall
container for your visualization.

2. Load and prepare data: You then load your data, ensuring it's in a suitable format for the
visualization library you're using.

3. Map data to visual elements: You create visual elements like plots, axes, and
markers, mapping the data to their properties (e.g., x-axis values, y-axis values, colors, sizes).

4. Arrange elements within layout: You position and organize these visual elements within the
figure using layout tools, ensuring a clear and informative presentation.

5. Customize appearance: You can apply stylistic choices to both the figure and individual
elements to enhance readability and visual appeal.

4.2 Q2. LOAD A SALES DATASET WITH COLUMNS 'SALES,' CREATE A PLOTLY LINE CHART TO VISUALIZE THE
TOTAL SALES TREND. INCLUDE AXIS LABELS, A TITLE, AND CUSTOMIZE THE APPEARANCE.
Code

Python Ca22
No ratings yet
Python Ca22
14 pages
MGNM801 Ca2 Final
No ratings yet
MGNM801 Ca2 Final
13 pages
Saurabh mgnm801 Ca2
No ratings yet
Saurabh mgnm801 Ca2
13 pages
All Units MAAL BDA - Chatgpt
No ratings yet
All Units MAAL BDA - Chatgpt
17 pages
Data Science Notes Structured FINAL v2
No ratings yet
Data Science Notes Structured FINAL v2
9 pages
PAM UNIT 1 (1)
No ratings yet
PAM UNIT 1 (1)
37 pages
BIDA_Thoerypdf
No ratings yet
BIDA_Thoerypdf
9 pages
DWDM
No ratings yet
DWDM
5 pages
DATA ANALYTICS
No ratings yet
DATA ANALYTICS
6 pages
Python For Data Science
No ratings yet
Python For Data Science
22 pages
data science unit 1
No ratings yet
data science unit 1
30 pages
MachineLearningusingPython
No ratings yet
MachineLearningusingPython
18 pages
MIT 302 - Statistical Computing II - Tutorial 01
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 01
6 pages
A32
No ratings yet
A32
4 pages
Assingment:-2 Submitted To: - Mandeep Ma'Am Submitted By: - Nishant Ruhil UID:-17BCA1513 GROUP:-4 Class: - Bca-4D
No ratings yet
Assingment:-2 Submitted To: - Mandeep Ma'Am Submitted By: - Nishant Ruhil UID:-17BCA1513 GROUP:-4 Class: - Bca-4D
6 pages
Data Science Module 1 q & A
No ratings yet
Data Science Module 1 q & A
16 pages
DATA ENGINEERING LAB
No ratings yet
DATA ENGINEERING LAB
6 pages
Pentaho OLAP Design Guidelines
No ratings yet
Pentaho OLAP Design Guidelines
9 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
Bvraju Institute of Technology, Narsapur: Code No: A46H2
No ratings yet
Bvraju Institute of Technology, Narsapur: Code No: A46H2
8 pages
Predictive modeling (1)
No ratings yet
Predictive modeling (1)
27 pages
DW Unit IV Notes
No ratings yet
DW Unit IV Notes
36 pages
A project based on Python
No ratings yet
A project based on Python
17 pages
Unit 5 PythonPackages (Numpy,Pandas,Tkinter)
No ratings yet
Unit 5 PythonPackages (Numpy,Pandas,Tkinter)
68 pages
Assignment - Da Aryan
No ratings yet
Assignment - Da Aryan
5 pages
CUSTOMER SEGMENTATION 2
No ratings yet
CUSTOMER SEGMENTATION 2
19 pages
Assignment - Da PDFFF
No ratings yet
Assignment - Da PDFFF
5 pages
ML QB Answers
No ratings yet
ML QB Answers
11 pages
Module 4 Data Science
No ratings yet
Module 4 Data Science
42 pages
DSBDA - Mini Project Report
100% (1)
DSBDA - Mini Project Report
7 pages
DATA ANALYSIS USING PYTHON2
No ratings yet
DATA ANALYSIS USING PYTHON2
27 pages
Chat GPT
No ratings yet
Chat GPT
32 pages
DPT Week 1
No ratings yet
DPT Week 1
3 pages
Data warehouse unit 4 complete
No ratings yet
Data warehouse unit 4 complete
21 pages
First
No ratings yet
First
35 pages
Data Visualization Thesis
No ratings yet
Data Visualization Thesis
7 pages
Da Unit Ii
No ratings yet
Da Unit Ii
25 pages
21CS64 Data Science and Visualization (PE)
No ratings yet
21CS64 Data Science and Visualization (PE)
37 pages
Big Data Framework Final Project
No ratings yet
Big Data Framework Final Project
2 pages
Final Revision Q2
No ratings yet
Final Revision Q2
8 pages
733702205-DSBDA-Mini-Project-Report
No ratings yet
733702205-DSBDA-Mini-Project-Report
9 pages
Programming For Data Science
No ratings yet
Programming For Data Science
48 pages
BIG DATA QUESTIONS
No ratings yet
BIG DATA QUESTIONS
5 pages
Walmart Data Analyst Interview Experience
No ratings yet
Walmart Data Analyst Interview Experience
10 pages
Data Science Project Proposal guidelines
No ratings yet
Data Science Project Proposal guidelines
11 pages
Micro project
No ratings yet
Micro project
20 pages
stl
No ratings yet
stl
2 pages
DWM Notes
No ratings yet
DWM Notes
19 pages
Interim Report week 9
No ratings yet
Interim Report week 9
4 pages
BI module 3
No ratings yet
BI module 3
10 pages
Iare DWDM PPT Cse
No ratings yet
Iare DWDM PPT Cse
249 pages
ML_lab
No ratings yet
ML_lab
30 pages
Data Analyst Course
No ratings yet
Data Analyst Course
8 pages
Data Structures and Algorithms: Aamir Zia
No ratings yet
Data Structures and Algorithms: Aamir Zia
19 pages
Time Series
No ratings yet
Time Series
27 pages
Unit 2_V2_Data Science
No ratings yet
Unit 2_V2_Data Science
23 pages
DM UNIT -2
No ratings yet
DM UNIT -2
10 pages
LTIMINDTREE INTERVIEW PREPARATIONS
No ratings yet
LTIMINDTREE INTERVIEW PREPARATIONS
7 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet