DA Assignmnet 4 Based On Format - Solution

GITA AUTONOMOUS COLLEGE, BHUBANESWAR
(Affiliated to BPUT, Odisha)
Assignment 4
Subject Name: Data Analytics Dept: CSE
Name of Faculty: REDNAM S S JYOTHI
Branch: CSE1,2 Semester: VII
Part I
01 Short Answer Type Questions (2 Marks) Module BL CO
1 What is data visualization? What are the basic principles 4 1,2 1
of data visualization
Data visualization is the graphical representation of inform
ation and data. By using visual elements like charts, graph
s, and maps, data visualization tools provide an accessible
way to see and understand trends, outliers, and patterns in
data.
Basic Principles of Data Visualization:
1. Clarity and Simplicity:
 Aim for clear and straightforward visuals th
at are easy to understand.
 Avoid clutter and unnecessary elements tha
t can distract from the main message.
2. Accuracy:
 Ensure that the visual representation of the
data accurately reflects the underlying infor
mation.
 Avoid distortions that can mislead the view
er.
3. Consistency:
 Use consistent colors, fonts, and shapes to
create a cohesive and understandable visual
ization.
 Maintain uniform scales and units across m
ultiple charts for easy comparison.
4. Focus and Emphasis:
 Highlight the most important data points or
trends to draw the viewer’s attention.
 Use techniques like color contrast, size vari
ation, and annotations to emphasize key ins
ights.
5. Context and Relevance:
 Provide context to help the viewer understa
nd the data, such as labels, legends, and titl
es.
 Ensure the visualization is relevant to the a
udience and the purpose of the analysis.
6. Interactivity:
 Implement interactive elements that allow u
sers to explore the data in more depth.
 Features like zooming, filtering, and tooltip
s can enhance the user experience.
7. Storytelling:
 Use visuals to tell a compelling story that g
uides the viewer through the data.
2 What are some common pitfalls in data visualization? 4 1,3 1

Data visualization is a powerful tool for conveying comple
x information, but it can go awry if not done carefully. He
re are some common pitfalls to watch out for:
1. Overloading with Information: Too much data cra
mmed into one visualization can overwhelm and c
onfuse the audience. Stick to the most relevant info
rmation.
2. Misleading Scales: Using inconsistent scales or not
starting at zero can distort the interpretation. Keep
scales uniform and clear.
3. Ignoring Context: A chart without context (like lab
els, legends, or titles) leaves viewers guessing. Pro
vide necessary explanations to make the data under
standable.
4. Poor Color Choices: Colors that don’t contrast wel
l or are overly vibrant can make the visualization h
ard to read. Opt for a balanced color palette.
5. Overcomplicating Design: Fancy, complex designs
can detract from the data itself. Simplicity often co
mmunicates more effectively.
6. Improper Chart Types: Using the wrong type of ch
art (e.g., pie chart instead of bar chart) can misrepr
esent the data. Match the chart type to the data you
’re presenting.
7. Lack of Interactivity: Static visuals can limit the de
pth of insight. Incorporating interactive elements al
lows users to explore the data further.
8. Ignoring Accessibility: Failing to consider color bli
ndness or other visual impairments can exclude par
t of your audience. Use accessible design principle
s.
Avoid these pitfalls to ensure your data visualizations are
clear, accurate, and effective at communicating insights.
3 What are some popular data visualization tools and their 4 1,2 1
strengths/weaknesses?
some popular data visualization tools, their strengths, and
weaknesses:
1. Tableau:
Strengths: User-friendly, fast, and seamless data
connection.
Weaknesses: Limited advanced analytics capabilities.
2. Power BI:
Strengths: Integrates well with Microsoft products, robust
analytics capabilities.
Weaknesses: Steeper learning curve.
3. D3.js:
Strengths: Highly customizable, web-based interactive
visualizations.
Weaknesses: Requires programming expertise.
4. Matplotlib:
Strengths: Popular Python library, highly customizable.
Weaknesses: Steeper learning curve.
5. Seaborn:
Strengths: Built on top of Matplotlib, easy to create
informative visualizations.
Weaknesses: Limited interactivity.
6. Plotly:
Strengths: Interactive visualizations, easy to use.
Weaknesses: Limited advanced analytics capabilities.
Part II
Focused – Short answer type Questions (4 Marks) Module BL CO
1 How would you visualize to communicate insights to a 4 1,2 1
non-technical audience?
1. Keep it simple: Use clear and concise language,
avoiding technical jargon.
2. Use intuitive charts: Select charts that are easy to
understand, such as bar charts, line charts, or scatter plots.
3. Focus on key insights: Highlight the most important
findings and trends.
4. Use visual hierarchy: Organize visuals to guide the
audience's attention.
5. Use colors effectively: Choose colors that are accessible
and easy to distinguish.
6. Add context: Provide context to help the audience
understand the data.
7. Use storytelling techniques: Create a narrative around
the insights.
8. Use interactive visualizations: Allow the audience to
explore the data themselves.
9. Provide recommendations: Offer actionable advice
based on the insights.
10. Practice your presentation: Ensure you can effectively
communicate the insights and visualize.
2 Create your own visualization data? 4 2,3
step 1: Install Dependencies
pip install pandas matplotlib
import pandas as pd
Step 2: Create Sample Data
# Creating sample data
data = {
'Region': ['North', 'South', 'East', 'West'],
'Sales': [15000, 20000, 25000, 10000]
}
df = pd.DataFrame(data)
print(df)
import matplotlib.pyplot as plt
Step 3: Visualize the Data
# Plotting the data
plt.figure(figsize=(10, 5))
plt.bar(df['Region'], df['Sales'], color=['blue', 'green', 'red',
'purple'])
plt.xlabel('Region')
plt.ylabel('Sales')
plt.title('Sales Performance by Region')
plt.show()
This will generate a simple bar chart showing the sales per
formance across different regions.
Part III
Long Answer type Questions (10 Marks) Module BL CO
1 a) What are the key considerations when visualizing high- 4 1,2 1
dimensional data?
In the era of big data, the ability to visualize high-
dimensional data has become increasingly important.
High-dimensional data refers to datasets with a
large number of features or variables. Visualizing
such data can be challenging due to the complexity and
the curse of dimensionality. However, several techniques
have been developed to help data scientists and analysts
make sense of high-dimensional data.
Techniques for Visualizing High Dimensional Data
1. Principal Component Analysis (PCA)
2. t-Distributed Stochastic Neighbor Embedding
3. Parallel Coordinates
4. Radial Basis Function Networks (RBFNs)
5. Uniform Manifold Approximation and Projection
(UMAP)
Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is a dimensionality
reduction technique that transforms high-dimensional data
into a lower-dimensional form while preserving as much
variance as possible. PCA achieves this by identifying the
principal components, which are the directions in which
the data varies the most. Python packages like scikit-learn
are used in its implementation.
How to Use PCA?
1. Standardize the Data: Ensure that each feature has
a mean of zero and a standard deviation of one.
2. Compute the Covariance Matrix: This matrix
captures the relationships between different
features.
3. Calculate Eigenvalues and Eigenvectors: These
help identify the principal components.
4. Transform the Data: Project the original data onto
the principal components.
When to Use?
 Appropriate for reducing linear dimensionality.
 Effective when a significant amount of the
variation can be explained by the first few primary
components.
Implementing Principal Component Analysis (PCA)
Consider a dataset with 100 samples and 50 features each.
By applying PCA, you might reduce it to 2 or 3 principal
components, which can then be plotted in a 2D or 3D
scatter plot.
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
# Generating a sample high-dimensional dataset

# For example, creating 100 samples with 50 features
each
np.random.seed(42)
data = np.random.rand(100, 50)
# Applying PCA to reduce the dataset to 2 dimensions

pca = PCA(n_components=2)
transformed_data = pca.fit_transform(data)
# Plotting the transformed data

plt.scatter(transformed_data[:, 0], transformed_data[:, 1])
plt.title('PCA of High-Dimensional Data')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.show()
b) How do you evaluate the effectiveness of a data 4 1,3 1
visualization?
To evaluate the effectiveness of a data visualization,
consider the following:
1. Clarity: Is the message clear and easy to understand?
2. Accuracy: Does the visualization accurately represent
the data?
3. Relevance: Does the visualization address the intended
question or problem?
4. Insight: Does the visualization provide new insights or
understanding?
5. Engagement: Does the visualization capture the
audience's attention?
6. Storytelling: Does the visualization tell a clear and
compelling story?
7. Design: Is the visualization aesthetically pleasing and
well-designed?
8. Interactivity: Does the visualization allow for
exploration and interaction?
9. Context: Does the visualization provide sufficient
context for understanding?
10. Feedback: Does the visualization elicit feedback or
discussion?
Additionally, consider:
- Audience: Does the visualization meet the needs and
understanding of the intended audience?
- Purpose: Does the visualization achieve its intended
purpose?
- Data quality: Is the data accurate, complete, and reliable?
- Visualization type: Is the visualization type appropriate
for the data and message?
NB: BL – Blooms Level 1, 2, 3, CO – Course Outcome, PO – Program Outcome, PSO – Program Specific
Outcome

DA Assignmnet 4 Based On Format - Solution

Uploaded by

Copyright:

Available Formats

DA Assignmnet 4 Based On Format - Solution

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DA Assignmnet 4 Based On Format - Solution

Uploaded by

Copyright:

Available Formats

GITA AUTONOMOUS COLLEGE, BHUBANESWAR

(Affiliated to BPUT, Odisha)

2 What are some common pitfalls in data visualization? 4 1,3 1

# Generating a sample high-dimensional dataset

# Applying PCA to reduce the dataset to 2 dimensions

# Plotting the transformed data

You might also like