DA Assignmnet 4 Based On Format - Solution
DA Assignmnet 4 Based On Format - Solution
DA Assignmnet 4 Based On Format - Solution
Assignment 4
Subject Name: Data Analytics Dept: CSE
Name of Faculty: REDNAM S S JYOTHI
Branch: CSE1,2 Semester: VII
Part I
01 Short Answer Type Questions (2 Marks) Module BL CO
1 What is data visualization? What are the basic principles 4 1,2 1
of data visualization
Data visualization is the graphical representation of inform
ation and data. By using visual elements like charts, graph
s, and maps, data visualization tools provide an accessible
way to see and understand trends, outliers, and patterns in
data.
Basic Principles of Data Visualization:
1. Clarity and Simplicity:
Aim for clear and straightforward visuals th
at are easy to understand.
Avoid clutter and unnecessary elements tha
t can distract from the main message.
2. Accuracy:
Ensure that the visual representation of the
data accurately reflects the underlying infor
mation.
Avoid distortions that can mislead the view
er.
3. Consistency:
Use consistent colors, fonts, and shapes to
create a cohesive and understandable visual
ization.
Maintain uniform scales and units across m
ultiple charts for easy comparison.
4. Focus and Emphasis:
Highlight the most important data points or
trends to draw the viewer’s attention.
Use techniques like color contrast, size vari
ation, and annotations to emphasize key ins
ights.
5. Context and Relevance:
Provide context to help the viewer understa
nd the data, such as labels, legends, and titl
es.
Ensure the visualization is relevant to the a
udience and the purpose of the analysis.
6. Interactivity:
Implement interactive elements that allow u
sers to explore the data in more depth.
Features like zooming, filtering, and tooltip
s can enhance the user experience.
7. Storytelling:
Use visuals to tell a compelling story that g
uides the viewer through the data.
3 What are some popular data visualization tools and their 4 1,2 1
strengths/weaknesses?
some popular data visualization tools, their strengths, and
weaknesses:
1. Tableau:
Strengths: User-friendly, fast, and seamless data
connection.
Weaknesses: Limited advanced analytics capabilities.
2. Power BI:
Strengths: Integrates well with Microsoft products, robust
analytics capabilities.
Weaknesses: Steeper learning curve.
3. D3.js:
Strengths: Highly customizable, web-based interactive
visualizations.
Weaknesses: Requires programming expertise.
4. Matplotlib:
Strengths: Popular Python library, highly customizable.
Weaknesses: Steeper learning curve.
5. Seaborn:
Strengths: Built on top of Matplotlib, easy to create
informative visualizations.
Weaknesses: Limited interactivity.
6. Plotly:
Strengths: Interactive visualizations, easy to use.
Weaknesses: Limited advanced analytics capabilities.
Part II
Focused – Short answer type Questions (4 Marks) Module BL CO
1 How would you visualize to communicate insights to a 4 1,2 1
non-technical audience?
1. Keep it simple: Use clear and concise language,
avoiding technical jargon.
2. Use intuitive charts: Select charts that are easy to
understand, such as bar charts, line charts, or scatter plots.
3. Focus on key insights: Highlight the most important
findings and trends.
4. Use visual hierarchy: Organize visuals to guide the
audience's attention.
5. Use colors effectively: Choose colors that are accessible
and easy to distinguish.
6. Add context: Provide context to help the audience
understand the data.
7. Use storytelling techniques: Create a narrative around
the insights.
8. Use interactive visualizations: Allow the audience to
explore the data themselves.
9. Provide recommendations: Offer actionable advice
based on the insights.
10. Practice your presentation: Ensure you can effectively
communicate the insights and visualize.
2 Create your own visualization data? 4 2,3
step 1: Install Dependencies
pip install pandas matplotlib
import pandas as pd
Step 2: Create Sample Data
# Creating sample data
data = {
'Region': ['North', 'South', 'East', 'West'],
'Sales': [15000, 20000, 25000, 10000]
}
df = pd.DataFrame(data)
print(df)
import matplotlib.pyplot as plt
Step 3: Visualize the Data
# Plotting the data
plt.figure(figsize=(10, 5))
plt.bar(df['Region'], df['Sales'], color=['blue', 'green', 'red',
'purple'])
plt.xlabel('Region')
plt.ylabel('Sales')
plt.title('Sales Performance by Region')
plt.show()
This will generate a simple bar chart showing the sales per
formance across different regions.
Part III
Long Answer type Questions (10 Marks) Module BL CO
1 a) What are the key considerations when visualizing high- 4 1,2 1
dimensional data?
In the era of big data, the ability to visualize high-
dimensional data has become increasingly important.
High-dimensional data refers to datasets with a
large number of features or variables. Visualizing
such data can be challenging due to the complexity and
the curse of dimensionality. However, several techniques
have been developed to help data scientists and analysts
make sense of high-dimensional data.
Techniques for Visualizing High Dimensional Data
1. Principal Component Analysis (PCA)
2. t-Distributed Stochastic Neighbor Embedding
3. Parallel Coordinates
4. Radial Basis Function Networks (RBFNs)
5. Uniform Manifold Approximation and Projection
(UMAP)
Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is a dimensionality
reduction technique that transforms high-dimensional data
into a lower-dimensional form while preserving as much
variance as possible. PCA achieves this by identifying the
principal components, which are the directions in which
the data varies the most. Python packages like scikit-learn
are used in its implementation.
How to Use PCA?
1. Standardize the Data: Ensure that each feature has
a mean of zero and a standard deviation of one.
2. Compute the Covariance Matrix: This matrix
captures the relationships between different
features.
3. Calculate Eigenvalues and Eigenvectors: These
help identify the principal components.
4. Transform the Data: Project the original data onto
the principal components.
When to Use?
Appropriate for reducing linear dimensionality.
Effective when a significant amount of the
variation can be explained by the first few primary
components.
Implementing Principal Component Analysis (PCA)
Consider a dataset with 100 samples and 50 features each.
By applying PCA, you might reduce it to 2 or 3 principal
components, which can then be plotted in a 2D or 3D
scatter plot.
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
NB: BL – Blooms Level 1, 2, 3, CO – Course Outcome, PO – Program Outcome, PSO – Program Specific
Outcome