Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
13 views

2b.data Visualization

The document discusses data visualization techniques. It provides guidelines for assignments involving data visualization, including submitting code and documentation. It also provides hints for following the CRISP-DM methodology and understanding dataset features. Sample problems are given involving calculating statistics, interpreting boxplots and histograms, and comparing different visualizations.

Uploaded by

SUMAN CHAUHAN
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

2b.data Visualization

The document discusses data visualization techniques. It provides guidelines for assignments involving data visualization, including submitting code and documentation. It also provides hints for following the CRISP-DM methodology and understanding dataset features. Sample problems are given involving calculating statistics, interpreting boxplots and histograms, and comparing different visualizations.

Uploaded by

SUMAN CHAUHAN
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

2a.

Data Visualization

Instructions:
Please share your answers filled in-line in the word document. Submit code
separately wherever applicable.

Please ensure you update all the details:


Name: CHARU CHAUHAN Batch ID: 19062023
Topic: Data Visualization

Guidelines:
1. An assignment submission is considered complete only when the correct and executable code(s) is
submitted along with the documentation explaining the method and results. Failing to submit either
of those will be considered an invalid submission and will not be considered a correct submission.

2. Ensure that you submit your assignments correctly. Resubmission is not allowed.

3. Post the submission you can evaluate your work by referring to the keys provided. (will be available
only post the submission).

Hints: Follow CRISP-ML(Q) methodology steps, where were appropriate.


1. Data Understanding: work on each feature of the dataset to create a data
dictionary as displayed in the image below:

Make a table as shown above and provide information about the features such as its data
type and its relevance to the model building. And if not relevant, provide reasons and a
description of the feature.

© 360DigiTMG. All Rights Reserved.


Problem Statements:
Q1) Calculate Skewness, and Kurtosis using Python code & draw inferences on the following
data. Refer to the Datasets attachment for the data file.
Hint: [Insights drawn from the data such as data is normally distributed/not, outliers, measures
like mean, median, mode, variance, std. deviation]
a. Cars speed and distance

© 360DigiTMG. All Rights Reserved.


SKEWNESS = -0.11751
KURTOSIS = -0.50899

b. Top Speed (SP) and Weight (WT)

SKEWNESS = 1.61145
KURTOSIS = 2.977329

© 360DigiTMG. All Rights Reserved.


© 360DigiTMG. All Rights Reserved.
Q2) Draw inferences about the following boxplot & histogram.
Hint: [Insights drawn from the plots about the data such as whether data is normally
distributed/not, outliers, measures like mean, median, mode, variance, std. deviation]

ANS:RIGHT SIDE SKEWED OR POSTIVEILY SKEWED

© 360DigiTMG. All Rights Reserved.


ANS: THE INTERFACE FOR THIS BOX IS POSTIVEILY SKEWED
Q3) Below are the scores obtained by a student on tests
34,36,36,38,38,39,39,40,40,41,41,41,41,42,42,45,49,56
1) Find the mean, median, variance, and standard deviation.
2) What can we say about the student marks? [Hint: Looking at the various measures
calculated above whether the data is normal/skewed or if outliers are present].
ANS= THE SCORES ARE IN UNIFORMLY DISTRIBUTION DATA IN ASCENDING ORDER
Q5) What is the nature of skewness when the mean and median of data are equal?
When the mean and median of a dataset are equal, it indicates that the data is symmetrically
distributed. This means that the data values are balanced around the center, and there is no
significant skewness present.

© 360DigiTMG. All Rights Reserved.


Skewness refers to the measure of the asymmetry of the probability distribution of a real-
valued random variable. When the mean and median are equal, the dataset's distribution is
centered and balanced, with data points distributed symmetrically on both sides of the central
point. In other words, the distribution is "symmetric" or "bell-shaped."
Q6) What is the nature of skewness when mean > median?
When the mean of a dataset is greater than the median, it indicates a **positive skewness**,
also known as a **right-skewed distribution**. In a right-skewed distribution, the tail of the
distribution is longer on the right side, which means that there are relatively few large values
that pull the mean to the right.

Q7) What is the nature of skewness when median > mean?


When the median of a dataset is greater than the mean, it indicates a **negative skewness**,
also known as a **left-skewed distribution**. In a left-skewed distribution, the tail of the
distribution is longer on the left side, which means that there are relatively few small values
that pull the mean to the left.

Q8) What does a positive kurtosis value indicate for data?


A positive kurtosis value indicates that the distribution of the data has **heavier tails and a
more peaked central region** compared to the normal distribution (also known as a
**leptokurtic distribution**). In other words, data with positive kurtosis has a distribution that
has more extreme values (outliers) and is more "peaked" around the mean compared to a
normal distribution.

© 360DigiTMG. All Rights Reserved.


1. Box Plot Shape: The box plot displays a box with its lower edge at Q1 (12), upper edge at
Q3 (16), and a line inside representing the median (14). This indicates that the data is
relatively symmetric around the median.
1. Whiskers and Outliers: The whiskers of the box plot extend from the edges of the box.
There are no visible outliers beyond the whiskers, suggesting that there are no extreme
values outside the typical range.
1. IQR: The Interquartile Range (IQR) is approximately 4 (16 - 12 = 4). This range contains
the middle 50% of the data, indicating that the data is moderately spread out.
1. Median: The median (Q2) is at 14, which is also the center of the box. This suggests that
the data is symmetrically distributed around this central value.
1. Range: The range of the data within the box plot is from 12 to 16, which covers the
interquartile range. This suggests that the majority of the data falls within this range.

Q9) What does a negative kurtosis value indicate for data?


A negative kurtosis value indicates that the distribution of the data has **lighter tails and a less
peaked central region** compared to the normal distribution (also known as a **platykurtic
distribution**). In other words, data with negative kurtosis has a distribution that has fewer
extreme values (outliers) and is less "peaked" around the mean compared to a normal
distribution.

Q10) Answer the below questions using the below boxplot visualization.

What can we say about the distribution of the data?

© 360DigiTMG. All Rights Reserved.


2. Calculate Q1 as the median of the lower half of the data: Q1 = 12
3. Calculate Q3 as the median of the upper half of the data: Q3 = 16

What is the nature of the skewness of the data?


Negative skewness indicates that the tail of the distribution is longer on the left side, and the
majority of the data points are concentrated on the right side. In a negatively skewed distribution
What will be the IQR of the data (approximately)?
IQR=Q3−Q1=16−12=4
Q11) Comment on the below Boxplot visualizations.

Draw an Inference from the distribution of data for Boxplot 1 with respect to Boxplot 2.
Hint: [On comparing both the plots and check if the data is normally distributed/not, outliers
present, skewness, etc.]
ANS :THE BOX PLOT 1 IS DESIGNED WITH RANGE =3
THE BOX PLOT 2 IS DESIGNED WITH RANGE =1.5

© 360DigiTMG. All Rights Reserved.


Q12)

Answer the following three questions based on the boxplot above.


(i) What is inter-quartile range of this dataset? [Hint: IQR = Q3 – Q1]
In one line, explain what this value implies. (Hint: Based on IQR definition)
ANS Q3-Q1 = 12-5= 7
(ii) What can we say about the skewness of this dataset?
ANS THE DATA IS POSTIVELY SKEWED
(iii) If it were found that the data point with the value 25 is 2.5, how would the new
boxplot be affected?
(Hint: On changing the data point from 25 to 2.5 in the data, how is it different from the
current one.)
ANS THE DATA POINT WOULD BE 3

© 360DigiTMG. All Rights Reserved.


Q 13)

Answer the following three questions based on the histogram above.


(i) Where would the mode of this dataset lie? Hint: [In terms of values On the Y-
axis]
ANS THE MODE LIE ON THE 7 ON X-AXIS
(ii) Comment on the skewness of the dataset
ANS THE DATA IS RIGHT SIDE AKEWED.

© 360DigiTMG. All Rights Reserved.


(iii)Suppose that the above histogram and the boxplot in question 2 are plotted for
the same dataset. Explain how these graphs complement each other in providing
information about any dataset. Hint: [Visualizing both the plots, draw the
insights]
1. BUSINESS PROBLEM
OBJECTIVE
CONSTRAINTS
2. FOR EACH ASSIGNMENT THE SOLUTION SHOULD BE SUBMITTED IN THE BELOW FORMAT
3. RESEARCH AND PERFORM ALL POSIBLE STEPLS FOR OBTAINING SOLUTION
4. FOR BASIC STATIC EXPLANATION OF THE SOL SHOULD BE DOCUMENTED IN BLACK AND
WHITE ALONG WITH CODES
5. ALL THE CODES (EXECUTABLES PROGRAMS) SHOULD EXECUTE WITHOUT ERRORS

© 360DigiTMG. All Rights Reserved.

You might also like