BigDataAnalytics _ Unit2
BigDataAnalytics _ Unit2
● Measures of central tendency summarize a dataset by identifying a single central value that represents
the entire dataset.
a) Mean
b) Median
c) Mode
Range - The range is the difference between the largest and the smallest values in the distribution.
Thus, it can be written as
R=L–S
where,
L is the largest value in the Distribution
S is the smallest value in the Distribution
● A higher value of range implies higher variation in the data set.
● One drawback of this measure is that it only takes into account the maximum and the minimum value.
They might not always be the proper indicator of how the values of the distribution are scattered.
Variance (σ2)
The range between the first quartile (Q1) and the third quartile (Q3).
Relative Measure of Dispersion
● Coefficient of Range: It is defined as the ratio of the difference between the highest and lowest value
in a data set to the sum of the highest and lowest value.
● Coefficient of Variation: It is defined as the ratio of the standard deviation to the mean of the data
set. We use percentages to express the coefficient of variation.
● Coefficient of Mean Deviation: It is defined as the ratio of the mean deviation to the value of the
central point of the data set.
● Coefficient of Quartile Deviation: It is defined as the ratio of the difference between the third
quartile and the first quartile to the sum of the third and first quartiles.
Quantile and Rank
a) Skewness