0% found this document useful (0 votes)

4 views

BigDataAnalytics _ Unit2

The document provides an overview of inferential and descriptive statistics, detailing measures of central tendency (mean, median, mode) and measures of dispersion (range, variance, standard deviation). It also covers concepts of skewness, kurtosis, normal distribution, and binomial distribution, including their characteristics and applications. Key features of normal distribution and conditions for binomial distribution are also highlighted.

Uploaded by

21ucs048

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

BigDataAnalytics _ Unit2

Uploaded by

21ucs048

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Unit - 2

Inferential and Descriptive Statistics

Statistics is broadly classified into two categories:

1. Descriptive Statistics: Summarizes and organizes data in a meaningful way.

2. Inferential Statistics: Makes inferences or predictions about a population based on sample data.

Measures of Central Tendency

● Measures of central tendency summarize a dataset by identifying a single central value that represents
the entire dataset.

a) Mean

The average value of a dataset.

Where xi are the data points and nnn is the number of data points.

b) Median

The middle value in an ordered dataset.

● For odd n: Median = middle value.

● For even n: Median = average of two middle values.

c) Mode

The most frequently occurring value(s) in the dataset.

● Suitable for categorical data.

Measures of Dispersion
● Measures of Dispersion are used to represent the scattering of
data.
● These are the numbers that show the various aspects of the data
spread across various parameters.

Types of Measures of Dispersion

Measures of dispersion can be classified into the following two types :
● Absolute Measure of Dispersion - The measures of
dispersion that are measured and expressed in the units of data
themselves are called Absolute Measure of Dispersion. For
example – Meters, Dollars, Kg, etc.
● Relative Measure of Dispersion - The relative measures of
dispersion to measure the two quantities that have different units
to get a better idea about the scattering of the data.

These measures of dispersion can be further divided into various

categories. They have various parameters and these parameters have the
Absolute Measure of Dispersion

Range - The range is the difference between the largest and the smallest values in the distribution.
Thus, it can be written as
R=L–S

where,
L is the largest value in the Distribution
S is the smallest value in the Distribution
● A higher value of range implies higher variation in the data set.
● One drawback of this measure is that it only takes into account the maximum and the minimum value.
They might not always be the proper indicator of how the values of the distribution are scattered.

Variance (σ2)

The average of the squared differences from the mean.

Variance (σ2)

The average of the squared differences from the mean.

Standard Deviation (σ)

The square root of the variance.

Interquartile Range (IQR)

The range between the first quartile (Q1) and the third quartile (Q3).
Relative Measure of Dispersion

● Coefficient of Range: It is defined as the ratio of the difference between the highest and lowest value
in a data set to the sum of the highest and lowest value.
● Coefficient of Variation: It is defined as the ratio of the standard deviation to the mean of the data
set. We use percentages to express the coefficient of variation.
● Coefficient of Mean Deviation: It is defined as the ratio of the mean deviation to the value of the
central point of the data set.
● Coefficient of Quartile Deviation: It is defined as the ratio of the difference between the third
quartile and the first quartile to the sum of the third and first quartiles.
Quantile and Rank

● Quantiles: Points that divide the data into equal parts.

○ Quartiles: Divide data into four equal parts (Q1,Q2,Q3).
○ Percentiles: Divide data into 100 equal parts.
● Rank: The relative position of a value within a dataset.

Skewness and Kurtosis

a) Skewness

Measures the asymmetry of the probability distribution.

● Positive Skew: Tail on the right side.

● Negative Skew: Tail on the left side.
Symmetric Skewness:
● A perfect symmetric distribution is one in which frequency distribution is the same on the sides of the
center point of the frequency curve.
● In this, Mean = Median = Mode.
● There is no skewness in a perfectly symmetrical distribution.
Asymmetric Skewness:
● A asymmetrical or skewed distribution is one in which the spread of the frequencies is different on both
the sides of the center point or the frequency curve is more stretched towards one side or value of Mean.
Median and Mode falls at different points.
● The two types of asymmetric skewness is:

➔ Positive Skewness: In this, the concentration of

frequencies is more towards higher values of the
variable i.e. the right tail is longer than the left tail.
➔ Negative Skewness: In this, the concentration of
frequencies is more towards the lower values of the
variable i.e. the left tail is longer than the right tail.
Kurtosis
● It is also a characteristic of the frequency distribution. It gives an idea about the shape of a
frequency distribution.
● Basically, the measure of kurtosis is the extent to which a frequency distribution is peaked in
comparison with a normal curve. It is the degree of peaked Ness of a distribution.
Types of Kurtosis
The following figure describes the classification of kurtosis:
1. Leptokurtic: Leptokurtic is a curve having a high peak
than the normal distribution. In this curve, there is too
much concentration of items near the central value.
2. Mesokurtic: Mesokurtic is a curve having a normal peak
than the normal curve. In this curve, there is equal
distribution of items around the central value.
3. Platykurtic: Platykurtic is a curve having a low peak
than the normal curve is called platykurtic. In this curve,
there is less concentration of items around the central
value.
Normal Distribution
● A symmetric, bell-shaped distribution.
● Defined by the mean (μ) and standard deviation (σ).
● The area under the curve of the normal distribution
represents probabilities for the data.
● The area under the whole curve is equal to 1, or 100%

Here is a graph of a normal

distribution with probabilities
between standard deviations (σ)

● Roughly 68.3% of the data is within 1

standard deviation of the average
(from μ-1σ to μ+1σ)
● Roughly 95.5% of the data is within 2
standard deviations of the average
(from μ-2σ to μ+2σ)
● Roughly 99.7% of the data is within 3
standard deviations of the average
(from μ-3σ to μ+3σ)
Key Features of Normal Distribution
● Symmetry:The normal distribution is symmetric around its mean. This means the left side of the
distribution mirrors the right side.
● Mean, Median, and Mode: In a normal distribution, the mean, median, and mode are all equal and
located at the center of the distribution.
● Bell-shaped Curve: The curve is bell-shaped, indicating that most of the observations cluster
around the central peak, and the probabilities for values further away from the mean taper off equally
in both directions.
● Standard Deviation: The spread of the distribution is determined by the standard deviation. About
68% of the data falls within one standard deviation of the mean, 95% within two standard deviations,
and 99.7% within three standard deviations.

Normal Distribution Examples

The Normal Distribution for various types of data that include,
● Distribution of Height of People.
● Distribution of Errors in any Measurement.
● Distribution of Blood Pressure of any Patient, etc.
Binomial Distribution
● Binomial Distribution is a probability distribution used to model the number of successes in a fixed
number of independent trials, where each trial has only two possible outcomes: success or failure.
● This distribution is useful for calculating the probability of a specific number of successes in scenarios like
flipping coins, quality control, or survey predictions.
● Binomial Distribution is based on Bernoulli trials, where each trial has an independent and identical
chance of success. The probability distribution for a Bernoulli trial is called the Bernoulli Distribution.

Conditions for Binomial Distribution

The Binomial distribution can be used in scenarios where the following conditions are satisfied:
1. Fixed Number of Trials: There are a set number of trials or experiments (denoted by n), such as
flipping a coin 10 times.
2. Two Possible Outcomes: Each trial has only two possible outcomes, often labeled as “success” and
“failure.” For example, getting heads or tails in a coin flip.
3. Independent Trials: The outcome of each trial is independent of the others, meaning the result of
one trial does not affect the result of another.
4. Constant Probability: The probability of success (denoted by p) remains the same for each trial. For
example, if you’re flipping a fair coin, the probability of getting heads is always 0.5.
Binomial Distribution Calculation
Binomial Distribution in statistics is used to compute the probability of likelihood of an event using the above
formula.
To calculate the probability using binomial distribution we need to follow the following steps:
● Step 1: Find the number of trials and assign it as ‘n’
● Step 2: Find the probability of success in each trial and assign it as ‘p’
● Step 3: Find the probability of failure and assign it as q where q = 1-p
● Step 4: Find the random variable X = r for which we have to calculate the binomial
distribution
● Step 5: Calculate the probability of Binomial Distribution for X = r using the Binomial
Distribution Formula.

Business Analytics Course Summary
No ratings yet
Business Analytics Course Summary
15 pages
Solution10 PDF
No ratings yet
Solution10 PDF
6 pages
02 Normal Distribution - TV
No ratings yet
02 Normal Distribution - TV
23 pages
Normal Distribution
No ratings yet
Normal Distribution
25 pages
Illustrating Normal Curve
No ratings yet
Illustrating Normal Curve
11 pages
Analytics compendium (incl stats)
No ratings yet
Analytics compendium (incl stats)
31 pages
Normal Distribution:: - Probability - Characteristics and Application of Normal Probability Curve - Sampling Error
No ratings yet
Normal Distribution:: - Probability - Characteristics and Application of Normal Probability Curve - Sampling Error
21 pages
Statistics For Datacience
100% (1)
Statistics For Datacience
7 pages
Unit I Bbbbbbbbbbbbbba
No ratings yet
Unit I Bbbbbbbbbbbbbba
8 pages
Amit Singh - Ssjcet20024 - Business Statistic Assignment
No ratings yet
Amit Singh - Ssjcet20024 - Business Statistic Assignment
14 pages
PRP PBL-1
No ratings yet
PRP PBL-1
12 pages
(Week2) Social Data Analysis_240911 (2)
No ratings yet
(Week2) Social Data Analysis_240911 (2)
27 pages
Assignment 9 Nomor 1
No ratings yet
Assignment 9 Nomor 1
2 pages
Sibd Questions Soved Theory
No ratings yet
Sibd Questions Soved Theory
14 pages
Normal Distribution
No ratings yet
Normal Distribution
3 pages
Biostats
No ratings yet
Biostats
17 pages
Normal Probability Curve
No ratings yet
Normal Probability Curve
6 pages
MTH1310 - Statistics
No ratings yet
MTH1310 - Statistics
34 pages
Day 02-Random Variable and Probability - Part (I)
No ratings yet
Day 02-Random Variable and Probability - Part (I)
34 pages
Probability and Statistics: Lums Undergraduate SS-4-6
No ratings yet
Probability and Statistics: Lums Undergraduate SS-4-6
17 pages
5. Shap of Distributions_٠٧١٩٥٦
No ratings yet
5. Shap of Distributions_٠٧١٩٥٦
20 pages
Normal - Copy (3)
No ratings yet
Normal - Copy (3)
8 pages
Assignment 2 State Arman
No ratings yet
Assignment 2 State Arman
9 pages
The Normal Distribution: Concepts
No ratings yet
The Normal Distribution: Concepts
17 pages
Lecture 2
No ratings yet
Lecture 2
52 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
15 pages
BAA Class Notes
No ratings yet
BAA Class Notes
16 pages
Statistics in Psychology
No ratings yet
Statistics in Psychology
15 pages
Week 1 - CH3 Descriptive Summary Measures
No ratings yet
Week 1 - CH3 Descriptive Summary Measures
10 pages
Unit 4 Descriptive Statistics
No ratings yet
Unit 4 Descriptive Statistics
8 pages
2 - Control Charts - X Bar, R, and Sigma
No ratings yet
2 - Control Charts - X Bar, R, and Sigma
25 pages
Measures of Cental Tendency & Dispersions
No ratings yet
Measures of Cental Tendency & Dispersions
42 pages
Chapter 3
No ratings yet
Chapter 3
28 pages
Dispersion_and_Inequalities
No ratings yet
Dispersion_and_Inequalities
12 pages
Unit-3 DS Students
No ratings yet
Unit-3 DS Students
35 pages
Normal Distribution Meaning
No ratings yet
Normal Distribution Meaning
6 pages
Session3
No ratings yet
Session3
11 pages
SSC CGL Tier 2 Statistics - Last Minute Study Notes: Measures of Central Tendency
No ratings yet
SSC CGL Tier 2 Statistics - Last Minute Study Notes: Measures of Central Tendency
10 pages
Statistics and Probability (Midterms)_removed 2
No ratings yet
Statistics and Probability (Midterms)_removed 2
3 pages
Normal Distribution
No ratings yet
Normal Distribution
24 pages
STAT PROB Week 5 Sy 2020 2021
No ratings yet
STAT PROB Week 5 Sy 2020 2021
19 pages
Module 10 Introduction To Data and Statistics
No ratings yet
Module 10 Introduction To Data and Statistics
63 pages
Normal Curve
No ratings yet
Normal Curve
13 pages
Ch3 Numerically Summarizing Data
No ratings yet
Ch3 Numerically Summarizing Data
35 pages
Measures of Variability
No ratings yet
Measures of Variability
6 pages
Normal Distributions
No ratings yet
Normal Distributions
11 pages
UNIT-III (part 1)
No ratings yet
UNIT-III (part 1)
31 pages
STATISTICS
No ratings yet
STATISTICS
25 pages
A normal probability curve
No ratings yet
A normal probability curve
2 pages
Summary Measures
No ratings yet
Summary Measures
26 pages
Standerd Diviation
No ratings yet
Standerd Diviation
14 pages
W3D1 Normal Distribution (1) (1)
No ratings yet
W3D1 Normal Distribution (1) (1)
10 pages
DSML
No ratings yet
DSML
510 pages
Topic:: Normal Probability Curve
No ratings yet
Topic:: Normal Probability Curve
20 pages
STASTIC
No ratings yet
STASTIC
12 pages
Normal Distribution and Z Scores
No ratings yet
Normal Distribution and Z Scores
26 pages
Descriptive Statistics Summary (Session 1-5) : Types of Data - Two Types
No ratings yet
Descriptive Statistics Summary (Session 1-5) : Types of Data - Two Types
4 pages
The Normal Distribution and the z-score
No ratings yet
The Normal Distribution and the z-score
23 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Class 5
No ratings yet
Class 5
21 pages
Class 8
No ratings yet
Class 8
10 pages
Class 13
No ratings yet
Class 13
7 pages
Class 7
No ratings yet
Class 7
10 pages
Class 5 - 2
No ratings yet
Class 5 - 2
2 pages
Class 11
No ratings yet
Class 11
8 pages
Class 12
No ratings yet
Class 12
5 pages
Python Record[2] (1)
No ratings yet
Python Record[2] (1)
77 pages
Document
No ratings yet
Document
5 pages
Probdist Ref
No ratings yet
Probdist Ref
256 pages
Probability Concepts and Applications
No ratings yet
Probability Concepts and Applications
110 pages
Contrast Stretching Dengan Java
No ratings yet
Contrast Stretching Dengan Java
6 pages
3. Random Variables and Distribution Functions
No ratings yet
3. Random Variables and Distribution Functions
33 pages
Sathyabama: Register Number
No ratings yet
Sathyabama: Register Number
4 pages
Variance and Standard Deviation of A Discrete Random Variable
100% (1)
Variance and Standard Deviation of A Discrete Random Variable
13 pages
Probability Theory & Stochastic Processes - BITS
No ratings yet
Probability Theory & Stochastic Processes - BITS
12 pages
Midterm Formula Statistics Department Faculty of Science Chiang Mai University 1. or 5. 6. 7
No ratings yet
Midterm Formula Statistics Department Faculty of Science Chiang Mai University 1. or 5. 6. 7
2 pages
Stats-Module-2 probability distribution-DONE
No ratings yet
Stats-Module-2 probability distribution-DONE
44 pages
CSD502 Standard Probability Dist.docx
No ratings yet
CSD502 Standard Probability Dist.docx
15 pages
STATPRB - Quarter 3 - Module 3 (FINAL)
No ratings yet
STATPRB - Quarter 3 - Module 3 (FINAL)
24 pages
Practice Calculating Variance and Associated Statistics
No ratings yet
Practice Calculating Variance and Associated Statistics
3 pages
Probability and Distribution
No ratings yet
Probability and Distribution
18 pages
Indian Institute of Technology, Kharagpur:: X"' P (3 X N (O, ?
No ratings yet
Indian Institute of Technology, Kharagpur:: X"' P (3 X N (O, ?
2 pages
2023_midterm_exam_solutions
No ratings yet
2023_midterm_exam_solutions
6 pages
Tabel Binomial Kumulatif
No ratings yet
Tabel Binomial Kumulatif
8 pages
Complete Download ACCELERATED TESTING a practitioner s guide to accelerated and reliability testing 2nd Edition Bryan L. Dodson & Harry Schwab PDF All Chapters
100% (1)
Complete Download ACCELERATED TESTING a practitioner s guide to accelerated and reliability testing 2nd Edition Bryan L. Dodson & Harry Schwab PDF All Chapters
18 pages
Failure Modeling
No ratings yet
Failure Modeling
44 pages
Formular Sheet Part 1 - Vu Vo Formular Sheet Part 1 - Vu Vo
No ratings yet
Formular Sheet Part 1 - Vu Vo Formular Sheet Part 1 - Vu Vo
5 pages
Rec 9A - Continuous Random Variables-2
No ratings yet
Rec 9A - Continuous Random Variables-2
2 pages
II PUC Statistics Lesson Plan 2017 2018
No ratings yet
II PUC Statistics Lesson Plan 2017 2018
3 pages
IE305 - 12 - Goodness-of-Fit Tests
No ratings yet
IE305 - 12 - Goodness-of-Fit Tests
30 pages
An Introduction To Probability and Statistics - 2015 - Rohatgi - Subject Index
No ratings yet
An Introduction To Probability and Statistics - 2015 - Rohatgi - Subject Index
11 pages
Mathematical Expectation and Others
No ratings yet
Mathematical Expectation and Others
17 pages
New General Transmuted Family of Distributions With Applications
No ratings yet
New General Transmuted Family of Distributions With Applications
23 pages
Statistics scqp27
No ratings yet
Statistics scqp27
3 pages
Lect01 Handouts
No ratings yet
Lect01 Handouts
45 pages
Constructing A Probability Histogram A Continues Random Variable
No ratings yet
Constructing A Probability Histogram A Continues Random Variable
23 pages
Chapter 3
No ratings yet
Chapter 3
8 pages