Probability MC3020
Probability MC3020
Probability MC3020
Mrs. S. Vijendiran
Faculty of Engineering,
University of Jaffna.
Lecture -01
Data is a collection of facts, such as values or
measurements.
It can be numbers, words, measurements, observations or even
just descriptions of things.
Nominal Ordinal
Qualitative Vs Quantitative
Data can be qualitative or quantitative.
Qualitative data is descriptive information
(it describes something)
Quantitative data, is numerical information (numbers).
Nominal:
Example: Blood group, Nationality
Ordinal:
Example: Hotel rating, questionnaire responses(strongly disagree,
disagree…)
Discrete:
Example: Family size, number of cars
Continuous:
Example: height, weight
Discrete data can only take certain values (like whole numbers)
Continuous data can take any value (within a range)
Put simply:
Discrete data is counted,
Continuous data is measured
A type of categorical data in which
objects fall into unordered categories.
Type of Bicycle
Mountain bike, road bike, BMX.
Smoking status
smoker, non-smoker
A type of categorical data in which order is
important.
Class ofdegree-1st class, 2nd upper, 2nd
lower, pass, fail
Opinion of students about stats classes-
Very unhappy, unhappy, neutral, happy
Organized representations of data that help to summarize
and simplify.
Types of frequency distributions :
Simple frequency distributions
Grouped frequency distributions
Raw Data Score f
2 5 8 7 2 2 8 3
7 2
6 8 5 2 5 7
6 3
4 5 6 2 8 6 5 4
4 1
2 5
Classintervals
Often data has natural classes
Example: grades in a course
Intervals must not be too large or too small
Class 𝒇
interval
90-100 1
80-89 5
70-79 2
60-69 7
<59 7
Graphical presentation of data
Categorical data Numerical data
Bar chart Histogram, frequency
Pie chart curve, stem and leaf plot ,
Line chart, Box plot
Measure of Location
• Mean
• Median
• Mode
Mean
Definition:
If 𝑥1 , 𝑥2 , 𝑥3 …𝑥𝑛 are the values of the variable X, then the arithmetic mean
of the set of observations defined by
𝑛
𝑖=1 𝑥𝑖
𝑋= .
𝑛
If the values 𝑥𝑖 occurs 𝑓𝑖 times (k≤ 𝑛)
𝑘
𝑖=1 𝑓𝑖 𝑥𝑖
𝑋= 𝑘 .
𝑖=1 𝑓𝑖
𝑘
Note: The total number of observations n= 𝑖=1 𝑓𝑖
𝑘
𝑖=1 𝑓𝑖 𝑚𝑖
For grouped data: 𝑋 = 𝑘 , where 𝑚𝑖 is the mid point.
𝑖=1 𝑓𝑖
Median
• The middle value when a variable’s values are ranked in order; the
point that divides a distribution into two equal halves.
• Scores which divide distributions into specific proportions
• Percentiles = hundredths
P1, P2, P3, … P97, P98, P99
• Median is the 50th percentile.
• Quartiles = quarters
• 1st quartile 𝑄1 is the 25th percentile
• 2nd quartile 𝑄2 is the 50th percentile (median)
• 3rd quartile 𝑄3 is the 75th percentile
• To locate the 𝑝𝑡ℎ percentile we use
𝑝
𝐿𝑝 = (𝑛 + 1)
100
Where, 𝐿𝑝 -the location of the percentile
𝑛 -the number of observation
𝑝 -the percentile
For grouped data
Mode
• The value that occurs most frequently.
Note:
• Mode is often uninformative for quantitative data with many
values.
• Modal class is the most frequently occurring class in a data set.
• Appropriate measures if the data has
• Numerical: mean, median, mode
• Ordinal scale: median, mode
• Nominal scale: mode
Measure of Dispersion/ measure of Variability
a) Range: maximum-minimum
b) Variance and Standard deviation(S.D)
These are applicable to quantitative data and most commonly used
measure of variability.
Simple data:
Variance and Standard deviation(S.D)
Grouped data
Population variance
c) Coefficient of variation (CV)
• CV compares variability of 2 data sets.
• This also applicable to numerical datasets
• It is measure of relative (proportionate) variation.
• It is important to use if data sets are of very different magnitudes.
standard deviation
CV= *100
𝑚𝑒𝑎𝑛
Thus CV express the S.D as a percentage of the mean
d) IQR(Inter quartile range) and Inter quartile deviation (IQD)
e) Mean Absolute Deviation
• Average of the absolute deviations from the mean
x x x
M . A.D.
x
5 -8 +8 n
24
9 -4 +4
5
16 +3 +3 4 .8
17 +4 +4
18 +5 +5
0 24
Measures of Skewness
• Skewness measures the lack of symmetry in a set of data about its
mean
Skewness