Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

STATISTICS

Download as pdf or txt
Download as pdf or txt
You are on page 1of 98

INTRODUCTION

TO
STATISTICS
What is Statistics?
▪ the study of the collection, analysis,
interpretation, presentation, and
organization of data.
▪ mathematical discipline to collect,
summarize data
▪ a branch of applied mathematics
According to Merriam-Webster dictionary
➢ statistics is defined as “classified facts representing
the conditions of a people in a state – especially the
facts that can be stated in numbers or any other
tabular or classified arrangement”.

According to statistician Sir Arthur Lyon Bowley,


statistics is defined as “Numerical statements of
facts in any department of inquiry placed in
relation to each other”.
Statistics Examples
Some of the real-life examples of statistics are:

• To find the mean of the marks obtained by each student in


the class whose strength is 50. The average value here is
the statistics of the marks obtained.

• Suppose you need to find how many members are


employed in a city. Since the city is populated with 15 lakh
people, hence we will take a survey here for 1000 people
(sample). Based on that, we will create the data, which is
the statistic.
Basics of Statistics
• The basics of statistics include the measure of central tendency
and the measure of dispersion. The central tendencies are
mean, median and mode and dispersions comprise variance
and standard deviation.

• Mean is the average of the observations. Median is the central


value when observations are arranged in order. The mode
determines the most frequent observations in a data set.

• Variation is the measure of spread out of the collection of data.


Standard deviation is the measure of the dispersion of data
from the mean. The square of standard deviation is equal to
the variance.
Mathematical Statistics

Mathematical statistics is the application of


Mathematics to Statistics, which was initially conceived
as the science of the state — the collection and
analysis of facts about a country: its economy, and,
military, population, and so forth.

Mathematical techniques used for different analytics


include mathematical analysis, linear algebra,
stochastic analysis, differential equation and measure-
theoretic probability theory.
Uses of Statistics
The main function of statistics is to enlarge our knowledge of
complex
phenomena. The following are some uses of statistics:

1. It presents facts in a definite and precise form.


2. Data reduction.
3. Measuring the magnitude of variations in data.
4. Furnishes a technique of comparison
5. Estimating unknown population characteristics.
6. Testing and formulating of hypothesis.
7. Studying the relationship between two or more variable.
8. Forecasting future events.
Types of Statistics

Basically, there are two types of statistics.

▪ Descriptive Statistics
▪ Inferential Statistics

In the case of descriptive statistics, the data or


collection of data is described in summary. But in the
case of inferential stats, it is used to explain the
descriptive one. Both these types have been used on
large scale.
Descriptive Statistics

The data is summarized and explained in descriptive


statistics. The summarization is done from a population
sample utilizing several factors such as mean and
standard deviation. Descriptive statistics is a way of
organizing, representing, and explaining a set of data
using charts, graphs, and summary measures. Histograms,
pie charts, bars, and scatter plots are common ways to
summarize data and present it in tables or graphs.
Descriptive statistics are just that: descriptive. They don’t
need to be normalized beyond the data they collect.
Inferential Statistics
We attempt to interpret the meaning of descriptive statistics using
inferential statistics. We utilize inferential statistics to convey the
meaning of the collected data after it has been collected,
evaluated, and summarized. The probability principle is used in
inferential statistics to determine if patterns found in a study sample
may be extrapolated to the wider population from which the
sample was drawn. Inferential statistics are used to test hypotheses
and study correlations between variables, and they can also be
used to predict population sizes. Inferential statistics are used to
derive conclusions and inferences from samples, i.e. to create
accurate generalizations.
Scope of Statistics

Statistics is used in many sectors such as


psychology, geology, sociology, weather
forecasting, probability and much more. The
goal of statistics is to gain understanding from
the data, it focuses on applications, and hence,
it is distinctively considered as a mathematical
science
Stage in Statistical Investigation

❖ Collection of data
❖ Organization of data
❖ Presentation of the data
❖ Data summarization
❖ Statistical analysis
❖ Inference of data
Collection of data
- the process of measuring, gathering, assembling the raw data up on
which the statistical investigation is to be based.

❑ Data can be collected in a variety of ways; one of the most common


methods is through the use of survey. Survey can also be done in
different methods, three of the most common methods are:
• Telephone survey
• Mailed questionnaire
• Personal interview

Organization of data
- summarization of data in some meaningful way, e.g table form.
Presentation of the data
- the process of re-organization, classification, compilation, and
summarization of data to present it in a meaningful
form.
Analysis of data
- the process of extracting relevant information from the
summarized data, mainly through the use of elementary
mathematical operation.
Inference of data
- the interpretation and further observation of the various statistical
measures through the analysis of the data by implementing those
methods by which conclusions are formed and inferences made.

Statistical techniques based on probability theory are required.


Definitions of some terms
a. Statistical Population - t is the collection of all possible
observations of a specified characteristic of interest (possessing
certain common property) and being under study.

b. Sample - it is a subset of the population, selected using some


sampling

technique in such a way that they represent the population.


c. Sampling - he process or method of sample selection from the
population.
d. Sample size - the number of elements or observation to be
included in the sample.

e. Census - complete enumeration or observation of the elements


of the population. Or it is the collection of data from every element
in a population.

f. Parameter- characteristic or measure obtained from a


population.

g. Statistic - characteristic or measure obtained from a sample.

h. Variable - it is an item of interest that can take on many different


numerical values
Types of Variables/Data
• Qualitative Variables - are nonnumeric variables and can't be
measured.
Examples: gender, religious affiliation, and state of birth.

• Quantitative Variables - are numerical variables and can be


measured.
Examples: balance in checking account, number of children in
family.
Note that quantitative variables are either discrete (which can assume
only certain values, and there are usually "gaps" between the values,
such as the number of bedrooms in your house) or continuous (which
can assume any value within a specific range, such as the air pressure in
a tire.)
Types of Quantitative variables/data

• Discrete data - has a particular fixed value. It can be


counted.

• Continuous data - is not fixed but has a range of data.


It can be measured.
Scales of Measurement
Proper knowledge about the nature and type of data to
be dealt with is essential in order to specify and apply the
proper statistical method for their analysis and inferences.

Measurement scale refers to the property of value assigned


to the data based on the properties of order, distance and
fixed zero.
Order
- the property of order exists when an object that has
more of the attribute than another object, is given a bigger
number by the rule system. This relationship must hold for all
objects in the "real world".

Distance
- the property of distance is concerned with the
relationship of differences between objects. If a measurement
system possesses the property of distance it means that the unit
of measurement means the same thing throughout the scale
of numbers. That is, an inch is an inch, no matters were it falls -
immediately ahead or a mile downs the road.
Fixed Zero
A measurement system possesses a rational zero (fixed
zero) if an object that has none of the attribute in question is
assigned the number zero by the system of rules.
The object does not need to really exist in the "real world",
as it is somewhat difficult to visualize a "man with no height".
The requirement for a rational zero is this: if objects with
none of the attribute did exist would they be given the value zero.
Defining O0 as the object with none of the attribute in
question, the definition of a rational zero becomes:

The property of FIXED ZERO exists if M(O0 ) = 0.


SCALE TYPES
Measurement is the assignment of numbers to objects or
events in a systematic fashion. Four levels of measurement
scales are commonly distinguished: nominal, ordinal, interval,
and ratio and each possessed different properties of
measurement systems.
Nominal Scales
- are measurement systems that possess none of the three properties
stated above.
❑ Level of measurement which classifies data into mutually exclusive, all
inclusive categories in which no order or ranking can be imposed on the
data.
❑ No arithmetic and relational operation can be applied.

Examples:
• Gender (Male or Female.)
• Marital status(married, single, widow, divorce)
• Nationality
• Blood type
• Zip code
• Hair color
Ordinal Scales
- are measurement systems that possess the property of order, but
not the property of distance. The property of fixed zero is not important if the
property of distance is not satisfied.
▪ Level of measurement which classifies data into categories that can be
ranked. Differences between the ranks do not exist.
▪ Arithmetic operations are not applicable but relational operations are
applicable.
▪ Ordering is the sole property of ordinal scale.

Examples:
• Rank ( 1st place, 2nd place, 3rd place, . . . Etc.)
• Agreement level ( always, oftentimes, sometimes, seldom and
never)
• Educational level ( primary, secondary & tertiary)
• Income ( low, medium & high)
Interval Scales
- are measurement systems that possess the properties of Order and
distance, but not the property of fixed zero.
• Level of measurement which classifies data that can be ranked and
differences are meaningful. However, there is no meaningful zero, so
ratios are meaningless.
• All arithmetic operations except division and multiplication are
applicable.
• Relational operations are also possible.

Examples:
• IQ
• Temperature (ºC or F)
• Time (minutes or hours)
• Age (years)
• Physical measures (height, weight, BP . . . Etc.)
• Income
Ratio Scales
- are measurement systems that possess all three properties: order,
distance, and fixed zero. The added power of a fixed zero allows ratios of
numbers to be meaningfully interpreted.
▪ Level of measurement which classifies data that can be ranked,
differences are meaningful, and there is a true zero. True ratios exist
between the different units of measure.
▪ All arithmetic and relational operations are applicable.
Examples:
• Temperature (0ºC doesn't mean that there is no value, 0ºC is the
freezing point)
Methods
Data Collection

Presentation
There are two sources of data

1. Primary Data
• Data measured or collect by the investigator or the user
directly from the source.
Two activities involved: planning and measuring.
➢ Planning:
• Identify source and elements of the data.
• Decide whether to consider sample or census.
• If sampling is preferred, decide on sample size, selection
method,… etc.
• Decide measurement procedure.
• Set up the necessary organizational structure.
➢ Measuring: there are different options.
• Focus Group
• Telephone Interview
• Mail Questionnaires
• Door-to-Door Survey
• Mall Intercept
• New Product Registration
• Personal Interview and
• Experiments are some of the sources for collecting the
primary data
2. Secondary Data
▪ Data gathered or compiled from published and unpublished
sources or files.
▪ When our source is secondary data check that:
• The type and objective of the situations.
• The purpose for which the data are collected and
compatible with the present problem.
• The nature and classification of data is appropriate to our
problem.
• There are no biases and misreporting in the published data.

Note: Data which are primary for one may be secondary for the
other.
Sampling Techniques
In Statistics, there are different sampling techniques available to get relevant
results from the population. The two different types of sampling methods are::

• Probability Sampling
• Non-probability Sampling
Probability Sampling Method

• utilizes some form of random selection.


• In this method, all the eligible individuals have a chance of selecting the
sample from the whole sample space.
• This method is more time consuming and expensive than the non-
probability sampling method.
• The benefit of using probability sampling is that it guarantees the sample
that should be the representative of the population.
Simple Random Sampling

In simple random sampling technique, every item in the population


has an equal and likely chance of being selected in the sample.
Since the item selection entirely depends on the chance, this
method is known as “Method of chance Selection”. As the sample
size is large, and the item is chosen randomly, it is known as
“Representative Sampling”.

Systematic Sampling

In the systematic sampling method, the items are selected from


the target population by selecting the random selection point
and selecting the other methods after a fixed sample interval. It
is calculated by dividing the total population size by the desired
population size.
Stratified Sampling

In a stratified sampling method, the total population is divided into smaller


groups to complete the sampling process. The small group is formed based
on a few characteristics in the population. After separating the population
into a smaller group, the statisticians randomly select the sample.

Clustered Sampling

In the clustered sampling method, the cluster or group of people are


formed from the population set. The group has similar significatory
characteristics. Also, they have an equal chance of being a part of the
sample. This method uses simple random sampling for the cluster of
population.
Non - Probability Sampling Method
- is a technique in which the researcher selects the sample based on
subjective judgment rather than the random selection. In this method, not
all the members of the population have a chance to participate in the
study.

Convenience Sampling

In a convenience sampling method, the samples are selected from the


population directly because they are conveniently available for the
researcher.
The samples are easy to select, and the researcher did not choose the
sample that outlines the entire population.
Quota Sampling
In the quota sampling method, the researcher forms a sample that involves
the individuals to represent the population based on specific traits or
qualities. The researcher chooses the sample subsets that bring the useful
collection of data that generalizes the entire population.

Purposive or Judgmental Sampling


In purposive sampling, the samples are selected only based on the researcher’s
knowledge. As their knowledge is instrumental in creating the samples, there are the
chances of obtaining highly accurate answers with a minimum marginal error. It is also
known as judgmental sampling or authoritative sampling.

Snowball Sampling
Snowball sampling is also known as a chain-referral sampling technique. In this method,
the samples have traits that are difficult to find. So, each identified member of a
population is asked to find the other sampling units. Those sampling units also belong to
the same targeted population.
Probability sampling vs Non-probability Sampling Methods
The below table shows a few differences between probability sampling
methods and non-probability sampling methods.
Representation of Data

There are different ways to represent data such as


through graphs, charts or tables. The general
representation of statistical data are:

• Bar Graph
• Pie Chart
• Line Graph
• Pictograph
• Histogram
• Frequency Distribution
Bar Graph
represents grouped data with rectangular bars
with lengths proportional to the values that they
represent. The bars can be plotted vertically or
horizontally.
Pie Chart
A type of graph in which a circle is divided into Sectors. Each of
these sectors represents a proportion of the whole.
Line Chart
The line chart is represented by a series of data points
connected with a straight line.
The series of data points are called ‘markers.’
Pictograph
A pictorial symbol for a word or phrase, i.e. showing data with the
help of pictures. Such as Apple, Banana & Cherry can have
different numbers, and it is just a representation of data.
Histogram
A diagram is consisting of rectangles. Whose area is
proportional to the frequency of a variable and whose width
is equal to the class interval.
Frequency Distribution
The frequency of a data value is often represented by “f.” A
frequency table is constructed by arranging collected data
values in ascending order of magnitude with their
corresponding frequencies.
There are three basic types of frequency distributions
• Categorical frequency distribution
• Ungrouped frequency distribution
• Grouped frequency distribution

There are specific procedures for constructing each type.

➢ Categorical frequency Distribution

Used for data that can be place in specific categories such as


nominal, or ordinal. e.g. marital status.
Example:
A social worker collected the following data on marital
status for 25 persons.(M=married, S=single, W=widowed,
D=divorced)

MSDWD
SSMMM
WDSMM
WDDSS
SWWDD
Solution:
Since the data are categorical, discrete classes can be used. There are four
types of marital status M, S, D, and W. These types will be used as class for the
distribution. We follow procedure to construct the frequency distribution.

Step 1: Make a table as shown.


Column 1 Column 2 Column 3 Column 4
Class Tally Frequency Percent
M
S
D
W
Step 2: Tally the data and place the result in column (2).

Step 3: Count the tally and place the result in column (3).

Step 4: Find the percentages of values in each class by using;


𝒇
% = 𝒏• 𝟏𝟎𝟎,
Where f= frequency of the class, n=total number of value.
Step 5: Find the total for column (3) and (4)

Column 1 Column 2 Column 3 Column 4


Class Tally Frequency Percent
M ///// 5 20
S ///// - // 7 28
D ///// - // 7 28
W ///// - / 6 24
2. Ungrouped frequency Distribution
-Is a table of all the potential raw score values that could
possible occur in the data along with the number of times
each actually occurred.
-Is often constructed for small set or data on discrete variable.

Constructing ungrouped frequency distribution:

• First find the smallest and largest raw score in the collected
data.
• Arrange the data in order of magnitude and count the
frequency.
• To facilitate counting one may include a column of tallies.
Example:

The following data represent the mark of 20 students.

80 76 90 85 80
70 60 62 70 85
65 60 63 74 75
76 70 70 80 85

Construct a frequency distribution, which is ungrouped.


Solution:

Step 1: Find the range, Range=Max-Min=90-60=30.


Step 2: Make a table as shown
Step 3: Tally the data. Mark Tally Frequency
Step 4: Compute the frequency. 60 /// 2
62 / 1
63 / 1
Each individual value is 65 / 1
presented separately, that 70 //// 4

is why it is name dungrouped 74 / 1


75 // 2
frequency distribution.
76 / 1
80 /// 3
85 /// 3
90 / 1
3. Grouped frequency Distribution
- when the range of the data is large, the data must be grouped
in to classes that are more than one unit
in width.

Definitions:

• Grouped Frequency Distribution - a frequency distribution when


several numbers are grouped in one class.
• Class limits - separates one class in a grouped frequency
distribution from another. The limits could actually appear in the
data and have gaps between the upper limits of one class and
lower limit of the next.
• Units of measurement (U) - the distance between two possible
consecutive measures. It is usually taken as 1, 0.1, 0.01, 0.001, -----.
• Class boundaries - separates one class in a grouped frequency
distribution from another. The boundaries have one more decimal
places than the row data and therefore do not appear in the data.
There is no gap between the upper boundary of one class and lower
boundary of the next class. The lower class boundary is found by
subtracting U/2 from the corresponding lower class limit and the upper
class boundary is found by adding U/2 to the corresponding upper class
limit.
• Class width - the difference between the upper and lower class
boundaries of any class. It is also the difference between the lower limits
of any two consecutive classes or the difference between any two
consecutive class marks.
• Class mark (Mid points) - it is the average of the lower and upper class
limits or the average of upper and lower class boundary.
• Cumulative frequency - is the number of observations less than/more
than or equal to a specific value.
• Cumulative frequency above - it is the total frequency of all
values greater than or equal to the lower class boundary of a
given class.
• Cumulative frequency above - it is the total frequency of all
values less than or equal to the upper class boundary of a
given class.
• Cumulative Frequency Distribution (CFD) - it is the tabular
arrangement of class interval together with their
corresponding cumulative frequencies. It can be more than
or less than type, depending on the type of cumulative
frequency used.
• Relative frequency (rf) - it is the frequency divided by the
total frequency.
• Relative cumulative frequency (rcf) - it is the cumulative
frequency divided by the total frequency
Steps for constructing Grouped frequency Distribution
1. Find the largest and smallest values
2. Compute the Range(R) = Maximum – Minimum
3. Select the number of classes desired, usually between 5 and 20 or use
Sturges rule k = 1+ 3.32𝑙𝑜𝑔 n, where k is number of classes desired and n is
total number of observation.
4. Find the class width by dividing the range by the number of classes and
rounding up, not off. 𝒘 = 𝑹 𝑲
5. Pick a suitable starting point less than or equal to the minimum value. The
starting point is called the lower limit of the first class. Continue to add the
class width to this lower limit to get the rest of the lower limits.
6. To find the upper limit of the first class, subtract U from the lower
limit of the second class. Then continue to add the class width to this
upper limit to find the rest of the upper limits.
7. Find the boundaries by subtracting U/2 units from the lower limits
and adding U/2 units from the upper limits. The boundaries are also
halfway between the upper limit of one class and the lower limit of
the next class. !may not be necessary to find the boundaries.
8. Tally the data.
9. Find the frequencies.
10. Find the cumulative frequencies. Depending on what you're trying
to accomplish, it may not be necessary to find the cumulative
frequencies.
11. If necessary, find the relative frequencies and/or relative
cumulative frequencies.
Example:
Construct a frequency distribution for the following data.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solutions:

Step 1: Find the highest and the lowest value H = 39, L = 6


Step 2: Find the range; R = H – L = 39 – 6 = 33
Step 3: Select the number of classes desired using Sturges formula;

k = 1+ 𝟑. 𝟑𝟐𝒍𝒐𝒈 n
K = 𝟑. 𝟑𝟐𝒍𝒐𝒈 (𝟐𝟎)
K = 5.32 = 6 (rounding up).
Step 4: Find the class width;
𝒘=𝑹 𝑲
𝒘 = 𝟑𝟑
𝟔
𝒘 = 𝟑𝟑
𝟔
= 5.5
= 6 (rounding up)

Step 5: Select the starting point, let it be the minimum observation.


6, 12, 18, 24, 30, 36 are the lower class limits.

Step 6: Find the upper class limit; e.g. the first upper class=12-U=12-1=11 11,
17, 23, 29, 35, 41 are the upper class limits.

So combining step 5 and step 6, one can construct the following classes.
Class limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41

Step 7: Find the class boundaries;


E.g. for class 1 Lower class boundary
6 + 0.5 = 5.5
Upper class boundary
11 + 0.5 = 11.5
Then continue adding w on both boundaries to obtain the rest boundaries. By
doing so one can obtain the following classes.
Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5

Step 8: tally the data.


Step 9: Write the numeric values for the tallies in the frequency column.
Step 10: Find cumulative frequency
Step 11: Find relative frequency or/and relative cumulative frequency.
The complete frequency distribution follows:
MEASURES
OF
CENTERAL
TENDENCY
Introduction
➢ When we want to make comparison between groups of numbers it is
good to have a single value that is considered to be a good
representative of each group. This single value is called the average of
the group. Averages are also called measures of central tendency.
➢ An average which is representative is called typical average and an
average which is not representative and has only a theoretical value is
called a descriptive average. A typical average should posses the
following:
• It should be rigidly defined.
• It should be based on all observation under investigation.
• It should be as little as affected by extreme observations.
• It should be capable of further algebraic treatment.
• It should be as little as affected by fluctuations of sampling.
• It should be ease to calculate and simple to understand.
The Summation Notation
• Let 𝒙𝟏 + 𝒙𝟐 + 𝒙𝟑 + … + 𝒙𝒏 be a number of measurements where N is the
total number of observation and 𝒙𝒊 is 𝒊𝒕𝒉 observation.
• Very often in statistics an algebraic expression of the form 𝒙𝟏 + 𝒙𝟐 +
𝒙𝟑 + … + 𝒙𝒏 is used in a formula to compute a statistic. It is tedious to
write an expression like this very often, so mathematicians have
developed a shorthand notation to represent a sum of scores, called
the summation notation.

The symbol
σ𝑛𝑖=1 𝒙𝒊 is a mathematical shorthand for 𝒙𝟏 + 𝒙𝟐 + 𝒙𝟑 + … + 𝒙𝒏

The expression is read, "the sum of x sub i from i equals 1 to n."


It means "add up all the numbers."
Example:
Suppose the following were scores made on the first homework assignment
for five students in the class: 5, 7, 7, 6, and 8. In this example set of five
numbers, where N=5, the summation could be written:

෍ 𝒙𝒊 = 𝒙𝟏 + 𝒙𝟐 + 𝒙𝟑 + 𝒙𝟒 + 𝒙𝟓
𝑖=1
=5+7+7+6+8
= 33
PROPERTIES OF SUMMATION
Example:
Considering the following data
determine
X Y

5 6

7 7

7 8

6 7

8 8
Measures of central tendency help you find the middle, or the
average, of a dataset.
The 3 most common measures of central tendency are the mode,
median, and mean.

• Mean
• Median
• Mode

In addition to central tendency, the variability and distribution of


your dataset is important to understand when performing
descriptive statistics.
Mean
- is the sum of all values divided by the total number of values. It’s the
most commonly used measure of central tendency because all values
are used in the calculation.

Population versus sample mean

A dataset contains values from a sample or a population.


• A population is the entire group that you are interested in researching,
while a sample is only a subset of that population.
• While data from a sample can help you make estimates about a
population, only full population data can give you the complete
picture.

In statistics, the notation of a sample mean and a population mean and their
formulas are different. But the procedures for calculating the population and
sample means are the same.
Outlier effect on the mean
• Outliers can significantly increase or decrease the mean when they are
included in the calculation. Since all values are used to calculate the
mean, it can be affected by extreme outliers.
• An outlier is a value that differs significantly from the others in a dataset.
Mean of Grouped Data Formula

• The mean formula is defined as the sum of the observations divided by the
total number of observations.
• There are two different formulas for calculating the mean for ungrouped
data and the mean for grouped data.
• Let us look at the formula to calculate the mean of grouped data. The
σ𝒇
formula is: x̄ = 𝒊
𝑵

Where,

x̄ = the mean value of the set of given data


f = frequency of the individual data
N = sum of frequencies

Hence, the average of all the data points is termed as mean.


Direct Method
- is the simplest method to find the mean of the grouped data.
- If the values of the observations are 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 , . . . 𝒙𝒏 with their corresponding
frequencies are 𝒇𝟏 , 𝒇𝟐 , 𝒇𝟑 , . . . 𝒇𝒏 then the mean of the data is given by,

𝒙𝟏 𝒇𝟏 +,𝒇𝟐 𝒙𝟐 + 𝒇𝟑 𝒙𝟑 + . . . 𝒇𝒏 𝒙𝒏
x̄ = 𝒇𝟏 ,𝒇𝟐 ,𝒇𝟑 , . . . 𝒇𝒏

σ 𝒙𝒊 𝒇𝒊
x̄ = σ𝒇
𝒊

Example: Find the mean of the following data.


Class Interval 1 - 10 11 - 20 21 - 30 31 - 40 41 – 50
(x)
Frequencies 9 13 8 15 10
(𝒇𝒊 )
Solution:
• The first step is to create the table with the midpoint or marks and the product of the
frequency and midpoint.
• To calculate the midpoint we find the average between the class interval by using the
formula mentioned above.

Class Interval Frequencies Class Mark 𝒙𝒊 𝒇𝒊


(x) (𝒇𝒊 ) (𝒙𝒊 ) σ 𝒙𝒊 𝒇𝒊
x̄ = σ𝒇
1 - 10 9 5.5 49.5 𝒊
11 - 20 13 15.5 201.5
21 - 30 18 25.5 459
1697.5
31 - 40 15 35.5 532.5 x̄ = 55
41 – 50 10 45.5 455
σ 𝒇𝒊 = 55 σ 𝒙𝒊 𝒇𝒊 = 1697.5

x̄ = 30.86
Assumed Mean Method
• A technique used to calculate the arithmetic mean for grouped data. In
this method.
• An assumed mean (a value within the range of the data) is chosen, and
the deviations of the data points from this assumed mean are
determined.
• By using these deviations, the arithmetic mean is then computed,
providing an estimate of the central tendency of the grouped data.

Assumed Mean Formula Of Arithmetic Mean:

σ(𝒇𝒊 𝒅𝒊 )
AM = a + ( 𝒇 )
𝒊
Where,
a = assumed mean
fi = frequency of ith class
di = xi – a = deviation of ith class
Σfi = n = Total number of observations
xi = class mark = (upper class limit + lower class limit)/2
Example:
The following table gives information about the marks obtained by
110 students in an examination.
Class 1 - 10 11 - 20 21 - 30 31 - 40 40 - 50
Frequency 12 28 32 25 13

Find the mean marks of the students using the


assumed mean method.
Solution: Class
(c)
Frequency
( 𝒇𝒊 )
Class Mark
(𝒙𝒊 )
𝒅𝒊 = 𝒙𝒊 - a 𝒇 𝒊 𝒅𝒊

1 - 10 12 5.5 -20 -240


11 - 20 28 15.5 -10 -280
21 - 30 32 25.5 a 0 0
31 - 40 25 35.5 10 250
40 - 50 13 45.5 20 260
σ 𝒇𝒊 = 110 σ 𝒇𝒊 𝒅𝒊 = -10

σ(𝒇𝒊 𝒅𝒊 )
=a+( 𝒇𝒊
)

−𝟏𝟎
= 25.5 + 𝟏𝟏𝟎

= 25.5 + ( -0.091)

= 25.409
Mean Deviation
• is defined as a statistical measure that is used to calculate the
average deviation from the mean value of the given data set.
• The mean deviation of the data values can be easily calculated using
the below procedure.

Step 1: Find the mean value for the given data values
Step 2: Now, subtract the mean value from each of the data values given
(Note: Ignore the minus symbol)
Step 3: Now, find the mean of those values obtained in step 2

σ |X – µ|
Mean Deviation = 𝑵
Where,
Σ represents the addition of values
X represents each value in the data set
µ represents the mean of the data set
N represents the number of data values
Example: Determine the mean deviation for the data values
5, 3,7, 8, 4, 9.
Median
- the median of a dataset is the value that’s exactly in the middle when it is
ordered from low to high.
Mode
• is the most frequently occurring value in the dataset. It’s possible to have no
mode, one mode, or more than one mode.
• To find the mode, sort your dataset numerically or categorically and select the
response that occurs most frequently.
Quantiles
• When a distribution is arranged in order of magnitude of items, the median is the
value of the middle term.
• Their measures that depend up on their positions in distribution quartiles, deciles,
and percentiles are collectively called quantiles.
Quartiles
• are measures that divide the frequency distribution in to four equal
parts.
• The value of the variables corresponding to these divisions are
denoted 𝑸𝟏 , 𝑸𝟐 , and 𝑸𝟑 often called the first, the second and the
third quartile respectively.
• 𝑄1 is a value which has 25% items which are less than or equal to it.
• Similarly 𝑄2 has 50%items with value less than or equal to it and 𝑄3
has 75% items whose values are less than or equal to it.
𝑖𝑁
• To find 𝑄𝑖 (i = 1, 2, 3) we count of the classes beginning from the
4
lowest class.
• For grouped data: we have the following formula:
𝒊𝑵
𝟒 − 𝒇𝒄
𝑸 𝒊 = 𝑳𝑸 𝒊 + ( )•w
𝒇𝑸
Where:
𝑳𝑸𝒊 = lower class boundary of the quartile class
𝒇𝒄 = cumulative frequency less than type preceding the quartile class
𝒇𝑸 = frequency of the quartile class
𝑵 = total number of observations
W = class size/width

Remark:
The quartile class (class containing
Qi ) is the class with the smallest
cumulative frequency (less than
type) greater than or equal to 𝒊𝑵𝟒
Deciles
• are measures that divide the frequency distribution in to ten equal parts.
• The values of the variables corresponding to these divisions are denoted
𝑫𝟏 , 𝑫𝟐 ,.. 𝑫𝟗 often called the first, the second,…, the ninth decile
respectively.
• To find 𝑫𝒊 (i = 1, 2,..9) we count 𝟏𝟎
𝒊𝑵
of the classes beginning from the
lowest class.
• For grouped data: we have the following formula:
𝒊𝑵
𝟏𝟎 − 𝒇𝒄
𝑫𝒊 = 𝑳𝑫𝒊 + ( )•w
𝒇𝑸
Where:
𝑳𝑫𝒊 = lower class boundary of the decile class
𝒇𝒄 = cumulative frequency less than type preceding the decile class
𝒇𝑸 = frequency of the decile class
𝑵 = total number of observations
W = class size/width
Percentiles
• are measures that divide the frequency distribution in to hundred equal
parts.
• The values of the variables corresponding to these divisions are denoted
𝑷𝟏 , 𝑷𝟐 ,.. 𝑷𝟗𝟗 often called the first, the second,…, the ninety-ninth
percentile respectively.
• To find 𝑷𝒊 (i= 1, 2,..99) we count 𝟏𝟎𝟎
𝒊𝑵
of the classes beginning from the
lowest class.
• For grouped data: we have the following formula:
𝒊𝑵
𝟏𝟎 − 𝒇𝒄
𝑷𝒊 = 𝑳𝑷𝒊 + ( )•w
𝒇𝑸
Where:
𝑳𝑷𝒊 = lower class boundary of the percentile class
𝒇𝒄 = cumulative frequency less than type preceding the percentile class
𝒇𝑸 = frequency of the percentile class
𝑵 = total number of observations
W = class size/width

You might also like