Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Lesson 4

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 22

Introduction to Statistics

Definition of terms:
Statistics is a branch mathematics that deals with the collection, organization or presentation, analysis,
and interpretation of data. Its fundamental purpose is to describe and draw inferences about the numerical
properties of a population.

Historical/Bibli Descriptive Statistics is a statistical procedure concerned with describing


cal Note the characteristics and properties of a group of persons, places or things; it is
The origin of descriptive statistics can based on easily verifiable facts. It organizes the presentation, description, and
be traced to data collection methods
used in censuses taken by the
Babylonians and Egyptians between
interpretation of data gathered. It includes the study of relationships among
4500 and 3000 BC.
variables.
In the Roman Empire between 27 BC
to 17 AD conducted surveys on births If you have gathered data from a survey and have organized them in a
and deaths its citizens, the number of
livestock and the harvested crops systematic, easy-to-read manner then you have succeeded in applying the basic
yearly.

Luke 2:1-4
principles of descriptive statistics.
In those days a decree went out from
Caesar Augustus that all the world Among the measurements falling under descriptive statistics are the
should
be registered. 2 This was the first measures of central tendency, measures of variability, skewness, kurtosis,
registration
when[a] Quirinius was governor of minimum, maximum, summation and other items which help in describing a data
Syria. 3 And all went to be registered,
each to his own
town. 4 And Joseph also went up
set.
from Galilee, from the town
of Nazareth, to Judea, to the city of Descriptive statistics answers the questions such as:
David, which is
called Bethlehem, because he was of 1. How many students are interested to take online classes?
the house and lineage of David…
2. What months has the highest and the lowest number of covid-19 positive?
3. What are the most likable Netflix series according to students?
4. Who performed better in the entrance examination?
5. What proportion of the ULS college students likes online class?
Inferential Statistics is a statistical procedure used to draw inferences for the population on the
basis of the information obtained from the sample. With inferential statistics, you are going to try to arrive
at conclusions extending beyond the data alone. You may use it to make judgments of the possibility that
an observed difference between groups/data is a dependable one or it just happened due to chance. It is a
matter of deciding between reality and coincidence.

Inferential statistics can answer questions such as:


1. Is there a significant difference in the academic performance of students enrolled in an online
and modular class?
2. Is there a significant difference between the proportions of students who are interested to take
statistics online and those who are not?

Population – refers to a large collection of objects, places or things.


Parameter – is any numerical value which describes a population.
Example: There are 7, 592 students enrolled in a certain Marian Institution.
N = 7, 592 Parameter (N)

Sample – is a small portion or part of a population; a representative of the population in a research study.
Statistic – is any numerical value which describes a sample.
Example: Out of the 7, 592 students enrolled in a Marian Institution, 3,568 are Female.
n = 3,568 Statistic (n)

Data - are facts, or a set of


information gathered or
under study. Data

Qualitative Data – are variables


that can be placed into distinct
categories, characteristic or
attributes.
Quantitative Data – are Quantitative Qualitative
numerical and can be ordered or
ranked.

Discrete Data – assume exact


value and can be obtained through
counting. e.g: number of students
Continuous Data – assume
infinite values within an interval and
obtained through measurement. e.g:
Temperature Discrete Continuous
Constant – is a characteristic or property of a population or sample which makes the
Note this:
members similar to each other.
Researchers are not
interested in constant
since they do not make
Variable – is a characteristic or property of a population or sample which makes the
the subjects of research members different from each other.
different from another.
They are specifically
interested in variables Dependent Variable – A variable that is affected by another variable.
Independent Variable – a variable which affects the dependent variable.

Scales of Measurement
1. Nominal level of measurement classifies data into mutually exclusive categories in which no order
or ranking can be imposed on the data. Nominal numbers are just labels. e.g. SSS number
2. Ordinal level of measurement classifies data into categories that can be ranked; however, precise
differences between the ranks do not exist. e.g. size of t-shirt.
3. Interval level of measurement ranks data, and precise differences between units of measure do
exist; however, there is no meaningful zero. e.g. temperature.
4. Ratio level of measurement possesses all the characteristics of interval measurement, and there
exists a true zero. in addition, true ratios exist when the same variable is measured on two different
members of the population. e.g. height

Sampling Techniques
In doing research, if the population is too big a scientific number of samples is acceptable. One way
of getting a number of samples is by using the RAOSOFT survey tool. You can use the raosoft calculator
tool online to compute the desired and substantial sample size. http://www.raosoft.com/samplesize.html

Note that the “e” is called the margin of error. It is a value which quantifies possible sampling errors.
Usually the margin of error is either 0.01 or 1%, 0.10 or 10 % and 0.05 or 5%. Sampling error means that
the results in the sample differ from those of the target population because of the “luck of the draw”.

Note this: Since you already know what to use to compute the appropriate sample size, the
Sampling is the process of next is how to select the samples from the population. This is referred to as sampling.
selecting samples from a
given population We will only consider and discuss the probability sampling techniques, these are:
There are two types of
sampling techniques Simple random sampling. This is a procedure where a sample is selected in such a way
(1) Probability that every element is as likely to be selected as any other element in the population.
Sampling: samples
are chosen in such a way
that each Example:
member of the
population has an equal Lottery: this needs a complete list of the population. you write the names or
chance of being selected
in the samples codes of each member and place them in a container, then randomly
(2) Non Probability draw the desired number of samples. This is easy if the population is
Sampling: each
member of the small.
population does not have
a known chance of being Systematic random sampling. This method is a sampling procedure with a random start.
included in the sample.
Hence, personal Samples are randomly chosen using the rules set by the researchers. This involves
judgment plays an 𝑁
important role in the choosing the 𝑘𝑡ℎ member of the population, with 𝑘 = , but there should be a random
selection.
𝑛
start.
Example: Choose a sample of size 10 from N = 500.
1. Choose a random start, say 10.
500
2. Determine the 𝑘𝑡ℎperiod by 𝑘 = = 50, so every 50th member will be chosen starting from 10
10
3. So the respondents will be member number 10, 60, 110, 160, 210, 260, 310, 360, 410, 460.

Stratified random sampling. This is used when the population can be naturally classified
into groups or strata.
Example: A survey to find out families living in a certain municipality are in favor of charter
change will be conducted. To ensure that all income groups are represented, respondents will
be divided into high-income (Class A), middle (class B) and low-income (class C) groups.
Below is the distribution of income groups.
Strata Number of Families
Class A 1000
Class B 2 500
Class C 1 500
N 5 000
1. Using Raosoft Calculator to find the sample size (n), use 5% margin of error with 50% response
rate, n = 357

2. Use proportional allocation, how many from each group should be taken as sample?

Strata Number of Families Percent Number of


Samples (n)
Class A 1000 1000 (0.2)(357)
= 0.2 = 20%
5000 = 71.4 = 71
Class B 2 500 2500 (0.5)(357)
= 0.5 = 50%
5000 = 178.5 = 179
Class C 1 500 1500 (0.3)(357)
= 0.3 = 30%
5000 = 107.1 = 107
N 5 000 n = 357

So, 71 families should be taken as respondents from Class A, 179 from Class B and 107 from
Class C, for a total of 357.
Collecting and Organizing Data in a Table
The study of statistics begins with the collection of data or measurements. Data collected should be
organized systematically for easier and faster interpretation. They may be presented in any of the
following forms:
The textual form can be used if the data to be presented is few.
The tabular and graphical forms are used when more detailed information about the data is to be
presented.
A table is used when you want to present a data in a systematic and organized manner so that
reading and interpretation will be simpler and easier.

When a table is used, you must consider the following parts:


1. Table number Table 3
2. Table Title Distribution of students Hogwarts School According to Year Level

Year Level Number of Students


Freshman 350
Sophomore 300
Junior 250
Senior 200
total 1 100
3. Column header
4. Row classifier
5. Body of the table
6. Source note
Example 1:
Table 1
Mahusay National High
School Enrolment, SY
2005-2006
Year Level Male Female
First 216 267
Second 197 216
Third 187 227
Fourt 176 215
h 776 925
Total
You will observe that the table above shows clearly the enrolment data in Mahusay National High School
for the school year 2005-2006.
Another type of tabular presentation is the frequency table also known as a frequency distribution.
It is an arrangement of the data that shows the frequency of occurrence of different values of the
variables.

A frequency table is constructed by listing the measurements from highest to lowest, then making
tally marks to record how often each number occurs. After tallying, count the marks and record them in
the proper column.

Example 2: The following are the ages of 45 patients confined in a certain hospital suffering from mild
pneumonia:

17 20 15 18 19 16 11 10 15 16
12 12 13 14 11 10 14 13 12 11
13 15 14 10 15 16 17 17 18 20
20 18 19 19 18 17 16 15 12 12
13 14 15 19 20

Prepare a frequency table for the set of data.

Solution: To prepare a frequency table for the given set of scores, the scores are listed from
highest to lowest, tally marks are made and counted. The counted tally marks will then
be recorded under the column frequency. Notice that every 5 th tally crosses the first four
tallies. This is done to make counting of marks easier especially if the number of cases
is rather big.

Score Tallies Frequency


20 //// 4
19 //// 4
18 //// 4
17 //// 4
16 //// 4
15 ////// 6
14 //// 4
13 //// 4
12 ///// 5
11 /// 3
10 /// 3
Total 45

Frequency Distribution Tables

If the number of measures in consideration is rather big, the presentation of data is further
simplified by grouping the measures into class intervals called a frequency distribution.
A frequency distribution is a distribution of the total number of measures or frequencies over
arbitrarily defined categories or classes. The number of measures falling under a class is called class
frequency.

Example 1.
The frequency distribution below shows the scores obtained by 300 students in an
Biostatistics test of 50 items.

Number of
Score Students
45-49 15
40-44 32
35-39 42
30-34 108
25-29 67
20-24 21
15-19 10
10-14 5
Total 300

In the example above, the symbol 45-49 and the other symbols which follow up to 10-14 are
called class intervals. The end numbers are called class limits. For instance, in the class interval 45-49,
45 is called the lower limit while 49 is called the upper limit.

Each class interval has also a lower boundary and a higher boundary. For the class interval 45-49,
the lower boundary is 44.5 while the higher boundary is
49.5. Hence, for the class interval 45-49, 44.5 – 49.5 are called the class boundaries.

The size of the class interval, also called class size is the difference between the upper boundary
and the lower boundary. Hence, the class size in the given example is 5
A class interval has also a midpoint or a class mark. It is obtained by taking half the sum of the lower and
upper class limit. For instance, the midpoint of the class interval 45-49 is 45−49 or 47.
2

Range (R) is the difference of the Highest score (H) and the lowest score (L) in the given data set.
The following are the suggested steps on how to make a class interval:
1. Determine the desired number of classes (n) (number of rows)
2. Solve for the class width (i)
𝑅𝑎𝑛𝑔𝑒
𝑖=
𝑛
3. Start the lowest class interval with the lowest value / score in the given data set. (lowest score
plus i). Continue until the highest value in the distribution is reached

Measures of Central Tendency for Ungrouped Data


Aside from tables and graphs, another way of describing a set of data is by stating a single
numerical value associated with it. This value is where all the other values in a distribution tend to
cluster. It is called the average or measure of central tendency. There are three kinds of average: the
mean, the median, and the mode.

The mean (also known as the arithmetic mean) is the most commonly used measure of central
position. It is the sum of measures divided by the number of measures in a variable. It is symbolized
as x (read as x bar).
The mean is used to describe a set of data where the measures cluster or concentrate at a point. As
the measures cluster around each other, a single value appears to represent distinctively the total
measures. It is, however, affected by extreme measures, that is, very high or very low measures can easily
change the value of the mean.

To find the mean of ungrouped data, use the formula

𝑥̅ = ∑ 𝑥
𝑛
where ∑x = the summation of x (sum of the measures) n = number of values of x

Example 1: The grades in Chemistry of 10 students are 87, 84, 85, 85, 86, 90, 79,
82, 78, 76. What is the average grade of the 10 students?
Solution:
Note:
87 + 84 + 85 + 85 + 86 + 90 + 79 + 82 + 78 + 76 832
𝑥̅ = = = 83.2
Mean 10 10
∑ 𝑓𝑥
𝑥̅ =
𝑛 Example 2: Find the mean salary for a small company that pays
Where: 𝑥̅ = mean samonthly laries to its employees as shown in the frequency
𝑓 = frequency
𝑥 = score
n = number of distribution.
observation
Salary (x)𝑥̅∑=𝑓𝑥 Number of fx
Employees (f) 𝑛
Php7 000.00 8 56 000 427 000
=
Php8 000.00 11 88 000 45
Php9 250.00 14 129 500 = 9 488.89
Php10 500.00 9 94 500
Php17 000.00 2 34 000
Php25 000.00 1 25 000

The median is the middle entry or term in a set of data arranged in either increasing or
decreasing order.
The median is a positional measure. Thus the values of the individual measures in a set of
data do not affect it. It is affected by the number of measures and not by the size of the extreme
values.

To find the median of a given set of data, take note of the following:

Arrange the data in either increasing or decreasing order.


Locate the middle value. If the number of cases is odd, the middle value is the median. If the
number of cases is even, take the arithmetic mean of the two middle measures.

Example 1: The number of books borrowed in the library from Monday to Friday last
week were 58, 60, 54, 35, and 97 respectively. Find the median.
Solution: Arrange the number of books borrowed in increasing order.
35, 54, 58, 60, 97
The median is 58.
Example 2: Cora’s quizzes for the second quarter are 8, 7, 6, 10, 9, 5, 9, 6, 10, and 7.
Find the median.
Solution: Arrange the scores in increasing order.
5, 6, 6, 7, 7, 8, 9, 9, 10, 10
Since the number of measures is even, then the median is the average of the two
𝟕+𝟖
middle scores 𝒙̃ = 𝟐 = 𝟕. 𝟓
The mode is another measure of position. The mode is the measure or value which occurs
most frequently in a set of data. It is the value with the greatest frequency. To find the mode for a
set of data:
- select the measure that appears most often in the set;
- if two or more measures appear the same number of times, and the frequency they
appear is greater than any other measures, then each of these values is a mode;
- if every measure appears the same number of times, then the set of data has no mode.

Example 1: The shoe sizes of 10 randomly selected students in a class are 6, 5, 4, 6, 4 1 , 5,


6, 7, 7 and 6. What is the mode?
2

Answer: The mode is 6 since it is the shoe size that occurred the most number of times.

Example 2: The sizes of 9 classes in a certain school are 50, 52, 55, 50, 51, 54, 55, 53 and
54.

Answer: The modes are 54 and 55 since the two measures occurred the same number
of times. The distribution is bimodal.

The Mean of Grouped Data Using the Class Marks

When the number of items in a set of data is too big, items are grouped for convenience. The
manner of computing for the mean of grouped data is given by the formula:

∑(𝑓𝑥𝑚)
𝑥̅ =
where: 𝑥̅ is the mean 𝑁
f is the frequency of each class
𝑥𝑚 is the class mark
∑(𝑓𝑥𝑚) is the summation of the product of the frequency and the class mark
𝑁 is the sum of all the frequency
Examples:
Compute the mean of the scores of the students in a Mathematics test.
Class Frequency
46 – 50 1
41 – 45 5
36 – 40 11
31 – 35 12
26 – 30 11
21 – 25 5
16 – 20 2
11 – 15 1

The frequency distribution for the data is given below. The columns X and fX are added.
Class f 𝑥𝑚 𝑓𝑥𝑚
46 – 50 1 48 48 ∑(𝑓𝑥𝑚)
41 – 45 5 43 215 𝑥̅ =
𝑁
36 – 40 11 38 418 1 549
31 – 35 12 33 396 =
48
26 – 30 11 28 308 = 32.27
21 – 25 5 23 115
16 – 20 2 18 36
11 – 15 1 13 13
The mean score is 32.
The Median of Grouped Data
The median is the middle value in a set of quantities. It separates an ordered set of
data into two equal parts. Half of the quantities found above the median and the other half
is found below it.
In computing for the median of grouped data, the following formula is used:
𝑁
− 𝑐𝑓
𝑏
𝑥̃ = 𝑥𝑙𝑏 + [ ]𝑖
2 𝑓
𝑚

where: 𝑥̃ = median
𝑥𝑙𝑏 = the lower boundary of the median class
N = total frequency
𝑐𝑓𝑏 = the cumulative frequency of the lower class next to the
median class
𝑓𝑚 = frequency of the median class
𝑖 = size of the class interval

The median class is the class that contains


the
𝑡ℎ score. This can be located under
(𝑁)
2

the column < 𝑐𝑓of the cumulative frequency distribution.

Examples:
1. Compute the median of the scores of the students in a Mathematics test.

Class Frequency
46 – 50 1
41 – 45 5
36 – 40 11
31 – 35 12
26 – 30 11
21 – 25 5
16 – 20 2
11 – 15 1
The frequency distribution for the data is given below. The columns for lb and “less than”
cumulative frequency are added.
48
Class f lb “<” cf Compute 𝑁, since 𝑁 = = 𝑁
46 – 50 1 45.5 48 2 2 2

41 – 45 5 40.5 47 24, then the class interval 𝑥̃ = 𝑥𝑙𝑏 + [ 2𝑐𝑓 ] 𝑖
𝑏
36 – 40 11 35.5 42 containing 24 in the less 𝑓𝑚
31 – 35 12 30.5 31 than cumulative frequency will 24 − 19
26 – 30 11 25.5 19 = 30.5 + [ ]5
be the median class. In this 12
21 – 25 5 20.5 8 case the median class is the 31 = 32.58
16 – 20 2 15.5 3
– 35 class interval.
11 – 15 1 10.5 1
The median score is 32.58.
The Mode of Grouped Data

The mode of grouped data can be approximated using the following formula:
∆1
𝑥̂ = 𝑥𝑙𝑏 + [ ]𝑖
∆1 + ∆2
where: 𝑥𝑙𝑏 is the lower bound ary of the modal class.
∆1 is the difference between the frequencies of the modal class and the next
lower class.
∆2 is the difference between the frequencies of the modalclass and the next
upper class.
𝑖 is the class interval.
The modal class is the class interval with the highest frequency.
Example:

1. Compute the mode of the scores of the students in a Mathematics test.


Class Frequency
46 – 50 1
41 – 45 5
36 – 40 11
31 – 35 12
26 – 30 11
21 – 25 5
16 – 20 2
11 – 15 1

The frequency distribution for the data is given below. The column for lb is added.

Class f lb Since class 31 – 35 has ∆1


46 – 50 1 45.5 the highest frequency, 𝑥̂ = 𝑥𝑙𝑏 + [ ]𝑖
∆1 + ∆2
41 – 45 5 40.5 the modal 12 − 11
36 – 40 11 35.5 class is 31 – 35. = 30.5 + [ ]5
31 – 35 12 30.5 (12 − 11) + (12 − 11)
26 – 30 11 25.5 = 30.5 + [0.5]5
21 – 25 5 20.5 = 30.5 + 2.5
= 32.5
16 – 20 2 15.5 The mode score is 33.
11 – 15 1 10.5
THE QUESTIONNAIRE AND THE LIKERT SCALE
Likert-type question is used if the researcher wants to know the feelings or opinions of
the respondents regarding any topic or issues of interest.
Below are examples of Likert-type statements. The respondents will choose the number
which best represents their feelings regarding the statements. Remember that the statements are
grouped according to a theme.
Choices
Likert-type Mean Interpretation
5 – strongly agree
4.50 – 5.00 – strongly agree
4 – agree
3.50 – 4.49 – agree
3 – Somewhat agree
2.50 – 3.49 - Somewhat agree
2 - disagree
1.50 – 2.49 - disagree
1 – strongly disagree 1.00 – 1.49 – strongly disagree
Items 1 – 3 refers to students’ personal confidence in learning statistics 5 4 3 2 1
1. I am sure that I can learn statistics.
2. I think I can handle different lessons in statistics.
3. I can get good grades in statistics.
Items 4 – 6 refer to the students’ perception on statistics as a subject
4. I think statistics is a worthwhile, necessary subject.
5. I will use statistics in many ways as a professional.
6. I’ll need a good understanding of statistics for my research work.
Items 7 – 9 re to students’ attitudes on the use of computer in learning
7. Computer makes learning fun and easy
8. I think working with computers would be enjoyable and stimulating
9. Computers help a lot in learning

The table below is the summary and interpretation of the mean responses in the Likert-type of
statements 1 - 3.
5 4 3 2 1 𝑥̅ Interpretation of 𝑥̅
1. 36 51 18 0 1 4.14 Agree
2. 18 44 37 8 1 3.65 Agree
3. 18 48 28 0 1 3.86 Agree
T 72 143 83 8 3 3.88 Agree
MEASURES OF VARIABILITY

Definition The first part of this module dealt with the concepts of measurements that
MEASURES or describe the middle or the center of the distribution. But the measures of central
VARIABILITY OR
DISPERSION tendency do not describe how the observations spread out from the center of the
distribution. In this lesson, we will deal with the measures of variability or spread
These are measures
of the average of a distribution. For the measures of skewness and kurtosis, it will be your task to
distance of each
observation from the read about it a link will be provided for your reading.
center of the
distribution. They Consider the following measurements, in liters, for two samples of apple juice in a
measure the
homogeneity or tetra packed by companies A and B.
heterogeneity of a Sample A Sample B
particular group.
0.95 1.06

1.00 1.01

0.92 0.88

1.03 0.91

1.10 1.14

𝑥̅ = 1.00 𝑥̅ = 1.00

How far apart are the measurements from one another?

Company A
0.8 0.9 1 1.1 1.2

Company B

0.8 0.9 1 1.1 1.2


Both samples have the same mean, 1.00 liters. It is quite obvious that company A packed
apple juice with a more uniform content than company B. We say that the variability or the
dispersion of the observations from the mean is less for sample A than for sample B. Therefore,
in buying apple juice, we would feel more confident that the tetra pack we select will be closer to
the advertised mean if we buy from company A.

A small measure of variability A big measure of variability would


would indicate that the data are... indicate that the data are...
1. clustered closely around the 1. FAR AWAY FROM THE MEAN
mean 2. HETEROGENEOUS
2. more homogeneous 3. MORE VARIABLE
3. less variable 4. LESS CONSISTENT
4. more consistent 5. LESS UNIFORMLY
5. more uniformly distributed DISTRIBUTED

There are 4 major parts/measures under the measures of variability


These are:
1. Range (R): The difference between the highest value and the lowest value in a set
of data Formula: R = H - L
2. Standard Deviation (𝜎 for population and s for sample)
Calculator: 𝜎 or s
3. Variance (𝜎 2 for population and s2 for sample) The square of standard deviation
4. Coefficient of Variation (cv): It is to compare the variability of two or more sets of
data having different units.
Formula: cv = standard deviation divided by mean
Range
The range is the simplest measure of variability. It is the difference between the largest and
smallest measurement.

R = H - L

where R = Range, H = Highest measure, L = Lowest Measure

The main disadvantage of the range is that it does not consider every measure in the data.

Examples:

1. The IQs of 5 members of a family are 108, 112, 127, 118 and 113. Find the
range.

Solution: The range of the IQs is 127 - 108 = 19.

2. The range of each of the set of scores of the three students is as follows:
H = 98 L = 92 R = 98 - 92 = 6
Student A

H = 97 L = 90 R = 97 - 90 = 7
Student B

H = 97 L = 90 R = 97 - 90 = 7
Student C

Observe that two students are “ tie.” This indicates that the range is not a reliable
measure of dispersion. It is a poor measure of dispersion, particularly if the size of the sample
or population is large. It considers only the extreme values and tells us nothing about the
distribution of numbers in between.

Variance
Since the range considers only two scores in the data set, it is an
Definition unreliable measure of variability, it cannot be used to directly compare two sets
of data and it is an unstable measure of variability especially for a very large
Variance is the
sample.
average of the squared
deviation from the
mean. The formulas for Consider the following
finding the variance for
ungrouped data are: For male group
Sample A (𝑥 − 𝑥̅) (𝑥 −
Population Variance: a. Treating the data as
∑(𝑥 − 𝜇)2 𝑥̅)2
𝜎2 = population, the variance is
𝑁
65 -19 361 ∑(𝑥 − 𝜇)2 820
Sample Variance: 𝜎2 = =
∑(𝑥 − 𝑥̅)2 𝑁 5
𝑠2 = 75 -9 81 = 164 𝑠𝑞𝑢𝑎𝑟𝑒 𝑢𝑛𝑖𝑡𝑠
𝑛−1
Where:
85 1 1 b. Treating the data as sample,
𝑥 = individual
value/score from the variance is
the raw data 95 11 121 ∑(𝑥 − 𝑥̅)2 820 820
𝑥̅ = the sample mean 𝑠2 = = =
𝑛−1 5−1 4
𝑁 = total population = 205 𝑠𝑞𝑢𝑎𝑟𝑒 𝑢𝑛𝑖𝑡𝑠
𝑛 =number of sample 100 16 256
𝜇 = population mean 𝛴 = 820
𝜎2 = the population
variance
𝑠2 = the sample
variance

For female group


1. Treating the data as
population, the variance is
Sample B (𝑥 − 𝑥̅) (𝑥 −
𝑥̅)2
82 -2 4
83 -1 1
84
0 0
85 1 1
86 2 4
𝛴 = 10

Note this
Using the variance as a measure of variability for the two sets of grades, the males showed more variability in
performance. Note that the higher the variance, the more variable or far apart the values are from each other.

Standard Deviation
Since the obtained variance is in squared units then the deviation from the
mean is squared. You cannot still picture out the true meaning of the data set.
Definition
Hence, extract the square root of the variance, defining another measure of
standard deviation variability called the standard deviation.
is the square root of the
variance. For Male Group
Population Standard a. Treating the data as population, the standard deviation is
Deviation:
∑(𝑥 − 𝜇)2 ∑(𝑥 − 𝜇)2 820
𝜎=√ 𝜎=√ =√ = √164 = 12.81
𝑁 𝑁 5

Sample Standard b. Treating the data as sample, the standard deviation is


deviation: ∑(𝑥 820 820
∑(𝑥 − 𝑥̅)2 𝑠=√ − 𝑥̅)2 =√ =√ = √205 = 14.32
5−1 4
𝑠=√ 𝑛−1
𝑛−1
Where: For female group
𝑥 = individual
value/score from the a. Treating the data as population, the standard deviation is
raw data
𝑥̅ = the sample mean ∑(𝑥 − 𝜇)2 10
𝑁 = total population 𝜎=√ = √ = √2 = 1.41
𝑛 =number of sample 𝑁 5
𝜇 = population mean
𝜎 = the population b. Treating the data as sample, the standard deviation is
standard deviation
𝑠 = the sample standard ∑(𝑥 − 𝑥̅)2 10 10
𝑠=√ =√ = √ = √2.5 = 1.58
deviation 𝑛−1 5−1 4
Based from the results, the male group are more dispersed than the female group.

Note this:

The computation of the variance and the standard deviation is far too easy if there only few cases; however, if
you have a larger set of data your calculator or your computer can do everything for you, all you have to do is
to input the data and interpret the result. To aide you in the computation of the variance and standard
deviation click the link below, be sure you have your calculator.
For casio fx 570ms and fx 991 ms https://www.youtube.com/watch?v=WTUYZZXU1Ws
For casio 50f ms https://www.youtube.com/watch?v=3m2bFJ43pWs

Coefficient of Variance
When comparing two sets of data with different units, you cannot use the variance or the standard
deviation. You have to use the coefficient of variance.

Definition

Coefficient of variance It is the ratio of the standard deviation to the mean. It is used to compare
the variability of two or more sets of data even when they are expressed in different units of
measurement.

𝑠𝑑
𝑐𝑣 = • 100%
𝑥̅
Where:
𝑐𝑣 = Coefficient
of variance
𝑠𝑑 = standard deviation
𝑥̅ = mean

Example: The following are the time taken (in minutes) to complete a homework by the two
groups of students in a day:
Group A :24, 26, 33, 37, 29, 31
Group B : 26 , 28, 33, 35, 37, 29 , 27, 25
a. Find the range.
b. Find the coefficient of variation.

Solution;

a. Find the range.


24, 26, 29, 31, 33, 37 arrange the data in ascending/descending
order R = H – L
R = 37 – 24
R = 13

b. Find the coefficient of variation.


Step 1. Find the Mean
24+26+29+31+33+37
𝑥̅ = 6 = 30
Step 2. Construct a table for easier computation later
x x - 𝒙̅ (x - 𝒙̅ ) 2
Follow this process and do the
24 24 – 30 = - 6 (-6)2 = 36
26 -4 16 same for the next data
29 -1 1
31 1 1
33 3 9
37 7 49
∑(𝑥 − 𝑥̅)2 = 112

Step 3. Solve for the Standard deviation

∑(𝑥 − 𝑥̅)2
𝑠𝑑 = √
𝑛−1
112
𝑠𝑑 = √
6−1
sd = 4.73

Step 4. Solve for the coefficient of variance or cv


𝑠𝑑
𝑐𝑣 = • 100%
𝑥̅
4.73
𝑐𝑣 = • 100%
30
cv = 16 %
Guiding Principle
The lesser the value of the measure, the more consistent, the more
homogeneous and the less scattered are the observations in the set of data.
Summary
Statistics is a branch mathematics that deals with the collection, organization or presentation, analysis,
and interpretation of data. Its fundamental purpose is to describe and draw inferences about the numerical
properties of a population.
Two main parts of Statistics: Descriptive Statistics and Inferential Statistics
Population – refers to a large collection of objects, places or things.
Parameter – is any numerical value which describes a population.
Sample – is a small portion or part of a population; a representative of the population in a research study.
Sampling is the process of selecting the elements of a sample from the population being
studied. The methods of sampling include simple random sampling, systematic random
sampling, and stratified random sampling.
Statistic – is any numerical value which describes a sample
Data - are facts, or a set of information gathered or under study.
Qualitative Data – are variables that can be placed into distinct categories, characteristic or
attributes.
Quantitative Data – are numerical and can be ordered or ranked.
Discrete Data – assume exact value and can be obtained through counting.
Continuous Data – assume infinite values within an interval and obtained through
measurement.
Constant – is a characteristic or property of a population or sample which makes the members similar to
each other.
Variable – is a characteristic or property of a population or sample which makes the members different
from each other.
Dependent Variable – A variable that is affected by another variable.
Independent Variable – a variable which affects the dependent variable.
Scales of Measurement: Nominal level of measurement, Ordinal level of measurement, Interval level
of measurement, and ratio level of measurement.
Table is used to present a data in a systematic and organized manner to make its reading and interpretation
simple and easy.

Frequency Distribution is a distribution of the total number of measures or frequencies over arbitrarily
defined categories or classes. The number of measures falling under a class is called class frequency.
There are three measures of tendency, namely, mean, median, mode.
Measures of Variability or Dispersion

These are measures of the average distance of each observation from the center of the distribution. They measure
the homogeneity or heterogeneity of a particular group

1. Range - R=H–L
2. Standard Deviation (𝜎 for population and s for sample)
3. Variance (𝜎 2 for population and s2 for sample) The square of standard deviation.
4. Coefficient of Variation (cv): It is to compare the variability of two or more sets of data
having different units.

REFERENCES
Bluman, Allan G (2012). Elementary Statistics: a step by step approach. (8th Ed) New York: McGraw-
Hill,
Blay, Basilia e. (2007). Elementary Statistics. Pasig City: Anvil Publishing, Inc.,
Calmorin, Laurentina P.,Pledad Ma. Lauremelch (2008). Nursing Biostatistics with Computer. Manila:
Rex Bookstore,
Baltazar, E.C, Ragasa, C, Evangelista, J.(2018). Mathematics in the Modern World. C & E.
Publishing:Quezon City Philippines.
Concepcio, Benjamin P. et.al. Business Statistics with Computer Applications. Sta. Monica Printing
Corp.: Manila, Philippines.
Calano, Roel B., et.al. (2009). Biostatistics. (1st ed) Educational Publishing House: Ermita, Manila,
Philippines

You might also like