Lesson 2 - Data Collection Organization and Presentation
Lesson 2 - Data Collection Organization and Presentation
Lesson 2 - Data Collection Organization and Presentation
INTRODUCTION:
In this lesson, we will be discussing the different ways to collect, organize
and present data so that it can be easily understood by the reader. Data
collection, organization and presentation are important processes that you must
learn in order to interpret the data correctly. Furthermore, you will be learning
how to construct frequency distribution and stem and leaf display.
OBJECTIVES:
At the end of this lesson, you should be able to:
1. Identify, compare and contrast the different sources of data;
2. Summarize and present data in different forms; and
3. Construct frequency distribution and stem and leaf display.
Collection of Data
The goal of every statistical study is to collect information which will
be used in making decisions. Decision made from the results of any
statistical study is only as good as the process used to obtain the
information. If the process is flawed, then the resulting decision is
questionable.
Kinds of Survey
a. Census – method of gathering the facts of interest or pertinent data on
every unit of the population.
b. Sample Survey – method by which data from a small but representative
cross-section of the population are scientifically collected and analyzed.
Classification of Data
1. Primary data – data that are collected directly from the
subjects/objects of the study. These subjects/objects may be people,
experimental animals, or the environment. Because the data is
collected by the individual who will utilize the data, the individual can
then very precisely collect the information that he wants. He can
specify the operational definitions used and can eliminate, or at least
monitor and record, the extraneous influences on the data as they are
gathered. The collection of primary data, however, is costly and time-
consuming compared to secondary data.
2. Secondary data – these are previously collected data that are found in
publications of both government and non-government institutions, research
papers, books, periodicals, pamphlets, computer files, microfilms or the
internet. Secondary data are useful in that
i) They fill a need for a specific reference on some point
ii) They act as reference benchmarks against which other findings may
be tested
6
iii) They can provide background material on what has already been
done, thereby \providing direction on what further needs be done
iv) They can be used as basis for a research study as in validating a
research methodology or a mathematical model
Sampling
where:
n = sample size
N = population size
e = desired margin of error (percent allowance for non-
precision because of the use of the sample instead of the
population)
II. According to Gay and Mills (2016), the larger the population size, the
smaller the percentage of the population required to get a
representative sample.
For smaller populations, say, N = 100 or fewer, there is little point in
sampling; survey the entire population.
7
If the population size is around 500 (give or take 100), 50% should be
sampled.
If the population size is around 1,500, 20% should be sampled.
Beyond a certain point (about N = 5,000), the population size is almost
irrelevant and a sample size of 400 will be adequate.
Sampling Techniques
A. Probability or Random Sampling is a procedure wherein every
element of the population is given an equal chance of being selected
in the sample.
8
Step 3. Starting from the random number obtained, select for the
sample – those members of the population that are in the kth
position.
Relative
N Sample needed (n)
Weight
Male 1000 1000/1500=.67 .67*120=80.4 or 80
Female 500 500/1500=.33 .33*120=39.6 or 40
Total 1500 120
You will be collecting randomly in the list of 1000 male employees the 80
male employees and 40 female employees out of 500 as your respondents
for the study you will be conducting.
b. Equal Allocation can be done by choosing the same number of samples
from each group regardless of its size. Say the student government
president would like to see if the opinions of the first-year students differ
from those the second-year students, if your sample size required 100
students then; each year-level should have 50 respondents that are
randomly selected.
Data Presentation
There are several ways to present data. These are textual, tabular and
graphical presentations.
In the textual form, the researcher uses the sentences to convey the
information contained in the data. This is incorporated with important figures
only. Textual form of presentation can be seen on news reports. For example, an
excerpt from news article:
10
“The new positives increased the region’s cumulative total to 6,884 with
3,194 active cases. Bacolod City still logged the highest number of new cases
with 106 while Negros Occidental has 53. Iloilo City has 45; Iloilo province, 25;
Capiz, 20; Aklan has two; and one each in Antique and Guimaras. Local cases
totaled 238, of whom 11 are locally stranded individuals (LSIs), two are returning
overseas Filipinos, and another two are authorized persons outside of residence.”
(Source: https://www.pna.gov.ph/articles/1115061)
In the tabular form, the data are presented in rows and columns. This
systematic arrangement of data is called a statistical table. Through this
presentation, data can easily be understood. In addition, you can easily compare
and contrast the data.
A good statistical table has four essential parts:
1. Table heading – includes the table number and table title. The title
should briefly explain the contents of the table.
2. Stub – items or classification written on the first column and identifies
what are written on the rows.
3. Caption or box head – includes the items or classifications written on
the first row and identifies what are contained in the columns.
4. Body –the main part of the table and it contains the substance or the
figures of one’s data.
In the construction of a table, the following guidelines should prove
helpful.
1. Every table must be self-explanatory.
2. The title should be clear and descriptive.
3. The title gives information about what, where, how, and when the data
were taken.
1,798,227. 65,123,660.
Philippines
00 00
Food and Live Animals 654,269.00 36.4 21,861,286.00 33.6
Beverages and
Tobacco 63,843.00 3.6 1,600,578.00 2.5
Crude Materials,
Inedible,
Except Fuels 110,882.06 6.2 812,969.00 1.2
Mineral Fuels,
Lubricants and
Related Materials 45,979.00 2.6 2,198,818.00 3.4
Stub Animal and Vegetable
Oils,
Fats and Waxes 24,288.00 1.4 1,092,686.00 1.7
Chemical and Related 102,338.00 5.7 4,878,398.00 7.5
11
Body
Products, n. e. c.
Manufactured Goods
Classified
Chiefly by Material 293,974.00 16.3 12,437,632.00 19.1
Machinery and
Transport Equipment 169,139.00 9.4 14,236,488.00 21.9
Miscellaneous
Manufactured
Articles 6,573.00 0.4 357,051.00 0.5
Commodities and
Transactions not
( Elsewhere
So Classified in the PSCC 326,942.00 18.2 5,647,754.00 8.7
urce: https://psa.gov.ph)
Types of Graphs
Line graph
The line graph shows the relationship between two sets of quantities.
Line graph is similar to the graph drawn in Cartesian plane where the points are
plotted using vertical and horizontal axes. Thus, the points are connected with a
line that makes up the line graph. Line graph is appropriate for variables that
predict trends for a long period of time. Example of line graph:
Bar graph
The bar graph consists of vertical or horizontal bars of equal widths. The
length of the bars represents the magnitudes of the quantities being compared.
This type of graph is most appropriate for comparing data at a particular time.
12
100.0
90.0
80.0
70.0
60.0
50.0
40.0
30.0 Apr 2018
20.0
10.0
0.0 Jan 2018
Jan 2017
Employment rate
53%
Pictograph
Another way of representing numerical values is through the use of
pictographs or picture graphs. In this type of chart, actual pictures or facsimiles
of the objects under study are used to represent values. Each figure is
considered a unit representing a definite number.
13
Generating Graphs Using Statistical Software
Using Excel
1. Open MS Excel program
2. Enter the data below:
4. Click “Insert”
6. Click the
5. Highlight the data.
desired
format of bar
graph
14
No. of Birth in Antique from Jan- Mar 2020
180
160
140 No. of Birth in
120 Antique from Jan-
100 Mar 2020
80
60
40
20
0
Tobias Fornier…
Bugasong
Libertad
San Remigio
Culasi
Anini-y
Laua-an
Sibalom
Valderrama
Tibiao
Barbaza
Pandan
Belison
Caluya
Sebaste
Hamtic
0 2234557778888899
1 0112223335667788899
2 01122236
3 0
Ordinary MS Excel does not have a function to create stem-and-leaf diagram.
However, if we activate the Megastat add-ins, it gives more functions to MS
Excel. Thus, it can compute more advanced statistical tests.
15
Frequency Distribution
A frequency distribution is an arrangement of data that shows the number
of times or frequency of occurrence of the different values of the variables.
There are two types of frequency distribution. The qualitative frequency
distributions are usually constructed for discrete variables, while quantitative
frequency distributions are usually constructed for continuous variable.
Year Level F
First Year 40
Second Year 80
Third Year 75
Fourth Year 30
Class Intervals F
90 – 94 35
85 – 89 40
80 – 84 55
75 – 79 25
70 – 74 5
2. Solve for the number of classes or class intervals, k. Two formulas can be
used:
i. If k represents the number of classes and N the total number of
observations, then the value of k will be the smallest exponent of the
number 2, so that 2k ≥ N.
ii. Sturges’ Formula
K = 1 + 3.322 log N
R
i=
k
The value obtained from this formula should be rounded off to a
more convenient value based on the preference of the investigator.
16
to the lower class limit of the preceding class interval obtains succeeding
lower limits. Upper class limits (UL) are obtained using the following formula:
UL = LL + i – 1 unit of measure
5. Count the number of observations that fall in each of the class intervals.
10 16 18 20 23 26 29 33
12 16 18 22 25 26 30 33
12 17 20 22 25 27 30 35
15 18 20 23 25 28 32 36
15 18 20 23 26 28 32 38
We can use:
N = 40
24 = 16 25 = 32 26 = 64
In this formula, we cannot use k = 5 because, 2 5 = 32 < 40. We will
use k = 6 because 26 = 64 > 40. This satisfies the condition laid
above.
or
k = 1 + 3.322 log N
k = 1 + 3.322 log 40
k = 6. 32 6
Step 3. Determine the class size i.
R
i=
k
38 - 10 28
i= = = 4.67 5
6 6
17
To this here is the trick:
2. List all the lower limits of all the class steps. Start with the lower limit of
the lowest class step and add i (class size) to obtain the lower limit of the
next class step.
3. Determine all the upper limits of all the class steps. Start
with the lowest class step. Add the class size (i)
to obtain the upper
Class Intervals Tally f
limit of the next
35 – 39 class step.
30 – 34 + 5
Start with the upper
25 – 29 + 5 limit of the lowest
20 – 24 + 5 class step. Apply the
15 – 19 + 5 formula:
10 – 14 + 5 UL = LL + i -1
UL = 10 + 5 -1 = 14
Step 5. Count the number of observations that fall in each of the class intervals.
Class Intervals Tally f
35 – 39 3
30 – 34 6
25 – 29 10
20 – 24 9
15 – 19 9
10 – 14 3
18
Histogram
The histogram is a series of columns or vertical rectangles, each having as
its base one class interval, and the frequency or number of cases in that class as
its height. Using the frequency distribution given above, the histogram looks like
this:
Frequency Histogram
Frequency polygon
The frequency polygon is the graph of the class mark against the frequency.
The shape of the histogram or the frequency polygon gives an idea of the shape
of the distribution. Using the same frequency distribution as above, the
frequency polygon generated is presented below.
Frequency Polygon
Cumulative frequency distribution
Cumulative frequency distribution or ogive tells us how many
observations lie above or below certain values rather than merely recording the
number of observations within intervals. There are two types of ogives – the
“less than” and the “greater than” ogives. The less than ogive is a graph
showing the how many values are below a certain upper class boundaries. The
greater than ogive is a graph showing how many values are above a certain
upper class boundary.
19
20