Lesson2 - Data Collection Organization and Presentation
Lesson2 - Data Collection Organization and Presentation
INTRODUCTION:
In this lesson, we will be discussing the different ways to collect, organize
and present data so that it can be easily understood by the reader. Data
collection, organization and presentation are important processes that you must
learn in order to interpret the data correctly. Furthermore, you will be learning
how to construct frequency distribution and stem and leaf display.
OBJECTIVES:
At the end of this lesson, you should be able to:
1. Identify, compare and contrast the different sources of data;
2. Summarize and present data in different forms; and
3. Construct frequency distribution and stem and leaf display.
Collection of Data
The goal of every statistical study is to collect information which will
be used in making decisions. Decision made from the results of any
statistical study is only as good as the process used to obtain the
information. If the process is flawed, then the resulting decision is
questionable.
Kinds of Survey
a. Census – method of gathering the facts of interest or pertinent data on
every unit of the population.
b. Sample Survey – method by which data from a small but representative
cross-section of the population are scientifically collected and analyzed.
Classification of Data
1. Primary data – data that are collected directly from the
subjects/objects of the study. These subjects/objects may be people,
experimental animals, or the environment. Because the data is
collected by the individual who will utilize the data, the individual can
then very precisely collect the information that he wants. He can
specify the operational definitions used and can eliminate, or at least
monitor and record, the extraneous influences on the data as they are
gathered. The collection of primary data, however, is costly and time-
consuming compared to secondary data.
15
Some confidential information can be tactfully collected from the
informants.
Disadvantages of Primary Data
Methods of primary data collection are expensive and time
consuming.
The personal prejudice and bias of the investigator or enumerator can
destroy the purpose of survey.
The data may not be reliable if collected carelessly.
2. Secondary data – these are previously collected data that are found in
publications of both government and non-government institutions, research
papers, books, periodicals, pamphlets, computer files, microfilms or the
internet. Secondary data are useful in that
i) They fill a need for a specific reference on some point
ii) They act as reference benchmarks against which other findings may
be tested
iii) They can provide background material on what has already been
done, thereby \providing direction on what further needs be done
iv) They can be used as basis for a research study as in validating a
research methodology or a mathematical model
Sampling
16
Determining the Sample Size
There is no absolute formula in determining the sample size.
However, there are guidelines on how to identify the appropriate sample
size depending on the known information about the population under
investigation and type of research design used. Here are some of the
guidelines:
I. Slovin’s Formula. This formula is primarily used in the descriptive
studies where the population is known and the margin of error is pre-
identified.
N
n= 2
1+ N e
where:
n = sample size
N = population size
e = desired margin of error (percent allowance for non-
precision because of the use of the sample instead of
the population)
II. According to Gay and Mills (2016), the larger the population size, the
smaller the percentage of the population required to get a
representative sample.
For smaller populations, say, N = 100 or fewer, there is little point in
sampling; survey the entire population.
If the population size is around 500 (give or take 100), 50% should be
sampled.
If the population size is around 1,500, 20% should be sampled.
Beyond a certain point (about N = 5,000), the population size is almost
irrelevant and a sample size of 400 will be adequate.
Sampling Techniques
17
A. Probability or Random Sampling is a procedure wherein every
element of the population is given an equal chance of being selected
in the sample.
18
Proportional Allocation – This process chooses sample sizes
proportional to the sizes of the different subgroups or strata.
Equal Allocation – This process chooses the same number of
samples from each group regardless of its size.
Relative
N Sample needed (n)
Weight
Male 1000 1000/1500=.67 .67*120=80.4 or 80
You will be collecting randomly in the list of 1000 male employees the 80
male employees and 40 female employees out of 500 as your respondents
for the study you will be conducting.
b. Equal Allocation can be done by choosing the same number of samples
from each group regardless of its size. Say the student government
president would like to see if the opinions of the first-year students differ
from those the second-year students, if your sample size required 100
students then, each year-level should have 50 respondents that are
randomly selected.
19
Step 3. Use all the members of the clusters obtained in step 2 as the
sample.
20
Data Presentation
There are several ways to present data. These are textual, tabular and
graphical presentations.
In the textual form, the researcher uses the sentences to convey the
information contained in the data. This is incorporated with important figures
only. Textual form of presentation can be seen on news reports. For example, an
excerpt from news article:
“The new positives increased the region’s cumulative total to 6,884 with
3,194 active cases. Bacolod City still logged the highest number of new cases
with 106 while Negros Occidental has 53. Iloilo City has 45; Iloilo province, 25;
Capiz, 20; Aklan has two; and one each in Antique and Guimaras. Local cases
totaled 238, of whom 11 are locally stranded individuals (LSIs), two are returning
overseas Filipinos, and another two are authorized persons outside of residence.”
(Source: https://www.pna.gov.ph/articles/1115061)
In the tabular form, the data are presented in rows and columns. This
systematic arrangement of data is called a statistical table. Through this
presentation, data can easily be understood. In addition, you can easily compare
and contrast the data.
1. Table heading – includes the table number and table title. The title
should briefly explain the contents of the table.
2. Stub – items or classification written on the first column and identifies
what are written on the rows.
3. Caption or box head – includes the items or classifications written on
the first row and identifies what are contained in the columns.
4. Body –the main part of the table and it contains the substance or the
figures of one’s data.
21
Table 1. Quantity and Value of Domestic Trade by Mode of
Commodity Section: Second Quarter 2020
Table Heading
Caption or box head
1,798,227.0
Philippines 65,123,660.00
0
22
Types of Graphs
Line graph
The line graph shows the relationship between two sets of quantities.
Line graph is similar to the graph drawn in Cartesian plane where the points are
plotted using vertical and horizontal axes. Thus, the points are connected with a
line that makes up the line graph. Line graph is appropriate for variables that
predict trends for a long period of time. Example of line graph:
Bar graph
The bar graph consists of vertical or horizontal bars of equal widths. The
length of the bars represent the magnitudes of the quantities being compared.
This type of graph is most appropriate for comparing data at a particular time.
100.0
80.0
60.0
40.0
20.0
0.0
Apr 2018
e e e e
r at at at at Jan 2018
n tr tr tr
tio en en en Jan 2017
a m m m
cip pl
oy
pl
oy
pl
oy
rti
pa Em em re
m
ce Un de
for Un
r
bo
La
23
Pie chart or pie graph
The pie chart or pie graph is appropriate in comparing the parts with the whole.
Apr 2018
Underemployment rate
10%
Unemployment rate
3% Labor force participation
rate
34%
Employment rate
53%
Pictograph
Another way of representing numerical values is through the use of
pictographs or picture graphs. In this type of chart, actual pictures or facsimiles
of the objects under study are used to represent values. Each figure is
considered a unit representing a definite number.
24
Generating Graphs Using Statistical Software
Using Excel
1. Open MS Excel program
2. Enter the data below:
4. Click “Insert”
6. Click the
5. Highlight the data.
desired format
of bar graph
25
After the series of steps, the result will be like this.
160
120
80
26
5 8 8 11 13 17 19 22 30
The first column will be the tens digit followed by the vertical bar. We have to
record the single digit in a stem containing 0 in tens digit. Thus, 2 is written as 0
2. The first two digit number is 10. Ten will be recorded in the stem containing
1 in tens digit. Thus, 10, 11, 11, 12, 12, 12 are written as 1 0 1 1 2 2 2. Below is
the complete leaf-and-stem diagram of the data above.
2 5 8 9 12 13 17 19 22
2 7 8 9 12 15 18 20 22
3 7 8 10 12 16 18 21 23
4 7 8 11 13 16 18 21 26
5 8 8 11 13 17 19 22 30
0 2234557778888899
1 0112223335667788899
2 01122236
3 0
10 16 18 20 23 26 29 33
12 16 18 22 25 26 30 33
12 17 20 22 25 27 30 35
15 18 20 23 25 28 32 36
15 18 20 23 26 28 32 38
1 0 2 2 5 5 6 6 7 8 8 8 8
2 0 0 0 0 2 2 3 3 3 5 5 5 6 6
6 7 8 8 9
3 0 0 2 2 3 3 5 6 8
Frequency Distribution
27
A frequency distribution is an arrangement of data that shows the number
of times or frequency of occurrence of the different values of the variables.
There are two types of frequency distribution. The qualitative frequency
distributions are usually constructed for discrete variables, while quantitative
frequency distributions are usually constructed for continuous variable.
Year Level f
First Year 40
Second Year 80
Third Year 75
Fourth Year 30
Class Intervals f
90 – 94 35
85 – 89 40
80 – 84 55
75 – 79 25
70 – 74 5
2. Solve for the number of classes or class intervals, k. Two formulas can be
used:
i. If k represents the number of classes and N the total number of
observations, then the value of k will be the smallest exponent of the
number 2, so that 2k ≥ N.
K = 1 + 3.322 log N
28
3. Determine the class size i.
R
i=
k
The value obtained from this formula should be rounded off to a
more convenient value based on the preference of the investigator.
UL = LL + i – 1 unit of measure
5. Count the number of observations that fall in each of the class intervals.
10 16 18 20 23 26 29 33
12 16 18 22 25 26 30 33
12 17 20 22 25 27 30 35
15 18 20 23 25 28 32 36
15 18 20 23 26 28 32 38
We can use:
29
N = 40
24 = 16 25 = 32 26 = 64
In this formula, we cannot use k = 5 because, 2 = 32 < 40. We will
5
k = 1 + 3.322 log N
k = 1 + 3.322 log 40
k = 6. 32 6
Step 3. Determine the class size i.
R
i=
k
38- 10 28
i= = =4 .67≈5
6 6
Step 4. Determine and enumerate the classes.
To this here is the trick:
1. Construct table containing columns such as class intervals, tally and frequency.
2. List all the lower limits of all the class steps. Start with the lower limit of the
lowest class step and add i (class size) to obtain the lower limit of the next
class step.
Step 5. Count the number of observations that fall in each of the class intervals.
Class Intervals Tally f <cf >cf Class Mark
35 – 39 3 40 3 37
30 – 34 6 37 9 32
25 – 29 10 31 19 27
20 – 24 9 21 28 22
15 – 19 9 12 37 17
10 – 14 3 3 40 12
N = 40
31
Histogram
The histogram is a series of columns or vertical rectangles, each having as
its base one class interval, and the frequency or number of cases in that class as
its height. Using the frequency distribution given above, the histogram looks like
this:
Frequency Histogram
Frequency polygon
The frequency polygon is the graph of the class mark against the frequency.
The shape of the histogram or the frequency polygon gives an idea of the shape
of the distribution. Using the same frequency distribution as above, the
frequency polygon generated is presented below.
Frequency Polygon
32
Cumulative frequency distribution
Cumulative frequency distribution or ogive tells us how many
observations lie above or below certain values rather than merely recording the
number of observations within intervals. There are two types of ogives – the
“less than” and the “greater than” ogives. The less than ogive is a graph
showing the how many values are below a certain upper class boundaries. The
greater than ogive is a graph showing how many values are above a certain
upper class boundary.
33