Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Lesson 2 - Data Collection Organization and Presentation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Lesson 2

Data Collection, Organization and


Presentation

INTRODUCTION:
In this lesson, we will be discussing the different ways to collect, organize
and present data so that it can be easily understood by the reader. Data
collection, organization and presentation are important processes that you must
learn in order to interpret the data correctly. Furthermore, you will be learning
how to construct frequency distribution and stem and leaf display.

OBJECTIVES:
At the end of this lesson, you should be able to:
1. Identify, compare and contrast the different sources of data;
2. Summarize and present data in different forms; and
3. Construct frequency distribution and stem and leaf display.

Collection of Data
The goal of every statistical study is to collect information which will
be used in making decisions. Decision made from the results of any
statistical study is only as good as the process used to obtain the
information. If the process is flawed, then the resulting decision is
questionable.

Methods of Data Collection


1. Survey - an investigation of one or more characteristics of a population.
Most often, surveys are done by asking questions either thru self-
administered questionnaires or personal interviews. Nowadays, in the
advent of technology online survey becomes prevalent.

Kinds of Survey
a. Census – method of gathering the facts of interest or pertinent data on
every unit of the population.
b. Sample Survey – method by which data from a small but representative
cross-section of the population are scientifically collected and analyzed.

Advantages of a sample survey over a census:


a. Speed and timeliness – data collected on the sample can be gathered
faster while ensuring uniformity of data gathering procedure.
b. Economy – data gathering and analysis are cheaper.
c. Quality and accuracy – when properly conducted, a sample survey
usually yield more accurate results since a small highly skilled group
of workers (enumerators) are likely to make fewer errors in the
collection and handling of data than a large census force would.
d. Feasibility – for a large population it is difficult or sometimes
impossible to gather all the data from a population. It is more doable
to gather data from a sample compared to a population. e.g. lifetime
of a bulb.

2. Observation. Observation makes possible the recording of behavior


but only at the time of occurrence. It is also employed when the
subjects cannot talk or write. In doing observation, the researchers
make use of their senses and observe the condition in the natural state
rather than communicating with their respondents.

3. Existing records. Data from published materials like reports, personal


files, and historical records will be utilized.

4. Simulation. A simulation is the use of a mathematical or physical


model to reproduce the conditions of a situation or process. Simulations
allow you to study situations that are impractical or even dangerous to
create in real life.

5. Experiment. In performing an experiment, a treatment is applied to


part of a population and responses are observed. Data are obtained
under controlled conditions.

Classification of Data
1. Primary data – data that are collected directly from the
subjects/objects of the study. These subjects/objects may be people,
experimental animals, or the environment. Because the data is
collected by the individual who will utilize the data, the individual can
then very precisely collect the information that he wants. He can
specify the operational definitions used and can eliminate, or at least
monitor and record, the extraneous influences on the data as they are
gathered. The collection of primary data, however, is costly and time-
consuming compared to secondary data.

Advantages of Primary Data


✓ Data collected for the first time hence original in character.
✓ The information collected is accurate.
✓ More information may be collected which may prove useful at the
time of analysis and interpretation of results.
✓ Some confidential information can be tactfully collected from the
informants.
Disadvantages of Primary Data
✓ Methods of primary data collection are expensive and time
consuming.
✓ The personal prejudice and bias of the investigator or enumerator can
destroy the purpose of survey.
✓ The data may not be reliable if collected carelessly.

2. Secondary data – these are previously collected data that are found in
publications of both government and non-government institutions, research
papers, books, periodicals, pamphlets, computer files, microfilms or the
internet. Secondary data are useful in that
i) They fill a need for a specific reference on some point
ii) They act as reference benchmarks against which other findings may
be tested

6
iii) They can provide background material on what has already been
done, thereby \providing direction on what further needs be done
iv) They can be used as basis for a research study as in validating a
research methodology or a mathematical model

Advantages of Secondary Data:


1. Secondary data are collected quickly and cheaply.
2. When official statistics are collected, these are most reliable and
acceptable by other agencies

Disadvantages of Secondary Data:


1. Secondary data may not meet one’s specific needs (example: red tide
modeling for which data are not as refined as you would need it)
2. It is difficult in assessing the accuracy of the information. Oftentimes,
definitions and units of measurement will be different and the
timeliness of the data is often questioned.

Sampling

Generally, it is impossible to study an entire population. Researchers


typically rely on a sample to obtain the needed information or data. Thus,
it is important to obtain “good data” because the inferences made will be
based on the statistics obtained from these data.

Sampling Technique is a procedure used to determine the


members of a sample. Sampling frame is a list, or set of the elements
belonging to the population from which the sample will be drawn. To avoid
biased data, a researcher must make sure that the sample is representative
of the population.

Determining the Sample Size


There is no absolute formula in determining the sample size.
However, there are guidelines on how to identify the appropriate sample
size depending on the known information about the population under
investigation and type of research design used. Here are some of the
guidelines:
I. Slovin’s Formula. This formula is primarily used in the descriptive
studies where the population is known and the margin of error is pre-
identified.
𝐍
𝐧=
𝟏 + 𝐍𝐞𝟐

where:
n = sample size
N = population size
e = desired margin of error (percent allowance for non-
precision because of the use of the sample instead of the
population)

II. According to Gay and Mills (2016), the larger the population size, the
smaller the percentage of the population required to get a
representative sample.
 For smaller populations, say, N = 100 or fewer, there is little point in
sampling; survey the entire population.

7
 If the population size is around 500 (give or take 100), 50% should be
sampled.
 If the population size is around 1,500, 20% should be sampled.
 Beyond a certain point (about N = 5,000), the population size is almost
irrelevant and a sample size of 400 will be adequate.

III. According to Fraenkel and Wallen (2011), the guideline in


selecting a sample size will be as follow:
 For descriptive study 100 would be enough;
 For correlational studies 50 samples can establish relationship if
there is any;
 For experimental or causal-comparative 30 per group is advise but
15 per group is allowed if the variables are tightly controlled.

Sampling Techniques
A. Probability or Random Sampling is a procedure wherein every
element of the population is given an equal chance of being selected
in the sample.

B. Nonprobability or Non-random Sampling is a procedure wherein


not all the elements in the population are given a chance of being
included in the sample.

Common Methods of Probability Sampling


1. Simple Random Sampling- giving each sampling unit an equal
chance of being included in the sample.

a. Fish bowl or lottery method. This sampling technique can be done by


numbering the subjects, and then place numbered cards in a bowl, mix
them thoroughly, and select as many cards as needed.

b. Table of random numbers. This can be done by using a table of


generated random numbers with a computer. To select a random sample,
say, 15 subjects out of 85 subjects, it is necessary to number each subject
from 01 to 85. Then, select a starting number by closing your eyes and
placing your finger on a number in the table, say your finger landed on 12
in the second column. Next, proceed downward until you have selected 15
different numbers between 01 and 85. When you reached the bottom of
the column, go to the top of the next column. If you select a number
greater than 85 of the number 00 or a duplicate number, just omit it.

2. Systematic Random Sampling – Samples are selected by using


every kth individual from a population. The first individual selected is
a random number between 1 and k.

Procedure in doing Systematic Random Sampling


Step 1. Divide the population size by the sample size and round the
result down to the nearest whole number, k.
Step 2. Use a random-number table or a similar process to obtain a
random number between 1 and k. (random start)

8
Step 3. Starting from the random number obtained, select for the
sample – those members of the population that are in the kth
position.

300/60= 5 or 5th element


Every 5th element will belong to our sample until we obtain 60 number
of respondents. (sample size).

3. Stratified Random Sampling - Separate the population into non-


overlapping groups called strata and then obtaining a simple random
sample from each stratum. The individual within each stratum should
be homogeneous (or similar) in some way (Blay, 2013).
Proportional Allocation – This process chooses sample sizes
proportional to the sizes of the different subgroups or strata.
Equal Allocation – This process chooses the same number of
samples from each group regardless of its size.

a. Proportional Allocation can be done by choosing sample sizes


proportional to the sizes of the different subgroups or strata
(characteristic). Example, in a company there 1500 employees. You
decided to stratified the group as to sex — male and female. The data
you collected showed that there are 1000 male employees and 500
female employees in the company. Assuming that the sample size you
needed for the study is 120. To compute the sample needed in each
stratum, first, get the relative weight of each strata to the total
population, then their respective weights will be multiplied to the
desired number of sample size showing below.

Relative
N Sample needed (n)
Weight
Male 1000 1000/1500=.67 .67*120=80.4 or 80
Female 500 500/1500=.33 .33*120=39.6 or 40
Total 1500 120

You will be collecting randomly in the list of 1000 male employees the 80
male employees and 40 female employees out of 500 as your respondents
for the study you will be conducting.
b. Equal Allocation can be done by choosing the same number of samples
from each group regardless of its size. Say the student government
president would like to see if the opinions of the first-year students differ
from those the second-year students, if your sample size required 100
students then; each year-level should have 50 respondents that are
randomly selected.

4. Cluster Sampling - This is sometimes called area sampling because


this is usually applied when the population is large. Subjects are selected
by dividing the population into groups (clusters) and then some of the
groups are selected randomly. The final sample would include all the
subjects in the groups (clusters) randomly selected.

Procedure in doing Cluster Sampling


9
Step 1 Divide the entire population into pre-existing segments or
clusters. The clusters are often geographic.
Step 2 Obtain a simple random sample of the clusters.
Step 3 Use all the members of the clusters obtained in step 2 as the
sample.

5. Multi-stage Sampling - The sample is randomly selected through


two or more steps or stages. Suppose a researcher wants to study the
efficacy rate of a newly developed vaccine against a certain epidemic
(nation-wide infectious disease). The researcher needs 6 barangays to
administer the said vaccine. In the first stage, 2 regions will be
selected out of 17 regions in the Philippines. In the second stage, if 2
regions were identified, one province within those regions will be
selected. In the third stage, 3 towns will be selected in each province.
Lastly, in the fourth stage, in each town that will be selected one
barangay will be selected as sample barangay.

Common Methods of Nonprobability or Non-random Sampling


1. Haphazard or Accidental Sampling. Samples are picked as it
comes to the researcher. Many fields in the social sciences, like
archaeology, history, and medicine, picked samples this way,
however, often incorrectly assumed that the samples picked are
typical of the population they come from.
2. Judgment or Purposive Sampling. The samples are selected by
the researcher subjectively. The researcher will pick a sample that
he/she believes is representative to the population of interest.
3. Convenience Sampling. Selecting a sample based on the
convenience of the researcher or using data from population members
that are readily available.
4. Snowball Sampling. A special nonprobability method used when
the desired sample characteristic is rare. Snowball sampling relies on
referrals from initial subjects to generate additional subjects.

Important Points to Consider When Collecting Data


1. If measurements of some characteristic from people are being
obtained, better results will be achieved if the researcher does the
measuring.
2. The method of data collection may delay the process. Choose a
method that would not produce a low response rate.
3. Ensure that the sample size is large enough. Ensure that the sample
is a representative of the population.

Data Presentation
There are several ways to present data. These are textual, tabular and
graphical presentations.
In the textual form, the researcher uses the sentences to convey the
information contained in the data. This is incorporated with important figures
only. Textual form of presentation can be seen on news reports. For example, an
excerpt from news article:

10
“The new positives increased the region’s cumulative total to 6,884 with
3,194 active cases. Bacolod City still logged the highest number of new cases
with 106 while Negros Occidental has 53. Iloilo City has 45; Iloilo province, 25;
Capiz, 20; Aklan has two; and one each in Antique and Guimaras. Local cases
totaled 238, of whom 11 are locally stranded individuals (LSIs), two are returning
overseas Filipinos, and another two are authorized persons outside of residence.”
(Source: https://www.pna.gov.ph/articles/1115061)
In the tabular form, the data are presented in rows and columns. This
systematic arrangement of data is called a statistical table. Through this
presentation, data can easily be understood. In addition, you can easily compare
and contrast the data.
A good statistical table has four essential parts:
1. Table heading – includes the table number and table title. The title
should briefly explain the contents of the table.
2. Stub – items or classification written on the first column and identifies
what are written on the rows.
3. Caption or box head – includes the items or classifications written on
the first row and identifies what are contained in the columns.
4. Body –the main part of the table and it contains the substance or the
figures of one’s data.
In the construction of a table, the following guidelines should prove
helpful.
1. Every table must be self-explanatory.
2. The title should be clear and descriptive.
3. The title gives information about what, where, how, and when the data
were taken.

Table 1 Quantity and Value of Domestic Trade by Mode of Commodity


Section: Second Quarter 2020
Caption or box head
Table Heading

Second Quarter 2020


Commodity Section Percent Percen
Description Quantity Value
Share t Share
(1) (2) (3) (4)

1,798,227. 65,123,660.
Philippines
00 00
Food and Live Animals 654,269.00 36.4 21,861,286.00 33.6
Beverages and
Tobacco 63,843.00 3.6 1,600,578.00 2.5
Crude Materials,
Inedible,
Except Fuels 110,882.06 6.2 812,969.00 1.2
Mineral Fuels,
Lubricants and
Related Materials 45,979.00 2.6 2,198,818.00 3.4
Stub Animal and Vegetable
Oils,
Fats and Waxes 24,288.00 1.4 1,092,686.00 1.7
Chemical and Related 102,338.00 5.7 4,878,398.00 7.5

11
Body
Products, n. e. c.
Manufactured Goods
Classified
Chiefly by Material 293,974.00 16.3 12,437,632.00 19.1
Machinery and
Transport Equipment 169,139.00 9.4 14,236,488.00 21.9
Miscellaneous
Manufactured
Articles 6,573.00 0.4 357,051.00 0.5
Commodities and
Transactions not
( Elsewhere
So Classified in the PSCC 326,942.00 18.2 5,647,754.00 8.7
urce: https://psa.gov.ph)

In graphical presentation, the data are presented in graphs, charts, or


diagrams. Graph is a pictorial representation of a set of data that shows
relationship. Some common types of graphs are line graphs, bar graph, pie graph
and pictograph.

Types of Graphs

Line graph
The line graph shows the relationship between two sets of quantities.
Line graph is similar to the graph drawn in Cartesian plane where the points are
plotted using vertical and horizontal axes. Thus, the points are connected with a
line that makes up the line graph. Line graph is appropriate for variables that
predict trends for a long period of time. Example of line graph:

Daily Confirmed COVID – 19 Cases in the Philippines


(Source: https://www.coronatracker.com/country/philippines/)

Bar graph
The bar graph consists of vertical or horizontal bars of equal widths. The
length of the bars represents the magnitudes of the quantities being compared.
This type of graph is most appropriate for comparing data at a particular time.

12
100.0
90.0
80.0
70.0
60.0
50.0
40.0
30.0 Apr 2018
20.0
10.0
0.0 Jan 2018
Jan 2017

Labor and Employment Rate in Western Visayas


(Source: https://psa.gov.ph)

Pie chart or pie graph


The pie chart or pie graph is appropriate in comparing the parts with the whole.

Underemployment Apr 2018


rate
10%
Unemployment
rate
3%
Labor force
participation rate
34%

Employment rate
53%

Labor and Employment Rate in Western Visayas as of April 2018


(Source: https://psa.gov.ph)

Pictograph
Another way of representing numerical values is through the use of
pictographs or picture graphs. In this type of chart, actual pictures or facsimiles
of the objects under study are used to represent values. Each figure is
considered a unit representing a definite number.

13
Generating Graphs Using Statistical Software

Using Excel
1. Open MS Excel program
2. Enter the data below:

1. Open MS Excel program


2. Enter the data below:

3. Highlight the data.


4. Click “Insert”
5. In chart dialog box click “Column”
6. Select the desired format of bar graph. 3. Click “Column”

4. Click “Insert”

6. Click the
5. Highlight the data.
desired
format of bar
graph

After the series of steps, the result will be like this.

14
No. of Birth in Antique from Jan- Mar 2020
180
160
140 No. of Birth in
120 Antique from Jan-
100 Mar 2020
80
60
40
20
0

Tobias Fornier…
Bugasong

Libertad

San Remigio
Culasi
Anini-y

Laua-an

Sibalom

Valderrama
Tibiao
Barbaza

Pandan
Belison

Caluya

Sebaste
Hamtic

San Jose (Capital)


Patnongon

The Stem-and-Leaf Diagram


Stem-and-leaf diagram is a visual presentation of raw data. In this set up,
the numbers (data) are broken into tens digit and unit digits. Every row
represents the stem and the numbers on the right are the leaf. For example, the
first entry in the example given below is 22. 22 is written as 2  2, the row 2  2 is
the stem and 2 is the leaf. Let us consider the set of data below:
22 30 19 26 8 8 12 21 15
18 18 12 8 17 11 3 16 5
12 17 4 22 2 8 13 8 2
9 5 19 13 23 7 20 13 7
8 7 18 11 10 22 9 21 16
In making leaf and stem diagram, it is necessary to arrange the numbers into
ascending or descending order. Arranging the data into ascending order will be
like this.
2 5 8 9 12 13 17 19 22
2 7 8 9 12 15 18 20 22
3 7 8 10 12 16 18 21 23
4 7 8 11 13 16 18 21 26
5 8 8 11 13 17 19 22 30
The first column will be the tens digit followed by the vertical bar. We have to
record the single digit in a stem containing 0 in tens digit. Thus, 2 is written as 0
 2. The first two digit number is 10. Ten will be recorded in the stem containing
1 in tens digit. Thus, 10, 11, 11, 12, 12, 12 are written as 1 0 1 1 2 2 2. Below is
the complete leaf-and-stem diagram of the data above.

0 2234557778888899
1 0112223335667788899
2 01122236
3 0
Ordinary MS Excel does not have a function to create stem-and-leaf diagram.
However, if we activate the Megastat add-ins, it gives more functions to MS
Excel. Thus, it can compute more advanced statistical tests.

15
Frequency Distribution
A frequency distribution is an arrangement of data that shows the number
of times or frequency of occurrence of the different values of the variables.
There are two types of frequency distribution. The qualitative frequency
distributions are usually constructed for discrete variables, while quantitative
frequency distributions are usually constructed for continuous variable.

Example of qualitative frequency distribution:

Distribution of BS Accountancy by Year Level

Year Level F
First Year 40
Second Year 80
Third Year 75
Fourth Year 30

Example of quantitative frequency distribution:

Frequency Distribution of Students and Their Grades

Class Intervals F
90 – 94 35
85 – 89 40
80 – 84 55
75 – 79 25
70 – 74 5

Guidelines in the Construction of Frequency Distribution

1. Determine the range R.


R = highest value – lowest value

2. Solve for the number of classes or class intervals, k. Two formulas can be
used:
i. If k represents the number of classes and N the total number of
observations, then the value of k will be the smallest exponent of the
number 2, so that 2k ≥ N.
ii. Sturges’ Formula

K = 1 + 3.322 log N

3. Determine the class size i.

R
i=
k
The value obtained from this formula should be rounded off to a
more convenient value based on the preference of the investigator.

4. Determine and enumerate the classes. Each class is an interval of values


defined by its lower an upper class limits. There must be enough classes to
include the highest score and the lowest score. As a rule, the lowest value in
the data becomes the lower class limit (LL) of the first class interval. Adding i

16
to the lower class limit of the preceding class interval obtains succeeding
lower limits. Upper class limits (UL) are obtained using the following formula:

UL = LL + i – 1 unit of measure

5. Count the number of observations that fall in each of the class intervals.

Example: Construct a frequency distribution of the orders received of a computer


company in 40 days.

10 16 18 20 23 26 29 33
12 16 18 22 25 26 30 33
12 17 20 22 25 27 30 35
15 18 20 23 25 28 32 36
15 18 20 23 26 28 32 38

Step 1. Determine the range R.


R = highest value – lowest value
R = 38 – 10 = 28

Step 2. Solve for the number of classes or class intervals, k.

We can use:

i. If k represents the number of classes and N the total number of


observations, then the value of k will be the smallest exponent of the
number 2, so that 2k ≥ N.

N = 40
24 = 16 25 = 32 26 = 64
In this formula, we cannot use k = 5 because, 2 5 = 32 < 40. We will
use k = 6 because 26 = 64 > 40. This satisfies the condition laid
above.
or

ii. Sturges’ Formula

k = 1 + 3.322 log N
k = 1 + 3.322 log 40
k = 6. 32  6
Step 3. Determine the class size i.

R
i=
k
38 - 10 28
i= = = 4.67  5
6 6

Step 4. Determine and enumerate the classes.

17
To this here is the trick:

1. Construct table containing columns such as class intervals, tally and


frequency.
Class Intervals Tally f

2. List all the lower limits of all the class steps. Start with the lower limit of
the lowest class step and add i (class size) to obtain the lower limit of the
next class step.

Add the Class Intervals Tally f


class size + 35 -
(i) to 30 -
+
obtain the 25 -
+ Start with the lower limit of
lower limit 20 -
+ the lowest class step. The
of the
15 - lower limit must be equal or
next class +
10 - a little lower than the lowest
step.
value in the data.

3. Determine all the upper limits of all the class steps. Start
with the lowest class step. Add the class size (i)
to obtain the upper
Class Intervals Tally f
limit of the next
35 – 39 class step.
30 – 34 + 5
Start with the upper
25 – 29 + 5 limit of the lowest
20 – 24 + 5 class step. Apply the
15 – 19 + 5 formula:
10 – 14 + 5 UL = LL + i -1
UL = 10 + 5 -1 = 14

Step 5. Count the number of observations that fall in each of the class intervals.
Class Intervals Tally f
35 – 39  3
30 – 34   6
25 – 29   10
20 – 24   9
15 – 19   9
10 – 14  3

18
Histogram
The histogram is a series of columns or vertical rectangles, each having as
its base one class interval, and the frequency or number of cases in that class as
its height. Using the frequency distribution given above, the histogram looks like
this:

Frequency Histogram

Frequency polygon
The frequency polygon is the graph of the class mark against the frequency.
The shape of the histogram or the frequency polygon gives an idea of the shape
of the distribution. Using the same frequency distribution as above, the
frequency polygon generated is presented below.

Frequency Polygon
Cumulative frequency distribution
Cumulative frequency distribution or ogive tells us how many
observations lie above or below certain values rather than merely recording the
number of observations within intervals. There are two types of ogives – the
“less than” and the “greater than” ogives. The less than ogive is a graph
showing the how many values are below a certain upper class boundaries. The
greater than ogive is a graph showing how many values are above a certain
upper class boundary.

19
20

You might also like