Module 2
Module 2
Data collection is the process of gathering 3. Determine the method to be used in data
and measuring information on variables of gathering and define the comprehensive
interest, in an established systematic fashion data collection points.
that enables one to answer stated research
questions, test hypotheses, and evaluate 4. Design data gathering forms to be used.
outcomes.
5. Collect data.
Without proper planning for data collection, a
Choosing of Method of Data Collection
number of problems can occur. If the data
collection steps and processes are not Decision-makers need information that is
properly planned, the research project can relevant, timely, accurate and usable. The cost
ultimately end up with a data set that does not of obtaining, processing and analyzing these
serve the purpose for which it was intended. data is high. The challenge is to find ways,
For example, if more than one person is which lead to information that is cost-effective,
involved in the data collection, but data relevant, timely and important for immediate
collectors do not follow consistent data use. Some methods pay attention to timeliness
collection practices, they can end up with data and reduction in cost. Others pay attention to
with different units, collection processes, and accuracy and the strength of the method in
variable names. using scientific.
Consequences from Improperly Collected The statistical data may be classified under two
Data categories, depending upon the sources.
approaches: Primary Data and Secondary
• Inability to answer research questions
Data.
accurately.
SOURCES OF DATA
• Inability to repeat and validate the study.
Whether conducting research in the social
• Distorted findings resulting in wasted
sciences, humanities arts, or natural sciences, the
resources.
ability to distinguish between primary and
secondary sources is essential.
• Misleading other researchers to pursue
fruitless avenues of investigation.
Primary Sources - Provide a first-hand
• Compromising decisions for public policy. account of an event or time period and are
considered to be authoritative. They
• Causing harm to human participants and represent original thinking, reports on
animal subjects. discoveries or events, or they can share new
information. Often these sources are created
Steps in Data Gathering at the time the events occurred but they can also
include sources that are created later. They
1. Set the objectives for collecting data
are usually the first formal appearance of
2. Determine the data needed based on the original research.
set objectives.
lOMoAR cPSD| 12139369
Primary Data - are data documented by the agency may have been different from the
primary source. The data collectors purpose of the user of these secondary data.
documented the data themselves. S e c o n d l y, t h e r e m a y h a v e b e e n b i a s
introduced, the size of the sample may have
The first hand information obtained by the been inadequate, or there may have been
investigator is more reliable and accurate since arithmetic or definition errors, hence, it is
the investigator can extract the correct necessary to critically investigate the validity of
information by removing doubts, if any, in the the secondary data.
minds of the respondents regarding certain
questions. High response rates might be The primary data can be collected by the
obtained since the answers to various following five methods:
questions are obtained on the spot. It permits
1. D i r e c t p e r s o n a l i n t e r v i e w s - T h e
explanation of questions concerning difficult
researcher has direct contact with the
subject matter.
interviewee. The researcher gathers
Secondary Sources - offer an analysis, information by asking questions to the
interpretation or a restatement of primary interviewee.
sources and are considered to be
2. Indirect/Questionnaire Method - This
persuasive. They often involve
methods of data collection involve sourcing and
generalisation, synthesis, interpretation,
accessing existing data that were
commentary or evaluation in an attempt to
originally collected for the purpose of the study.
convince the reader of the creator's
argument. They often attempt to describe or Designing good “questioning tools” forms an
explain primary sources. important and time consuming phase in the
development of most research proposals.
Secondary Data - are data documented by a
Once the decision has been made to use these
secondary source. The data collectors had the
techniques, the following questions should be
data documented by other sources.
considered before designing our tools:
In secondary data, data are primary data for
the agency that collected them, and become • What exactly do we want to know, according
secondary for someone else who uses these to the objectives and variables we identified
data for his own purposes. earlier? Is questioning the right technique to
obtain all answers, or do we need additional
Secondary data are less expensive to collect techniques, such as observations or
both in money and time. These data can also analysis of records?
be better utilized and sometimes the quality of
such data may be better because these might • Of whom will we ask questions and what
have been collected by persons who were techniques will we use? Do we understand
specially trained for that purpose. the topic sufficiently to design a
questionnaire, or do we need some loosely
On the other hand, such data must be used structured interviews with key informants or
with great care, because such data may also a focus group discussion first to orient
be full of errors due to the fact that the purpose ourselves?
of the collection of the data by the primary
lOMoAR cPSD| 12139369
7. Write special instructions for interviewers or Question wording and question order have a large
respondents. effect on the responses obtained.
9. Always test your questions before taking the Two surveys were taken in late 1993/early
survey. (Pre-test) 1994 about Elvis Presley.
An open-ended question is a type of question One survey asked: “In the past few years,
that does not include response categories. The there have been a lot of rumors and stories
respondent is not given any possible answers about whether Elvis Presley is really dead. How
to choose from. This type of question is usually do you feel about this? Do you think there is any
appropriate for collecting subjective data. It possibility that these rumors are true and that
permit free responses that should be recorded Elvis Presley is still alive, or don’t you think so?”
in the respondent’s own words.
lOMoAR cPSD| 12139369
It gives relatively more accurate data on size can produce accuracy of results.
behavior and activities but Investigators or Moreover, the results from the small sample
observer’s own biases, prejudice, desires, and size will be questionable. A sample size that is
etc. and needs more resources and skilled human too large will result in wasting money and time
power during the use of high level machines. because enough sample will normally give an
accurate result.
The secondary data can be collected by the
following five methods: The sample size is typically denoted by n and
it is always a positive integer. No exact sample
1. Published report on newspaper and size can be mentioned here and it can vary in
periodicals. different research settings. However, all else
being equal, large sized sample leads to
2. Financial Data reported in annual reports.
increased precision in estimates of various
3. Records maintained by the institution. properties of the population.
4. I n t e r n a l r e p o r t s o f t h e g o v e r n m e n t Take Note!
departments.
- Representativeness, not size, is the more
5. Information from official publications. important consideration.
• Always investigate the validity and reliability - If you use complex statistics, you may need
of the data by examining the collection a minimum of 100 or more in your sample
method employed by your source. (varies with method).
Desired Confidence
Z - Score
Level
80% 1.28
85% 1.44
90% 1.65
95% 1.96
99% 2.58
3. Degree of Variability
precision, is the range in which the true value Z is the z-score corresponding to level of
of the population is estimated to be. confidence.
• Estimating Proportion (Infinite The conservative formula using the strong law
Population) of large number.
2 Where:
Z
n≥ p(1 − p)
(e) Confidence level is 95%.
N
n≥
1 + Ne 2
Where:
Example:
A researcher plans to conduct a survey about
food preference of BS Stat students. If the
BASIC SAMPLING DESIGN
population of students is 1000, find the sample
size if the error is 5%.
The goal in sampling is to obtain individuals for
Solution: a study in such a way that accurate information
1000 about the population can be obtained.
n≥ = 285.71
1 + 1000(0.05)2 Reason for Sampling
The researcher need to survey 286 BS stat - Important that the individuals included in a
students. sample represent a cross section of
individuals in the population.
• Finite Population Correction
- If sample is not representative it is biased.
If the population is small then the sample size You cannot generalize to the population from
can be reduced slightly your statistical data.
n0
n≥ Some definitions are needed to make the
n −1
1+ o notion of a good sample more precise.
N
lOMoAR cPSD| 12139369
- They require the use of a complete listing of - Most basic method of drawing a probability
the elements of the universe called the sample.
sampling frame.
- Assigns equal probabilities of selection to
- The probabilities of selection are known. each possible sample.
Sampling Procedure
- Ask the question, can I generalize to the general Simple Random Sampling
population from the accessible population?
N PopulationSize
k= =
n SampleSize
Example:
• Stratified Random Sampling
We want to select a sample of 50 students
- It is obtained by separating the population
from 500 students under this method kth item
into non-overlapping groups called strata and
and picked up from the sampling frame.
then obtaining a simple random sample from
Solution: each stratum.
Solution:
Given:
n 50
n1 = N1 = 200 = 20
(N) ( 500 )
n 50
n2 = N2 = 300 = 30
(N) ( 500 )
The sample sizes are 20 from A and 30 fro m
B. Then the units from each institution are to
be selected by simple random sampling.
Example:
Disadvantage: In actual field applications, 1. Organize the sampling process into stages
adjacent households tend to have more similar where the unit of analysis is systematically
characteristics than households distantly apart. grouped.
3. S y s t e m a t i c a l l y a p p l y t h e s a m p l i n g
technique to each stage until the unit of
analysis has been selected.
Example:
Suppose we wish to study the expenditure
patterns of households in NCR. We can select
a sample of households for this study using simple
three-stage sampling.
1. Non-sampling Error
1. Non-responses
https://data36.com/statistical-bias-types- explained/
2. Sampling Error