3T2324 Module 2 - 3
3T2324 Module 2 - 3
Objectives
• Primary- Collected specifically for the analysis desired. Most common type is
doing a survey.
The interviewee can repeat the question not fully understood by the respondent until it
suits the interviewee’s level. However, this method is time consuming, expensive, and
has limited field coverage.
b. The Indirect or Questionnaire Method
This method makes used of a written questionnaire. The researcher distributes the
questionnaire to the respondents either by personal delivery or by mail. Using this
method, the researcher can save a lot of time and money in gathering the information
needed because questionnaires can be given to a large number of respondents at the
same time.
However, the researcher cannot expect that all distributed questionnaires will be
retrieved because some respondents simply ignore the questionnaires. In addition,
clarification cannot be made if the respondent does not understand the question.
2. The Registration Method
This method of collecting data is governed by laws. For example, birth and death rates
are registered in the NSO for records and future use. The number of registered cars can
be found at LTO. The list of registered voters in the Philippines can be found at
COMELEC.
3. Retrospective Study
- Uses either all or sample data and can also be called as Historical Data.
- The data recorded or internal data by a company such as sales and transactions is a
type of primary data. Primary data is data that is collected directly from the source
for a specific purpose or research question, and it has not been previously collected
or analyzed by others.
Advantage
➢ Quickest and easiest way to collect process data.
Disadvantage
➢ Provides limited information.
4. Observational Study
Advantage
- May give valuable info but usually limited because you just altered a part of the
system
5. The Experimental Design
This method is usually used to find out cause and effect relationships. Scientific
researchers often use this method. For example, agriculturists would like to know the
effect of a new brand of fertilizer on the growth of plants. The new kind of fertilizer will
be applied to ten sets of plants, while another ten sets of plants will be given another
fertilizer. The growth of the plants will then be compared to determine which fertilizer
is better.
Advantage:
We can establish cause-and-effect relationship unlike retrospective and observational
studies where we are just informed about any interesting phenomena.
6. Simulation Study
Simulation data gathering refers to the process of collecting data from a simulation,
which is a computer model of a system that mimics its behavior.
Advantage:
Simulation is a powerful technique and can be used to model many different types of
systems.
➢ Cost-effective
➢ Time Efficient
➢ Safe-testing
➢ Increased understanding
➢ Optimization
Sources of Secondary Data
• Books/Records
• Published censuses or other statistical data
• Data archives
• Internet/Research articles
Sampling Techniques
Objectives
A good sample has accurate a. People sometimes do not tell the truth.
responses to the items of b. People do not always understand the question.
interest. When a response in the c. People forget.
survey differs from the true d. People give different answers to different
interviews.
value, measurement error has e. People may say what they think an interviewer
occurred. wants to hear or what they think will impress the
interviewer.
f. Certain words mean different things to different
people.
1. Sampling error - the error that results from taking one sample instead of examining
the whole population.
2. Non-sampling error - selection bias and measurement error are types of non-
sampling error. These are the errors that cannot be attributed to the sample-to-
sample variability.
1. Sampling can provide reliable information at far less cost than a census.
2. Data can be collected more quickly, so estimates can be published in a timely
fashion
3. Estimates based on sample surveys are often more accurate than those based on a
census because investigators can be more careful when collecting data.
Get the smallest 4 of the random numbers which leads us to sample units {3,4,5,10}
Steps:
1. Ask “What is expected of the sample, and how much precision do I need?”
“What are the consequences of the sample results?”
“How much error is tolerable?”
• Example: Unemployment Rate Survey >> Unemployment Rate
• Only the investigators in the study can say how much precision is needed.
• Specify the tolerable error: Desired precision = 𝑃 𝑦ത − 𝑦ത𝜇 ≤ e = 1 − α
• The investigator must decide on reasonable values for α and e; e is called the margin
of error in many surveys while α is the level of significance.
• For many surveys of people in which a proportion is measured, e = 0.03 and α =
0.05.
2. Find an equation relating the sample size n and your expectations of the sample.
Estimate and unknown quantities if the sample size formula requires it.
Solution:
10,000
For e=10%, Slovin’s Formula suggests 𝑛 = ≈ 99
1+10,000 0.10 2
10,000
For e=5%, we have 𝑛 = ≈ 385
1+10,000 0.05 2
b. Cochran’s Sample Size Formula (proportion): 𝑛 = 𝑝 1 − 𝑝 𝑧 2 /𝑒 2
Where, n = sample size
p = the population proportion
e = acceptable margin of error
z = z-score at significance level (α)
α = 0.10 = 1.645 = z-score (1- α = confidence level:90%)
α = 0.05 = 1.96 = z-score (1- α = confidence level:95%)
α = 0.01 = 2.576 = z-score (1- α = confidence level:99%)
We can use Cochran formula if the population size is unknown but a lot. The
population proportion is known is used to calculate the essential sample size for the
required level of precision, confidence level and the estimated proportion of the
attribute present in the population. Cochran formula is most suitable for a large
population, but if the population of interest is relatively small, there is a modified
Cochran’s Sample Size Formula.
A researcher wishes to estimate, with 95% confidence, the proportion of people who
own a home computer. A previous study shoes that 40% of those interviewed had a
computer at home. The researcher wishes to be accurate within 2% of the true
proportion. Find the minimum sample size necessary using Cochran’s Sample Size
Formula.
Solution:
Since confidence level = 95% = 1.96, e = 0.02, estimated proportion or 𝑝ො = 0.40 and 1 -
𝑝ො = 0.60
0.40 ∗ 1−0.40 ∗ 1.96 2
The sample size needed, 𝑛 = is 2304.96 or
0.022
2305 people to interview
Note: If you will find at this point that the sample size you calculated in step 2 is much larger than you
can afford. Go back and adjust some of your expectations for the survey and try again.
It is used as a proxy for simple random sampling when no list of the population. Selection
of individuals is based on pre-determined interval (k) or sampling interval and we choose
a random starting point.
Example: Conducting a survey about the communication skills of students in university X
➢ If you want to study whether a certain brand of bath oil is an effective mosquito
repellent, you should perform a controlled experiment, not take a survey. You should
take a survey if you want to estimate how many people use the bath oil as a
mosquito repellent, or if you want to estimate how many mosquitoes are in an area.
Advantage:
1. the best option if your goal is to profile different groups
2. Entire population requires to be a part of the sample data
1. Create a table in order. Label each group for Group Financial Group N n
convenience. A PHP 50,000 and above 120
2. Solve for the sample size (total sample size) B PHP 40,000-PHP49,999 250
using Slovin’s Formula C PHP 30,000-39,999 210
D PHP 20,000-29,999 400
𝟐𝟎𝟎𝟎 E PHP 10,000-19,999 900
𝒏= 𝟐
= 𝟑𝟑𝟑. 𝟑𝟑𝟑𝟑𝟑 ≈ 𝟑𝟑𝟒 (𝒓𝒐𝒖𝒏𝒅 𝒖𝒑)
𝟏 + 𝟐𝟎𝟎𝟎 𝟎. 𝟎𝟓 F below PHP 10,000 120
Total 2000
Group Financial Group N n
A PHP 50,000 and above 120
B PHP 40,000-PHP49,999 250
1. Create a table in order. Label each group for C PHP 30,000-39,999 210
2. Solve for the sample size total sample size 𝒏. E PHP 10,000-19,999 900
𝟐𝟎𝟎𝟎 F below PHP 10,000 120
𝒏= 𝟐
= 𝟑𝟑𝟑. 𝟑𝟑𝟑𝟑𝟑 ≈ 𝟑𝟑𝟒 (𝒓. 𝒖𝒑)
𝟏 + 𝟐𝟎𝟎𝟎 𝟎. 𝟎𝟓 Total 2000 334
3. Solve for the sample size of each sample stratum, in this example, the financial group,
using proportionate sampling.
𝑁𝐴 120 𝑁𝐷 400
𝑛𝐴 = 𝑛 𝑥 = 334 𝑥 ≈ 20 𝑛𝐷 = 𝑛 𝑥 = 334 𝑥 ≈ 67
𝑁 2000 𝑁 2000
𝑁𝐵 250 𝑁𝐸 900
𝑛𝐵 = 𝑛 𝑥 = 334 𝑥 ≈ 42 𝑛𝐸 = 𝑛 𝑥 = 334 𝑥 ≈ 150
𝑁 2000 𝑁 2000
𝑁𝐶 210 𝑁𝐹 120
𝑛𝐶 = 𝑛 𝑥 = 334 𝑥 ≈ 35 𝑛𝐹 = 𝑛 𝑥 = 334 𝑥 ≈ 20
𝑁 2000 𝑁 2000
Group Financial Group N n
A PHP 50,000 and above 120 20
B PHP 40,000-PHP49,999 250 42
C PHP 30,000-39,999 210 35
4. Verify if the sum of the sample sizes of the D PHP 20,000-29,999 400 67
stratified groups is equal to the computed E PHP 10,000-19,999 900 150
sample size.
F below PHP 10,000 120 20
Total 2000 334
20 + 42 + 35 + 67 + 150 + 20 = 334
∴ Engr. Donneth Dave will survey 20 from Group A, 42 from Group B, 35 from
Group C, 67 from Group D, 150 from Group E, and 20 from Group F, forming
a sample of 334 out of 2000.
Cluster Sampling is similar to stratified random sampling, the total population is divided
into clusters and a sample random sampling is used in each cluster. Cluster is usually
based on geographic area.
Example: A food company will offer a new food product in the market and will first survey
few selected food scientists about it. They’re the subject-matter experts and their opinion
and insights will be very valuable in creating the product.
3. Quota sampling: The sample is selected based on certain quotas or predetermined
criteria, such as age, educational attainment, gender or income level. Quota Sampling
is one of the most preferred methods of non-probability sampling because it forces the
inclusion of members of different subpopulations.
Example: There are 200 observation unit in a population and there are 100 men and 100
are women. If there are 20 samples needed for the study, 10 men and 10 women may be
interviewed.
4. Snowball sampling: The sample is selected based on referrals from other members of
the population. This type of sampling is used if the population of interest is hard to find
like people with disabilities or certain diseases, drug users, victims of a specific crime.
Example: Conducting a survey to those people who have specific auto-immune disease.
There are no defined sample size formula for non-probability sampling, but there are
recommendations and rule of thumbs:
1. Sample sizes larger than 30 and less than 500 are appropriate for most research.
4. For simple experimental research with tight experimental controls (matched pairs,
etc.), successful research is possible with samples as small as 10 to 20 in size."
• Introduction to the concepts of statistics “Better data builds better
• Statistical Inquiry evidence, which informs better
• Importance of statistics decisions. Those decisions affect
• Level of measurements our health, wealth and happiness.”
• Types of data and variables - Earlham Institute
• Data Collection Method
• Sampling Techniques
Source: clarkstoneconsulting.com
• Sampling Design and Analysis, Sharon Lohr
• Qualtrics Documentation and References
• Sekaran, U., 2003. Research methods for business: A skill building approach. John Wiley & Sons.