Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
22 views

3T2324 Module 2 - 3

Uploaded by

202310052
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

3T2324 Module 2 - 3

Uploaded by

202310052
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Data Collection

Objectives

▪ Enumerate and explain the different ways of collecting data


▪ Identify the best data collection scheme to be used in some situations
Data - consists of information coming from observations, counts, measurements,
or responses. Most data can be put into the following categories:

Categorized based on the nature of collection:

• Primary- Collected specifically for the analysis desired. Most common type is
doing a survey.

• Secondary – have already been collected/compiled and are available for


statistical analysis.
1. Survey Study - systematic method for gathering information. It is an investigation of
one or more characteristics of a population. Two types of data gathering via survey:

a. The Direct or Interview Method


In this method, the researcher has a direct contact with the interviewee. The researcher
obtains the information needed by asking questions and inquiries from the interviewee.
This method gives precise and consistent information because clarifications can be made.

The interviewee can repeat the question not fully understood by the respondent until it
suits the interviewee’s level. However, this method is time consuming, expensive, and
has limited field coverage.
b. The Indirect or Questionnaire Method
This method makes used of a written questionnaire. The researcher distributes the
questionnaire to the respondents either by personal delivery or by mail. Using this
method, the researcher can save a lot of time and money in gathering the information
needed because questionnaires can be given to a large number of respondents at the
same time.

However, the researcher cannot expect that all distributed questionnaires will be
retrieved because some respondents simply ignore the questionnaires. In addition,
clarification cannot be made if the respondent does not understand the question.
2. The Registration Method

This method of collecting data is governed by laws. For example, birth and death rates
are registered in the NSO for records and future use. The number of registered cars can
be found at LTO. The list of registered voters in the Philippines can be found at
COMELEC.
3. Retrospective Study

- Uses either all or sample data and can also be called as Historical Data.
- The data recorded or internal data by a company such as sales and transactions is a
type of primary data. Primary data is data that is collected directly from the source
for a specific purpose or research question, and it has not been previously collected
or analyzed by others.

Advantage
➢ Quickest and easiest way to collect process data.

Disadvantage
➢ Provides limited information.
4. Observational Study

- Simply observes the process or population during a period of routine operation


- Researcher interacts/disturbs the process only as much as is required to obtain data
on the system.

Advantage
- May give valuable info but usually limited because you just altered a part of the
system
5. The Experimental Design
This method is usually used to find out cause and effect relationships. Scientific
researchers often use this method. For example, agriculturists would like to know the
effect of a new brand of fertilizer on the growth of plants. The new kind of fertilizer will
be applied to ten sets of plants, while another ten sets of plants will be given another
fertilizer. The growth of the plants will then be compared to determine which fertilizer
is better.

Advantage:
We can establish cause-and-effect relationship unlike retrospective and observational
studies where we are just informed about any interesting phenomena.
6. Simulation Study
Simulation data gathering refers to the process of collecting data from a simulation,
which is a computer model of a system that mimics its behavior.

Advantage:
Simulation is a powerful technique and can be used to model many different types of
systems.
➢ Cost-effective
➢ Time Efficient
➢ Safe-testing
➢ Increased understanding
➢ Optimization
Sources of Secondary Data

• Books/Records
• Published censuses or other statistical data
• Data archives
• Internet/Research articles
Sampling Techniques
Objectives

▪ Enumerate and illustrate the different sampling techniques


▪ Identify the most appropriate sampling technique to be employed on given
situations
“When statistics are not based on strictly accurate calculations, they
mislead instead of guide. The mind easily lets itself be taken in by the false
appearance of exactitude which statistics retain in their mistakes, and
confidently adopts errors clothed in the form of mathematical truth.” —
Alexis de Tocqueville, Democracy in America

Requirements of a Good Sample


a “scaled-down” version of the population, mirroring every characteristic of the
whole population.
Terms
➢ Observation Unit/element – basic unit of observation, an object which a
measurement is taken. Example: Human Population: Individual
➢ Target Population – the complete collection of observations we want to study.
➢ Sample – subset of a population.
➢ Sampled Population – the collection of all possible observation units that might have
been chosen in a sample.
➢ Sampling unit – a unit that can be selected for a sample.
➢ Sampling frame – A list, map, or other specification of sampling units in the
population from which a sample may be selected.
Selection Bias Example:
A good sample will be as free a. Sample selection procedure that
from selection bias as possible. It depends on some characteristics
occurs when some part of the target associated with the properties of
population is not in the sampled interest.
population, or, more generally, when b. Deliberately or purposively
some population units are sampled selecting a ‘representative’
at a different rate than intended by sample.
the investigator. c. Misspecifying the target
population.
d. Under coverage and over
coverage.
e. Nonresponse.
Measurement Error Obtaining accurate responses is challenging in all types
of surveys, but particularly so in surveys of people:

A good sample has accurate a. People sometimes do not tell the truth.
responses to the items of b. People do not always understand the question.
interest. When a response in the c. People forget.
survey differs from the true d. People give different answers to different
interviews.
value, measurement error has e. People may say what they think an interviewer
occurred. wants to hear or what they think will impress the
interviewer.
f. Certain words mean different things to different
people.
1. Sampling error - the error that results from taking one sample instead of examining
the whole population.

2. Non-sampling error - selection bias and measurement error are types of non-
sampling error. These are the errors that cannot be attributed to the sample-to-
sample variability.
1. Sampling can provide reliable information at far less cost than a census.
2. Data can be collected more quickly, so estimates can be published in a timely
fashion
3. Estimates based on sample surveys are often more accurate than those based on a
census because investigators can be more careful when collecting data.

“Sampling is not mere substitution of a partial coverage for a total


coverage. Sampling is the science and art of controlling and measuring
the reliability of useful statistical information through the theory of
probability”
In probability sample:
➢ each unit in the population has a known probability of selection
➢ random number table or other randomization mechanism is used to choose the
specific units to be included in the sample.
➢ Investigator can use a relatively small sample to make inferences about an arbitrarily
large population

Types of Probability Samples:


a. Simple Random Sample
b. Systematic Random Sample
c. Stratified Random Sample
d. Cluster Sample
The most basic form of probability sampling and provides theoretical basis for the more
complicated forms.
Two ways of taking a simple random sample:

1. A Simple Random Sample with Replacement (SRSWR) of size n from a population


of N units can be thought of as drawing n independent samples of size 1. One unit is
randomly selected from the population to be the first sampled unit, with probability
1/N. (might include duplicates)

2. Simple Random Sample without Replacement (SRSWOR) is much more preferred


than SRSWR. This sample is selected so that every possible subset of n distinct units
in the population has the same probability of being selected as the sample. The
probability that an element or individual will be included in the sample is P(i) = n/N
How to take an SRS?

1. List of all possible observation units in the population (Sampling Frame).


2. Each unit is assigned a number
3. Use a computer-generated pseudo-random numbers to select the sample. One
method of selecting an SRS of size n from a population of size N is to generate N
random numbers between 0 and 1.
Example: N=10, n=4
unit i 1 2 3 4 5 6 7 8 9 10
Random 0.837 0.636 0.465 0.609 0.154 0.766 0.821 0.723 0.988 0.469
number

Get the smallest 4 of the random numbers which leads us to sample units {3,4,5,10}
Steps:
1. Ask “What is expected of the sample, and how much precision do I need?”
“What are the consequences of the sample results?”
“How much error is tolerable?”
• Example: Unemployment Rate Survey >> Unemployment Rate
• Only the investigators in the study can say how much precision is needed.
• Specify the tolerable error: Desired precision = 𝑃 𝑦ത − 𝑦ത𝜇 ≤ e = 1 − α
• The investigator must decide on reasonable values for α and e; e is called the margin
of error in many surveys while α is the level of significance.
• For many surveys of people in which a proportion is measured, e = 0.03 and α =
0.05.
2. Find an equation relating the sample size n and your expectations of the sample.
Estimate and unknown quantities if the sample size formula requires it.

Sample Size Estimation


How many responses do you really need? This simple question is a never-ending
quandary for researchers. A larger sample can yield more accurate results — but
excessive responses can be pricey. (Qualtrics)

Some Sample Size Formula


a. Yamane or Slovin’s Formula : 𝑛 = 𝑁/(1 + 𝑁𝑒 2 ) • We can use Yamane if we have
known finite population and used
where n = sample size, a simple random sampling
N = population size technique.
e = margin of error
A group of researchers will conduct a survey to find out the opinion of residents of a
particular community regarding the oil price hike. If there are 10 000 residents in the
community and the researchers plan to use a sample using a 10% margin of error, what
should the sample size be? If the researchers would like to use a 5% margin of error,
what should the sample size be?

Solution:
10,000
For e=10%, Slovin’s Formula suggests 𝑛 = ≈ 99
1+10,000 0.10 2
10,000
For e=5%, we have 𝑛 = ≈ 385
1+10,000 0.05 2
b. Cochran’s Sample Size Formula (proportion): 𝑛 = 𝑝 1 − 𝑝 𝑧 2 /𝑒 2
Where, n = sample size
p = the population proportion
e = acceptable margin of error
z = z-score at significance level (α)
α = 0.10 = 1.645 = z-score (1- α = confidence level:90%)
α = 0.05 = 1.96 = z-score (1- α = confidence level:95%)
α = 0.01 = 2.576 = z-score (1- α = confidence level:99%)
We can use Cochran formula if the population size is unknown but a lot. The
population proportion is known is used to calculate the essential sample size for the
required level of precision, confidence level and the estimated proportion of the
attribute present in the population. Cochran formula is most suitable for a large
population, but if the population of interest is relatively small, there is a modified
Cochran’s Sample Size Formula.
A researcher wishes to estimate, with 95% confidence, the proportion of people who
own a home computer. A previous study shoes that 40% of those interviewed had a
computer at home. The researcher wishes to be accurate within 2% of the true
proportion. Find the minimum sample size necessary using Cochran’s Sample Size
Formula.

Solution:
Since confidence level = 95% = 1.96, e = 0.02, estimated proportion or 𝑝ො = 0.40 and 1 -
𝑝ො = 0.60
0.40 ∗ 1−0.40 ∗ 1.96 2
The sample size needed, 𝑛 = is 2304.96 or
0.022
2305 people to interview

Note: If you will find at this point that the sample size you calculated in step 2 is much larger than you
can afford. Go back and adjust some of your expectations for the survey and try again.
It is used as a proxy for simple random sampling when no list of the population. Selection
of individuals is based on pre-determined interval (k) or sampling interval and we choose
a random starting point.
Example: Conducting a survey about the communication skills of students in university X

Step 1. Define Population (N) = 1000


Step 2. Compute Sample size (n) = 100 (assume that this is computed via sample size
formula).
Step 3. Decide the Sampling Interval (formula: k=population/sample size)
Step 4. Select a random number between 1 and k. This will be the starting point for the
sample selection.
Step 5. Select every kth element: Starting from the random starting point, select every
kth element in the population until the sample size is reached.
Example: For example, to select a systematic sample of 45 students from the list of
45,000 students at a university, the sampling interval k is 1000. Suppose the random
integer we choose is 597. Then the students numbered 597, 1597, 2597, ... , 44,957
would be in the sample.
Simple random samples are usually easy to design and easy to analyze. But they are
not the best design to use in the following situations:

➢ If you want to study whether a certain brand of bath oil is an effective mosquito
repellent, you should perform a controlled experiment, not take a survey. You should
take a survey if you want to estimate how many people use the bath oil as a
mosquito repellent, or if you want to estimate how many mosquitoes are in an area.

➢ If interested in the proportion of mosquitoes in Metro Manila that carry an


encephalitis virus, you would need to sample different areas, and then examine
some or all the mosquitoes found in those areas, using a form of cluster sampling.
➢ Little extra information is available that can be used when designing the survey. If
your sampling frame is merely a list of university students’ names in alphabetical
order and you have no additional information such as major or year, simple random
or systematic sampling is probably the best probability sampling strategy.

➢ The primary interest is in multivariate relationships such as regression equations that


hold for the whole population, and there are no compelling reasons to take a
stratified or cluster sample. Multivariate analyses can be done in complex samples,
but they are much easier to perform and interpret in an SRS.
It partitions population into subclasses with notable distinctions which are called strata.
To stratify a population means to classify or to separate people into groups according to
some of their characteristics, such as rank, income, education, sex, or ethnicity
background.

Example: Conducting a survey about the communication skills of students in university X


Information: Strata is based age. Population (N) = 180,000; Sample Size (n) = 50,000
𝑁𝑢𝑚𝑏𝑒𝑟 𝑖𝑛 𝑆𝑡𝑟𝑎𝑡𝑢𝑚
1. Strata size =
𝑁 Age Group 18-24 25-35 35+ Total
2. Multiply the proportion to the
desired sample size (n) Age Bracket Size 90000 60000 30000 180000
Strata Size 25000 16667 8333 50000

Advantage:
1. the best option if your goal is to profile different groups
2. Entire population requires to be a part of the sample data
1. Create a table in order. Label each group for Group Financial Group N n
convenience. A PHP 50,000 and above 120
2. Solve for the sample size (total sample size) B PHP 40,000-PHP49,999 250
using Slovin’s Formula C PHP 30,000-39,999 210
D PHP 20,000-29,999 400
𝟐𝟎𝟎𝟎 E PHP 10,000-19,999 900
𝒏= 𝟐
= 𝟑𝟑𝟑. 𝟑𝟑𝟑𝟑𝟑 ≈ 𝟑𝟑𝟒 (𝒓𝒐𝒖𝒏𝒅 𝒖𝒑)
𝟏 + 𝟐𝟎𝟎𝟎 𝟎. 𝟎𝟓 F below PHP 10,000 120
Total 2000
Group Financial Group N n
A PHP 50,000 and above 120
B PHP 40,000-PHP49,999 250

1. Create a table in order. Label each group for C PHP 30,000-39,999 210

convenience. D PHP 20,000-29,999 400

2. Solve for the sample size total sample size 𝒏. E PHP 10,000-19,999 900
𝟐𝟎𝟎𝟎 F below PHP 10,000 120
𝒏= 𝟐
= 𝟑𝟑𝟑. 𝟑𝟑𝟑𝟑𝟑 ≈ 𝟑𝟑𝟒 (𝒓. 𝒖𝒑)
𝟏 + 𝟐𝟎𝟎𝟎 𝟎. 𝟎𝟓 Total 2000 334
3. Solve for the sample size of each sample stratum, in this example, the financial group,
using proportionate sampling.
𝑁𝐴 120 𝑁𝐷 400
𝑛𝐴 = 𝑛 𝑥 = 334 𝑥 ≈ 20 𝑛𝐷 = 𝑛 𝑥 = 334 𝑥 ≈ 67
𝑁 2000 𝑁 2000

𝑁𝐵 250 𝑁𝐸 900
𝑛𝐵 = 𝑛 𝑥 = 334 𝑥 ≈ 42 𝑛𝐸 = 𝑛 𝑥 = 334 𝑥 ≈ 150
𝑁 2000 𝑁 2000

𝑁𝐶 210 𝑁𝐹 120
𝑛𝐶 = 𝑛 𝑥 = 334 𝑥 ≈ 35 𝑛𝐹 = 𝑛 𝑥 = 334 𝑥 ≈ 20
𝑁 2000 𝑁 2000
Group Financial Group N n
A PHP 50,000 and above 120 20
B PHP 40,000-PHP49,999 250 42
C PHP 30,000-39,999 210 35
4. Verify if the sum of the sample sizes of the D PHP 20,000-29,999 400 67
stratified groups is equal to the computed E PHP 10,000-19,999 900 150
sample size.
F below PHP 10,000 120 20
Total 2000 334
20 + 42 + 35 + 67 + 150 + 20 = 334

∴ Engr. Donneth Dave will survey 20 from Group A, 42 from Group B, 35 from
Group C, 67 from Group D, 150 from Group E, and 20 from Group F, forming
a sample of 334 out of 2000.
Cluster Sampling is similar to stratified random sampling, the total population is divided
into clusters and a sample random sampling is used in each cluster. Cluster is usually
based on geographic area.

➢ Sampling Unit = Cluster


Example: Barangay 1,2,…,50
Select barangays using SRSWOR -> 1, 5, 12, 13, 19, 37, 46
➢ One-Stage Cluster Sampling
Selected: 1, 5, 12, 13, 19, 37, 46

➢ Two-Stage Cluster Sampling


Selected: 1, 5, 12, 13, 19, 37, 46
Conduct Simple Random Sampling within each barangay.
Non-probability sampling is a method of selecting sampling units from a target
population using a subjective or non-random method.

Types of Non-Random Sampling


1. Convenience sampling: The sample is selected based on accessibility or
convenience. It is the least effective of the non-probability sampling methods but
if there are logistics and time constraints, it may be the only option.

Example: Conducting a survey in a department store and interviewer selected any


person who happens to walk by.
2. Purposive Sampling: A method for obtaining sample units where researchers use their
expertise to choose qualified participants to take the survey that will help the research
study meets its goals. The researchers pick these participants purposively.

Example: A food company will offer a new food product in the market and will first survey
few selected food scientists about it. They’re the subject-matter experts and their opinion
and insights will be very valuable in creating the product.
3. Quota sampling: The sample is selected based on certain quotas or predetermined
criteria, such as age, educational attainment, gender or income level. Quota Sampling
is one of the most preferred methods of non-probability sampling because it forces the
inclusion of members of different subpopulations.

Example: There are 200 observation unit in a population and there are 100 men and 100
are women. If there are 20 samples needed for the study, 10 men and 10 women may be
interviewed.
4. Snowball sampling: The sample is selected based on referrals from other members of
the population. This type of sampling is used if the population of interest is hard to find
like people with disabilities or certain diseases, drug users, victims of a specific crime.

Example: Conducting a survey to those people who have specific auto-immune disease.
There are no defined sample size formula for non-probability sampling, but there are
recommendations and rule of thumbs:
1. Sample sizes larger than 30 and less than 500 are appropriate for most research.

2. Where samples are to be broken into sub-samples;(male/females, juniors/seniors,


etc.), a minimum sample size of 30 for each category is necessary.

3. In multivariate research (including multiple regression analyses), the sample size


should be several times (preferably 10 times or more) as large as the number of
variables in the study. (Sample-to-variable ratio suggests a minimum observation of 15
– 20 observation per independent variable)

4. For simple experimental research with tight experimental controls (matched pairs,
etc.), successful research is possible with samples as small as 10 to 20 in size."
• Introduction to the concepts of statistics “Better data builds better
• Statistical Inquiry evidence, which informs better
• Importance of statistics decisions. Those decisions affect
• Level of measurements our health, wealth and happiness.”
• Types of data and variables - Earlham Institute
• Data Collection Method
• Sampling Techniques

Source: clarkstoneconsulting.com
• Sampling Design and Analysis, Sharon Lohr
• Qualtrics Documentation and References
• Sekaran, U., 2003. Research methods for business: A skill building approach. John Wiley & Sons.

You might also like