Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Sampling

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

Sampling

A sample is a smaller (but hopefully representative) collection of units from a population


used to determine truths about that population (Field, 2005).

Why sample?

 Resources (time, money) and workload


 Gives results with known accuracy that can be calculated mathematically
 The sampling frame is the list from which the potential respondents are drawn from
registrar office, class rosters. Be careful of the sampling errors

Population

A population can be defined as including all people or items with the characteristic one
wishes to understand. Because there is very rarely enough time or money to gather information
from everyone or everything in a population, the goal becomes finding a representative sample
or subset of that population. Note also that the population form which the sample is drawn may
not be the same as the population about which we actually want information. Often there is large
but not complete overlap between these two groups due to frame issues etc. sometimes they may
be entirely separate, for instance, we might study rats in order to get a better understanding of
human health, or we might study records from people born in 2008 in order to make predictions
about people born in 2009.

What is your population of interest?

 To whom do you want to generalize your results?


 Can you sample the entire population?

Factors that influence sample representativeness

 Sampling procedure
 Sample size
 Participation (responses)

When might you sample the entire population?


 When your population is very small
 When you have extensive resources
 When you don’t expect a very high response

Sampling frame

In the most straightforward case, such as the sentencing of a batch of material from
production (acceptance sampling by lots), it is possible to identify and measure every single item
in the population and to include any one of them in our sample. However, in the more general
case this is not possible. There is no way to identify all rats in the set of all rats. Where voting is
not compulsory, there is no way to identify which people will actually vote at a forthcoming
election. As a remedy, we seek a sampling frame which has the property that we can identify
every single element and include any in our sample. The sampling frame must be representative
of the population.

The theoretical
population

The study
population

The sampling
frame

The sample

Process

The sampling process comprises several stages:

 Defining the population of concern


 Specifying a sampling frame, a set of items or events possible to measure
 Specifying the sampling method for selecting items or events from the frame
 Determining the sample size
 Implementing the sampling plan
 Sampling and data collection
 Reviewing the sampling process

Types of samples

 Probability (random) samples

 simple random sampling


 systematic sampling
 stratified random sampling
 cluster sampling
 multistage sampling
 multiphase sampling

 non-probability samples
 convenience sample
 purposive sample
 quota sample

Probability sampling
A probability sampling scheme is one in which every unit in the population has a chance
(greater than zero) of being selected in the sample, and this probability can be accurately
determined. When every element in the population does have the same probability of selection,
this is known as an EQUAL PROBABILITY OF SELECTION (EPS) design. Such designs are
also referred to as self-weighting, because all sampled units are given the same weight.

1. Simple random sampling


 Applicable when population is small, homogenous and readily available
 All subsets of the frame are given an equal probability. Each element of the frame
thus has an equal probability of selection.
 If provides for greatest number of possible samples, this is done by assigning a
number to each unit in the sampling frame.
 A table of random number or lottery system is used to determine which units are
to be selected.
 Estimates are easy to calculate
 Simple random sampling is always an EPS design, but not all EPS design are
simple random sampling

Disadvantages:

 If sampling frame large, this method is not practical.


 Minority subgroups of interest are in population, it may not be present in sample
in sufficient numbers for study.

2. Systematic sampling
 It relies on arranging the target population according to some ordering scheme
and then selecting elements at regular intervals through that ordered list.
 Systematic sampling involves a random start and then proceeds with the selection
of every kth element from then onwards. In this case, k= population size/ sample
size.
 It is important that the starting point is not automatically the first in the list, but is
instead randomly chosen from within the first to the kth element in the list.
 A simple example would be to select every 10th name from the telephone
directory (an every 10th sample, also referred to as sampling with a skip of 10).
 As described above, systematic sampling is an EPS method, because all elements
have the same probability of selection (in the example given, one in ten). It is not
simple random sampling because different subsets of the same size have different
selection probabilities, e.g., the set (4, 14, 24,…, 994) has a one in ten probability
of selection, but the set (4,13,24,34,…) has zero probability of selection.

Advantages:

 Sample easy to select


 Suitable sampling frame can be identified easily
 Sample evenly spread over entire reference population
Disadvantages:

 Sample may be biased if hidden periodically in population coincides with that of


selection.
 Difficult to assess precision of estimate from one survey.

3. Stratified sampling
 Where population embraces a number of distinct categories, the frame can be
organized into separate “strata”. Each stratum is then sampled as an independent
sub-populations, out of which individual elements can be randomly selected.
 Every unit in the stratum has same chance of being selected.
 Using same sampling fraction for all strata ensures proportionate representation in
the sample.
 Adequate representation of minority sub-groups of interest can be ensured by
stratification and varying sampling fraction between strata as required.
 Finally, since stratum is treated as an independent population, different sampling
approaches can be applied to different strata. Men and women are two strata.

Disadvantage:

 First, sampling frame of entire population has to be prepared separately for each
stratum.
 Second, when examining multiple criteria, stratifying variables may be related to
some, but not to others, further complicating the design, and potentially reducing
the utility of the strata.
 Finally, in some cases (such as designs with a large number of strata, or those
with a specified minimum sample size per group), stratified sampling can
potentially require a larger sample than would other methods.

4. Cluster sampling
 Cluster sampling is an example of “two-stage sampling”.
 First stage a sample of areas is chosen.
 Second stage a sample of respondents within those areas is selected.
 Populations divided into clusters of homogeneous units, usually based on
geographical contiguity.
 Sampling units are groups rather than individuals.
 A sample of such clusters is then selected.
 All units from the selected clusters are studied.

Identification of clusters

 List all cities, towns, villages and wards of cities with their population falling in
target area under study.
 Calculate cumulative population and divided by 30, this gives sampling interval.
 Select a random no. less than or equal to sampling interval having same no. of
digits. This forms 1st cluster.
 Random no. + sampling interval = population of 2nd cluster.
 Second cluster + sampling interval = 4th cluster.
 Last or 30th cluster = 29th cluster + sampling interval.

Cluster sampling types

 One-stage sampling: all of the elements within selected clusters are included in
the sample.
 Two-stage sampling: a subset of elements within selected clusters are randomly
selected for inclusion in the sample.

Advantages:

 Cuts down on the cost of preparing a sampling frame.


 This can reduce travel and other administrative costs.

Disadvantages:

 Sampling error is higher for a simple random sample of same size.


 Often used to evaluate vaccination coverage in EPI.
Difference between strata and clusters

 Although strata and clusters are both non-overlapping sub-sets of the population,
they differ in several ways.
 All strata are represented in the sample, but only a subset of clusters are in the
sample.
 With stratified sampling, the best survey results occur when elements within
strata are internally homogeneous. However, with cluster sampling, the best
results occur when elements within clusters are internally heterogeneous.

5. Multi-stage sampling
 Complex form of cluster sampling in which two or more levels of units are
embedded one in the other.
 First stage: random number of districts chosen in all states.
 Followed by random number of villages
 Third stage units will be houses
 All ultimate units (houses, for instances) selected at last step are surveyed.
 This technique is essentially the process of taking random samples of preceding
random samples.
 Not as effective as true random sampling, but probably solves more of the
problems inherent to random sampling.
 An effective strategy because it banks on multiple randomization. As such,
extremely useful.
 Multi-stage sampling used frequently when a complete list of all members of the
population not exists and is inappropriate.
 Moreover, by avoiding the use of all samples units in all selected clusters,
multistage sampling avoids the large, and perhaps unnecessary, costs associated
with traditional cluster sampling.

6. Multi-phase sampling
 Part of the information collected from whole sample and part from subsample.
 In TB survey MT in all cases- phase I
 X-ray chest in MT +ve cases- phase II
 Sputum examination in X-ray +ve cases- phase III
 It is a survey that is less costly, less laborious and more purposeful

Non-probability sampling
Any sampling method where some elements of population have no chance of selection
(these are sometimes referred to as out of coverage/ under covered), or where the probability of
selection can’t be accurately determined. It involves the selection of elements based on
assumptions regarding the population of interest, which forms the criteria for selection. Hence,
because the selection of elements is non-random, non-probability, sampling not allows the
estimation of sampling errors. For example, we visit every household in a given street and
interview the first person to answer the door. In any household with more than one occupant, this
is a non-probability sample, because some people are more likely to answer the door (e.g. an
unemployed person who spends most of their time at home is more likely to answer than an
employed housemate who might be at work when the interviewer calls) and it’s not practical to
calculate these probabilities.

Non-response effects may turn probability design if the characteristics of non-response


are not well understood, since non-response effectively modifies each element’s probability of
being sampled.

1. Convenience sampling
 Sometimes known as grab or opportunity sampling or accidental or
haphazard sampling.
 A type of non-probability sampling which involves the sample being drawn
from that part of the population which is close to hard. That is, readily
available and convenient.
 The researcher using such a sample cannot scientifically make generalizations
about the total population from this sample because it would not be
representative enough.
 For example, if the interviewer was to conduct a survey at a shopping center
early in the morning on a given day, the people that he/she could interview
would be limited to those given there at that given time, which would not
represent the views of other members of society in such an area, if the survey
was to be conducted at different times of day and several times per week.
 This type of sampling is most useful for pilot testing.
 In social science research, snowball sampling is similar technique, where
existing study subjects are used to recruit more subjects into the sample.

2. Purposive sampling/ judgmental sampling


 The researcher chooses the sample based on who they think would be
appropriate for the study. This is used primarily when there is a limited
number of people that have expertise in the area being researched.

3. Quota sampling
 The population is first segmented into mutually exclusive sub-groups, just as
in stratified sampling.
 Then judgment used to select subjects or units from each segment based on a
specified proportion.
 For example, an interviewer may be told to sample 200 females and 300 males
between the age of 45 and 60.
 It is this second step which makes the technique one of non-probability
sampling.
 In quota sampling, the selection of the sample is non-random.
 For example, interviewer might be tempted to interview those who look most
helpful. The problem is that these samples may be biased because bot
everyone gets a chance of selection. This random element is its greatest
weakness and quota versus probability has been a matter of controversy for
many years.

______________________________________________________________________________

Post-stratification

Stratification is sometimes introduced after the sampling phase in a process called “post-
stratification”. This approach is typically implemented due to a lack of prior knowledge of an
appropriate stratifying variable or when the experimenter lacks the necessary information to
create a stratifying variable during the sampling phase. Although the method is susceptible to the
pit falls of post hoc approaches. It can provide several benefits in the right situation.
Implementation usually follows a simple random sample. In addition to allowing for
stratification on an ancillary variable post-stratification can be used to implement weighting,
which can improve the precision of a sample’s estimates.

Over-sampling

Choice-based sampling is one of the stratified sampling strategies. In this data are
stratified on the target and a sample is taken from each strata so that the rare target class will be
more represented in the sample. The model is then built on this biased sample. The effects of the
input variables on the target are often estimated with more precision with the choice-based
sample even when a smaller overall size is taken, compared to a random sample. The results
usually must be adjusted to correct for the oversampling.

Panel sampling

Method of first selecting a group of participants through a random sampling method and
then asking that group for the same information again several times over a period of time.
Therefore, each participant is given same survey or interview at two or more time points, each
period of data collection called a “wave”. This sampling method often chosen for large scale or
nation-wide studies in order to gauge changes in the population with regard to any number of
variables from chronic illness to job stress to weekly food expenditures. Panel sampling can also
be used to inform researcher about within person health changes due to age or help explain
changes in continuous dependent variables such as spousal interaction. There have been several
proposed methods of analyzing panel sample data including growth curves.

Matched random sampling

A method of assigning participants to groups in which pairs of participants are first


matched on some characteristics and then individually assigned randomly to groups. The
procedure for matched random sampling can be briefed with the following contexts. Two
samples in which the members are clearly paired or are matched explicitly by the researcher. For
example, IQ measurement or pairs of identical twins. Those samples in which the same attribute
or variable is measured twice on each subject under different circumstances. Commonly called
repeated measures. Examples include the times of a group of athletes for 1500m before and after
a week of special training, the milk yields of cows before and after being fed a particular diet.

Sampling

In some cases the sample designer has access to an auxiliary variable or size measure,
believed to be correlated to the variable of interest, for each element in the population. This data
can be used to improve accuracy in sample design. One option is to use the auxiliary variable as
a basis for stratification, as discussed above.

Another option is probability-proportional to size (PPS) sampling, in which the selection


probability for each element is set to be proportional to its size measure up to a maximum of 1.
In a simple PPS design, these selection probabilities can then be used as the basis for POISSON
SAMPLING. However, this has the drawbacks of variable sample size and different portions of
the population may still be over or under represented due to chance variation in selection. To
address this problem, PPS may be combined with a systematic approach.

Suppose we have six schools with population of 150, 180, 200, 220, 260 and 490 students
respectively (total 1500 students), and we want to use student population as a basis for a PPS
sample of size three. To do this, we could allocate the first school numbers 1 to 150, the second
school 151 to 330 (= 150+180). The third school 331 to 530 and so on to the last school (1011 to
1500). We then generate a random start between 1 and 500 (equal to 1500/3) and count through
the school populations by multiples of 500. If our random start was 137, we would select the
school which have been allocated numbers 137, 637, and 1137, i.e. the first, fourth, and sixth
schools.

The PPS approach can improve accuracy for a given sample size by concentrating sample
on large elements that have the greatest impact on population estimates. PPS sampling
commonly used for surveys of businesses, where element size varies greatly and auxiliary
information is often available, for instances, a survey attempting to measure the number of
guests-nights spent in hotels might use each hotel’s number rooms as an auxiliary variable.

Event sampling methodology (ESM)

This is a new form of sampling method that allows researcher to study on going
experiences and events that vary across and within days in its naturally occurring environment.
Because of the frequent sampling of events inherent in ESM, it enables researcher to measure the
typology of activity and detect the temporal and dynamic fluctuations of work experiences.
Popularity of ESM as a new form of research design increased over the recent years because it
addresses the shortcomings of cross-sectional research, where once unable to, researchers can
now detect intra-individual variances across time. In ESM, participants are asked to record their
experiences and perceptions in a paper or electronic diary. There are three types of ESM:

 Signal-contingent: Random beeping notifies participants to record data. The advantage of


this type of ESM is minimization of recall bias.
 Event contingent: records data when certain events occur.
 Interval contingent: records data according to the passing of a certain period of time

ESM has several disadvantages, as it is sometimes be perceived as invasive and intrusive by


participants. It can also lead to self-selection bias. It may be that only certain types of
individuals are willing to participate in this type of study creating a non-random sample.
Another concern is participant cooperation. Participants may not actually fill out their diaries
at the specified times. It can substantially change the phenomena being studied. Reactivity
and priming may occur, such that repeated measurements may cause changes in the
participant’s experiences. It can highly vulnerable to common method variance.
Further, it is important to think whether or not an appropriate dependent variable is being
used in an ESM. For example, it might be logical to use ESM in order to answer research
questions which involve dependent variables with a great deal of variation throughout day.
Thus variables, such as change in mood, change is stress level, or the immediate impact of
particular events may be best studied using ESM methodology. However, it is not likely that
utilizing ESM will yield meaningful predictions when measuring someone performing a
repetitive task throughout the day or when dependent variables are long-term in nature
(coronary heart problems).

You might also like