Class Notes On Sampling
Class Notes On Sampling
Sampling
Sampling as in Wikipedia means, “In statistics and survey
methodology, sampling is concerned with the selection of a subset of individuals from
within a statistical population to estimate characteristics of the whole population”. The
three main advantages of sampling are that the cost is lower, data collection is faster,
and since the data set is smaller it is possible to ensure homogeneity and to improve
the accuracy and quality of the data.
The sampling plan is then specified which means that one should indicate how
decisions made so far are to be implemented. It is the operational process i.e. the
fieldwork needs to be specified. This includes details of interview procedure, the
sampling element, the time of the interview, the operational process of selecting the
specific sampling unit and the sampling element etc.
6. Sampling and data collection
This is the final step in the sampling process. A good deal of office and fieldwork
is involved in the actual selection of the sampling elements.
SAMPLING TECHNIQUES
When a decision has to be taken about the most appropriate technique of
sampling, the basic choice is between the probability and the non-probability techniques
of sampling. A traditional sampling method has been shown in figure 1.1 below.
In a simple random sample ('SRS') of a given size, all such subsets of the frame
are given an equal probability. Each element of the frame thus has an equal probability
of selection: the frame is not subdivided or partitioned. Furthermore, any given pair of
elements has the same chance of selection as any other such pair (and similarly for
triples, and so on). This minimises bias and simplifies analysis of results. In particular,
the variance between individual results within the sample is a good indicator of variance
in the overall population, which makes it relatively easy to estimate the accuracy of
results.
SRS may also be cumbersome and tedious when sampling from an unusually
large target population. In some cases, investigators are interested in research
questions specific to subgroups of the population. For example, researchers might be
interested in examining whether cognitive ability as a predictor of job performance is
equally applicable across racial groups. SRS cannot accommodate the needs of
researchers in this situation because it does not provide subsamples of the population.
Stratified sampling, which is discussed below, addresses this weakness of SRS.
b) Systematic Sampling
For example, suppose we wish to sample people from a long street that starts in
a poor area (house No. 1) and ends in an expensive district (house No. 1000). A simple
random selection of addresses from this street could easily end up with too many from
the high end and too few from the low end (or vice versa), leading to an
unrepresentative sample. Selecting (e.g.) every 10th street number along the street
ensures that the sample is spread evenly along the length of the street, representing all
of these districts. (Note that if we always start at house #1 and end at #991, the sample
is slightly biased towards the low end; by randomly selecting the start between #1 and
#10, this bias is eliminated.
For example, consider a street where the odd-numbered houses are all on the
north (expensive) side of the road, and the even-numbered houses are all on the south
(cheap) side. Under the sampling scheme given above, it is impossible to get a
representative sample; either the houses sampled will all be from the odd-numbered,
expensive side, or they will all be from the even-numbered, cheap side.
c) Stratified Sampling
Where the population embraces a number of distinct categories, the frame can
be organized by these categories into separate "strata." Each stratum is then sampled
as an independent sub-population, out of which individual elements can be randomly
selected. There are several potential benefits to stratified sampling.
First, dividing the population into distinct, independent strata can enable researchers to
draw inferences about specific subgroups that may be lost in a more generalized
random sample.
Second, utilizing a stratified sampling method can lead to more efficient statistical
estimates (provided that strata are selected based upon relevance to the criterion in
question, instead of availability of the samples). Even if a stratified sampling approach
does not lead to increased statistical efficiency, such a tactic will not result in less
efficiency than would simple random sampling, provided that each stratum is
proportional to the group's size in the population.
Third, it is sometimes the case that data are more readily available for individual,
pre-existing strata within a population than for the overall population; in such cases,
using a stratified sampling approach may be more convenient than aggregating data
across groups (though this may potentially be at odds with the previously noted
importance of utilizing criterion-relevant strata).
There are, however, some potential drawbacks to using stratified sampling. First,
identifying strata and implementing such an approach can increase the cost and
complexity of sample selection, as well as leading to increased complexity of population
estimates. Second, when examining multiple criteria, stratifying variables may be
related to some, but not to others, further complicating the design, and potentially
reducing the utility of the strata. Finally, in some cases (such as designs with a large
number of strata, or those with a specified minimum sample size per group), stratified
sampling can potentially require a larger sample than would other methods (although in
most cases, the required sample size would be no larger than would be required for
simple random sampling.
A stratified sampling approach is most effective when three conditions are met
d) Cluster sampling
Sometimes it is more cost-effective to select respondents in groups ('clusters').
Sampling is often clustered by geography, or by time periods. (Nearly all samples are in
some sense 'clustered' in time - although this is rarely taken into account in the
analysis.) For instance, if surveying households within a city, we might choose to select
100 city blocks and then interview every household within the selected blocks.
Clustering can reduce travel and administrative costs. In the example above, an
interviewer can make a single trip to visit several households in one block, rather than
having to drive to a different block for each household.
It also means that one does not need a sampling frame listing all elements in the
target population. Instead, clusters can be chosen from a cluster-level frame, with an
element-level frame created only for the selected clusters. In the example above, the
sample only requires a block-level city map for initial selections, and then a household-
level map of the 100 selected blocks, rather than a household-level map of the whole
city.
1. Are there controls within the research design or experiment which can serve to
lessen the impact of a non-random convenience sample, thereby ensuring the
results will be more representative of the population?
2. Is there good reason to believe that a particular convenience sample would or
should respond or behave differently than a random sample from the same
population?
3. Is the question being asked by the research one that can adequately be
answered using a convenience sample?
In social science research, snowball sampling is a similar technique, where
existing study subjects are used to recruit more subjects into the sample. Some variants
of snowball sampling, such as respondent driven sampling, allow calculation of
selection probabilities and are probability sampling methods under certain conditions.
b) Quota sampling
In quota sampling, the population is first segmented into mutually exclusive sub-
groups, just as in stratified sampling. Then judgement is used to select the subjects or
units from each segment based on a specified proportion. For example, an interviewer
may be told to sample 200 females and 300 males between the age of 45 and 60.
It is this second step which makes the technique one of non-probability sampling.
In quota sampling the selection of the sample is non-random. For example interviewers
might be tempted to interview those who look most helpful. The problem is that these
samples may be biased because not everyone gets a chance of selection. This random
element is its greatest weakness and quota versus probability has been a matter of
controversy for many years.
c) Purposive sampling
CONCLUSION
A good sample design or technique requires the judicious balancing of four broad
criteria which includes that it should be goal oriented, measurable, practicable, and can
be used economically. It may be pointed out that these four criteria come into conflict
with each other in most of the cases, and the researcher should carefully balance the
conflicting criteria so that he is able to select a really good sample design. As there is
no unique method or procedure by which one can select a good sample, one has to
compare several sample designs that can be used in a survey. This means that one has
to weigh the pros and cons, the strong and weak points of various sample designs in
respect of these four criteria before selecting the best possible one. In some cases,
more than two sample designs have to be selected to carry out the survey.
References:
Marketing Research; Concepts, Practices & Cases; by Sunanda
Easwaran
Marketing Research (Third Edition); by G C Beri
www.statisticaloutsourcingservices.com
http://en.wikipedia.org
www.education.ucsb.edu
brettscaife.net/statistics