Module 4 - Participant Sampling
Module 4 - Participant Sampling
Learning Outcomes
- Understand how a study's sampling procedure impacts the generalizability of its results
- Understand the concept of random sampling errorf
- Explain how the procedure that is used to collect a sample and the sample size impact
the magnitude of random sampling error
- Explain the role of a sample frame in the sampling process
- Recognize and distinguish different approaches to sampling, including simple random
sampling, stratified sampling, convenience sampling, and snowball sampling
- Use a research randomizer to generate random numbers that can be used for a
sampling task
- Recognize when a particular sampling procedure should be used for a given sampling
task
- Think critically and discuss concerns about possible cultural biases in psychological
studies that rely on convenience samples
- Coverage error: failure to include members of the population of interest within the
sampling frame
- Multi-stage sampling: type of probability sampling that involves conducting random
sampling in a sequence of stages at multiple levels of organization of the population
- Probability sampling: sampling procedure where the probability of any given case being
selected can be determined and the cases are sampled independently of each other
- Representative sample: a sample whose properties accurately reflect those of the target
population
- Sampling frame: defined sense of the population that identifies population members for
sampling (ex. list of all individuals in the target population)
- Simple random sampling: every individual listed in the sample frame has equal
probability of being selected for inclusion in the sample
Collecting Samples
- Sampling is a multi-step process, and researchers need to be mindful of the choices that
they make for each of these steps in the process
1. Identify target population to which you intend to generalize the result of your study
- Depending on research question, could be fairly narrow or broader population
2. Define a sampling frame that identifies the members of the population and that you can
use to draw your study from
- A sampling frame is usually not a perfectly comprehensive listing of the
population
- Ex. if population is Ontario adults and you choose to use the Ontario
telephone directories as your sampling frame - this is not a convenient
source since not all adults have phones
- These exclusions are referred to as coverage error
- Ultimately, the results of a study can only be generalized to the population that
falls within the sampling frame, not to the cases that were excluded from the
sampling frame
- So if telephone directories were used, the results of your study would only
be able to be generalized to the narrower population of individuals listed
in Ontario telephone directories rather than the broader population that
you intended to generalize your results to
3. Collect a representative sample (select individuals within sampling frame to recruit
participants in your study)
- A sample whose properties accurately reflect those of the target population
- To achieve a representative sample researchers strive to use probability
sampling techniques
- A sampling procedure qualifies as probability sampling if it fulfills the following
criteria:
- the probability (or likelihood) that a given member of the population will be
selected into the sample is a known quantity (e.g., the researcher is able
to say that the chance that any given case will be sampled is something
like 1 in 50, or 1 in 126, or 1 in 500, etc.), and
- the cases are selected independently of each other, meaning that the
selection of any given case into the sample does not change the
probability that any other case will be sampled
- One type of probability sampling technique is simple random sampling (SRS)
- Every individual who is listed in the sample frame has an equal probability
of being selected for inclusion in the sample
- Ex. lottery system in which the name of each member of the
population is listed on a separate paper and mixed
- In SRS the samples are usually drawn without replacement, which
means that an individual can only be drawn once for inclusion in the study
- Researchers usually use a computerized random number generator to produce a
table of random numbers that they can use to select cases from their sample
frame
- To do this, researched first must determine the size of the sample (n) that
they intend to select from the sample frame (N)
- Then, they assign number from 1 to N to each individuals listed in their
sample frame
- Next, they use a random number generator to generate a list
of random numbers and select the matching number until
they reach their target sample size (n)
- Simple random sampling is not always a practical sampling method
to implement especially if the target population is widely dispersed
- Ex. if researcher seeked to randomly select Ontario citizens
for in-person interview - there are 414 towns/cities in
Ontario, with simple random sampling the researcher would
likely have to travel to most of these towns and cities to
conduct these interviews - could be very inefficient and
research might not have the resources and personnel to
accomplish this
- Multi-stage sampling involves conducting random sampling in a
sequence of stages across multiple levels of organization of the population
- In example where researcher is seeking to collect a representative
sample of Ontario citizens to interview, they could begin by random
selecting 40 towns within Ontario
- Next, they might randomly select 10 street blocks within each of the
selected towns
- Finally, they might randomly select 5 households on each
of the selected blocks
- This way a researcher would be able to sample 2000
Ontarians, but they would only need to send their
research personnel to 40 separate towns to conduct
these interviews (rather than potentially having to visit
400+ towns) - because random sampling was
implemented at each stage of selection, this should help
to promote representativeness of the sample
Think and Respond
- Strata: subgroups of the population who share some characteristic in common (ex.
Generational cohorts within a population - Baby Boomers, GenXers, Millennials)
- Stratified random sampling: type of probability sampling in which the researcher
randomly samples within specified strata
Stratified Random Sampling
- In many studies a researcher seeks to compare the responses of defined segments of
the population called strata
- Ex. a researcher studying stress in the UW population might intend to compare
the prevalence of stress among students in each of the different faculties ex.
Arts, AHS, Engineering etc.
- If SRS was used, her final sample might not include enough participants
from some of the smaller faculties to allow for meaningful comparisons
with the larger faculties
- To ensure the sample will contain sufficient numbers of participants to
permit meaningful comparisons of these groups the researcher can use
stratified random sampling instead of simple random sampling
- Stratified random sampling: the researcher first reorganizes the sample frame to
identify cases that belong to each of the specific groups that she wishes to
compare - researcher would take the list of registered students and segment it by
faculty, separating out the names on the list into each of the separate faculties
- Researcher would then choose a random sample of participants from
within each of these faculty lists (ex. Randomly select n cases from the
Arts faculty, and n cases from the AHS faculty etc.)
Illustration of Stratified Random Sampling
- Imagine that you had a list of 420 UW students who participated in a volunteer program
- You want to study their satisfaction with their experience in the program and you would
like to be able to compare the average satisfaction levels of students from different
faculties
- To enable these comparisons you plan to interview 20 of these students from each of the
UW faculties
1. Reorganize table into separate lists for each of the faculties (the strata in your stratified
sample)
- Random sampling error: discrepancies between the sample statistics and population
parameters that are due to random, or chance-based, differences between the sample
and population
- Variance: the amount of variability in some measured quantity
When the Sample Statistic Does Not Reflect the Population
- In everyday life we have experiences where a sample of observations leads to a
misleading impression just due to chance
- Random sampling error refers to any discrepancies between the sample statistics and
the population parameters that occur due to such chance factors
- Random sampling is the most reliable approach for collecting a representative target
population
- However, because the sample is just a subset of the population there is some
likelihood that the characteristics of the sample will differ from the characteristics
of the population just based on change even if the participants were selected
through randomly sampling
- Ex. imagine that there’s a class of 20 students all taking the same class
- 15 (75%) have a favourable opinion of the course and 5 (25%)
have an unfavourable opinion
- If you randomly draw the names of 4 students to fill out the course
evaluation
- By chance 2 of the sampled students happen to come from the
group with a favourable opinion and the other 2 come from the
group with an unfavourable opinion
- Thus, in the sample the favourable to unfavourable ratio is
1:1 Whereas the actual
population ratio is 3:1
- The more variability there is in the population on the
attribute that is being measured, the higher the random
sampling error will tend to be
- Fortunately, there’s a fairly straightforward way
to lower the magnitude of random sampling
error - simply increase your sample size
- All else being equal, you will have a larger
random sampling error if you draw a small
sample from a population than you would have
if you drew a larger sample from that population
- *a relatively large sample will give you a more reliable
estimate of your target population parameter than a
relatively small sample would give you
- Researchers can use what they know about the
population’s general heterogeneity to estimate how
much variability there is likely to be in the
characteristics they are attempting to measure in order
to plan how large a sample they will need to address
random sampling error
- Ex. if a group of researchers is studying a highly
heterogeneous population that has extensive diversity in socioeconomic status,
ethnicity, age, religious background etc. then they might anticipate that there will
be relatively high variability in the psychological characteristics that they
measure, and thus they would plan to recruit a relatively large sample
- If the same study was conducted in a very homogenous population whose
members tend to be quite similar in their backgrounds and circumstances,
then the researchers might anticipate relatively low variability in the
characteristics - therefore a smaller sample size can be made
Sample Size and Variance Interactive Demonstration
- Sample size and variance in an outcome measure or observation, such as stress level,
can impact how reliable a statistic is of the degree to which we can trust results of the
study
- Red curve: the mean (average score) and variance (distribution of scores) within the
population from which we are sampling
- Blue bars: reflect the number or frequency of individuals in the sample and their score
- If our sample happens to have a lot of variance in it then we’ll need to use a
much larger sample size in order to get a reliable estimate that approximates the
true population mean
- Ex. we are measuring the prevalence of stress in 2 populations of undergraduates - we
will consider drawing a sample from 2 populations that have the same average level of
stress (mean , µ) of '13', as measured using a perceived stress scale
- Population A has high variance in stress scores, meaning that the scores of the
individuals in this population differ quite a bit from each other and from the population
mean with many cases of individuals that have stress scores that are considerably lower
and relatively few individuals whose scores are considerably higher than the population
mean
- Ex. this population might consist of students at different levels of study (first
through fourth year students) and in many different majors across all of the
different faculties at the university - because these majors differ quite a bit in the
levels of stress and competitive pressures that they place on students and
because stress levels might differ quite a bit depending on a student’s study
term, we would expect there to be quite a wide range of variability in how
stressed these students are
- Population B has low variance in stress scores, meaning that the scores of the
individuals in this population do not differ as much from each other and from the
population mean, and there are relatively few individuals that have stress scores that are
considerable lower and many others that are considerably higher than the population
mean
- Ex. this population might consist of students who are all in the same year of study
(second year students only) and all registered in the same major - because the
major and level of study are the same for all these students there might be a lot
less variability in how stressed these students are
- Now we will draw a series of relatively small and relatively large samples from each
population to see how the variance in the population and the sample size influence the
reliability of the sample estimates
- Starting with population A:
1. Set the population variance to be high to reflect the fact that this population has
high variance in stress scores
2. Set the sample size to be low. Set the sample size to draw 25 cases from this
population
- Write down the sample mean that you get after you draw your first sample
of 25 cases [12.90107]
- Select ‘resample’ 5 more times to draw more samples of 25 cases from
this population and again write down the sample means. You should now
have 6 sample means
- 6 sample means: 12.90107, 12.56030, 12.48131, 13.26972, 12.81068,
13.16858
3. Compare the 6 samples means that you got when you drew these 6 small
samples from Population A. How much do they differ from each other and how
much do they differ from the actual population mean of 13?
4. Now let’s see what happens when you increase your sample size. Set the
sample size to draw 250 cases from this population
- Write down the sample mean that you get after you draw your first sample
of 250 cases [13.07251]
- Select ‘resample’ 5 more times to draw more samples of 250 cases from
this population and again write down the sample means. You should now
have 6 sample means from this larger sample
- 6 samples: [13.07251, 13.13019, 13.02103, 12.91974, 13.05259,
12.86772]
5. Compare the 6 sample means that you got using the large samples from
Population A. How much do they differ from each other and how much do they
differ from the actual population mean?
6. How do your results differ when using smaller versus larger samples?
- In the graph, the blue bars are closer to the red line when there’s bigger
sample size
- Now let’s follow the same procedure with Population B
1. Set the population variance to be low to reflect the fact that this population has
low variance in stress scores
2. Follow steps 2-6 from above but sampling from this lower variance population
What you may have noticed is that when the variance in our sample is high and our sample size
is small the mean estimates vary quite a bit from each other and the true population mean.
Often in psychology we do not know the true population mean, so we approximate it by trying to
sample in a way that reduces variance due to sampling error or noise and we try to use large
enough sample sizes that we can trust our estimates of the population mean.
- Systematic sampling error: discrepancies between the sample estimate and the
population value that occur when certain members of the population are less likely to be
included in the sample compared to other members of the population
- Selective nonresponse: a type of systematic sampling error that arises when certain
members of the population are less available or less motivated to participate in a study
and thus are underrepresented in the sample
When the Distribution of Members of the Sample Do Not Reflect the Population
- While random sampling error is a pretty straightforward problem to deal with, the other
major type of sampling error - systematic sampling error - poses more serious
challenges
- It occurs when certain members of the population are more likely to be included in the
sample than other members
- Systematic oversampling or undersampling of certain members of the population can
lead to a distorted estimate of the population parameter if the variable that influences the
likelihood of being samples is related to the variable(s) that the researcher is trying to
estimate
- Ex. researchers who are trying to estimate the prevalence of stress in the
population of Waterloo students will systematically underestimate this value if
something about their recruitment procedure causes highly stressed students to
less likely to choose to participate in the survey compared to students who are
less stressed
- Suppose highly stressed students feel like they don’t have time to devote
to the survey because they’re feeling pressured about their regular
coursework - if highly stressed students are more likely to opt out of
participating in the survey then they will be undersamples relative to their
population
- The undersampling of the most stressed students will mean that the
prevalence of stress that is recorded in the sample will be lower than the
actual prevalence of stress in the undergrad population
- Even if researcher randomly selects representatives from
the population, there are other factors that might introduce
systematic bias into the sample
- Selective nonresponse: individuals typically need to
consent to participate in a study, if the individuals who
choose to participate differ systematically from those who
opt out then this will cause the sample to be
unrepresentative of the population
- Ex. if in an election poll the supporters of one of the
candidates are predisposed to distrust the pollster then they would be
more likely to refuse to participate and consequently the poll may
underestimate the prevalence of support for this candidate in the
population
- in the 2016 US Presidential election many commentators
speculated that pre-election polls might have underestimated
support for Donald Trump because his supporters may have been
more likely to refuse to participate in the polls compared to Hillary
Clinton's supporters perhaps because Trump's voters distrust the
mainstream news media (e.g., CNN, The New York Times) that
sponsor these polls
- Systematic sampling errors due to factors like selective nonresponse are less
easy to correct for than random sampling error
- Opposite of random sampling error, collecting a larger sample size will
only influence the researchers to be more confident about a biased
estimate
- Researchers should be careful to examine their samples to see if there is
evidence of systematic sampling error such as a low overall response rate or
patterns indicating that the individuals who refuse to participate differ on some
systematic basis (ex. Higher response rates in more affluent neighbourhoods
compared to less affluent neighbourhoods)
- Some ways researchers can avoid systematic sampling errors such as selective
nonresponse:
- Adding incentives to participate, sending an advance letter to inform
potential participants when and why they are being contacted for the
study, or adjusting the recruitment process to ensure that participants with
a variety of backgrounds and interests will feel welcome to participate and
will be motivated to take part in the study
- When there is evidence of systematic error in a sample such as selective
nonresponse of certain demographic groups then researchers may attempt to
make certain statistical adjustments to correct for these sample biases such as
applying higher weights to the responses of underrepresented groups. If these
statistical corrections are applied carefully they can help to mitigate systematic
sampling biases
Biased Sampling
Think and Respond
- You may have noticed that many of your social network contacts are similar to you in
political views, ethnicity, age, religious beliefs, educational background, musical tastes,
sexual preferences, and many other ways. For example, if you're a secular humanist
then it's likely that your social contacts tend to be more secular than the broader
population. You thus might be relatively underexposed to the ideas and opinions of
religious people. This might lead you to underestimate the role that religion plays in
many people's lives.
- Make efforts to expose yourself to information outside your personal network, especially
information sources that might be on the opposing side of most of your contacts. For
example, if you're a liberal and most of your friends and acquaintances are also liberals,
then make an effort to visit websites that have a conservative point-of-view to see how
conservatives might be interpreting some of the issues and news topics that you and
your friends are discussing or vice versa if you are conservative.
- If most of your contacts have a similar opinion on an issue but a few of your contacts
have an opposite opinion on that issue then make extra efforts to engage with the
contacts who have this minority opinion and ask them to direct you to other sources of
information on the topic when it comes up in conversation. This way you can leverage
contacts who are different from you to expose yourself to a more diverse range of
opinions and ideas, enabling you to make more informed decisions and conclusions.
Nonprobability Samples
- Nonprobability samples: Samples for which the researcher cannot determine the
probability that various members of the population are included in the sample.
- Convenience sampling: Samples that are recruited based on the researcher's
convenience rather than based on their representativeness of the population.
- Location-dependent sampling: Sampling that relies on the clustering of members of the
target population at accessible physical locations or online forums.
- Snowball sampling: Sampling that uses social network contacts of an initial set of
participants to recruit other members of a target population.
- Homogeneity bias: Reliable tendency of people to affiliate with others who are similar to
them in a number of sociological and psychological features such as demographic
characteristics, personality traits, social identity characteristics, beliefs, and attitudes.
- Triangulation strategy: This is the strategy of using a variety of methods or approaches
to recruiting participants in a study (e.g., sampling from different locations) in an effort to
counteract sampling biases associated with any particular recruitment method.
A Representative Sample is Not Always Possible
- Although probability samples such as random samples are ideal for the purpose of
generalizing the results of a study to a target population, in many cases researchers may
be unable to collect a representative sample
- These cases are called nonprobability samples because the researcher does not know
the probability that particular cases in the population will be selected into the sample
- A common type of nonprobability sampling is convenience sampling
- A convenience sample is constructed based on the accessibility of its members
to the researcher rather than based on the probability that its members’
characteristics will reflect those of the broader population
- Often used in experimental research where the goal is to test some hypothesis
about causal mechanisms, not to estimate the characteristics of some population
of interest - Typically in experimental research there is an assumption that the
processes studied are so fundamental that they can be generalized beyond the
relatively narrow samples of convenience that are typically used
- Representative samples are most critical when a researcher is trying to form a reliable
estimate of some population value
- Ex. in political surveys a representative sample of voters’ candidate preferences
is needed to derive a reliable estimate of which candidate will win on election day
- Some psychological studies do aim to estimate population values
- Ex. A clinical psychologist may seek to estimate the prevalence of various
psychological disorders in some communities. In cases such as this a
representative sample would be necessary to construct a reliable estimate
- However, most psychological studies do not seek to estimate specific population values
on some psychological variable. Instead, they seek to test a hypothesis about the causal
mechanisms that influence some psychological response
- Samples of convenience have low external
validity and thus are a poor basis for generalization
about population characteristics
- However, in many experimental
psychology studies there is more of a priority of
maximizing internal validity to test hypotheses about
psychological mechanism and the researchers are
less concerned about whether the results of their
research have external validity for generalizing the
results beyond the sample
- Another context where nonprobability sampling strategies might need to be used is
research that studies hidden populations
- Hidden populations are populations where there is not any existing,
comprehensive public record of the membership of the population that can be
used to form a sampling frame
- Ex. population might be hidden when its members share some socially
stigmatized characteristic that is relatively uncommon and concealable
(ex. People with opiate addictions, sexual minorities, undocumented
immigrants, political extremists)
- If a researcher seeks to study the members of some hidden population then they cannot
use the usually recommended strategy of randomly sampling cases from a membership
list. Instead, for hidden populations researchers often use a combination of
location-dependent sampling and snowball sampling
- Location-dependent sampling relies on the fact that people who share a
characteristic tend to cluster together in physical (or virtual) spaces
- Ex. Researcher who is seeking to document the psychological effects of
homophobia on LGBT individuals might seek to recruit a sample of LGBT
individuals by advertising her study through a network of LGBT support
groups
- Capitalizing on clustering of people in reliable locations can be a convenient way to find
and recruit people from hidden populations. However, one shortcoming of
location-dependent sampling is that the individuals who cluster in these accessible
locations might not be representative of that hidden population as a whole
- If this method of snowball sampling would tend to recruit the more socially connected
victims of human trafficking then this would oversample the least vulnerable members of
this population. The members of the population who are more socially connected are
relatively less hidden and thus may have access to victims' services, support networks,
and legal supports. By contrast, less socially connected members of the population are
likely to be living more restricted and controlled lives that cut them off from these
external resources. To the extent that this research method would tend to oversample
the more socially connected victims of human trafficking then it could lead researchers to
underestimate how oppressed and vulnerable many victims of human trafficking actually
may be.
Summary
- Sampling involves a series of choices that researchers make for recruiting the
participants in their studies. These choices determine how confidently they can
generalize their results beyond their sample. First, researchers choose to specify what
target population they intend to generalize their results to. Second, researchers choose
a sample frame to identify the members of their target population. The quality of the
sampling frame that participants choose determines the coverage error in their sample.
Third, researchers choose a method to select cases within their sampling frame for
participation in the study. Random selection of cases from the sampling frame is one of
the most useful methods for ensuring that the study will have a representative sample
- Random and systematic errors in the sampling process can lead the sample results to
deviate from whatever population values the researcher is estimating. Researchers can
mitigate random sampling error by increasing their sample size. Systematic sampling
error is a more challenging problem because it involves bias in either the recruitment
process or in participant responsiveness. To avoid systematic error researchers try to
monitor their sampling process to detect signs of bias and then implement measures to
eliminate or statistically adjust for these biases
- While methods such as random sampling are the preferred approach for recruiting a
representative sample of a population, psychological researchers often rely on samples
that have questionable representativeness, such as samples of convenience.
Convenience samples may be adequate for many research purposes such as in
experiments where the researcher does not aim to generalize the results to estimate a
population value with precision but instead aims to test the validity of a hypothesis about
some psychological processes. Psychological researchers may also need to rely on
other sampling methods that have questionable representativeness such as
location-based sampling and snowball sampling when they try to recruit members of
hidden populations that are composed of individuals who have rare or potentially
stigmatized characteristics. In such cases where the representativeness of the sample is
questionable, researchers will need to be cautious about basing any broader
generalizations about the characteristics of the group on the qualities that they observe
in their samples