Sampling
Sampling
• An individual unit of a
population.
• Person (subject)
• House
• Street
• Event
• Behavior
• Sample: Any group of individuals taken from a
population. The sample is supposed to be
representative (unbiased, unselected, random).
• Sample size: The no. of individuals or
observations in the sample is denoted as sample
size.
• Sampling variation: The variation between one
sample and another sample is known as
sampling variation.
• Sampling error: Difference between the
population mean & the sample mean.
Sample size determination
• Statistical methods can be used to determine
sample sizes for RCTs and single parameter
estimation (e.g. prevalence)
• Why does it matter?
– Too few subjects realistic improvements unlikely to
be distinguished from chance variation.
– Too many subjects unnecessary number of
subjects receiving inferior treatment
• Efficient use of resources
Prior information
n = (1.96x48/20)^2 =22
i.e. a sample of 22 would enable us to estimate the population
mean to within 20 (with 95% probability).
Estimating a proportion to a
required precision
• Useful when estimating a prevalence
• Problem: the standard error of a proportion
depends on the proportion itself, the
quantity we are trying to estimate!
– We need an initial estimate
–
95% confidence interval = p±1.96sqrt(p(1-p)/n)
Example: prevalence of a disease
For example:
• If you have 100 in a population & want a sample of 10
• …the ratio is 1/10 and k=10
• Randomize the order of cases in the sampling frame
• Use a random number table to select the first case
• Sample every kth case thereafter
• Advantage
– Quick, efficient, saves time and energy
• Disadvantage
– Not entirely bias free; each item does not have equal chance to
be selected
– System for selecting subjects may introduce systematic error
– Cannot generalize beyond pop actually sampled
Cluster (Area) Sampling
• Randomly select groups (cluster) – all members of groups are
subjects
• Appropriate when
– you can’t obtain a list of the members of the population
– have little knowledge of pop characteristics
– Pop is scattered over large geographic area.
• Advantage
– More practical, less costly
• Conclusions should be stated in terms of cluster (sample unit –
school)
• Sample size is # of clusters
Multistage Sampling
• Stage 1
– randomly sample clusters (schools)
• Stage 2
– randomly sample individuals from the schools
selected
Deliberate (Quota) Sampling
• Similar to stratified random sampling
• Technique
– Quotas set using some characteristic of the
population thought to be relevant
– Subjects selected non-randomly to meet quotas (usu.
convenience sampling)
• Disadvantage
– selection bias
– Cannot set quotas for all characteristics important to
study
Other sampling methods
• Convenience Sampling
Intact classes, volunteers, survey respondents
(low return), a typical group, a typical person
Disadvantage: Selection bias
• Purposive Sampling
Establish criteria necessary for being included in
study and find sample to meet criteria.
• Snowball sampling
• The Three Universal Assumptions of Analysis of Variance
• 1. Independence
• 2. Normality
• 3. Homogeneity of Variance
• Overview of the concepts
• Model I (Assessing treatment effects)
• Comparison of mean values of several groups.
• Why ANOVA?
• Model I (Assessing treatment effects)
• ANOVA is an extension of the commonly used t-test for
comparing the means of two groups.
• The aim is a comparison of mean values of several groups.
• The tool is an assessment of variances.
• Model I: t-test versus ANOVA
• Why not multiple t-tests?
• With several groups, many t-tests are necessary for pair-wise
comparisons, e.g. 6 times for 4 groups.
• Multiple comparisons inflate the t-value, i.e. too often one will
get a “significant” result, i.e. a P-value below 5%.
• Thus, ANOVA is useful when dealing with several groups.
Model I ANOVA – Short summary
• Plot your data
• Generally, the procedure is robust towards deviations from
normality. However, it is indeed sensitive towards outliers, i.e.
investigate for outliers within groups.
• Testing for variance homogeneity may be carried out by Bartlett
´s test.
• Cochran's test can be used to test for variance outliers.
• Control group (C) versus treatment groups
• Often, focus is on effects in
• treatment groups versus
• the control group.
• Apply Dunnett´s Test based on
• the principle of “least significant
• difference” (LSD), i.e. critical t-values
• for differences between treatment groups
• and the control group are adjusted