Part 5 of 5 - Experimental design lecture series. This one focuses on sample size calculations and introduces some of the commonly used statistical tests (for normally distributed data). Toward the end it covers type I and II errors, alpha/beta and reducing variability.
2. Letâs move on to
sample size
determination
How many
samples do you
need?
I know this one!
You always do 3
experiments! Am I
right?
Nope.
OK, do you do
experiments until
your data are
significant?
No. You do a calculation
before you start based
on your hypothesis and
experiment design
3. What if I am doing a
descriptive study?
Sample size
calculations are for
when you are
formally testing a
hypothesis
Exploratory studies donât
need them, but without the
stats to back up your data you
wonât be able to make strong
conclusions
4. Step 1:
Identify your
variables
Step 2:
Pick the right test
(sounds scary, isnât
really)
Step 3:
Plug some numbers
into a calculator
Sample size
calculations are
actually not too
hard⌠3 steps
5. Letâs briefly talk about
stats
The good news is that you
usually should be planning
your experiment in a way
that makes your stats as
simple as possible
To be able to calculate
sample size you will need a
good idea of what test you
plan to run in the end
Oh, dear!
If your plan is complex, then
you definitely should get
some expert help
Letâs have a quick look at
some of the common tests
Phew!
6. Before we move on, letâs
consider an experiment
where you have more than
one hypothesis to test
The sample size calculation
you do will be based on the
primary outcome measure
I always consider the
primary outcome to be the
real reason I did the
experiment
How do I know
which is the
primary
hypothesis?
Secondary outcomes are
bonus information, or
details that help explain the
data
7. Dependent/Outcome: What you actually measure
Independent/Factor: The bits you control, how your samples
are grouped
Types of Variable
Class of Variable
Categorical:
Discrete answers:
yes/no,
male/female,
Small/medium/large
Continuous:
Answers can be anything
Height, weight, age
Speed, distance, power,
concentration
For your
experiment, first
identify what class
your variables will
fall into
Then look up the
type of test that is
appropriate for
those combinations
Reminder
For both types of
variable, there are
two classes:
categorical and
continuous
Simple!
8. The tests that you decide to use
will ultimately depend on
whether your data are normally
distributed or not
Youâll need to formally test for
normality once you have your
data
However, for the purposes of
experimental design, I usually
base my calculations on the data
being normally distributed
Normal Distribution Non-Normal Distribution
This
simplifies things and
means you only need
to choose between
about half as many
tests
Iâm not
convinced Iâm
normal!
OK, on to the
tests!!
9. Outcome Variables:
Factor Variables:
Types of Variable
Categorical
Categorical
Hypothesis: mice carrying the
green transgene are more likely
to be female than male
Male
Female
Control Transgenic
Example
please!
Chi Squared
10. Outcome Variables:
Factor Variables:
Types of Variable
Categorical
Categorical
Male
Female
Control Transgenic
4
4
2
6
Chi squared will test how
likely the observed
proportions will have arisen
by chance
Your data is simply the number
of animals in each group
Chi Squared
11. Chi Squared
Outcome Variables:
Factor Variables:
Types of Variable
Categorical
Categorical
Male
Female
-/-
You can use Chi square with
multiple categories in both
outcome and factor variables
+/-+/+
Do you call
those Chi
rectangles?
No, Conro!
12. Outcome Variables:
Factor Variables:
Types of Variable
Categorical
Weight
Measurements
-/-
+/+
Continuous
if you planned to
compare the weights of
your 2 transgenic lines
I find it helps if you
imagine the graphs
your data will generate
Example
please!
Note: use T tests for
comparing 2 groups
only!
T tests
13. Outcome Variables:
Factor Variables:
Types of Variable
Categorical
Continuous:
Independent or paired
T test?
If your groups are
unrelated, then you will use
the independent T test?
Independent T tests
14. Outcome Variables:
Factor Variables:
Types of Variable
Categorical
Continuous:
If you are measuring the
response in the same
individuals, use a paired
T test
A
+ Treatment 1
Measurement
B
+
Baseline
Reading 1
Baseline
Reading 2
Treatment 2
Measurement
A B
Paired T tests
15. Outcome Variables:
Factor Variables:
Types of Variable
Categorical
Continuous
T tests are good for
comparing 2 groups, but if
you are going to have 3 or
more youâll likely to need
ANOVA
These are pretty common:
for example comparing wild-
type, heterozygous and
homozygous mice
-/-
+/-
+/+
Or if you have untreated,
control treatment(s) and test
treatments
ANOVA
16. Outcome Variables:
Factor Variables:
Types of Variable
Categorical
Continuous:
Like for T tests, there are
different types of ANOVA
depending on whether your
samples are independent of not
A one-way ANOVA is for
independent samples,
like our mice line
examples
-/-
+/-
+/+
A B
A repeated measures
ANOVA is for multiple
paired samples
1 way or repeated
measures
ANOVA
17. Outcome Variables:
Factor Variables:
Types of Variable
Categorical
Continuous
If you break the factor variables
into two , you will need a 2-way
ANOVA
For example, if you plan to
analyse your data looking at
genotype AND gender effects
then you will need a 2-way
ANOVA
-/-+/-+/+ -/-+/-+/+
Male Female
2 way ANOVA
18. Outcome Variables:
Factor Variables:
Types of Variable
Continuous
Ok, letâs move on to situations
where you are looking to see if
your two variables are
correlated
Continuous
OutcomeVariable
The most common test here is
Pearsonâs regression or
Pearsonâs correlation
Pearsonâs Regression
19. Outcome Variables:
Factor Variables:
Types of Variable
Categorical (usually binary)
A different form of regression,
the logistic regression is used
for categorical outcome data
with continuous factor data
Continuous
An example might be if you
want to see if there is a
correlation between the age
of your donor and whether a
disease is present or not
Logistic Regression
20. Outcome Variables:
Factor Variables:
Types of Variable
Multiple Continuous
Last one (of the
common ones!). The
MANOVA
Categorical
This is a special type of
ANOVA where you have
more than one outcome
variable
Classic examples are
when you are
comparing lots of
different mRNAs in the
same sample
Or lots of different
clinical details in the
same patient
Gene 1 Gene 2 Gene 3
NB â all 3 genes measured in same animal
MANOVA
21. Outcome Variables:
Factor Variables:
Types of Variable
Multiple Continuous
Categorical
If you plan do multiple
ANOVAs or T tests in this
situation you will
underestimate your sample
size
The MANOVA is specifically
designed for these multiple
comparison tests
NB â all 3 genes measured in same animal
Gene 1 Gene 2 Gene 3
MANOVA
22. Outcome Variables:
Factor Variables:
Types of Variable
Multiple Continuous
Categorical
Having multiple
comparisons will increase
the sample size you need.
This is another time for
you to consider; do you
really want to ask lots of
questions?
NB â all 3 genes measured in same animal
Gene 1 Gene 2 Gene 3
MANOVA
23. Remember though when
you do your power analysis
it is based only on the
primary outcome
This means that you may
only have one outcome
variable as far as power
analysis goes
MANOVA
Or
ANOVA
The secondary outcomes will not be
as robustly tested, but you can still
report the data.
BUT, make it clear when you write
up your results that you have not
accounted for multiple comparisons
for these variables
Oh, no. Iâm
confused!
24. MANOVA
Or
ANOVA
Youâre doing a clinical
trial of a new drug that
decreases blood pressure
Blood pressure
measurements are your
primary outcome
Secondary outcomes might be
other clinical measurements
like heart rate, white cell
count, reflexes, blood sugars
etc.
The secondary outcomes are
interesting but you arenât
designing the trial to test
them, theyâre not part of your
sample calculations.
ANOVA is OK
26. There are other tests for more
complex designs, but get help
for those!
Time for you to do
something!
Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVACat + CatCon
Con Con
Con
Con 2+
27. Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVACon
Con Con
Con
Con 2+
Protein B expression is increased in more than 50% of
squamous cell carcinoma cancers
Paired normal and cancer tissues probed with antibodies
against protein B. Compare staining intensity in the normal
tissue against the cancer tissue from the same patient.
Staining intensity scored as âreducedâ, âsameâ or âincreasedâ
First identify the variables
and classify them as
categorical or continuous
Outcome Variables:
Factor Variables:
Cat + Cat
28. Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVACon
Con Con
Con
Con 2+
Protein B expression is increased in more than 50% of
squamous cell carcinoma cancers
Paired normal and cancer tissues probed with antibodies
against protein B. Compare staining intensity in the normal
tissue against the cancer tissue from the same patient.
Staining intensity scored as âreducedâ, âsameâ or âincreasedâ
Outcome Variables:
Factor Variables:
Categorical:
reduced/ same/ increase
Categorical:
Cancer / no cancer
Cat + Cat
29. Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVACon
Con Con
Con
Con 2+
Protein B expression is increased in more than 50% of
squamous cell carcinoma cancers
paired normal and cancer tissues probed with antibodies
against protein B. Compare staining intensity in the normal
tissue against the cancer tissue from the same patient.
Staining intensity scored as signal intensity 0-255
Letâs test the hypothesis
in a different way
Outcome Variables:
Factor Variables:
Cat + Cat
30. Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVACon
Con Con
Con
Con 2+
Protein B expression is increased in more than 50% of
squamous cell carcinoma cancers
paired normal and cancer tissues probed with antibodies
against protein B. Compare staining intensity in the normal
tissue against the cancer tissue from the same patient.
Staining intensity scored as signal intensity 0-255
Outcome Variables:
Factor Variables:
Continuous
Signal intensity
Categorical:
Cancer / no cancer
Cat + Cat
Paired T test
31. Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVACon
Con Con
Con
Con 2+
Protein B expression is increased in more than 50% of
squamous cell carcinoma cancers
Paired normal and cancer tissues probed with antibodies
against protein B. Compare staining intensity in the normal
tissue against the cancer tissue from the same patient.
Staining intensity scored as signal intensity 0-255.
We also want to determine if patients with high staining
intensity have also developed metastasis
Outcome Variables:
Factor Variables:
Letâs add a little more
Cat + Cat
32. Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVACon
Con Con
Con
Con 2+
Protein B expression is increased in more than 50% of
squamous cell carcinoma cancers
Outcome Variables:
Factor Variables:
Continuous
Signal intensity
Categorical:
No cancer / cancer no metastasis / cancer
with metastasisCat + Cat
Paired normal and cancer tissues probed with antibodies
against protein B. Compare staining intensity in the normal
tissue against the cancer tissue from the same patient.
Staining intensity scored as signal intensity 0-255.
We also want to determine if patients with high staining
intensity have also developed metastasis
33. Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVACatCon
Con Con
Con
Con 2+
The mRNA for protein A is decreased and mRNA for protein B is
increased In RNA extracted from squamous cell carcinoma
tissue compared to RNA isolated normal skin
Paired normal skin and cancer tissues obtained, RNA isolated
and reverse transcribed to cDNA. Quantitative PCR used to
determine abundance of mRNA for protein B and mRNA for
protein A, measured relative to a reference transcript
Outcome Variables:
Factor Variables:
34. Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVACatCon
Con Con
Con
Con 2+
The mRNA for protein A is decreased and mRNA for protein B is
decreased In RNA extracted from squamous cell carcinoma
tissue compared to RNA isolated normal skin
Paired normal skin and cancer tissues obtained, RNA isolated
and reverse transcribed to cDNA. Quantitative PCR used to
determine abundance of mRNA for protein B and mRNA for
protein A, measured relative to a reference transcript
Outcome Variables:
Factor Variables:
Continuous x 2
mRNA for Protein A
mRNA for Protein B
Categorical:
Cancer / no cancer
35. Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVACatCon
Con Con
Con
Con 2+
The mRNA for protein A is decreased and mRNA for protein B is
decreased In RNA extracted from squamous cell carcinoma
tissue compared to RNA isolated normal skin
Paired normal skin and cancer tissues obtained, RNA isolated
and reverse transcribed to cDNA. Quantitative PCR used to
determine abundance of mRNA for protein B and mRNA for
protein A, measured relative to a reference transcript. Ratio of
mRNA A to mRNA B determined for each tissue
Outcome Variables:
Factor Variables:
You might be interested
in the ratio of A to B
rather than absolute
levels of both
36. Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVACatCon
Con Con
Con
Con 2+
The mRNA for protein A is decreased and mRNA for protein B is
decreased In RNA extracted from squamous cell carcinoma
tissue compared to RNA isolated normal skin
Paired normal skin and cancer tissues obtained, RNA isolated
and reverse transcribed to cDNA. Quantitative PCR used to
determine abundance of mRNA for protein B and mRNA for
protein A, measured relative to a reference transcript. Ratio of
mRNA A to mRNA B determined for each tissue
Outcome Variables:
Factor Variables:
Continuous
mRNA A : mRNA B
Categorical:
Cancer / no cancer
Repeated
measures
37. Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVACatCon
Con Con
Con
Con 2+
Squamous cell carcinoma cells induced to overexpress protein B
display increased invasion compared with control treated cells.
Squamous cell carcinoma cells will either be induced to express
protein B, a control protein âCâ, or not treated then seeded onto
either a skin substitute or onto pure collagen. After 48 hours the
distance migrated into each substrate will be measured
Outcome Variables:
Factor Variables:
Last one!
38. Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVACatCon
Con Con
Con
Con 2+
Squamous cell carcinoma cells induced to overexpress protein B
display increased invasion compared with control treated cells.
Squamous cell carcinoma cells will either be induced to express
protein B, a control protein âCâ, or not treated then seeded onto
either a skin substitute or onto pure collagen. After 48 hours the
distance migrated into each substrate will be measured
Outcome Variables:
Factor Variables:
Continuous
distance migrated
Categorical: Untreated,
+Protein B or +control protein C
Categorical:
skin substitute or pure collagen
39. Right, now I know what test I
am going to do, how do I
decide on sample sizes!
Itâs quite easy
actually!
You can use an
online
calculators
I like:
www.powerandsamplesize.com
Pick the appropriate test and fill in
the details it asks for
Youâll need to know some terms
about what to put where so weâll
quickly cover the big ones and what
they mean
40. Itâs asking for a
type I error rate, a
Whatâs that?
This is your critical p value written as
a percentage
P =0.05 = 5%
This number represents the
probability that you would have
observed the effect even though no
effect exists
The P value will ultimately measure of
how confident you can be that you
havenât got a false positive
How likely is it that this
difference could have
occurred by chance?
41. Everyone uses p<0.05 right? So I
want 5% here.
5% is a common cut off but it
doesnât mean itâs appropriate for
your study!
You might want 1% or even 0.1% if
the impact of a false positive would
be problematic such as in a drug
safety trial
How likely is it that this
difference could have
occurred by chance?
42. Itâs asking for a
Power, 1- b
Whatâs that?
The power refers to the
false negative rate
Itâs the probability that you
would have observed a
difference if there actually
is a difference
This number is written as a fraction of
1 but usually described as a
percentage
So, 0.8 would be described as 80%
power
The higher the better for this
number, but again make it fit your
question.
How likely is it that you
could have detected a
difference if there actually
was one?
43. OK, back to the calculator.
Itâs asking for means and
standard deviations
Yes, the calculator needs a
prediction for what you are
going to see
Youâll need two predictions:
The effect size.
What the means of your different
populations will be
Mean A
Mean B
Effect size
A B
And an estimate of the
variation within each
population
Sample A
Distribution curve
44. Letâs have a quick look at why
you need those numbers, what
they mean with respect to power
and p values
Mean A
Mean B
Effect size
A B
Sample A
Distribution curve
47. Where do the effect size and
variation numbers come
from?
Look at all this
juicy pilot data
You could have a 1000 samples
and see a really small difference
but it might not matter in the
grand scheme of things
The effect size should be chosen
not just on what you will be able
to see but also based on real-
world importance of the effect
48. Look at all this
juicy pilot data
Variation and predictions for
effect size will ideally come from
pilot data
Sometimes this wonât be
possible, in which case you
should turn to published work
You wonât be able to find your
exact experiment but you should
be able to find something similar
enough to make a prediction
49. Hopefully you spotted an
important point in the last little
bit
If you can reduce the variation
between your experimental units,
you will need less samples or be
more confident with the same
number of samples
This is one of the goals of your pilot
experiments. To get the
experimental conditions as tight as
possible so that you limit the
variation between samples to true
biological variability
50. What about technical
repeats? How many of them
do you need?
Good question!
Do you remember why you
do technical repeats?
Experiment #1 Final analysis
51. They increase the accuracy
of measurement for each
sample
Yes, by decreasing the intra-
individual variability you
should get a more accurate
measurement
Experiment #1 Final analysis
Outliers or mistakes will have
a smaller effect the more
technical repeats you do
There will therefore be
less overall variability in
your final sample
numbers
52. So should I do as many as
possible?
Final analysis
No, not necessarily, that
would be a waste of time
and money.
You need to consider
where the variability in
your experiments will be
and how much value each
technical repeat adds
Probably going to want
examples here!
53. Paired normal and cancer tissues probed with antibodies
against protein B. Compare staining intensity in the normal
tissue against the cancer tissue from the same patient.
Staining intensity scored as âreducedâ, âsameâ or âincreasedâ
Protein B expression is increased in more than 50% of
squamous cell carcinoma cancers
The question to ask yourself
is how variable will the data
be when I test the same
sample, and how does that
compare to the between
sample variability?
Cancers vary a lot between
individuals
Our data should be pretty clear for
each sample, though there will be
variations within each sample
Itâs really a cost vs reward
analysis. How much do each
tech repeats cost (time,
money and ethics) vs how
much value they add
Between Samples Variability
Within sample Variability
High
Low
Tech repeats Sample processing:
low number OK
Not much value from
technical repeats, better to
just increase sample
numbers
54. The mRNA for protein A is decreased and mRNA for protein B is
decreased In RNA extracted from squamous cell carcinoma
tissue compared to RNA isolated normal skin
Paired normal skin and cancer tissues obtained, RNA isolated
and reverse transcribed to cDNA. Quantitative PCR used to
determine abundance of mRNA for protein B and mRNA for
protein A, measured relative to a reference transcript
Here the samples are the
same as last time, so
again high inter-sample
variability
The technique this time also
has more variability within it
RNA degradation during
storage, RNA isolation steps
and pipetting errors during
reverse transcription and qPCR
So, we know we will
need a large number
of samples
Between Samples Variability
Within sample Variability
High
High
So we probably need
more technical
repeats as well
55. The mRNA for protein A is decreased and mRNA for protein B is
decreased In RNA extracted from squamous cell carcinoma
tissue compared to RNA isolated normal skin
Paired normal skin and cancer tissues obtained, RNA isolated
and reverse transcribed to cDNA. Quantitative PCR used to
determine abundance of mRNA for protein B and mRNA for
protein A, measured relative to a reference transcript
Between Samples Variability
Within sample Variability
High
Sample availability might
mean we donât have the
opportunity to repeat at
the whole tissue level
High
But, logistically,
there might be
challenges
You want the repeats to
address the areas with the
highest variability: the
tissue extraction level
Tech repeats Sample processing +
RT-PCR and qPCR
56. Between Samples Variability
Within sample Variability
Tech repeats
Squamous cell carcinoma cells induced to overexpress protein B
display increased invasion compared with control treated cells.
Squamous cell carcinoma cells will either be induced to express
protein B, a control protein âCâ, or not treated then seeded onto
either a skin substitute. After 48 hours the distance migrated
into each substrate will be measured
Letâs assume here you are
using cells from only one
donor (e.g. an immortalised
cell line)
So variability will come from
what condition the cells are in
And maybe different levels
of protein expression
Med
57. Between Samples Variability
Within sample Variability
Tech repeats
Squamous cell carcinoma cells induced to overexpress protein B
display increased invasion compared with control treated cells.
Squamous cell carcinoma cells will either be induced to express
protein B, a control protein âCâ, or not treated then seeded onto
a skin substitute. After 48 hours the distance migrated into each
substrate will be measured
The assay itself should be
quite tight. You will have
the option of measuring lots
of cells
Med
Probably donât need many
technical repeats per assay,
rather doing more biological
repeats would be a better
use of resources
Low
Low
58. Between Samples Variability
Within sample Variability
Tech repeats
Note that you might disagree
with some of the previous
assertions
And really, it will be your data
that tells you what you need to
do
But be aware, that adding lots of
technical repeats may not be
worth the time and money you
invest in doing them
OK, got it.
59. Ethics, Cost and Feasibility
Once you have done your sample
size calculation you have three
important decisions to make
60. Ethics, Cost and Feasibility
1. Can I ethically justify using these numbers of
patients/animals?
⢠Does the end justify the means?
⢠Will I be able to obtain ethical approval?
61. Ethics, Cost and Feasibility
2. Can I afford to process that number of
samples?
62. Ethics, Cost and Feasibility
3. Is it feasible to process that number of
samples?
⢠Will I be able to recruit enough participants?
⢠Will I be able to do the experiments within my study
timeframe?
63. Ethics, Cost and Feasibility
What do you do if the
answer to these questions is
no?
Well, first you should be pleased
that you asked the question
before wasting loads of time
doing the experiment!
Secondly you can go back and
see if you can tighten up your
research question or modify your
experimental design
64. Ethics, Cost and Feasibility
You can reduce the number of
samples required by:
Reducing variability in your
samples, possibly by controlling for
confounders
Or, reducing the number of
comparisons being made.
Remember, its better to
answer one question well
rather than adding lots of extra
small questions
66. Donât jump in too soon!
3 final questions
1. Consider the ethics again.
Are you fully satisfied that you are going to
do the least amount of harm to effectively
answer your question?
67. 2. What would the devilâs advocate say about
your experimental design?
Are your controls sufficient to rule out
alternative interpretations? Is your
randomisation technique truly random? Are
your experimental units truly independent?
68. 3. Have you had someone elseâs input?
Before you jump in, get someone
independent to have a look. They might
come up with additional confounders or spot
ways where you can generate more robust
data
69. Itâs worth spending time on
experimental planning
Making sure things are right
will help make sure you
generate useful data
Hopefully this series of videos
should have helped you
identify what to think about.
70. Part 5 Recap.
Donât be put off by sample size determination. It
will stop you wasting time and money with
poorly planned experiments
Identify whether your variables are continuous or
categorical, and the rest is easy.
Use your pilot studies to reduce variabilities in your
experiment and to provide the numbers you need to
perform power analysis
Donât be afraid to redesign your experiment if you have any
qualms about research ethics or the likelihood of generating
meaningful data.