Biostatistics: Descriptive Statistics
Biostatistics: Descriptive Statistics
Introduction
Descriptive Statistics
1
Introduction
Introduction to the basic concepts of statistics as applied to
problems in biological science.
Goal of the course
Understand statistical concepts (population, sample,
t-test, slope, significant etc.);
Identify appropriate methods for your data (e.g.,
paired t-test or independent t-test, one-way block or
two-way ANOVA);
Select correct SAS procedures to analyze data (you
may use different SAS procedure for one purpose,
which more is more suitable);
Scientific reading and interpretation.
2
Biostatistics+Computer
Applications
Why study Biostatistics?
Statistical methods are widely used in biological
field;
Examples are from biological field, practical and
useful;
Focus on application instead of mathematical
derivation;
Help to evaluate the paper in an intelligent
manner.
Statistics - the science and art of obtaining reliable results and conclusions from
data that is subject to variation.
Biostatistics (Biometry)- the application of statistics to the biologic sciences.
3
Biostatistics+Computer
Applications
Why Computer Applications?
Statistical methods are mostly difficult and
complicated (ANOVA, regression etc);
Advances in computer technology and
statistical software development make the
application of statistical method much
easier today than before;
Software such as SAS needs time to learn.
4
Is Biostatistics hard to study?
Factors make it hard for some students to learn
statistics:
The terminology is deceptive. To understand
statistics, you have to understand the statistical
meaning of terms such as significant, error and
hypothesis are distinct from ordinary uses of
these words.
Statistics requires mastering abstract concepts. It
is not easy to think about theoretical concepts
such as populations, probability distributions, and
null hypotheses.
5
Is Biostatistics hard? (cont)
Statistics is at the interface of mathematics and science.
To really grasp the concepts of statistics, you need to be
able to think about it from both angles.
The derivation of many statistical tests involves difficult
math. However, you can learn to use statistical tests and
interpret the results even if you do not fully understand
how they work. You only need to know enough about how
the tool works so that you can avoid using them in
inappropriate situations.
Basically, you can calculate statistical tests and interpret
results even if you don’t understand how the equations
were derived, as long as you know enough to use the
statistical tests appropriately.
6
Questions about this class
Is this class to be hard?
No. Concept is easy and procedure is clear.
Why do we spend time on theoretical
stuff?
Helpful to understand the application
Do we need to know all the stuff?
You may not need all, but be prepared
7
Role of statistics in Biological
Science
Science Statistics
1.Idea or Question 1.Mathematical model /
2.Collect data/make hypothesis
observations 2.Study design
3.Describe data / 3.Descriptive statistics
observations
4.Inferential statistics
4.Assess the strength of
evidence for / against the
hypothesis
8
Contents of the course
Descriptive statistics
Graph, table, mean and standard deviation
Inferential statistics
Probability and distribution
Hypothesis test
Analysis of Variation
Correlation and regression analysis
Other special topic
9
Basic Concept
Data
numerical facts, measurements, or observations
obtained from an investigation, experiment
aimed at answering a question
Statistical analyses deal with numbers
Variable
a characteristic that can take on different values
for different persons, places or things
Statistical analyses need variability; otherwise
there is nothing to study
Examples:
Concentration of a substance, pH values obtained
from atmospheric precipitation, birth weight of
babies whose mothers are smokers, etc. 10
Basic Concept (cont.)
Type of Variable
Continuous variable
Between any two values of a variable, there is
another possible value
Examples: height, weight, concentration
Discrete variable
Value can be only integer
Example: number of people, plant etc.
11
Basic Concept (cont.)
Population
Population: a set or collection of objects
we are interested in. (finite, infinite)
Parameter: a descriptive measure
13
Basic Concept (cont.)
Parameter population
predict Generalize
properties to a
of sample population
sample statistic
SamplePopulation, StatisticParameter
14
Descriptive Statistics
Graphical Summaries
Frequency distribution
Histogram
(Barplot, Boxplot)
Numerical Summaries
Location - mean, median, mode.
15
Frequency Distribution (discrete var.)
1, 4, 1, 0, 0, 1, 0, 0, 2, 3, 1, 2, 3, 1, 0, 2, 0, 1, 2,
………………………….
1, 2, 3, 2, 1, 1, 0, 5, 0, 0, 1, 0, 1, 0, 2, 4, 7, 2, 1,0
16
Frequency Distribution (discrete var.)
18
Frequency Distribution
19
Frequency Distribution (cont. var.)
21
Histogram (Bar graph) and polygon
350 350
300 300
250 250
Frequency
Frequency
200 200
150
150
100
100
50
50
0
0 0 2 4 6 8 10
1 2 3 4 5 6 7 8
Plants/Quadrat
Plants/Quadrat
20
18 20
16
18
14
Frequency
12 16
10 14
Frequency
8 12
6 10
4 8
2 6
0
4
2
5
5
7.
8.
8.
9.
9.
0.
0.
1.
1.
2.
2.
~2
~2
~2
~2
~2
~3
~3
~3
~3
~3
~3
0
.0
.5
.0
.5
.0
.5
.0
.5
.0
.5
.0
27
27
28
28
29
29
30
30
31
31
32
27 28 29 30 31 32 33
Length (mm)
Length (mm)
100
Accumulate relative frequency
90
80
70
60
50
40
30
20
10
0
27 28 29 30 31 32 33
Length (mm)
25
Descriptive Statistics:
Measures of Location
26
Arithmetic mean (population)
Definition N
1 1 1 X i
X
X 1 X 2 ... X N i 1
N N N N N
More accurately called the arithmetic mean, it is defined as the
sum of measures observed divided by the number of
observations.
27
Arithmetic mean (sample)
n n n n
28
Arithmetic mean (sample)
2. i
x 2
( X x ) 2
min
Prove: 1.
x ( X x) X nx 0;
i i i
2. x ' x e,
( X x ' ) ( X x e) [( X x ) e]
i
2
i
2
i
2
[( X x ) 2e( X x ) e ] ( X x ) 2 e( X
i
2
i
2
i
2
i x ) e 2
( X x) e
i
2 2
29
Median
30
Mode
Definition
Value that occurs most frequently in data set
Example
2 3 4 5 3 4 5 6 7 5 3 2 5, mode Mo=5
If all values different, no mode
May be more than one mode
Bimodal or multimodal
Not used very frequently in practice
31
Example: Central Location
Suppose the ages of the 10 trees you are studying are:
34,24,56,52,21,44,64,34,42,46
Then the mean age of this group is:
1
x
n
X (34 24 56 52 21 44 64 34 42 46) / 10
417 / 10
Mean are commonly
41.7years
used
To find the median, first order the data:
21,24,34,34,42,44,46,52,56,64
1
X 10 X 10
Median
2
2
2
1
1
42 44
2
43 years
33
Geometric mean
If we calculate mean:
Set 1.
n 7, x 1
Set 2.
n 7, x 1
How to measure dispersion (spread,
variability)?
35
Descriptive Statistics
Measures of Dispersion
Common measures
Range
Variance and Standard deviation
Coefficient of variation
Many distributions are well-described by
measure of location and dispersion
36
Range
R2=20
37
Variance and standard deviation
(population)
N N N N
i
( X ) 2
as standard deviation
38
Variance and standard deviation
(sample)
s2
i
( X x ) 2
2
n 1
as mean squares, or sample variance
s
i
( X x ) 2
n 1
as standard deviation
39
Variance and standard deviation
s2
i
( x x ) 2
n 1
( X i ) 2
SS ( X i x ) 2 X i2
n
Corrected Sum of Squares (CSS)
df n 1 Degree of freedom
n-1 used because if we know n-1 deviations, the
nth deviation is known
Deviations have to sum to zero
40
Example:
42
Example:
Set 1: 100, 30, 20, 7, –20, –30, –100
Set 2: 10, 3, 2, 7, -2, -3, -10
Calculate x , s2, s and CV.
Set x s2 s CV
1 1 3773.7 61.4 61.4
2 1 44.7 6.7 6.7
43
Descriptive Statistics (Summmary)
Graphical Summaries
Frequency distribution
Histogram
Boxplot
Numerical Summaries
Location - mean, median, mode.
44
Software
Statistical software
SAS
SPSS
Stata
BMDP
MINITAB
Graphical software
Sigmaplot
Harvard Graphics
PowerPoint
Excel
45
Box Plots (explain later)
46
Box Plots
40
20
-20
1 2 3 4
Drug
47