Introduction To Biostatistics: Part 1
Introduction To Biostatistics: Part 1
) (M MSE
Bias
M
2
2
] ) [( = M E
Accuracy
2
] ) ( [ ) ( + = M E M Var
25
precision bias
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Data
Summarizing Data
Frequency Distribution Frequency Distribution
Cumulative Distributions
Relative Frequency Distribution Relative Frequency Distribution
Percent Frequency Distribution
B G h Bar Graph
Histogram
Pie Chart
Dot Plot
26
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Data
Frequency Distribution for
Q lit ti d t
Af di ib i i b l f
Qualitative data
A frequency distribution is a tabular summary of
data showing the frequency (or number) of items
i h f l l i l in each of several nonoverlapping classes.
h bj i i id i i h b h d The objective is to provide insights about the data
that cannot be quickly obtained by looking only at
h i i l d the original data.
27
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Data
Frequency Distribution
Guests staying at Holiday Inn were asked to
rate the quality of their accommodations as
being: being:
Poor Poor
22
Rating Frequency
Below Average Below Average
Average Average
33
55
Above Average Above Average
Excellent Excellent
99
11
ll
28
Total Total 20 20
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Data
An example for quantitative data:
Hudson Auto Repair
Sample of Parts Cost for 50 Tune Sample of Parts Cost for 50 Tune--ups ups
Hudson Auto Repair
Sample of Parts Cost for 50 Tune Sample of Parts Cost for 50 Tune--ups ups
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
29
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Data
Frequency Distribution
Guidelines for selecting number of classes
Use between 5 and 20 classes
Data sets with a larger number of elements usually require g y q
a larger number of classes
Smaller data sets usually require fewer classes
Use classes of equal width
Approximate class width = Approximate class width =
Largest Data Value Smallest Data Value
30
Number of Classes
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Data
Frequency Distribution
For Hudson Auto Repair, if we choose six classes:
Approximate Class Width = (109 - 52)/6 = 9.5 ~ 10
50-59 2
Parts Cost ($) Frequency
60-69
70-79
80 89
13
16
80-89
90-99
100 109
7
7
5
31
100-109 5
Total 50
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Data
Relati e Freq enc Distrib tion Relative Frequency Distribution
The relative frequency of a class is the fraction or
proportion of the total number of data items
belonging to the class.
A relative frequency distribution is a tabular
summary of a set of data showing the relative summary of a set of data showing the relative
frequency for each class.
32
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Data
Percent Frequency Distribution Percent Frequency Distribution
The percent frequency of a class is the relative
frequency multiplied by 100 frequency multiplied by 100.
Apercent frequency distribution is a tabular A percent frequency distribution is a tabular
summary of a set of data showing the percent
frequency for each class. frequency for each class.
33
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
R l ti F d
Data
Relative Frequency and
Percent Frequency Distributions
Holiday Inn Quality Ratings
q y
Relative
Frequency
Percent
Frequency Rating
Poor
Below Average
.10 .10
.15 .15
10 10
15 15
Average
Above Average
.25 .25
.45 .45
25 25
45 45
Excellent
.05 .05
Total Total 1.00 1.00
55
100 100
34
1/20 = .05 1/20 = .05
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
R l ti F d
Data
Relative Frequency and
Percent Frequency Distributions q y
Hudson Auto Repair
Parts
Cost ($)
Relative
Frequency
Percent
Frequency
50-59
60 69
Cost ($)
.04
26
Frequency
4
26
Frequency
2/50
60-69
70-79
80 89
.26
.32
14
26
32
14
2/50
80-89
90-99
100 109
.14
.14
10
14
14
10
35
100-109 .10
Total 1.00
10
100
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
R l i F d
Data
Relative Frequency and
Percent Frequency Distributions
Insights gained from the percent frequency distribution
Percent Frequency Distributions
Only 4% of the parts costs are in the $50-59 class Only 4% of the parts costs are in the $50-59 class.
30% of the parts costs are under $70.
The greatest percentage (32% or almost one-third)
of the parts costs are in the $70-79 class. p $
10% of the parts costs are $100 or more.
36
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Our class
Data
Our class
students <- read.csv('D:/ioz/statistics/2012/students.csv', header=T)
students$ID <- as.character(students$ID) students$ID as.character(students$ID)
head(students)
order ID name visits email
33 201028007610020 2093 163.com
nrow(students) #115
f il < b t ( t d t $ 1 1)
3 201028016215017 222 163.com
111 201028016215018 99 163.com
4 201028016215019 130 163.com
25 201128000206033 130 163.com
56 201128000206061 282 mails.gucas.ac.cn
family.name <- substr(students$name,1,1)
length(unique(family.name)) #62
f.name <- table(family.name)[table(family.name)>1]
f.name <- as.table(f.name)
barplot(f.name, ylab='Number')
0
1
2
g
m
b
e
r
6
8
1
0
N
u
m
2
4
37
0
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Our class
Data
Our class
email <- table(students$email)[table(students$email) > 2]
class(email) # array
email <- as.table(email)
barplot(email, ylab='Number')
r
3
0
4
0
N
u
m
b
e
1
0
2
0
126.com 163.com gmail.com mails.gucas.ac.cn qq.com sina.com
0
1
38
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Our class
Data
Our class
Histogramof students$visits
hist(students$visits, freq=T, nclass=15, xlab='Times')
Histogram of students$visits
0
2
5
u
e
n
c
y
1
5
2
0
F
r
e
q
u
5
1
0
0 500 1000 1500 2000
0
39
Times
0 500 1000 1500 2000
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Data
Bar Graph
1
0
1
2
Barplot()
Bar Graph
N
u
m
b
e
r
2
4
6
8
0
A bar graph is a graphical device for depicting qualitative data.
Specify the labels that are used for each of the classes on one axis
(usually the horizontal axis) (usually the horizontal axis).
A frequency, relative frequency, or percent frequency scale can be used
for the other axis (usually the vertical axis).
Use a bar of fixed width drawn above each class label.
The bars are separated to emphasize the fact that each class is a
separate category
40
separate category.
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Histogram
Data
Histogram
Another common graphical presentation of quantitative data is a
histogram.
The variable of interest is placed on the horizontal axis.
A rectangle is drawn above each class interval with its height
corresponding to the inter als freq enc relati e freq enc or percent corresponding to the intervals frequency, relative frequency, or percent
frequency.
Unlike a bar graph, a histogram has no natural separation between
rectangles of adjacent classes.
R code
hist(rnorm(100),nclass=6)
41
( ( ), )
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Holiday Inn Quality Ratings
Data
Pie Chart
R code
x=sample(1:100,6,replace=TRUE)
names(x)=c('A' 'B' 'C' 'D' 'E' 'F') names(x)=c('A','B','C','D','E','F')
pie(x)
The pie chart is a commonly used graphical device for presenting relative
frequency distributions for qualitative data.
First draw a circle; then use the relative frequencies to subdivide the First draw a circle; then use the relative frequencies to subdivide the
circle into sectors that correspond to the relative frequency for each class.
Since there are 360 degrees in a circle, a class with a relative frequency
42
of .25 would consume .25(360) = 90 degrees of the circle.
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
D t Pl t
Data
Dot Plot
One of the simplest graphical summaries of data is a One of the simplest graphical summaries of data is a
dot plot.
A horizontal axis shows the range of data values A horizontal axis shows the range of data values.
Then each data value is represented by a dot placed
above the axis above the axis.
Tune-up Parts Cost p
. .
. .. . . . . .. . . .
50 50 60 60 70 70 80 80 90 90 100 100 110 110 50 50 60 60 70 70 80 80 90 90 100 100 110 110
. . . ..... .......... .. . .. . . ... . .. . . . . ..... .......... .. . .. . . ... . .. .
. .. .. .. .. . . . .. .. .. .. . .
43
50 50 60 60 70 70 80 80 90 90 100 100 110 110 50 50 60 60 70 70 80 80 90 90 100 100 110 110
Cost ($) Cost ($)
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
C l ti Di t ib ti
Data
Cumulative frequency distribution - shows the
Cumulative Distributions
Cumulative frequency distribution shows the
number of items with values less than or equal to
the upper limit of each class..
Cumulative relative/ percent frequency distribution
x=seq(-5,5,by=0.1)
R code
plot(pnorm(x,mean=0,sd=1),type='l')
44
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
C l ti Di t ib ti
Data
Cumulative Distributions
Hudson Auto Repair Hudson Auto Repair
Cumulative Cumulative
Cost ($)
Cumulative
Frequency
Cumulative
Relative
Frequency
Cumulative
Percent
Frequency
< 59
< 69
2
15
.04
.30
4
30
< 79
< 89
31
38
.62
.76
62
76
2 + 13 15/50
< 99
< 109
45
50
.90
1.00
90
100
45
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Leaf Unit = 0.1 Leaf Unit = 0.1
Data
Stem-and-Leaf Display
88
99
10 10
6 8 6 8
1 4 1 4
22
p y
A stem-and-leaf display shows both the rank order and shape of the
11 11 0 7 0 7
p y p
distribution of the data.
It is similar to a histogram on its side, but it has the advantage of
h i th t l d t l showing the actual data values.
The first digits of each data item are arranged to the left of a vertical
line. line.
To the right of the vertical line we record the last digit for each item in
rank order.
Each line in the display is referred to as a stem.
Each digit on a stem is a leaf.
46
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
E l L f U it 0 1
Data
Example: Leaf Unit = 0.1
If we have data with values such as
8.6 8.6 11.7 11.7 9.4 9.4 9.1 9.1 10.2 10.2 11.0 11.0 8.8 8.8
a stem a stem and and leaf display of these data will be leaf display of these data will be
Leaf Unit = 0.1 Leaf Unit = 0.1
a stem a stem--and and--leaf display of these data will be leaf display of these data will be
88
99
Leaf Unit 0.1 Leaf Unit 0.1
6 8 6 8
1 4 1 4 99
10 10
11 11
1 4 1 4
22
0 7 0 7
47
11 11 0 7 0 7
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Example: Leaf Unit = 10
Data
If we have data with values such as If we have data with values such as
Example: Leaf Unit = 10
1806 1717 1974 1791 1682 1910 1838 1806 1717 1974 1791 1682 1910 1838
a stem a stem--and and--leaf display of these data will be leaf display of these data will be a stem a stem and and leaf display of these data will be leaf display of these data will be
Leaf Unit = 10 Leaf Unit = 10
16 16
17 17
88
1 9 1 9
The 82 in 1682 The 82 in 1682
is rounded down is rounded down
18 18
19 19
0 3 0 3
1 7 1 7
to 80 and is to 80 and is
represented as an 8. represented as an 8.
48
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Data
Probability density function (PDF)
A probability density function (pdf) is a function that represents a
probability distribution in terms of integrals. p y g
Formally, a probability distribution has density f(x), such that the
probability of the interval [a, b] is given by
}
b
a
dx x f ) (
I t iti l if b bilit di t ib ti h d it f( ) th th i fi it i l Intuitively, if a probability distribution has density f(x), then the infinitesimal
interval [x, x + dx] has probability f(x) dx.
x=seq(-5,5,by=0.1)
plot(dnorm(x,mean=0,sd=1),type='l')
49
1 ) (
-
=
}
dx x f
The total area under the graph is 1
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Descriptive statistics
Are the scores generally high or generally low? Are the scores generally high or generally low?
Where the center of the distribution tends to be
located
Th f t l t d Three measures of central tendency
Mode
Median
Mean
50
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Mode
The most frequently occurring score
Report mode when using nominal scale, the
most frequently occurring category
Based on the simple frequency of each score
If you have a rectangular distribution, do not
report the mode
Unimodal, bimodal, multimodal, antimode
51
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
E l f M d
Descriptive statistics
Example of Mode
Measurements Measurements
x
3
5
5
I n this case the data have
tow modes:
1
7
2
5 and 7
Both measurements are
2
6
7
0
Both measurements are
repeated twice
0
4
52
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
E l f M d
Descriptive statistics
Example of Mode
M t Measurements
x
3
Mode: 3
5
1
1
Mode: 3
1
4
7
Notice that it is possible for a
data not to have any mode.
7
3
8
y
3
53
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Median
S t th 50
th
til Score at the 50
th
percentile
For normal distribution the median is the same
as the mode
Arrange scores from lowest to highest if odd Arrange scores from lowest to highest, if odd
number of scores the Median is the one in the
middle, if even number of scores then average middle, if even number of scores then average
the two scores in the middle
Used when have ordinal scale and when the Used when have ordinal scale and when the
distribution is skewed
54
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Example of Median
Medi an: ( 4+ 5) / 2 =
4.5
Measurements Measurements
Ranked
Notice that only the two
l l d
x x
3 0
5 1
5 2
central values are used
in the computation.
5 2
1 3
7 4
2
The median is not
sensible to extreme
2 5
6 5
7 6
values
0 7
4 7
40 40
55
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Mean
Descriptive statistics
Mean
Score at the exact mathematical center of Score at the exact mathematical center of
distribution (average)
U d ith i t l d ti l d h Used with interval and ratio scales, and when
have a symmetrical and unimodal distribution
Not accurate when distribution is skewed
because it is pulled towards the tail because it is pulled towards the tail
n
~ =
=
x
X
i
i
1
56
~
n
X
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Uses of the Mean
Describes scores
Deviation of mean gives us the error of our Deviation of mean gives us the error of our
estimate of the score, with total error equal to
zero zero
Predict scores
Describe a scores location
Describe the population mean () which is a
parameter
57
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Deviations around the Mean
The score minus the mean The score minus the mean
Include plus or minus sign Include plus or minus sign
Sum of deviations of the mean always
equals zero E(X-M)=0
58
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Range
Report the maximum difference between the Report the maximum difference between the
lowest and highest
Semi-interquartile range used with the median:
one half the distance between the scores at the
25th and 75th percentile
59
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Measures of Variability
Extent to which the scores differ from each other
or how spread out the scores are
Tells us how accurately the measure of central Tells us how accurately the measure of central
tendency describes the distribution
Shape of the distribution
60
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Wh d b t i bilit ?
Descriptive statistics
Why do we care about variability?
Where would you rather vacation LA Bungalows Where would you rather vacation, LA Bungalows,
where the mean temperature is 24 degrees, or
Sahara Condos where the mean temperature is Sahara Condos where the mean temperature is
also 24 degrees?
LA temperature range:
day = 26 y
night = 22
S h t t Sahara temperature range:
day = 40
61
night = 8
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Variance
Uses the deviation from the mean
Remember the sum of the deviations always Remember, the sum of the deviations always
equals zero, so you have to square each of the
deviations deviations
S
2
X
= sum of squared deviations divided by the
number of scores
Provides information about the relative variability Provides information about the relative variability
62
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
S Li it Some Limits
It isnt the average deviation
Interpretation doesnt make sense because:
N b i t l Number is too large
And it is a squared value And it is a squared value
63
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
The standard deviation (SD)
Take the square root of the variance
S
X
Uses the same units of measurement as the raw
scores
How much scores deviate below and above the
mean mean
64
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
The standard deviation (SD) ( )
Standard deviation ~ the mean of
deviations from the mean (sort of) ( )
2
) (
p p
n
x
i
i
=
=
1
) (
o
n
The larger the standard deviation the more e a ge e s a da d de a o e o e
variability there is in the scores
The standard deviation is somewhat less
sensitive to extreme outliers than the range
66
g
(as N increases)
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Th d i ti (d fi iti l) f l f
Descriptive statistics
The deviation (definitional) formula for
the sample standard deviation
( ) X X
2
( )
N
X X
S
i
=
Whats the difference between this formula and
the population standard deviation?
In the first case all the Xs represent the entire In the first case, all the Xs represent the entire
population. In the second case, the Xs
represent a sample.
67
p p
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
St d d D i ti E l
Descriptive statistics
Standard Deviation: Example
( )
( )
2
X
21
( ) X X
( )
2
X X
-5.8 33.64
25
24
5.8
-1.8
-2 8
33.64
3.24
7 84 24
30
34
-2.8
3.2
7 2
7.84
10.24
51 84
34
0
26.8
7.2 51.84
21.36 Mean
62 4 36 21
8 . 106
S
68
62 . 4 36 . 21
5
= = = S
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Calculating S using the
raw-score formula raw score formula
( ) X
2
( )
N
N
X
X
S
=
2
N
To calculate X
2
you square all the scores first and To calculate X you square all the scores first and
then sum them
To calculate (X)
2
you sum all the scores first and
then square them
69
then square them
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Population and sample
variance and standard deviation variance and standard deviation
When we have data from the entire population
we use (not x bar) to compute o
X
using the
same formula
Variance and standard deviations of the sample Variance and standard deviations of the sample
are biased estimates of the population
70
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Estimating the population
standard deviation from a sample standard deviation from a sample
S the sample standard deviation is usually a little smaller S, the sample standard deviation, is usually a little smaller
than the population standard deviation. Why?
The sample mean minimizes the sum of squared deviations
(SS). Therefore, if the sample mean differs at all from the
l ti th th SS f th l ill b population mean, then the SS from the sample will be an
understimate of the SS from the population
Therefore, statisticians alter the formula of the sample
standard deviation by subtracting 1 from N
71
standard deviation by subtracting 1 from N
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Formulas for s-hat (estimated)
( )
2
X X
Definitional ( )
1
=
N
X X
s
Definitional
formula:
( )
2
2
X
R
( )
1
=
N
N
X
X
s
Raw-score
formula:
72
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
The estimated variance
The standard deviation squares
( )
1
2
2
=
X X
s
( )
N
X
=
2
2
o
1 n
N
The variance is not a very useful descriptive statistic,
but it is very important value you will use in other
t h i ( th l i f i )
73
techniques (e.g., the analysis of variance)
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
For a standard normal
distribution
Sample mean is a good estimate of population
mean
The estimate of the population variance and
standard deviation tells us how spread out the standard deviation tells us how spread out the
scores are
68% of the scores are within +1 and 1 S
X
74
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Standard error Standard error
The standard error of a sample of sample size n is the sample's
standard deviation divided by . It therefore estimates the
standard deviation of the sample mean based on the standard deviation of the sample mean based on the
population mean.
s
SE
n
SE
x
=
75
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Coefficient of variation
In probability theory and statistics, the coefficient of variation (CV) is a
normalized measure of dispersion of a probability distribution It is normalized measure of dispersion of a probability distribution. It is
defined as the ratio of the standard deviation to the mean :
100
o
CV 100 =
o
CV
76
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Skewness
Descriptive statistics
Skewness
Symmetrical distribution
Symmetric
Left tail is the mirror image of the right tail Left tail is the mirror image of the right tail
Examples: heights and weights of people
n
c
y
n
c
y .30 .30
.35 .35
F
r
e
q
u
e
n
F
r
e
q
u
e
n
.20 .20
.25 .25
R
e
l
a
t
i
v
e
F
R
e
l
a
t
i
v
e
F
05 05
.10 .10
.15 .15
77
RR
.05 .05
00
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Skewness
Descriptive statistics
Skewness
Asymmetrical distribution
Moderately Skewed Left
A longer tail to the left A longer tail to the left
Example: exam scores
yy
.30 .30
.35 .35
r
e
q
u
e
n
c
y
r
e
q
u
e
n
c
y
.20 .20
.25 .25
.30 .30
l
a
t
i
v
e
F
r
l
a
t
i
v
e
F
r
.10 .10
.15 .15
78
R
e
l
R
e
l
.05 .05
00
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Skewness
Descriptive statistics
Skewness
Asymmetrical distribution
Frequency
I Income
Populations of
countries
Value
79
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Skewness
N
A Measure of skewness based on the 3rd moment about the Mean
N
3
i
) (
_ _
3
1 i
s ) 1 (N
skewness
=
=
( )
n
x x
n
i
i
) 1 (
2 / 3
1
3
s ) 1 (N
( )
n
n
x x
n
i
i
i
2
) 1 (
1
2 / 3
1
~
=
=
80
s median mean s e mean
i
/ ) ( 3 / ) mod (
1
~
=
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Sk
Descriptive statistics
Skewness
Frequency q y
Value
81
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Skewed Right - Positive Skewness
Number of Music CDs of Spring 1998 Stat 250 Students
20
10
q
u
e
n
c
y
10
F
r
e
0 100 200 300 400
0
82
Number of Music CDs
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Kurtosis
Measures of Kurtosis
Kurtosis is a measure of the flatness or peakedness of a Distribution
Normal Kurtosis - Mesokurtic
Flat Kurtosis - Platokurtic
Peaked Kurtosis Leptokurtic Peaked Kurtosis - Leptokurtic
A Measure of Kurtosis based on the 4th moment about the Mean
83
Lecture 1. Brief history, basic concepts and descriptive statistics
Biostatistics
Xinhai Li
Descriptive statistics
Kurtosis
N
4
1 i
4
i
) (
kurtosis
=
=
_ _
4
s ) 1 (N
kurtosis