Lecture 1
Lecture 1
1A1E
STA1EB1
t u re 1
Lec
What is Statistics?
We are constantly exposed to collections of facts,
or data, both in our professional capacities and in
everyday activities.
Employment statistics
Income and expenditure
Accident statistics
Population statistics
Birth and death
Exports and imports, etc.
Data is the Latin word for "those that are given", and
can be thought of as the "results/outcomes/actual values
of observations“
4
Descriptive vs Inferential
STATISTICS
Descriptive Inferential
Drawing
Collection conclusions
Organizing
Presentation
Probability And Statistical
Inference
Probability
Population
Sample
Statistical
Inference
Collecting Data
Statistics deals not only with the
organization and analysis of data
once it has been collected but also
with the development of techniques
for collecting the data.
B u t
2 0 08 P r e
s i d en t i a l
election
Sample si
ze: 2847
52% Obam
a ; 42%
McCain
R es u l t s :
53% Obam
a ; 46% 8
McCain
Probability Sampling
Methods
Selection methods where the sample
members are selected from the target
population on a purely random (chance)
basis
Simple random
sampling
Systematic random
sampling
Stratified random
sampling
Cluster random
sampling
Simple Random
Sampling
• The sampling plan or
experimental design determines
the amount of information you can
extract, and often allows you to
measure the reliability of your
inference
Population
Cluster Random
Sampling
• Large and geographically dispersed
population with natural clusters, similar in
profile to each other
• Take simple random sample of clusters and
then random samples of the sampling unit
within each cluster
Population
Systematic Random
Sampling
• Uses a sampling frame (address list or
database)
• First sampling unit is selected randomly
from sampling frame
• Subsequent sampling units selected at
uniform intervals
N
n 2
15000 Population 62
500 (15 000)
30
32
Types of Data
In determining the most appropriate ways
to summarize or analyse data, it is useful to
classify variables as either categorical or
quantitative:
16
Quantitative data
Some quantitative data is obtained by counting to
determine the value of a variable
• Weight of an individual
• Reaction time to a particular stimulus
• Time taken to complete a test
17
Histograms for “Counting”
variables
18
Example
A bag of M&Ms contains 25 candies:
Raw Data: m m m m m m m m m m
m m m m m m m m m m
m m m m m
Frequency distribution:
20
Example
The ages of 50 lecturers at UJ:
34 48 70 63 52 52 35 50 37 43 53 43 52 44
42 31 36 48 43 26 58 62 49 34 48 53 39 45
34 59 34 66 40 59 36 41 35 36 62 34 38 28
43 50 30 43 32 44 58 53
Frequency Distribution:
Step 1: Determine the range
Range max min
The range = 70 – 26 = 44
Start at 20
Example
Step 5: Determine the class limits
Upperlimitof 1 class lowerlimit class width
st
23
Age Tally Frequen Relative Percen
cy Frequency t
[20 ; 30) 11 2 2/50 = .04 4%
[30 ; 40) 1111 1111 1111 16 16/50 = .32 32%
1
[40 ; 50) 1111 1111 1111 15 15/50 = .30 30%
[50 ; 60) 1111 1111 11 12 12/50 = .24 24%
[60 ; 70) 1111 4 4/50 = .08 8%
[70 ; 80) 1 1 1/50 = .02 2%
Histogram
18
16 Method
14
12
of left
inclusio
Frequency
10
8
6 n
4
2
0
29 39 49 59 69 79 More
30 40 50 Bin 60 70 80
24
Histogram Shapes
Histograms come in a variety of shapes: Bimodal
Symmetric
Skewed to
the left
Skewed to
the right
Describing the
Distribution Histogram
18
16
14
Frequency
10
8
Outliers? No 6
4
2
0
29 39 49 59 69 79 More