Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
6 views

Lecture 1

Uploaded by

Clarence
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Lecture 1

Uploaded by

Clarence
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Statistics

1A1E
STA1EB1

t u re 1
Lec
What is Statistics?
We are constantly exposed to collections of facts,
or data, both in our professional capacities and in
everyday activities.

 Employment statistics
 Income and expenditure
 Accident statistics
 Population statistics
 Birth and death
 Exports and imports, etc.

The discipline of statistics provides methods for


organizing and summarizing data and for
drawing conclusions based on information
contained in the data.
Definitions
 A population is a set of all measurements of interest

 A sample is a subset of a population that is obtained


through some process (sampling), for the purposes of
investigating the properties of the underlying population.
Paramet
er
Population 
Statistic
Sample
x
3
Definitions
 A variable is any attribute or characteristic (such as
age, weight, length, etc.) of an object being investigated
and that is considered to be capable of varying from one
object to another in the population

 The object or individual on which a variable is


measured, is an experimental unit (case)

 Data is the Latin word for "those that are given", and
can be thought of as the "results/outcomes/actual values
of observations“

4
Descriptive vs Inferential
STATISTICS

Descriptive Inferential

Drawing
Collection conclusions

Organizing

Presentation
Probability And Statistical
Inference
Probability

Population

Sample

Statistical
Inference
Collecting Data
Statistics deals not only with the
organization and analysis of data
once it has been collected but also
with the development of techniques
for collecting the data.

If data is not properly collected, an


investigator may not be able to
answer the questions under
consideration with a reasonable
degree of confidence.
Sampling
oup A m e r i ca
ut S Population
t e r n million
size: 300

B u t
2 0 08 P r e
s i d en t i a l
election
Sample si
ze: 2847
52% Obam
a ; 42%
McCain

R es u l t s :
53% Obam
a ; 46% 8
McCain
Probability Sampling
Methods
Selection methods where the sample
members are selected from the target
population on a purely random (chance)
basis

Simple random
sampling
Systematic random
sampling
Stratified random
sampling
Cluster random
sampling
Simple Random
Sampling
• The sampling plan or
experimental design determines
the amount of information you can
extract, and often allows you to
measure the reliability of your
inference

• Simple random sampling is a


method of sampling that allows
each possible sample of size n an
equal probability of being selected
10
Example
• There are 29 students in a
statistics class. The lecturer
wants to choose 5 students to
form a group. How should she
proceed?
1.
1. Give
Giveeach
eachstudent
studentaanumber
number
from
from01
01to
to29
29
2.
2. Choose
Choose55pairs
pairsof
ofrandom
randomdigits
digits
from
fromthe
therandom
randomnumber
numbertable
table
3.
3. IfIfaanumber
numberlarger
larger29
29isischosen,
chosen,
simply
simplyselect
selectanother
anothernumber
number
4.
4. The
Thefive
fivestudents
studentswith
withthose
those
numbers
numbersformformthe
thegroup
group
11
Stratified Random
Sampling
• Population heterogeneous
• Divide population into homogenous
strata
• Take simple random samples from
each strata in proportion to relative
size of each stratum

Population
Cluster Random
Sampling
• Large and geographically dispersed
population with natural clusters, similar in
profile to each other
• Take simple random sample of clusters and
then random samples of the sampling unit
within each cluster

Population
Systematic Random
Sampling
• Uses a sampling frame (address list or
database)
• First sampling unit is selected randomly
from sampling frame
• Subsequent sampling units selected at
uniform intervals

N
n 2
15000 Population 62

500 (15 000)
30
32
Types of Data
In determining the most appropriate ways
to summarize or analyse data, it is useful to
classify variables as either categorical or
quantitative:

• A categorical variable divides the cases


into groups, placing each case into exactly
one of two or more categories

• A quantitative variable measures or


records a numerical quantity for each
case.
15
Graphs for categorical
variables Bar gr
c har t aph
Pie

16
Quantitative data
Some quantitative data is obtained by counting to
determine the value of a variable

• The number of traffic offences a person received


during the last year
• The number of vehicles arriving at a tollgate
during a particular period
• The number of students in this class

whereas other data is obtained by taking


measurements

• Weight of an individual
• Reaction time to a particular stimulus
• Time taken to complete a test

17
Histograms for “Counting”
variables

18
Example
 A bag of M&Ms contains 25 candies:
 Raw Data: m m m m m m m m m m
m m m m m m m m m m
m m m m m
 Frequency distribution:

Color Tally Frequen Relative Percen


cy Frequency t
Red mmm 3 3/25 = 0.12 12%
Blue mmmmmm 6 6/25 = 0.24 24%
Green mm mm 4 4/25 = 0.16 16%
Orange mmmmm 5 5/25 = 0.20 20%
Brown mm m 3 3/25 = 0.12 12%
Yellow mmmm 4 4/25 = 0.16 16%
Histograms for “Measurement”
variables
Constructing a histogram for measurement data entails
subdividing the measurement axis into a suitable
number of class intervals or classes, such that each
observation is contained in exactly one class.

 Divide the range of the data into 5-20 subintervals of


equal length
 Calculate the approximate width of the subinterval as
range/number of subintervals
 Round the approximate width UP to a convenient value
 Use the method of left inclusion, including the left
endpoint, but not the right in your tally
 Create a statistical table including the subintervals,
their frequencies and relative frequencies

20
Example
The ages of 50 lecturers at UJ:
34 48 70 63 52 52 35 50 37 43 53 43 52 44
42 31 36 48 43 26 58 62 49 34 48 53 39 45
34 59 34 66 40 59 36 41 35 36 62 34 38 28
43 50 30 43 32 44 58 53

Frequency Distribution:
Step 1: Determine the range
Range max min
The range = 70 – 26 = 44

Step 2: Determine the number of classes


Between5 and 12, dependingon samplesize

We choose to use 6 intervals


Example
Step 3: Determine the class width
Range
Class width 
Number of classes

Minimum class width = 44/6 = 7.33


Convenient class width = 10

Step 4: Determine the lower limit of the 1st


class Lower limit of 1st class  min

Start at 20
Example
Step 5: Determine the class limits
Upperlimitof 1 class lowerlimit class width
st

Lowerlimitof 2 class upperlimitof 1 class


nd st

Upperlimitof 2 class lowerlimit class width


nd

[20 ; 30) Use


[30 ; 40) etc. method of
left
inclusion
Step 6: Determine the frequency
A countof thenumberof valuesin eachclass

23
Age Tally Frequen Relative Percen
cy Frequency t
[20 ; 30) 11 2 2/50 = .04 4%
[30 ; 40) 1111 1111 1111 16 16/50 = .32 32%
1
[40 ; 50) 1111 1111 1111 15 15/50 = .30 30%
[50 ; 60) 1111 1111 11 12 12/50 = .24 24%
[60 ; 70) 1111 4 4/50 = .08 8%
[70 ; 80) 1 1 1/50 = .02 2%
Histogram
18
16 Method
14
12
of left
inclusio
Frequency

10
8
6 n
4
2
0
29 39 49 59 69 79 More
30 40 50 Bin 60 70 80

24
Histogram Shapes
Histograms come in a variety of shapes: Bimodal
Symmetric

Skewed to
the left

Skewed to
the right
Describing the
Distribution Histogram
18
16
14

Shape? Skewed right 12

Frequency
10
8
Outliers? No 6
4
2
0
29 39 49 59 69 79 More

What proportion of the 30 40 50 Bin 60 70 80

lecturers are younger than


(2 + 16)/50 = 18/50 =
40?
0.36
What is the probability
that a randomly selected (12 + 4 + 1)/50 = 17/50 =
lecturer is 50 or older? 0.34
e n c e :
Re f er
p t er 1
C h a 1 &
s1
n .
Sec ti o
1.2

You might also like