Statistics Grade 12
Statistics Grade 12
Statistics Grade 12
IS
STATISTICS
STATISTICS GRADE 10
Learners must be able to:
STATISTICS
A set of mathematical tools used to describe and make judgements about data. Statistics
is very closely related to Probability. In Probability, we know how a process works and
we want to predict what the outcomes of that process will be. In Statistics, we don’t
know how the process works but we can observe the outcomes of the process i.e., we
want to use the information’s about the outcomes to learn about the nature of the
process. Statistics can either be descriptive or inferential.
Descriptive Statistics
It is whereby data from sample are organised and analysed into tabular forms and
displayed graphically. Data is a set of facts. There are two types of data: Qualitative data
and Quantitative data. Quantitative data is further classified as discreet (counting) and
continuous (measurement).
Location: What is the centre or average value? Here we use measures of central
tendency
Spread: What is the spread of the values? Do they fall closely around the centre or
far apart? (Five number summary)
Shape: What is the shape of the data, Bell shaped or Skewed? Symmetric?
Outliers: Are there any extreme or unusual observations?
A measure of Central Tendency are numbers that describe what is the average or typical
of the data. They are also regarded as values where the middle of distribution lies. They
tell us where the middle of a bunch of data lies.
Mean: the value that is equal to the sum of a list of numbers divided by the
number of numbers ( same as average). It is the most common measure of
central tendency. Mean = Sum of the data/Number of the data
Median: the value such that half of the numbers in a list are above it and half is
below it. To find the median, we must first put the list of numbers in ascending or
descending order. The median is the middle score(in a case of odd items) or
average of middle scores(in a case of even items) in a distribution. The position
of the median is given by the formula:
( n+1 )
median= 2 , where n is the number of items in the list.
Mode: the value that occurs most frequently in a list of numbers. It does not
involve any calculations or ordering of data.
EXAMPLE 1: Suppose you have observed the following values of daily sales of kotas at
you school: 40,56,38,38,63,59,52,49,46.
Σx
Mean = N
40+ 56+38+38+ 63+59+52+49+ 46
=49
Mean = 9
38, 38, 40, 46, 49, 52, 56, 59, 63. 49 is the median, four numbers are less than 49 and
four numbers are greater than 49. It is easy to find the median if there are odd number
of numbers in the list. If there are even numbers of items in the list, the median will be
the average of the two middle numbers. Suppose we are given the following set of data:
41, 44, 46, 47, 48, 49, 52, 53. The two numbers closest to the middle are 47 and 48. So
Find the mean, mode and median for each set of data
(a) 10 5 8 15 7 9 3 30 3
(b) 36 35 37 29 39 36 340 35 36
(c) 76 84 88 90 78 80 84 88 92 80 86 84
(d) 98 97 101 104 105 102 103 100 101 99
The best average in a situation is the one that best reflects what is most typical
measures of central tendency are used to describe various situations.
MEAN
It is useful in situations that do not use extreme large or small numbers that distort
results, e.g. the time it takes each learner to get to school.
MODE
Mode is the preferred average when the data is not numerical. It is used in determining
the most frequently chosen category e.g. the favourite kwaito group.
MEDIAN
It is used when extreme measures distort the mean e.g. the cost of houses in your
community.
Would you use the mean, the median, or the mode for each situation? Explain
With three averages to choose form mean, median and mode – which should we use?
The following table shows the advantages and disadvantages of these different
averages.
This table shows the monthly salary of people who work at a garden centre.
Question
Answer
Even though the range R0 – R9, 999 contains the most number of people, the next two
ranges have comparable numbers and so is not representative of the data.
Question
Almost everyone earns under R30, 000. The mean would be distorted by the fact that
two people earn much more than this.
It is a mathematical tool used to quickly summarize and gain insight about a set of data.
It provides summary of the distribution of data. The five number summary consists of
the following:
Example 1:
Min = 3
Max = 24
Median =
( 2 )
11+ 13
=12
Lower Quartile
3, 7, 10, 10, 11 are on the left of 12, therefore the first 10 is the lower quartile.
Upper Quartile
13, 15, 19, 20, 24 are on the right of 12, therefore 19 is the upper quartile.
Find the five number summary for the following set of data
Min = 2
Max = 28
Lower Quartile: 2 6 9 13 are on the left side of the first 18, therefore (6+9)/2=7.5 is the
lower quartile.
Upper Quartile: 18 22 26 28 are on the right of the first 18, therefore (22+26)/2=24 is
the upper quartile.
a) 1, 3, 4, 5, 6, 7, 9
b) 1, 3, 4, 6, 8, 9, 10, 4
c) 84, 89, 89, 64, 78
d) 26, 13, 35, 76, 44, 58
e) 1, 13, 17, 7, 10, 2, 9, 3, 26, 18, 24, 2, 29
Whisker Whisker
Step 1: Label either a vertical axis or a horizontal axis with numbers from min to max of
the data.
Step 2: Draw box with lower end at Q1 and upper end at Q3.
Step 4: Draw a line from Q1 end of box to smallest data value that is not further than 1.5
x (Q3 –Q1) from Q1. Draw a line from Q3 end of box to largest data value that is not
further than 1.5 x (Q3 – Q1) from Q3.
BOX PLOTS
Box Plots (Box and Whisker Diagram) are another way of displaying data for
comparison
To draw a Box Plot we need FIVE pieces of information:
- 1. The Minimum Value
- 2. Lower Quartile (Q1)
- 3. The Median (Q2)
- 4. Upper Quartile (Q3)
- 5. Maximum Value
They are drawn in the following way:
1 5
2 4
3
BOX =The location of the middle 50% of the data
WHISKERS = How the data is spread overall
1. If the median is near the centre of the box and each of the horizontal lines are
approximately equal in length, then the distribution is symmetric.
__________________________________________
25 35 45 55 65 75
2. If median is left of the centre of the box and/or the right line is substantially
longer than the left line, the distribution is right skewed.
0 10 20
3. If the median is right of the centre of the box and/or the left line is substantially
longer than the right line, the distribution is left skewed
13 14 15 16 17 18 19 20 21
NOTE:
Min = 16
Is greater than
= 15.5
Max = 23
Is less than
Q3 + 1.5 (Q3-Q1)
16 17 18 19 20 21 22 23
121 126 130 132 143 137 141 144 148 205
125 128 131 133 135 139 141 147 153 213
SOLUTIONS
(a)
Q1 = 130.5 Q2 = 138 Q3 = 145.5
Min =121 Max = 213 Range = 92
IQR = 15 UF = 168 LF = 108
(b)Extreme Outliers
(> 3 IQR from Q3)
100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260
Weight
EXAMPLE 2
89 90 87 95 86 81
111 108 83 88 91 79
GRADE 11 GRADE 12
1. Represent measures of central 1. Represent bivariate numerical data
tendency and dispersion in as a scatter plot and suggest
univariate numerical data by: intuitively and by simple
Using ogives, and investigation whether a linear,
Calculating the variance and quadratic or exponential function
standard deviation of sets of would best fit the data
data manually (for small sets 2. Use a calculator to calculate the
of data) and using linear regression line which best
calculators (for larger sets of fits a given set of bivariate
data) and representing numerical data.
results graphically. 3. Use a calculator to calculate the
2. Represent Skewed data in a box correlation co-efficient of a set of
and whisker diagrams and bivariate numerical data and make
frequency polygons. Identify relevant deductions
outliers.
A. Measures of dispersion
It is the statistical measures that summarise the amount of spread or variation in the
distribution of values in a variable. It indicates the extent to which data is spread out
from the centre, i.e. how far apart or how close is the data from each other and from
the mean.
The dispersion is high if most of the data is far away from another
The dispersion is low (but greater than zero) if the data is closer to one
another.
1. Range
The difference between the highest(maximum) and the lowest(minimum) in the
distribution of values.
Example 2:
Weekly income of 6 mine workers
R1 800 R2000 R2 3OO R2 100 R2 500 R1 450
RANGE= Max – Min= 2 500 – 1450 = 950
The measure of the spread. It is also defined as the average of the squared deviations
from the mean. It involves the measuring of the distance between each of the values and
the mean.
Ages in years ( x− x̄ ) 2
( x ,−x)
x
20
24
26
27
28
TOTAL
3. Standard Deviation
It indicates the relation that a set of values has to the mean. It is the square root
of the variance.
Use the information from example 2 to calculate the standard deviation manually
as well as by the calculator
Constructing an ogive
70
60
50 3
4
40
1
2 IQR = 38 – 21 =
30 17mins
20 1
4
Median =27
UQ = 38
10
LQ=21
10 20 30 40 50 60 70
0 10 20 60
30 40 50
QUESTION 1
The data below shows the number of people visiting a local clinic per day to be
vaccinated against measles.
5 12 19 29
35 23 15 33
37 21 26 18
23 18 13 21
18 22 20
QUESTION 2
The table below shows the amount of time (in hours) that learners aged between 14 and
18 spent watching television 3 weeks of the holiday.
2.1 Draw an ogive (cumulative frequency curve) to represent the above data
2.2 Write down the modal class of the data
2.3 Use the ogive (cumulative frequency curve) to estimate the number of learners
who watched television more than 80% of the time
QUESTION 3
In the 2008 Beijing Paralympic Games, South Africa was in sixth palce in the Overall
Medals Standing, having won 21 gold,3 silver and 6 bronze medals. South Africa’s top
athletes were Natalie du Toit, Oscar Pistorius and Hilton Langenhoven. China was in
first position having achieved a total of 211 medals, 89 of them gold. The number of gold
won by the countries in position 2 to 20, have been grouped into class intervals, and
listed in the table that follows.
A group of Grade 11 learners were interviewed about using a certain application to send
SMS messages. The number of SMS messages, m, sent by each learner was summarised
in the histogram below.
25
20
15 14
15
10 7
5 2
0
2 4 6 8 10 12 14 16 18
Number of SMS messages (m)