Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Statistics Grade 12

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 24

STATISTICS

IS
STATISTICS
STATISTICS GRADE 10
Learners must be able to:

(a) Collect, organise and interpret numerical data in order to determine:


 Measures of central tendency
 Five number summary
 Box and whisker diagram
 Measures of dispersion

STATISTICS

A set of mathematical tools used to describe and make judgements about data. Statistics
is very closely related to Probability. In Probability, we know how a process works and
we want to predict what the outcomes of that process will be. In Statistics, we don’t
know how the process works but we can observe the outcomes of the process i.e., we
want to use the information’s about the outcomes to learn about the nature of the
process. Statistics can either be descriptive or inferential.

Descriptive Statistics

It is whereby data from sample are organised and analysed into tabular forms and
displayed graphically. Data is a set of facts. There are two types of data: Qualitative data
and Quantitative data. Quantitative data is further classified as discreet (counting) and
continuous (measurement).

Four features of Quantitative Data

 Location: What is the centre or average value? Here we use measures of central
tendency
 Spread: What is the spread of the values? Do they fall closely around the centre or
far apart? (Five number summary)
 Shape: What is the shape of the data, Bell shaped or Skewed? Symmetric?
 Outliers: Are there any extreme or unusual observations?

Kutlwanong Centre for Maths, Science & Technology Page 1


Measures of Central Tendency

A measure of Central Tendency are numbers that describe what is the average or typical
of the data. They are also regarded as values where the middle of distribution lies. They
tell us where the middle of a bunch of data lies.

 Mean: the value that is equal to the sum of a list of numbers divided by the
number of numbers ( same as average). It is the most common measure of
central tendency. Mean = Sum of the data/Number of the data
 Median: the value such that half of the numbers in a list are above it and half is
below it. To find the median, we must first put the list of numbers in ascending or
descending order. The median is the middle score(in a case of odd items) or
average of middle scores(in a case of even items) in a distribution. The position
of the median is given by the formula:

( n+1 )
median= 2 , where n is the number of items in the list.

 Mode: the value that occurs most frequently in a list of numbers. It does not
involve any calculations or ordering of data.

EXAMPLE 1: Suppose you have observed the following values of daily sales of kotas at
you school: 40,56,38,38,63,59,52,49,46.
Σx
Mean = N
40+ 56+38+38+ 63+59+52+49+ 46
=49
Mean = 9

Mode is 38 because it occurs the greatest number of times

Median: first arrange data in ascending order.

38, 38, 40, 46, 49, 52, 56, 59, 63. 49 is the median, four numbers are less than 49 and
four numbers are greater than 49. It is easy to find the median if there are odd number
of numbers in the list. If there are even numbers of items in the list, the median will be
the average of the two middle numbers. Suppose we are given the following set of data:
41, 44, 46, 47, 48, 49, 52, 53. The two numbers closest to the middle are 47 and 48. So

the median will be


( 2 )
47+ 48
=47 . 5

Kutlwanong Centre for Maths, Science & Technology Page 2


PRACTICE EXERCISE 1

Find the mean, mode and median for each set of data
(a) 10 5 8 15 7 9 3 30 3
(b) 36 35 37 29 39 36 340 35 36
(c) 76 84 88 90 78 80 84 88 92 80 86 84
(d) 98 97 101 104 105 102 103 100 101 99

Kutlwanong Centre for Maths, Science & Technology Page 3


APPLICATION OF MEAN, MODE AND MEDIAN

The best average in a situation is the one that best reflects what is most typical
measures of central tendency are used to describe various situations.
MEAN

It is useful in situations that do not use extreme large or small numbers that distort
results, e.g. the time it takes each learner to get to school.
MODE

Mode is the preferred average when the data is not numerical. It is used in determining
the most frequently chosen category e.g. the favourite kwaito group.
MEDIAN

It is used when extreme measures distort the mean e.g. the cost of houses in your
community.

Kutlwanong Centre for Maths, Science & Technology Page 4


PRACTICE EXERCISE 2

Would you use the mean, the median, or the mode for each situation? Explain

1. The heights of the members of the basketball team


2. The distance members of your class live from the school
3. The rainfall for the month of August in your community
4. The number of times each class member went to a mall last week
5. The cost of new cars
6. The amount of TV each student watches each week
7. The amount of time spent doing homework by the members of your class

Kutlwanong Centre for Maths, Science & Technology Page 5


Advantages and disadvantages of mean, median and mode

With three averages to choose form mean, median and mode – which should we use?
The following table shows the advantages and disadvantages of these different
averages.

Average Advantages Disadvantages


Mean All the data is used to find the Very large or very small
answer numbers can distort the
answer
Median Very big and very small values Takes a long time to calculate
don’t affect it for a very large set of data
Mode or modal class The only average we can use 1. There may be more than
when data is not numerical one mode
2. There may be no mode at
all if none of the data is the
same
3. It may not accurately
represent the data
Example

This table shows the monthly salary of people who work at a garden centre.

Monthly Salary (R) Number of people


0 – 9,999 10
10,000 – 19,999 9
20,000 – 29,999 9
30, 000 – 39,999 1
40,000 – 49,999 1

The modal class is R0 – R 9,999.

Question

What is the disadvantage of using the modal class?

Answer

Even though the range R0 – R9, 999 contains the most number of people, the next two
ranges have comparable numbers and so is not representative of the data.

Question

What is the disadvantage of using the mean?

Almost everyone earns under R30, 000. The mean would be distorted by the fact that
two people earn much more than this.

Kutlwanong Centre for Maths, Science & Technology Page 6


FIVE NUMBER SUMMARY

It is a mathematical tool used to quickly summarize and gain insight about a set of data.
It provides summary of the distribution of data. The five number summary consists of
the following:

1. Minimum: the smallest value in the data


2. Lower Quartile(Q1): the median of the data to the left of the location of the
overall median
3. Median(Q2): the middle number in the list
4. Upper Quartile(Q3): the median of the data to the right of the location of the
overall median.
5. Maximum: the largest value in the data

How to do the five number summary

 Identify the minimum and maximum


 Find the median
 Determine the lower and upper quartiles guided by the median

Example 1:

Suppose we are given the following set of data

19, 11, 7, 24, 13, 15, 10, 3, 10, 20

Write in ascending order

3, 7, 10, 10, 11, 13, 15, 19, 20, 24

Min = 3

Max = 24

Median =
( 2 )
11+ 13
=12

Lower Quartile

3, 7, 10, 10, 11 are on the left of 12, therefore the first 10 is the lower quartile.

Upper Quartile

13, 15, 19, 20, 24 are on the right of 12, therefore 19 is the upper quartile.

Kutlwanong Centre for Maths, Science & Technology Page 7


Example 2:

Find the five number summary for the following set of data

6, 22, 13, 2, 18, 28, 26, 9, 18

Write in ascending order

2, 6, 9, 13, 18, 18, 22, 26, 28

Min = 2

Max = 28

Median = the first 18

Lower Quartile: 2 6 9 13 are on the left side of the first 18, therefore (6+9)/2=7.5 is the
lower quartile.

Upper Quartile: 18 22 26 28 are on the right of the first 18, therefore (22+26)/2=24 is
the upper quartile.

Kutlwanong Centre for Maths, Science & Technology Page 8


PRACTICE EXERCISE 3

Identify the five number summary in each of the following

a) 1, 3, 4, 5, 6, 7, 9
b) 1, 3, 4, 6, 8, 9, 10, 4
c) 84, 89, 89, 64, 78
d) 26, 13, 35, 76, 44, 58
e) 1, 13, 17, 7, 10, 2, 9, 3, 26, 18, 24, 2, 29

Kutlwanong Centre for Maths, Science & Technology Page 9


BOX AND WHISPER DIAGRAM

It is a graphical display of the five number summary. It provides an alternative to a


histogram, a dotplot and a stem-and-leaf plot.
Box

Whisker Whisker

HOW TO DRAW A BOX AND WHISKER

Step 1: Label either a vertical axis or a horizontal axis with numbers from min to max of
the data.

Step 2: Draw box with lower end at Q1 and upper end at Q3.

Step 3: Draw a line through the box at the median.

Step 4: Draw a line from Q1 end of box to smallest data value that is not further than 1.5
x (Q3 –Q1) from Q1. Draw a line from Q3 end of box to largest data value that is not
further than 1.5 x (Q3 – Q1) from Q3.

BOX PLOTS

 Box Plots (Box and Whisker Diagram) are another way of displaying data for
comparison
 To draw a Box Plot we need FIVE pieces of information:
- 1. The Minimum Value
- 2. Lower Quartile (Q1)
- 3. The Median (Q2)
- 4. Upper Quartile (Q3)
- 5. Maximum Value
 They are drawn in the following way:

1 5
2 4
3
 BOX =The location of the middle 50% of the data
 WHISKERS = How the data is spread overall

Kutlwanong Centre for Maths, Science & Technology Page 10


Step 5: Mark data points further than 1.5 x IQR from either edge of the box with an
asterisk. Points represented with asterisks are considered to be “outliers”.

DISTRIBUTION SHAPE OF BOX AND WHISKER

1. If the median is near the centre of the box and each of the horizontal lines are
approximately equal in length, then the distribution is symmetric.

__________________________________________

25 35 45 55 65 75

2. If median is left of the centre of the box and/or the right line is substantially
longer than the left line, the distribution is right skewed.

0 10 20

3. If the median is right of the centre of the box and/or the left line is substantially
longer than the right line, the distribution is left skewed

13 14 15 16 17 18 19 20 21

Kutlwanong Centre for Maths, Science & Technology Page 11


BOXPLOT

NOTE:

Min = 16

Is greater than

Q1 – 1.5 (Q3 – Q1)

= 18.5 – 1.5 (2)

= 15.5

SO… stop at min

Max = 23

Is less than

Q3 + 1.5 (Q3-Q1)

= 20.5 + 1.5(2) Min Max


=23.5

SO… stop at max.


Q1 Q3
Median

16 17 18 19 20 21 22 23

Kutlwanong Centre for Maths, Science & Technology Page 12


EXAMPLE 1

The weights of 20 randomly selected juniors at MSHS are recorded below:

121 126 130 132 143 137 141 144 148 205

125 128 131 133 135 139 141 147 153 213

(a) Construct a boxplot of the data


(b) Determine if there are any mild or extreme outliers.

SOLUTIONS
(a)
Q1 = 130.5 Q2 = 138 Q3 = 145.5
Min =121 Max = 213 Range = 92
IQR = 15 UF = 168 LF = 108

(b)Extreme Outliers
(> 3 IQR from Q3)

100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260
Weight

EXAMPLE 2

Kutlwanong Centre for Maths, Science & Technology Page 13


The following are the scores of 12 members of a woman’s golf team in tournament
play:

89 90 87 95 86 81

111 108 83 88 91 79

(a) Construct a boxplot of the data.


(b) Are there any mild or extreme outliers?
(c) Comment

Kutlwanong Centre for Maths, Science & Technology Page 14


GRADE 11 AND 12

Learners must be able to:

GRADE 11 GRADE 12
1. Represent measures of central 1. Represent bivariate numerical data
tendency and dispersion in as a scatter plot and suggest
univariate numerical data by: intuitively and by simple
 Using ogives, and investigation whether a linear,
 Calculating the variance and quadratic or exponential function
standard deviation of sets of would best fit the data
data manually (for small sets 2. Use a calculator to calculate the
of data) and using linear regression line which best
calculators (for larger sets of fits a given set of bivariate
data) and representing numerical data.
results graphically. 3. Use a calculator to calculate the
2. Represent Skewed data in a box correlation co-efficient of a set of
and whisker diagrams and bivariate numerical data and make
frequency polygons. Identify relevant deductions
outliers.

A. Measures of dispersion

It is the statistical measures that summarise the amount of spread or variation in the
distribution of values in a variable. It indicates the extent to which data is spread out
from the centre, i.e. how far apart or how close is the data from each other and from
the mean.

 The dispersion is high if most of the data is far away from another
 The dispersion is low (but greater than zero) if the data is closer to one
another.

There are FOUR main measures of dispersion

1. Range
The difference between the highest(maximum) and the lowest(minimum) in the
distribution of values.

Example 2:
Weekly income of 6 mine workers
R1 800 R2000 R2 3OO R2 100 R2 500 R1 450
RANGE= Max – Min= 2 500 – 1450 = 950

Kutlwanong Centre for Maths, Science & Technology Page 15


2. Variance

The measure of the spread. It is also defined as the average of the squared deviations
from the mean. It involves the measuring of the distance between each of the values and
the mean.

How to calculate the variance

 Calculate the mean


 For each value in the data, subtract the mean(deviate) and then square the results
Σf ( x− x̄ )2
s2 =
 Calculate the average of those squares using the formula: n

Ages in years ( x− x̄ ) 2
( x ,−x)
x
20
24
26
27
28
TOTAL

3. Standard Deviation
It indicates the relation that a set of values has to the mean. It is the square root
of the variance.
Use the information from example 2 to calculate the standard deviation manually
as well as by the calculator

4. Inter- Quartile Range


The difference between the 3rd and the 1st quartile (the middle half)
Example 3:
Find the IQR for the following data
45 47 49 51 57 62 68

First find Q1 and Q3


Q1 = 47 and Q3= 62
So IQR = 62 – 47 = 15

Kutlwanong Centre for Maths, Science & Technology Page 16


OGIVES

It is the graphical presentation of a cumulative frequency distribution. Cumulative


frequency graphs are used to estimate the median and the interquartile range of a set of
data. Cumulative means adding up.

Constructing an ogive

 Constructing a cumulative frequency table is first step


 Make a running total of the frequency
 Put the end points not the class interval on the x-axis
 Plot the points at the end of the class interval
 Join the points with straight lines-it must give you an “S” shape
 Find the median by drawing across from the middle of the cumulative frequency
axis
1 3
 Find LQ and UQ from 4 and 4 up the c.f. axis

Cumulative Frequency Curves


 Cumulative Frequency = a running total of the data
 Median = Half way up the Cumulative Frequency
1
 Lower Quartile (LQ) = 4 way up Cum Freq
3
 Upper Quartile (UQ) = 4 way up Cum Freq
 Inter Quartile Range = UQ – LQ

Kutlwanong Centre for Maths, Science & Technology Page 17


BOX PLOT FROM CUMULATIVE FREQUENCY CURVE

70

60

50 3
4

40

1
2 IQR = 38 – 21 =
30 17mins

20 1
4
Median =27

UQ = 38

10
LQ=21

10 20 30 40 50 60 70

0 10 20 60
30 40 50

Kutlwanong Centre for Maths, Science & Technology Page 18


Practice Exercise

a) Draw a cumulative frequency graph for the following data:

Height 150≤h<155 155≤h<160 160≤h<165 165≤h<170 170≤h<175


(cm)
Frequency 4 22 56 32 5

b) Estimate the median from the graph


c) Estimate the interquartile range from the graph

Kutlwanong Centre for Maths, Science & Technology Page 19


PRACTICE EXERCISE

QUESTION 1

The data below shows the number of people visiting a local clinic per day to be
vaccinated against measles.

5 12 19 29

35 23 15 33

37 21 26 18

23 18 13 21

18 22 20

1.1 Determine the mean given data


1.2 Calculate the standard deviation of the data
1.3 Determine the number of people vaccinated against measles that lies within ONE
standard deviation of the mean
1.4 Determine the interquartile range for the data
1.5 Draw a box and whisker diagram to represent the data
1.6 Identify any outliers in the data set. Substantiate your answer.

QUESTION 2

The table below shows the amount of time (in hours) that learners aged between 14 and
18 spent watching television 3 weeks of the holiday.

Time (hours) Cumulative frequency


0≤t<20 25
20≤t<40 69
40≤t<60 129
60≤t<80 157
80≤t<100 166
100≤t<120 172

2.1 Draw an ogive (cumulative frequency curve) to represent the above data
2.2 Write down the modal class of the data
2.3 Use the ogive (cumulative frequency curve) to estimate the number of learners
who watched television more than 80% of the time

Kutlwanong Centre for Maths, Science & Technology Page 20


2.4 Estimate the mean time (in hours) that learners spent watching television during
3 weeks of the holiday

QUESTION 3

In the 2008 Beijing Paralympic Games, South Africa was in sixth palce in the Overall
Medals Standing, having won 21 gold,3 silver and 6 bronze medals. South Africa’s top
athletes were Natalie du Toit, Oscar Pistorius and Hilton Langenhoven. China was in
first position having achieved a total of 211 medals, 89 of them gold. The number of gold
won by the countries in position 2 to 20, have been grouped into class intervals, and
listed in the table that follows.

Number of gold Number of Cumulative Points to plot for


medals won countries frequency an ogive
5≤x<10 6
10≤x<15 4
15≤x<20 a 14
20≤x<25 3
25≤x<30 0
30≤x<35 0
5≤x<40 b 18
40≤x<45 c
TOTAL 19

a) Calculate the values for a, b and c


a=
b=
c=
b) Complete the given table
c) Draw a cumulative frequency curve to illustrate the given data set
d) On your graph, record where you would read the following, and write down your
results below correct to the nearest whole number.

Lower quartile: ...


Median: ...
Upper quartile: ...

Kutlwanong Centre for Maths, Science & Technology Page 21


QUESTION 4

A group of Grade 11 learners were interviewed about using a certain application to send
SMS messages. The number of SMS messages, m, sent by each learner was summarised
in the histogram below.

Histogram showing the number of SMS messages sent by


learners
50
45
40
36
35
31
30 29
26
Frequency

25
20
15 14
15
10 7
5 2
0
2 4 6 8 10 12 14 16 18
Number of SMS messages (m)

4.1 Complete the cumulative frequency table


4.2 Draw an ogive (cumulative frequency curve) to represent the data
4.3 Estimate the percentage of the learners who sent more than 11 messages using
this application
4.4 In which direction is the data skewed?

Kutlwanong Centre for Maths, Science & Technology Page 22


Kutlwanong Centre for Maths, Science & Technology Page 23

You might also like