KWV Education Statistics
KWV Education Statistics
KWV Education Statistics
Data refers to the pieces of information that have been observed and recorded, from an
experiment or a survey.
✓ Ungrouped data
➢ and it is difficult to analyse the data, so the stem and leaf is used to arrange the data.
The data is listed in intervals that depend on the place value of the digits of each data.
➢ The leaf is the units digit-i.e. furthest to the right in the number.
➢ Back to back
➢ One of the first steps to processing a large set of raw data is to arrange the data values
➢ and then count how many of each data value there are in each group.
➢ The groups are usually based on some sort of interval of data values, so data values that
➢ Note that
➢ Once the data has been collected, it must be organised in a manner that allows for the
➢ Bar graphs, histograms and pie charts will be drawn directly from the data.
➢ A bar chart is used to present data where each observation falls into a specific category.
➢ The frequencies (or percentages) are listed along the y-axis and the categories are listed
➢ The bars are of equal width and should not touch neighbouring.
➢ A compound bar chart (also called component bar chart) is a variant: here the bars are
➢ If percentages are used for various components of a compound bar, then the total bar
➢ The compound bar chart is a little more complex but if this method is used sensibly, a
➢ It is often useful to look at the frequency with which certain values fall in pre-set groups
➢ The choice of the groups should be such that they help highlight features in the data.
➢ If these grouped values are plotted in a manner similar to a bar graph, then the resulting
➢ The same data used to plot a histogram are used to plot a frequency polygon, except the
pair of data values are plotted as a point and the points are joined with straight lines.
➢ frequency distributions, provided that the data has been grouped in the same way and
✓ Pie Charts
➢ A pie chart is a graph that is used to show what categories make up a specific section
of the data,
➢ and what the contribution each category makes to the entire set of data.
➢ A pie chart is based on a circle, and each category is represented as a wedge of the
Angular Size
4. Check that the total degrees for the different wedges add up to close to 360 degrees.
raw data
➢ All graphs that have been studied until this point (bar, compound bar,
histogram, frequency polygon and pie) are drawn from grouped data.
➢ Line and broken line graphs are plots of a dependent variable as a function of
➢ Usually a line graph is plotted after a table has been provided showing the
➢ Just as in (x; y) graphs, each of the pairs results in a specific point on the
graph, and being a line graph these points are connected to one another by a
➢ Many other line graphs exist; they all connect the points by lines, not
basic relationship between the given pairs of variables, and between these
➢ Summarising Data
If the data set is very large, it is useful to be able to summarise the data set by calculating a few
quantities that give information about how the data values are spread and about the central
✓ Mean or Average
➢ The mean, (also known as arithmetic mean), is simply the arithmetic average of a group
of numbers (or data set) and is shown using the bar symbol.
➢ The mean of a set of values is calculated by adding up all the values in the set and
2. Count how many data values there are in the data set.
➢ Median
➢ The median of a set of data is the data value in the central position, when the
data set has been arranged from highest to lowest or from lowest to highest.
➢ There are an equal number of data values on either side of the median value.
➢ Ungrouped data
2. Count how many data values there are in the data set.
✓ An easy way to determine the central position or positions for any ordered
data set is to take the total number of data values, add 1, and then divide
by 2.
✓ If the number you get is a whole number, then that is the central position.
✓ If the number you get is a fraction, take the two whole numbers on either
side of the fraction, as the positions of the data values that must be
✓ Mode
➢ The mode is the data value that occurs most often, i.e. it is the most frequent value or
❖ For ungrouped
❖ For grouped
➢ The mean, median and mode are measures of central tendency, i.e. they provide
information on the central data values in a set.
➢ When describing data it is sometimes useful (and in some cases necessary) to determine
the spread of a distribution. Measures of dispersion provide information on how the
data values in a set are distributed around the mean value.
➢ Some measures of dispersion are range, percentiles and quartiles.
✓ Range
➢ The range of a data set is the difference between the lowest value and the highest value
in the set.
Method: Calculating the range 1. Find
the highest value in the data set.
2. Find the lowest value in the data set.
3. Subtract the lowest value from the highest value. The difference is the range
✓ Quartiles
➢ Quartiles are the three data values that divide an ordered data set into four groups
containing equal numbers of data values.
➢ Lower quartile Q1 : P = 4
➢ The lowest 25% of the data being found below the first quartile value
➢ The median, or second quartile divides the set into two equal sections.
➢ Upper quartile Q3 : P =
➢ The lowest 75% of the data set should be found below the third quartile
➢ Note: for the grouped data you must remove 1 for positions
➢ The inter quartile range is a measure which provides information about the spread of a
data Set.
➢ and is calculated by subtracting the first quartile from the third quartile, giving the
range of the middle half of the data set, trimming off the lowest and highest quarters,
Semi-IQR =
✓ Percentiles
➢ Percentiles are the 99 data values that divide a data set into 100 groups.
➢ except the aim is to divide the data values into 100 groups instead of the 4 groups
required by quartiles.
➢ Position : P =
2. Count how many data values there are in the data set.
3. Divide the number of data values by 100. The result is the number of data values per group.
4. Determine the data values corresponding to the first, second and third quartiles using the
1. Minimum value
Maximum value
3. The median
The median is the middle most number when the data is arranged from smallest to greatest.
The upper and lower quartiles are the median of the upper
✓ Outliers
➢ A point or score which is widely separated from the other points or scores, this is mostly
applicable to the plot called scatter plot and Box and Whisker. ➢ To check the outlier :
➢ Box and Whisker Plots allow us to interpret the spread of the data more easily.
➢ The Box is the part from the lower quartile to the upper quartile
➢ and the whiskers are the lines on either end of the box.
➢ The end point of the whiskers gives us the minimum and maximum values. NB; it
➢ It is very important to note that the first 25% (first quarter) of results lies between the
➢ The next 25% (second quarter) of results lies between the lower quartile and the median.
➢ The third quarter lies between the median and the upper quartile and the last quarter of
data lies between the upper quartile and the maximum value.
Shape of a data set; this describes how the data is distributed relative to the mean and median.
✓ Positively skewed
If the mean > median then the data is positively skewed (skewed to the right). This means that
✓ Negatively skewed
If the mean < median then the data is negatively skewed (skewed to the left). This means that
✓ Symmetrically skewed
➢ Symmetrical data sets are balanced on either side of the median (spread fairly evenly).
➢ If the mean, median, and mode are approximately equal to each other, the distribution
➢ With both the mean and median known, the following can be concluded:
NB: The longer whisker shows the greater variability and/or spread.
➢ Ogive Curves (Cumulative Frequency curves)
In mathematics, the name ogive is applied to any continuous cumulative frequency polygon.
➢ The Cumulative Frequency is the sum of all the frequencies within a specific interval
or boundary.
➢ The sum of all the frequencies is always equal to the Cumulative Frequency value. ➢
The last number on the cumulative will give the total frequency
✓ Drawing of Ogive
➢ To plot the graph we plot the cumulative frequency value against the end point value
One extra point is obtained by plotting (x; 0), x is the lower boundary of the lowest class
interval. This is done because all the values must lie above x.
Frequency table
➢ Find the position of each and then use the cumulative frequency.
➢ Cumulative frequency curves make it very simple to answer questions that involve
➢ For box and whisker: the right lower interval indicate the minimum value and the right
Add all the y-values of the cumulative frequency and the divide it by number of point.
The bars „touch‟ meaning that we are working with continuous data. How to complete the
➢ The variance and standard deviation are measures of how spread out a set of data is.
➢ In other words, they are measures of variability. it is a measure of the average distance
between the values of the data in the set and the mean.
➢ If the data values are all similar, then the standard deviation will be low (closer to 0).
➢ If the data values are highly variable, then the SD is high (further from 0).
➢ The standard deviation is always a positive number and is always measured in the same
➢ For example, if the data are distance measurements in metres, the standard deviation
➢ If the data is more closed together the Standard deviation and mean
The variance: is the average squared deviation of each number from the mean.
➢ When data elements are tightly clustered together, the standard deviation
and variance are small; when they are spread apart, the standard deviation and the
variance are relatively large.
✓ A data set with more data near the mean will have less spread and a smaller
standard deviation
✓ A data set with lots of data far from the mean which will have a greater
spread and a larger standard deviation.
✓ Two methods
2. A calculator method
Method 1
➢ For ungrouped
X(score) (x - ̅) (x - ̅)2
4. Add all the values in column 3, and divide by the total number of original values. i.e.: find
5. To find the standard deviation, σ, square root the answer found in step 4.
The table used for a set of grouped data is slightly different as the frequency has to be taken
5. Add all the values in column 6, and divide by the total number of original values. i.e.: find
6. To find the standard deviation, σ, square root the answer found in step 5.
Method 2
The variance and/or standard deviation can be calculated easily with a calculator:
Although you are encouraged to use a calculator to calculate the standard deviation, you must
Step 3: Enter the numbers one by one followed by the equals after each number.
Once you have completed entering all the data as described in step 3, press the AC button once.
Step 4: Press the “ shift” button and then the “1” button. (Notice that Mean is also an option
Select 4: VAR
Kwv 1
➢ Note that when using the calculator be sure to put the frequency mode on.
➢ On the CASIO ƒx – 82ES PLUS, this is done by pressing “SHIFT” then ”SET UP”.
• The larger the standard deviation, the greater the variability of the data (the greater the
• A large standard deviation indicates that the data values are far from the mean and a
small standard deviation indicates that they are clustered closely around the mean.
➢ In science, the scatter plot is widely used to present measurements of two or more
related variables.
➢ It is particularly useful when the variables of the y-axis are thought to be dependent
upon the values of the variable of the x-axis (usually an independent variable). ➢
• The data points are plotted but joined the resulting pattern indicates the type and
➢ is the line drawn with the aim of having the same number of points above the line
➢ a - represent y- intercept
➢ MODE 2
➢ PRESS 2: A + BX
✓ Interpretation
❖ Relationship (correlation)
➢ Positive trend/gradient: as the variable on the x-axis increases, the variable on the
➢ Negative gradient/trend: as the variable on the x-axis increases, the variable on the y-
axis decreases.
❖ Correlation (r)
➢ If the points on the scatter plot are close to the line of best fit, we have a strong
➢ If the points are not so close to the line of best fit, we have a weak correlation or
➢ In analysing the scatter plot, you look for a pattern in the way the points lie.
➢ Certain patterns tell you that correlations (relationships) exist between the two variables.
➢ When describing the relationship between two variables displayed on a scatter plot, we should
comment on:
The form – whether it is linear or non-linear (either a quadratic or exponential curve).
The direction – whether it is positive or negative
The strength – whether it is strong, moderate or weak.
The points are scattered randomly over the graph indicating no pattern between the two sets of
data. This tells you that there is no correlation between the two variables.
The points show a „band‟ that slopes upwards from bottom left to top right. As one variable
increases, the other variable also increases. Such a pattern shows a strong positive
The points show a „band‟ that slopes downwards from top left to bottom right. As one
variable increases, the other variable decreases. Such a pattern shows a strong negative
The points are obviously clustered from bottom left to top right, but are not clustered together as
closely as with the strong positive correlation.
The points are obviously clustered from top left to bottom right, but are not clustered together as
closely as with the strong negative correlation.
➢ In many real-life situations, scatter plots follow patterns that are approximately linear.
However, it might sometimes look as though there is no correlation between the
➢ The points might look as though they are randomly scattered over the plane.
➢ However, on closer inspection you may be able to recognise a quadratic or an
exponential shape to the pattern of points or any other pattern.
The points in this scatter plot follow a curve from left to right, and show an exponential
KWV QP: 01