1A Sources of Data
1A Sources of Data
1A Sources of Data
TYPES OF DATA
Qualitative data – tends to be non-numerical because it looks at characteristics, descriptions, opinions,
feelings, estimates.
Quantitative data – tends to be numerical because it is measurable. Therefore, it will often have units.
Discrete data – can take only certain values within a range.
Continuous data – can be any value, and is only restricted by the level of accuracy of the measuring
instrument used.
Examples: State the type of data for each example given below:
1. What something smells like
2. The number of hours someone studied
3. Your favourite place to go
4. How good food tastes
5. The sound level on a stereo system
6. Your score on your last Maths test
7. The height of your favourite sports personality
8. The weight of your pet
9. The most popular names for babies each year
Queen’s College
Mathematics Department
Mr. Goodridge, Mrs. Maxwell Page 1
Topic 1 Sources of Data
Read Pgs 1 – 4 (Discrete & Continuous Data)
DEFINITIONS
Population – All of the elements, persons, animals or units that fall into a set or group under analysis.
Sample – a subset of the population which is studied in order to make a determination about what the
population should be like.
Survey – a way of gathering information about a population.
Census – a survey taken from an entire population.
Sample survey – a survey taken using any portion of a population apart from the whole.
Parameter – a numerical value that is characteristic of a population and can be found or estimated by
calculation using survey data. E.g. the mean, variance, etc.
Statistic – a number that represents a piece of information, i.e. a numerical datum. E.g. how often you
do something, how common something is, etc. These are generally summarised from sample data.
SAMPLE FRAMES
When it has been determined exactly what group you are going to study (target population) and how, a
comprehensive list of the members of that population must be created. This list is called the sample
frame. Each member of the list is given a number to identify them by, which allows members to be
referred to discretely, and creating the list also allows for easy subdivision into more manageable
pieces if necessary so an effort is made to list the members in a logical and systematic way (this also
allows members to be easily located). The information that will be used to find or contact each sample
unit is included (telephone numbers, addresses, form, etc). Depending on the difficulty in acquiring
certain types of information, you may list cluster groups to gather information from.
Examples
8. Population: Students taking Applied Maths Unit 1 at Today’s Secondary
Sample Frame: 1. Band C – Appiana Holmes, L6Picasso 2. Band C – Dapple Athlete,
L6Einstein 3. Band D – Lionel Mathemati, L6Suzuki …
9. Population: Birds in Barbados
Sample Frame: 1. Doves 2. Pigeons 3. Herons 4. Egrets 5. Hummingbirds
6. Finches 7. Blackbirds 8. Sparrows ….
Queen’s College
Mathematics Department
Mr. Goodridge, Mrs. Maxwell Page 2
Topic 1 Sources of Data
10. Population: Students at Queen’s College
Sample Frame: Nominal Roll
11. Population: Home owners in St. James
Target Population: Home owners from St. James listed in the directory (telephone survey)
Sample Frame: List of Home owners listed in St. James
12. Population: Eligible voters
Sample Frame: Electoral register
13. Population: People who live in Bridgetown
Sample Frame: Map of Bridgetown with a list of streets to be surveyed (cluster groups)
SAMPLING
Why is sampling necessary? Often a population is far too large to feasibly do a census. The amount of
resources required is too large to realistically tackle such a project – too many people to survey, too much
time needed to reach them (you need the information soon), too large a workforce needed to engage
everyone (you don’t have the money to pay them, the plant to house them, the computers for them to
input the data into, …). With this in mind, you can pick a subset which you hope will tell you what you
what to learn about the population. Often you try not to bias which elements of the population you put
into your sample, so that you can feel confident that you have not skewed your analysis onto one or two
sections of the population. Thus you want to give each element an equal chance of being chosen for your
sample. This produces what is known as a random sample.
Queen’s College
Mathematics Department
Mr. Goodridge, Mrs. Maxwell Page 3
Topic 1 Sources of Data
RANDOM VS NON-RANDOM SAMPLING
Any sample in which an effort is made to give each sample unit an equal opportunity to be chosen is
random. When that is not followed then a non-random sample results. You may decide on a sample of
convenience, for example, where you choose to poll all of the persons on your street, which would give
persons living outside of your area no chance of being chosen.
SAMPLING METHODS
Simple Random – Once you have a sample frame and each of the sample units is numbered, you can
randomly choose a number to pick a sample unit to be included in your sample. The methods used to
choose the random numbers will be explored more in the next section.
Stratified Random – The problem with simple random sampling is that although an effort has been
made to choose the members of the sample randomly, all of the members may still end up coming from
the same section of the population. E.g. a survey of the students of this school may turn out having a
majority of students from 2nd form and lower 6th. If the survey is interested in the students’ opinions
about their timetable then this will clearly produce a biased viewpoint. To avoid this problem,
especially in cases where it is deemed important to get the viewpoints/measurements of all segments of
the population, the population is first stratified (i.e divided into groups with seemingly similar
characteristics) and then individual members are then chosen randomly out of each group. E.g. a
sample of student opinions at this school may first stratify students by form level and/or gender before
picking students to be surveyed.
Systematic Random – in this method every 𝑘th term is chosen to be included in the sample. E.g. if
your population has 1000 elements and you would like a sample of 100 elements then you would chose
every 10th element. The first element would be chosen randomly out of the first 10 elements and then
the 10th element after that would be chosen. For example, if the first element chosen randomly was the
3rd one, then the other positions chosen would be 13th, 23rd, 33rd, 43rd, 53rd, etc.
Cluster – this method is used when it is difficult to create an exhaustive list of the sample units. If we
were seeking after the opinions of churchgoers in Barbados, it may not be practical to try to create a list
of all members of all church organisations before starting to choose persons for the sample. Instead,
you can create clusters (denominations of the church) and then sample clusters using one of the
previous methods discussed. All of the possible sample units in any given cluster would be studied
(one-stage) or units can be chosen out of each cluster to be sampled (two-stage).
Queen’s College
Mathematics Department
Mr. Goodridge, Mrs. Maxwell Page 4
Topic 1 Sources of Data
Quota – the population is broken down by characteristics and the proportion of the population with
each characteristic is expected to be the same proportion in the sample. E.g. you would need to know
what proportion of the population is male and female, and then for each of these subgroups, how many
persons are in the various age categories, ethnic groups, urban/rural, etc. A matrix is then created with
each of the groups in their proportions and you simply have to find a person who meets the criteria for
each subgroup’s quota. E.g. a survey about life in Barbados may only be able to poll 318 people. If
the population is 48% male and 52% female, then you’re aiming to poll 153 males and 165 females.
We then have 18.29% people aged 0 - 14, 13.35% people aged 15 - 24, 44.62% people aged 25 - 54,
12.87% people aged 55 - 64 and 10.88% people aged over 65. This would mean the matrix could look
like this:
Males Females Total
0 – 14 28 30 58
15 - 24 20 22 42
25 – 54 68 74 142
55 – 64 20 21 41
65 and over 17 18 35
Total 153 165 318
Once the number of persons of each gender and age category are then polled it satisfies the criteria for
this survey.
- (ii) Electronic Random Number Generators – calculators and computer software (like Excel
and Numbers) have random number functions (Ran# and RanInt on the calculator or RAND
or RANDBETWEEN on the computer). The Ran# and RAND functions generate a number
from 0 to 1. This number can then be multiplied by the number you are interested in to
generate a number of the correct size. The RanInt and RANDBETWEEN functions allow
you to insert the two endpoints that you’re interested in. E.g. RANDBETWEEN(100, 500)
will randomly generate numbers from 100 to 500. This is another unbiased way of picking
sample units to include in your survey.
Queen’s College
Mathematics Department
Mr. Goodridge, Mrs. Maxwell Page 6
Topic 1 Sources of Data
ADVANTAGES & DISADVANTAGES OF SAMPLING METHODS
Sampling Method Advantages Disadvantages
Simple Random - Avoids the bias of the - Requires large amounts of
researcher resources – time, effort,
- Should represent the target money, access to
population information
- Can be used with - Can lead to poor
populations of any size representation of the target
population if the random
numbers omit large areas
of the population
Stratified Random - Guarantees a high degree of - Very time-consuming and
representativeness of all tedious to set up
segments of the target
population
- Because of the high degree
of representativeness we
can have confidence that
the results can be
generalized for the entire
population
Systematic Random - Ensures a fairly - It is not as random as simple
representative sample is random sampling
obtained without the need - Requires a lot of time, effort
to generate many random and money
numbers - Some areas of the target
population may be over or
underrepresented, especially
if there is some kind of
pattern occurring in the
population
Cluster - Makes it possible to create - Members of selected
a sample when there is no clusters may be very alike
Queen’s College
Mathematics Department
Mr. Goodridge, Mrs. Maxwell Page 7
Topic 1 Sources of Data
means of obtaining an but may be very different
itemised list of sample units from unselected cluster
- Easy, cheap and convenient groups
to start - It may not be easy to
generalise the findings to
the entire population
Quota - Creates a truly - Because the sample units
representative sample of the are not specific, the bias of
target population the researcher may
- It is easier and faster to introduce a bias
carry out since it does not - In this regard, it is then
require a sample frame or difficult to know how
specific sample units much of the results can be
generalized
Do Pg 430 Ex 9a Qs 3, 5, 6, 8
Queen’s College
Mathematics Department
Mr. Goodridge, Mrs. Maxwell Page 8
Topic 1 Sources of Data
PAST PAPER QUESTIONS
QUESTION 1
Queen’s College
Mathematics Department
Mr. Goodridge, Mrs. Maxwell Page 9
Topic 1 Sources of Data
QUESTION 2
Queen’s College
Mathematics Department
Mr. Goodridge, Mrs. Maxwell Page 10
Topic 1 Sources of Data
QUESTION 4
Queen’s College
Mathematics Department
Mr. Goodridge, Mrs. Maxwell Page 11
Topic 1 Sources of Data
QUESTION 5
QUESTION 6
QUESTION 7
Queen’s College
Mathematics Department
Mr. Goodridge, Mrs. Maxwell Page 13
Topic 1 Sources of Data
QUESTION 8
Queen’s College
Mathematics Department
Mr. Goodridge, Mrs. Maxwell Page 14
Topic 1 Sources of Data
QUESTION 9
Queen’s College
Mathematics Department
Mr. Goodridge, Mrs. Maxwell Page 15