Classification and Data Collection
Classification and Data Collection
com
1.2.2Quantitative data
The name of this data type let us know it refers to quantities. Therefore, quantitative data refers to
variables of quantities which can be either measured or operated on. A quantitative variable can be
counted, measured and/or operated with; it provides specific information on a numerical scale.
From a sample of items, quantitative variables can refer to the weight of the objects in the study,
their temperature, their volume, or just about any type of measurement or numerical value from
them.
For example, if you are to count the amount of people having dinner at a restaurant, this would be
discrete data, first, because you are counting; second, you cannot have fractions of people, you can
only have complete people. Discrete data comes in the form of whole numbers or integers.
On the other hand, if you measure the time it takes for each table in the restaurant to receive what
they ordered (hopefully within the range of an hour) you will have values containing hours, minutes,
second, and even fractions of a second if you want to increase precision! And so, these values would
be a set of continuous quantitative data, first, because you measured them; second, because you can
have any value (any value containing decimals, not just integers) within the reasonable range.
Notice that from the four examples of quantitative variables listed above, the first two are examples
of discrete variables, while the third and fourth are examples of continuous variables.
The main difference between qualitative and quantitative data is that you can count or measure a
quantitative variable, while you can only describe or define the characteristics of a qualitative
variable. When performing qualitative vs quantitative research the information that can be obtained
is quite different, while qualitative data will provide the conditions of the study's subject (such as
what type of object, color, shape, state, etc), quantitative data provides a amounts that have been
counted or measured as the variables from the study's subject (such as how many incidences of an
Lecture Notes@simeyoous@yahoo.com
event per a unit of time, quantities such as weight, height, length, mass, and if the subject is moving,
how fast and is it accelerating or not?)
NOTE
Numbers can also be used as a qualitative variable. Take for example your driver's license number or
your student ID number, can you operate with them? No, but they are still numbers right? So how
are they qualitative?
Well, these are numbers assigned to a particular person or thing (for example a car plates number),
and so, they actually act like a name. For example, if you receive an student ID number of 012345, it
means that the system within your school has you as "file 012345", and so, you could very well just
go and say "I am 012345" and that would serve as another name to identify yourself for the school
system. Thus, a student ID number is a label (quality) assigned to you, you cannot measure it, you
cannot count it or operate on it and so, it cannot be quantitative.
The note above takes us to an important concept between quantitative vs qualitative data, which is
the four levels of measurement of statistical data.
In order from lowest to highest, the four levels of statistical data are nominal, ordinal, interval and
ratio.
1.3.1 Nominal
The word nominal comes from latin meaning "name", therefore, nominal level of measurement for
statistical data refers to names, labels or qualities. There is truly no numerical value and so the level
of measurement is zero; therefore, statistical variables with only nominal level of measurement can
be categorized in groups, but they cannot be arranged in any particular order.
1.3.2 Ordinal
The term ordinal refers to the order of items in a list or series, and thus, this particular level of
measurement refers to statistical data that can be ranked or ordered in a meaningful manner.
There is something very important to be said about data with an ordinal level of measurement, this
type of data provides certain information that allow us to arrange the items in a particular order, but
it does not provide any information about numerical values that the items on the list to be ordered
might have. For example, if you were asked to rank a girl, her mother and her grandmother from
oldest to youngest, you would be able to do it without a problem because this data has an ordinal
level of measurement (we can automatically infer that the grandmother is the oldest one, and the girl
the youngest one because that is how nature works!) but you do not have any information about
their ages, so you do not have any numerical information about them.
Data with a level of measurement either nominal or ordinal is qualitative data.
Lecture Notes@simeyoous@yahoo.com
On the other hand, quantitative data can have either an interval or ratio level of measurement.
1.3.3 Interval
The interval level of measurement of a variable refers to a specified space, which is defined as a
scaled space where the zero is the origin (the reference point of the settled system). In simple words,
a variable has an interval level of measurement when it belongs to a scale range, be it physical scale
spaces such as euclidean coordinate systems, or just significative scales such as the Celsius
temperature scale.
For statistical data with interval level of measurement, the zero entry represents a position on the
particular scale, but not an inherent value. For example, if a substance has a temperature of zero
degrees celsius it does not mean that it has no heat, the zero point in the scale was picked because is
the freezing point of water.
Notice that data with interval level of measurement is similar to data with an ordinal level of
measurement in that they both can be ordered and ranked, the main difference is that data with
interval level contains precise numerical value information in each of its terms.
1.3.4 Ratio
The ratio level of measurement in statistical data is similar to the interval level, with the imperative
difference that this kind of data has an inherent zero, meaning that the value of zero actually exists
as a quantity, a variable of zero means "no quantity" or simply "none".
The term ratio is used since quantities are expressed as the ratio of the magnitude of a particular type
of quantity against the quantity of the established unit in that scale; in simple terms, the ratio level of
measurement is easily thought of "how many" or "how much" of a particular quantitative variable,
and the value of zero truly means a value of none (this is what's called an inherent zero).
Remember that the Celsius scale of temperature was used to provide an example of the interval level
of measurement, the Fahrenheit scale is also part of the interval level; the Kelvin scale on the other
hand is not. The Kelvin scale has a value of zero which happens to be the absolute zero value of
heat, meaning there is not kinetic energy (heat) in a body with such temperature (which by the way,
is unobtainable); therefore, the Kelvin temperature scale has a level of measurement of ratio.
Example 1
Determine which of the following data is quantitative or qualitative:
1. The marks that students get in a test.
2. The genders of newborn babies.
3. The area codes in phone numbers.
4. The heights of buildings
Solution
Lecture Notes@simeyoous@yahoo.com
Answer: Quantitative.
Test marks are numerical values that can be compared, and have an intrinsic value to them
belonging to a scale, they are not labels. Therefore, this is quantitative data.
2. The genders of newborn babies.
Answer: Qualitative.
Gender is a descriptive variable, therefore, a study gathering this kind of information would be
collecting qualitative data.
3. The area codes in phone numbers.
Answer: Qualitative.
Although area codes are numbers, they are labels assigned to particular geographical areas
within a city. They can be ordered, but cannot be counted or measured using the numerical
symbols in them, thus, they are qualitative data.
4. The heights of buildings.
Answer: Quantitative.
The heights of buildings are numerical values that can be measured, even more, they can be any
value within a reasonable range and so, this so happens to be continuous quantitative data.
Example 2
1. The number of customers visiting a store over a weekend.
2. The amount of water consumed by a country over the past 10 years.
3. The outcomes of rolling a 6-sided die ten times.
4. The heights of trees in a rainforest.
5. Students' shoe sizes in a class.
Solution
1. The number of customers visiting a store over a weekend
Answer: Discrete.
You can count the number of customers in the store and the resulting quantities are whole
numbers.
2. The amount of water consumed by a country over the past 10 years.
Answer: Continuous
The amount of water can be measured and the resulting value will probably contain decimals,
not just integers, since it requires higher levels of precision than just whole numbers.
3. The outcomes of rolling a 6-sided die ten times.
Lecture Notes@simeyoous@yahoo.com
Answer: Discrete
There are just 6 possible outcomes of rolling a 6-sided die; since the possible outcomes are a
finite number that can only be expressed in whole numbers, this is discrete data.
4. The heights of trees in a rainforest.
Answer: Continuous
Measuring the heights of trees will result in values containing decimals, and these would be any
value within the range of possible heights of a tree species.
5. Students' shoe sizes in a class.
Answer: Discrete
Shoe sizes can belong only to a particular range of values, therefore, they are not continuous.
Example 3
Identify the level of measurement used in the following scenarios (nominal, ordinal, interval or
ratio):
1. A research on the causes of deaths in a country.
2. A research that wants to find out the relationship between the amount of time students spend
on preparing for the exam and the marks they get in it.
3. A survey tries to find out how people rank the importance of: safety, price, speed, and comfort,
when they are buying cars.
4. A research on how humidity in the air changes over the year in a city.
Solution
1. A research on the causes of deaths in a country.
Answer: Nominal
This data comes from qualitative research which will provide the descriptions of what is
causing deaths in the population of a country, therefore each variable will be a labeled category
only, thus nominal.
2. A research that wants to find out the relationship between the amount of time students spend
on preparing for the exam and the marks they get in it.
Answer: Ratio
This data comes from quantitative research that can have an inherent zero among the values of
the statistical data gathered. Simply said, a student could have spent no time studying,
therefore, this data has a level of measurement of ratio.
3. A survey tries to find out how people rank the importance of: safety, price, speed, and comfort,
when they are buying cars.
Answer: Ordinal
This data can be ranked without having numerical values for each variable, therefore its level of
measurement is ordinal.
4. A research on how humidity in the air changes over the year in a city.
Lecture Notes@simeyoous@yahoo.com
Answer: Interval
The key point to determine the level of measurement of this data is to observe that the data is
collected as values (thus is quantitative data) belonging to a particular range of values, an affine
space, where the zero is not included, and so, this data has an interval level of measurement.
Data Collection is an important aspect of any type of research study. Inaccurate data collection can
impact the results of a study and ultimately lead to invalid results.
Data collection methods for impact evaluation vary along a continuum. At the one end of this
continuum are quantitative methods and at the other end of the continuum are Qualitative methods
for data collection
This is data directly collected, or observed for the first time, by the researcher for a specific purpose.
It is original in nature.
Primary data is collected through various methods and these may include:
a) Questionnaire
This is a research instrument consisting of a sequence of questions and other prompts for the
purpose of collecting information from respondents. The questionnaire translates the research
Lecture Notes@simeyoous@yahoo.com
objective into specific questions. The answers to these questions provide relevant data for drawing
inferences. The questions relate to the problem of inquiry directly or indirectly. The questionnaire
should consist of a note briefly explaining the aims and objectives of the inquiry. The researcher
should ensure the secrecy of the information as well as the details, for example, name of the
respondent, if required.
• They usually have standardized answers that make it simple to compile data.
• They do not require as much effort from the questioner as verbal or telephone surveys.
• Very economical in terms of time, energy and money. Widely used when the scope of
inquiry is large.
• Data collected by this method is not affected by the personal bias of the researcher.
Use of clear and comprehensive wording which are understandable for all educational levels
Use of correct grammar in wording.
Use of statements that will be interpreted in the same way by respondents from different
sample areas of the same population.
Assumptions about the respondents should not be made Questions should be impersonal
and non-aggressive.
Use of positive statements, hence avoiding negative ones.
Items that contain more than one question per item should be avoided, for example, do you
give your children fruits and meat?
Use of only one aspect of the construct you are interested in per item.
Anticipate of receiving open ended answers from the respondents. Therefore, the questions
should be multiple choices, simple alternative and open ended. In the simple alternative
Lecture Notes@simeyoous@yahoo.com
questions, the respondent chooses between alternatives such as ―yes‖ or ―no‖,‖ true‘ or
―false‖ while in the open ended questions the respondents are given maximum freedom in
answering the questions. For example, what are the causes of corruption? In the multiple-
choice questions, the respondent chooses the best alternative from the ones that are given.
Use of statements where respondents that have different opinions or traits will give
different responses
(b) Survey
Surveys are often used when information is sought from a large number of people or on a wide
range of topics (where in-depth responses are not necessary). They can contain yes/no, true/false,
multiple choice, scaled, or open-ended questions — or all of the above. The same survey can be
conducted at spaced intervals to measure change over time.
Some of the advantages of surveys are that respondents can answer questions on their own time,
and may answer more honestly as questionnaires provide anonymity (whether real or perceived).
And while the responses may be biased on the part of the participant, they are free from the
collector‘s bias.
Advantages of statistical survey;
Disadvantages of Survey
Misinterpretation of data results.
Inappropriate use of data analysis procedures.
The data is collected by direct person interviews. With this method, the researcher directly contacts
the respondents, solicits for their cooperation and enumerates the data.
At its most simple, observation involves ‗seeing‘ things – such as objects, processes, relationships,
events – and formally recording the information. There are different types of observation.
Structured or direct observation is a process in which observations are recorded against an agreed
checklist. Expert observation is usually carried out by someone with specific expertise in an area of
work, and involves the expert observing and recording information on a subject. Observation may
also be carried out as a participatory exercise. Where this is the case the intended beneficiaries of a
project or programme are involved in planning an observation exercise, observing, and discussing
findings
(d)Interview
This is involves asking individuals the required information. There are two types of research
interviews:
• One-to-one interview this may be either face-to-face interviews or telephone interviews.
• One-to-many interviews.
Disadvantage of interviewing
Inaccurate or false data may be given to the interviewer. This may be due to:
i) misunderstanding the question,
ii) forgetfulness or
iii) deliberate intent to mislead.
If a number of interviewers are employed, they may record the answers in the same way as the
investigator himself would.
Focus group discussions (FGDs) are facilitated discussions, held with a small group of people who
have specialist knowledge or interest in a particular topic. They are used to find out the perceptions
and attitudes of a defined group of people. FGDs are typically carried out with around 6-12 people,
and are based around a short list of guiding questions, designed to probe for in-depth information.
FGDs are often used to solicit the views of those who would not be willing or able to speak up at
larger group meetings. They may also be used to access the views of minority or disadvantaged
groups, such as women, children or people with disabilities.
Lecture Notes@simeyoous@yahoo.com
This is processed data that have been already collected and readily available from other sources.
Information can be used for planning, monitoring or evaluation that has been collected by other
people or organisations for their own purposes. This is known as secondary data. Secondary data
might include government statistics, NGO reports, newspaper or website articles, hospital records,
research studies, evaluations conducted by other agencies, and community records – to name just a
few. Secondary data is often a valuable source of information that can supplement other forms of
data collection.
In order for the secondary data to be reliable the following requirements must be satisfied:
Relevance of the data. The data should satisfy the requirements of the problem under
investigations, that is, concepts used must be the same and the data should not be outdated,
units of measurement must be the same.
Accuracy of the data. In order to ensure how accurate, the data is, the following requirements
must be considered: i) specifications and methodology used; ii) margin of error should be
examined; and iii) dependability of the source must be seen.
Availability of the data. It has to be seen that the kind of data under investigation is available or
not. In case the data is not available then in this case one uses the primary data.
Sufficiency of the data. Adequate data must be available.
It is economical; it saves expenses and efforts, since it is obtainable from other sources.
It is time saving, since it is more quickly obtainable than the primary data. It provides a basis for
comparison for the data that is collected by the researcher.
It helps to make the collection of primary data to be more specific, since, with the help of
secondary data, one is able to identify the gaps and inefficiencies, so that the additional or
missing information may be collected.
Government autonomous bodies, such as KEB, KRA, central Bank, KIPPRA, KBS,
etc., collect and display statistical data on examination results, taxation, inflation rate,
development and environment, respectively.
International publications. International agencies publish regular reports of international
importance. These agencies include the International Monetary Fund (IMF),
International Labour Organization (ILO), World Meteorology Organization, etc.
Private publications. Some private sectors also publish reports, for example, Rwanda
Exchanges Board , NGOs, etc.
Newspapers and magazines. Various newspapers, as well as magazines, have statistical
information on social and economic aspects. Some of these include; The standard,
Monitor, Kenya Times, Daily Nation, Taifa Leo, Business Magazine, The Observer,
Bridal Magazine, etc.
Archives: Libraries have historical data of significance.
Internet websites.
NOTE
Document analysis guide is the tool /method used to collect secondary data. It basically contains
guidelines on data items to be collected. These data items are filled during the interaction with the source
documents in the field.
Direct Personal observation as discussed under Primary data can also be used to collect secondary data