Chapter 2 Methods of Data Collection and Presentation
Chapter 2 Methods of Data Collection and Presentation
Set by G.F 1
2.1.1 Source of data
Sources of data
Depending on the source, data can be classified as Primary or
Secondary data.
1. Primary Data
Data measured or collect by the investigator or the user directly
from the source.
Primary data refers to the first hand data gathered by the
researcher himself.
data is gathered for the first time by the researcher for a given purpose
Example:
Set by G.F 3
Cont’d
b) Measuring: there are different options
Focus Group discussion
Telephone Interview
Mail Questionnaires
Door-to-Door Survey
New Product Registration
Personal Interview and
Experiments are some of the sources for collecting the
primary data
Set by G.F 4
cont’d
2. Secondary Data
Secondary data means data collected by someone else earlier
It is important to analyze the past situations
It is obtained from published and unpublished sources, such as
book, survey reports (Journals or periodicals ), official records,
newspapers, etc.
These are becoming very important as a secondary data collection is
concerned because they provide an up to date information which all the
books may not hold. These journals give our specific information about a
topic and are useful for the researcher.
Journals or periodicals have an advantage over the books that while the
books give general information about all the topics, journals and periodicals
Set by G.F 5
talk only about specific topics in detail which is why they are more helpful
Cont’d
1. Data reliability
The data connection analysis should be done by asking questions
like;
who collected the data,
what were the sources of the collected data,
when was the data collected
what were the methods used to collect it,
what’s the desired level of accuracy achieved and
if there is any bias by the compiler.
Set by G.F 6
Cont’d
(2) Suitability of the data
The researcher should carefully see the terms(variable types) and
units of collection (data) and the time at which the data is collected
It is compatible with the present study problem
The nature and classification of data
(3) Data sufficiency
Check whether the scope of the current study is not narrow or wider
than the secondary data
There are no biases and misreporting in the published data.
Set by G.F 7
sources of secondary data
b) Magazines/Journals or Periodicals
provide an up to date data than books
Provide specific information about a topic
b) General websites
• They may not contain reliable information
• There are some authentic websites that provide citations and bibliography
for every quote that has been made on their website example: Wikipedia
Set by G.F 10
Cont’d
c) Weblogs
They are the records kept in the form of a video or audio or written
format.
d) Blogs
• some of the blogs are considered to be authentic while others are not
much
for record-keeping.
it
person.
Set by G.F 13
Difference between primary & secondary data
Primary data Secondary data
Directly collected by Previously collected by other
investigator person
Less prone to error Much prone to error
Correction is possible Can’t be corrected
Takes time, cost & labor Doesn't’ take much time, cost &
labor
Set by G.F 14
2.1.2 Data collection methods
Questionnaire is the main data collection instrument in
formal survey
Depending on the amount of freedom given to a respondent in
offering responses, there are two basic types of questions:
1. Open-ended questions and
2. Closed ended questions
The type of questions for use will be determined by
the form of responses wanted,
the nature of the respondents and
their ability to answer the questions
Set by G.F 15
Cont’d
1) Open-ended questions: - allows the respondent to answer freely in
his or her own words
Example: what do you think are the reasons for a high drop-out rate of
village health committee members?
Set by G.F 17
Cont’d
Step 1: Content
Decide what questions will be needed to measure/define your
variables and reach your objectives.
When developing the questionnaire, you should reconsider the
variables you have chosen, and, if necessary, add, drop or change
some
Step 2: Formulating Questions
Formulate one/more questions that will provide the information
needed for each variable
Take care that questions are specific and precise enough that
different respondents do not interpret
Set by G.F them differently 18
cont’d
Cont’d
Set by G.F 19
Cont’d
Step 3: sequencing of questions
Design interview schedule/questionnaire to be consumer
friendly
The sequence of questions must be logical and allow natural
discussion, even in more structured interviews
Pose more sensitive questions as late as possible in the
interview
Set by G.F 20
Cont’d
Step 4: Formatting the questionnaire
When you finalize your questionnaire, be sure that
Each questionnaire has a heading and space to insert the number,
data and location of the interview, and , if required the name of the
informant/data collectors
Layout is such that questions belonging together appear together
visually
Sufficient space is provided for answers to open-ended questions
Set by G.F 21
Cont’d
Step 5: Translation
If interview will be conducted in one or more local
languages,
After having translating you should have it retranslated
into the original language
Set by G.F 22
Data collection methods
1) Interview(face-to-face, mailed or telephone)
a meeting between an interviewer and interviewee.
Set by G.F 23
Cont’d
5) Records And Document: extracting data from existing documents
The documents can be:
internal to an organization (such as emails, sales reports, records of
customer feedback, activity logs, purchase orders, etc.)
external (such as Government reports)
Set by G.F 24
Pros and cons of different data collection methods
1) Interview (via face-to-face or video conferencing tools)
Advantage
1) Accurate: The interviewee can’t provide false information
2) The interviewer can capture raw emotions, tone, voice, and word
choices to gain a deeper understanding
3) Interviewers can ask follow-up questions and require additional
information to understand attitudes, motivations, etc.
4) Participants do not need to be able to read and write to respond
Set by G.F 25
Cont’d
Disadvantage
i. High costs as this method require a staff of people to perform the
interview.
ii. The quality of the collected data depends on the ability of the
interviewer to gather data well.
iii. A time-consuming process that involves transcription, organization,
reporting, etc.
iv. Doesn’t give opportunity to probe and explore
Relatively inflexible
Less reliable to assess behavior and attitude of respondents
Set by G.F 26
Cont’d
2) Surveys and Questionnaires
Advantages
ii. Easily accessible and can be deployed via many online channels
like web, mobile, email, etc.
iv. Easy to analyze and present with different data visualization types
Set by G.F 27
Cont’d
Disadvantage
i. Survey fraud. Answers may not be honest as some people answer
online surveys just to receive a promised reward.
ii. Many questions might be left unanswered and participants may
not stay fully engaged to the end.
iii. Participants may have different interpretations of the questions.
iv. Cannot fully capture emotions and feelings.
Set by G.F 28
cont’d
3)Focus group discussion
Advantage
i. Easy measure the reaction of customers to your brand, products,
or marketing campaigns.
ii. The moderator can ask questions to gain a deeper understanding
of the respondents’ emotions.
iii. The moderator can observe non-verbal responses, such as body
language or facial expressions.
iv. Provide brainstorming opportunities and participants can create
new ideas.
Set by G.F 29
Cont’d
Advantage
i. Participants can not give honest answers for sensitive topics
ii. Requires strong facilitator
iii. Doesn’t give quantitative information
iv. It is difficult to organize the discussion
Set by G.F 30
Cont’d
4) Observation
Advantage
i. Simple to collect data. Observation does not require tech skills of
the researcher.
ii. Allows for a detailed description of behaviors, intentions, and
events.
iii. Provide accurate information: The observer can view participants in
their natural environment and directly check their behavior.
iv. Doesn’t depend on people’s willingness to report. Some
respondents don’t want to speak about themselves or don’t have time
Set by G.F 31
for that.
Cont’d
Disadvantage
i. Can take a lot of time if the observer has to wait for a particular
event to happen
ii. Cannot study attitudes and opinions
iii. Liable to subjective observational bias. The personal view of the
observer can be an obstacle to make valid conclusions.
iv. Expensive method. It requires a high cost, effort, and plenty of
time.
v. Situations of the past cannot be studied
Set by G.F 32
Cont’d
6) Secondary data
Advantage
i. Ease of data collection as it needs lees resource (labor & cost)
and time)
ii. No need of searching and motivating respondents to participate in
the study
iii. Allows to track history of events/progress. For example, you may
want to find out why there are lots of negative reviews from your
customers about your products. In this case, you can look at
recorded customers’ feedback.
Set by G.F 33
Cont’d
Disadvantage
i. Information may be outdated or inapplicable
ii. Time-consuming
iii. No knowledge on the accuracy of data collection
iv. Less likely to give qualitative information
Set by G.F 34
2.2 METHODS OF DATA PRESNTATION
Classification is the process of arranging data in to classes or
categories according to similarities
Classification is a preliminary and it prepares the ground for proper
presentation of data.
Mainly, the purpose of classification is to divide the data into
homogeneous groups or class
Set by G.F 35
Cont’d
3. Multi-way (high order) table: classify data based on more than two characteristics
Set by G.F 37
Cont’d
The presentation of data is broadly classified in to two categories:
Set by G.F 39
Cont’d
The reasons for constructing a frequency distribution
Set by G.F 41
Cont’d
.
Set by G.F 42
Cont’d
.
Set by G.F 43
Exercise 2.1
1. Twelve people were asked which sandwiches they had bought from
a sandwich shop. Their answer were:
Having this,
a) Prepare categorical frequency distribution
Set by G.F 44
2. Un grouped frequency distribution
It is used for small number of observations
It is done by putting all individual values in the dataset in ascending
order along with the number of times each observation actually occurs
Set by G.F 45
Cont’d
Set by G.F 46
Exercise 2.2
A class of 30 students were asked how many brothers and sisters they
have. Here are the results.
Set by G.F 47
Exercise 2.3
1. A survey is carried out to test the manufacturer’s claim that there
are ‘about 36 chocolate buttons in each packet’. The number of
buttons in each of 25 packets is counted as follow:
Set by G.F 48
3. Grouped frequency Distribution
.
Set by G.F 49
Cont’d
.
Set by G.F 50
Cont’d
Class width
Set by G.F 51
Cont’d
.
Set by G.F 52
Cont’d
.
Set by G.F 53
Cont’d
.
Set by G.F 54
Cont’d
.
Set by G.F 55
Cont’d
.
Set by G.F 56
Cont’d
.
Set by G.F 57
Cont’d
.
Set by G.F 58
Cont’d
.
Set by G.F 59
Cont’d
.
Set by G.F 60
Exercise 2.4
A fitness club carries out a survey to find out the ages of its members.
Here are the results.
Set by G.F 61
2. Diagrammatic and Graphic presentation of data
These are techniques for presenting data in visual displays using
geometric and pictures
Importance: -
Ꙫ They have greater attraction.
Ꙫ They facilitate comparison.
Ꙫ They are easily understandable.
Set by G.F 62
Cont’d
The three most commonly used diagrammatic presentation for
discrete as well as qualitative data are Pie charts, Pictogram & Bar
chart
1. Pie chart
A Pie Chart is a circular chart divided into sectors, illustrating relative
magnitudes or frequencies of classes of a given variable.
Pie chart usually represents categorical data but it is also possible to
use it for discrete quantitative data.
The angle of each sector has to be proportional to the relative
frequency of a given class, which is: 𝑣𝑎𝑙𝑢𝑒𝑜𝑓 𝑝𝑎𝑟𝑡 0
Set by G.F Angle= ∗ 360
63
𝑣𝑎𝑙𝑢𝑒𝑜𝑓 𝑤h𝑜𝑙𝑒𝑞𝑢𝑎𝑛𝑡𝑖𝑡𝑦
cont’d
Set by G.F 64
Cont’d
.
Set by G.F 65
Cont’d
2. Pictogram
It is presenting data with the help of pictures
Here the magnitudes of quantities of the variable are explained
with the help of pictures which depict the variable
approximately
Example: The following table shows the orange production in a
plantation from production year 1990-1993. Represent the data by a
pictogram
Set by G.F 66
Cont’d
Solution:
Set by G.F 68
1. Simple Bar Chart
Are used to display data of one categorical variable
They are thick lines (narrow rectangles) having the same
breadth.
The magnitude of a quantity is represented by the height
/length of the bar
Example: The following data represent sale by product, 1957-
1959 of a given company for three products A, B, C.
Set by G.F 69
Cont’d
Solution:
Set by G.F 70
2. Component Bar chart
Set by G.F 71
3. Multiple Bar charts
Set by G.F 72
b) Graphical Presentation of data
The histogram, frequency polygon and cumulative frequency
graph/Ogive is most commonly applied graphical
representation for continuous data.
Procedures for constructing statistical graphs
Draw and label the X and Y axes.
Choose a suitable scale for the frequencies/cumulative
frequencies and label it on the Y axes.
Represent
the class boundaries for the histogram/Ogive
the mid points for the frequency polygon on the X axes.
Set by G.F 74
Cont’d
If we want to draw Histogram for this data it would be like this:
Set by G.F 75
b) Frequency Polygon
Frequency Polygon depicts a frequency distribution for discrete or
continuous numeric data.
It is used to for understand the shapes of distributions.
It is done by placing the mid-point on the x-axis and frequency on y-
axis
A Histogram can easily be changed to Frequency Polygon by joining
the mid points of the top of the adjacent rectangles of the Histogram
with a line.
It is also possible to draw Frequency Polygon without drawing
Histogram.
Set by G.F 76
Cont’d
Example: the following Frequency Distribution represents the ages (in
years) of 60 merchants at a psychiatric counseling center.
Set by G.F 77
Cont’d
.
Set by G.F 78
C) O’give (cumulative frequency polygon)
Example: These data represent the record high temperatures in for each
of the 50 states. Construct less than type ogive curve for the data given
below.
Set by G.F 79
Cont’d
.
Solution:
Set by G.F 80
Cont’d
....
Set by G.F 81
tha
En d nk
of c you!
hap ! !
ter
two
Set by G.F 82