Chapter 2 Method of Data Collection and
Chapter 2 Method of Data Collection and
Wullo S. (MPH)
Reading assignment
Methods of data collection
• Having collected and edited the data, the next important step is to
organize it.
• The process of arranging data in to classes or categories according to
similarities is called classification
• Classification is a preliminary and it prepares the ground for proper
presentation of data.
• The presentation of data is broadly classified in to the following two
categories:
• Tabular presentation
• Diagrammatic and Graphic presentation.
7/9/21 Wullo S. 4
Tabular presentation of data
• Frequency distribution: is the organization of raw
data in table form using classes and frequencies.
• Frequency: is the number of values in a specific class
of the distribution
• Raw data: recorded information in its original
collected form, whether it be counts or
measurements, is referred to as raw data.
7/9/21 Wullo S. 5
Frequency distribution (F.D.)…
7
Categorical F.D…
8
Ungrouped FD for Discrete Variables
9
Discrete/Ungrouped FD…
No.of 2 3 4 5 6 7 8 Total
children
No. of family 5 7 8 4 1 2 3 30
(f)
10
Continuous/grouped F.D
11
Continuous/grouped F.D…
Example: Consider the following FD on wages of 100
workers in a factory.
Wage (CI) 40-44 45-49 50- 55-59 60- 65- 70-74 75-79
54 64 69
Freq. 6 9 15 17 20 13 12 8
CB’s 39.5- 44.5- 49.5- 54.5- 59.5- 64.5- 69.5- 74.5-
44.5 49.5 54.5 59.5 64.5 69.5 74.5 79.5
12
Continuous/grouped F.D…
13
Continuous/grouped F.D…
14
Continuous/grouped F.D…
The lowest and height values that can be included in a class such
The lower class limit of the first class should be the smallest
Add the size of a class on the lower class limit to obtain the
15
Cont…
To find the upper limit of the first class, subtract U from
the class width to this upper limit to find the rest of the
upper limits or
=LCL+ (W-1)
Continuous/grouped F.D…
4. Determine the Class boundaries
– Let U =LCL of a class – UCL of preceding class. Add half of
this difference (U/2) to all upper class limits to get the upper
class boundaries (UCBs), and subtract (U/2) from all lower
class limits to get the lower class boundaries (LCBs).
– UCBi = UCLi +U/2
– LCBi = LCLi – U/2
44 50 79 63 66 54 56 70 56 63
60 87 60 70 59 60 62 88 71 53
56 65 74 80 51 83 69 77 69 50
58 42 43 85 43 75 55 60 58 49
72 67 55 77 48 45 61 47 44 61
Solution:
Step 1: Find the highest and the lowest value H=88, L=42
Sturges formula;
(rounding up)
….Cont
Step 5: Select the starting observation as lowest class limit (this is
usually the lowest observation). Add the width to that observation
to get the lower limit of the next class. Keep adding until there are
7 classes.
42, 49, 56, 63, 70, 77, 84 are the lower class limits.
Step 6: Find the upper class limit; e.g. the first upper class=42-
U=49-1=48
48, 55, 62, 69, 76, 83, 90 are the upper class limits.
So combining step 5 and step 6, one can construct the following
classes.
So combining step 5 and step 6, one can construct the following classes.
Class limits
42-48
49-55
56-62
63-69
70-76
77-83
84-90
Step 7: Find the class boundaries by subtracting 0.5 from each lower class limit and
adding 0.5 to the UCL as shown.
and
LCBi LCLi U 2 UCBi UCLi U 2
Example: For class 1 = 42-0.5=41.5 and
UCB1 48 0.5 48.5
Then continue adding W on both boundaries to obtain the rest boundaries. By
doing so one can obtain the following classes.
Class boundary
41.5 – 48.5
48.5 – 55.5
55.5 – 62.5
62.5 – 69.5
69.5 – 76.5
76.5 – 83.5
83.5 – 90.5
Step 8: Tally the data.
Step 9: Write the numeric values for the tallies in the
frequency column.
Step 10: Find cumulative frequency.
Step 11: Find relative frequency and /or relative
cumulative frequency.
The complete frequency distribution follows
Total 50 1
Continuous/grouped F.D…
57, 53, 65, 55, 50, 45, 64, 52, 16, 46,
42, 63, 33, 64, 53, 25, 54, 35, 48, 55,
70, 47, 39, 58, 52, 36, 65, 75, 26, 20,
55, 60, 83, 61, 45, 63, 49, 42, 35, 18,
51, 45, 42, 65, 39, 59, 45, 41, 30, 40.
26
Continuous/grouped F.D…
Solution:
i. Using the Struges’ rule, the number of classes is:
k= 1+ 3.322 log 50 =6.64 ≈ 7.
ii. Range = highest value – lowest value
= 83 –16= 67.
Range 67
w 9.57 10
iii) Class width k 7
iv) Since the smallest value is 16, the LCL1 is 16 and the
UCL1 is 25; and the frequency distribution would look like:
27
Continuous/grouped F.D…
Here is the FD:
Ages Freq.
16-25 4
26-35 5
36-45 12
46-55 14
56-65 12
66-75 2
76-85 1
Total 50
28
Continuous/grouped F.D…
Example: The class marks and class boundaries of the
above Example are:
CL Freq. CM CB
16-25 4 20.5 15.5-25.5
26-35 5 30.5 25.5-35.5
36-45 12 40.5 35.5-45.5
46-55 14 50.5 45.5-55.5
56-65 12 60.5 55.5-65.5
66-75 2 70.5 65.5-75.5
76-85 1 80.5 75.5-85.5
Total 50
29
Continuous/grouped F.D…
Cumulative frequency distributions
Tells us how often the values fall below or above
that class. There are two types of CFD:
30
Continuous/grouped F.D…
Example: For the data in the above Example, both
cumulative frequency distributions are given below:
31
Diagrammatic and Graphical Methods of Data
Presentation
A F.D can be presented graphically or diagrammatically.
Advantages
• To understand the information easily.
• To make the data attractive.
• To make comparisons of items easy.
• To draw attention of the observer.
The purpose of graphs and diagrams is not to provide exact and
detailed information, but simple comparisons. Any further
information shall rather be obtained from the original data.
32
2.2.2 Diagrammatic Presentation of Data
300
600
100
400
100
• The box shows the distance between the first and the third
quartiles,
Min Q1 Q2 Q3 Max
35 35 36 37 37 38 42 43 43 44 45 48 48 51 55
Illustration of Box-plot using the age of 15 patients
51
A box-plot indicating the distribution of blood lead level of
individuals by sex
52
Histogram
44 50 79 63 66 54 56 70 56 63
60 87 60 70 59 60 62 88 71 53
56 65 74 80 51 83 69 77 69 50
58 42 43 85 43 75 55 60 58 49
72 67 55 77 48 45 61 47 44 61
Example*
10
8
6
4
2
0
41.5 – 48.5 – 55.5 – 62.5 – 69.5 – 76.5 – 41.5 –
48.5 55.5 62.5 69.5 76.5 83.5 48.5
Blood Glucose Level
FrequencyPolygon
Line graph of class marks against class frequencies.
To draw a frequency polygon we connect the midpoints of
class boundaries of the histogram by a straight line.
Frequncy (Number of
14
12
10
Patients)
8
6
4
2
0
38 45 52 59 66 73 80 87 94
Class Marks (Blood Glucose Level)
Ogive (cumulative frequency polygon)
• A graph showing the cumulative frequency (less than or more than
type) plotted against upper or lower class boundaries respectively.
• That is class boundaries are plotted along the horizontal axis and
the corresponding cumulative frequencies are plotted along the
vertical axis.
• The points are joined by a free hand curve.
• Example: Draw an ogive curve(less than type) for the above data.
(Example *)
Ogive Graph (Cumulative Less Than Type)
60
50
40
30
20
10
0
41.5 48.5 55.5 62.5 69.5 76.5 83.5 90.5