Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Medical Statistics and Demography Made Easy®

Download as pdf or txt
Download as pdf or txt
You are on page 1of 353

Medical Statistics

and
Demography Made Easy

Medical Statistics
and
Demography Made Easy

Devashish Sharma

MSc (Gold Medalist), PhD (Statistics)

Professor, Statistics and Demography


MLN Medical College
Allahabad Central University
Allahabad, India

JAYPEE BROTHERS MEDICAL PUBLISHERS (P) LTD


New Delhi Ahmedabad Bengaluru Chennai Hyderabad
Kochi Kolkata Lucknow Mumbai Nagpur

Published by
Jitendar P Vij
Jaypee Brothers Medical Publishers (P) Ltd
Corporate Office
4838/24 Ansari Road, Daryaganj, New Delhi - 110002, India, Phone: +91-11-43574357
Registered Office
B-3 EMCA House, 23/23B Ansari Road, Daryaganj, New Delhi - 110 002, India
Phones: +91-11-23272143, +91-11-23272703, +91-11-23282021
+91-11-23245672, Rel: +91-11-32558559, Fax: +91-11-23276490, +91-11-23245683
e-mail: jaypee@jaypeebrothers.com, Visit our website: www.jaypeebrothers.com
Branches

2/B, Akruti Society, Jodhpur Gam Road Satellite


Ahmedabad 380 015, Phones: +91-79-26926233, Rel: +91-79-32988717
Fax: +91-79-26927094, e-mail: ahmedabad@jaypeebrothers.com

202 Batavia Chambers, 8 Kumara Krupa Road, Kumara Park East


Bengaluru 560 001, Phones: +91-80-22285971, +91-80-22382956, 91-80-22372664
Rel: +91-80-32714073, Fax: +91-80-22281761 e-mail: bangalore@jaypeebrothers.com

282 IIIrd Floor, Khaleel Shirazi Estate, Fountain Plaza, Pantheon Road
Chennai 600 008, Phones: +91-44-28193265, +91-44-28194897, Rel: +91-44-32972089
Fax: +91-44-28193231 e-mail: chennai@jaypeebrothers.com

4-2-1067/1-3, 1st Floor, Balaji Building, Ramkote Cross Road,


Hyderabad 500 095, Phones: +91-40-66610020, +91-40-24758498
Rel:+91-40-32940929, Fax:+91-40-24758499 e-mail: hyderabad@jaypeebrothers.com

No. 41/3098, B & B1, Kuruvi Building, St. Vincent Road


Kochi 682 018, Kerala, Phones: +91-484-4036109, +91-484-2395739
+91-484-2395740 e-mail: kochi@jaypeebrothers.com

1-A Indian Mirror Street, Wellington Square


Kolkata 700 013, Phones: +91-33-22651926, +91-33-22276404, +91-33-22276415
Rel: +91-33-32901926, Fax: +91-33-22656075 e-mail: kolkata@jaypeebrothers.com

Lekhraj Market III, B-2, Sector-4, Faizabad Road, Indira Nagar


Lucknow 226 016, Phones: +91-522-3040553, +91-522-3040554
e-mail: lucknow@jaypeebrothers.com

106 Amit Industrial Estate, 61 Dr SS Rao Road, Near MGM Hospital, Parel
Mumbai 400 012, Phones: +91-22-24124863, +91-22-24104532,
Rel: +91-22-32926896, Fax: +91-22-24160828 e-mail: mumbai@jaypeebrothers.com

KAMALPUSHPA 38, Reshimbag, Opp. Mohota Science College, Umred Road


Nagpur 440 009 (MS), Phone: Rel: +91-712-3245220, Fax: +91-712-2704275
e-mail: nagpur@jaypeebrothers.com

USA Office
1745, Pheasant Run Drive, Maryland Heights (Missouri), MO 63043, USA
Ph: 001-636-6279734 e-mail: jaypee@jaypeebrothers.com, anjulav@jaypeebrothers.com
Medical Statistics and Demography Made Easy
2008, Devashish Sharma
All rights reserved. No part of this publication and CD ROM should be reproduced, stored in a
retrieval system, or transmitted in any form or by any means: electronic, mechanical, photocopying,
recording, or otherwise, without the prior written permission of the author and the publisher.
This book has been published in good faith that the material provided by author is original.
Every effort is made to ensure accuracy of material, but the publisher, printer and author will not
be held responsible for any inadvertent error(s). In case of any dispute, all legal matters are to
be settled under Delhi jurisdiction only.
First Edition:

2008

ISBN 978-81-8448-353-6
Typeset at JPBMP typesetting unit
Printed at Ajanta Offset & Packagins Ltd., New Delhi

This book is dedicated to


My Parents
Late Dr BK Sharma and Mrs Kusum Sharma
for being the constant
source of enlightenment in
the path of my mundane life

My Teacher
Professor MK Singh
for moulding my inner-self
and
outer appearance to make
me what I am

Preface
There are many books on general applied statistics,
assuming various level of mathematical knowledge, but no
book is available which is specially designed for Medical
Students at undergraduate level. The main feature of this
book is that it will help medical students at undergraduate
and postgraduate levels, as well as those students who are
preparing for various PGME examinations.
The present book, which is explicitly directed towards
medical applications, will have two special aspects. First,
use of examples almost entirely related to medical problems,
which I think, help the research workers and students to
understand the underlying computational points. Second,
the choice of statistical topics reflects the extent of their
usage in medical research. Several topics, such as vital
statistics, statistical methods in epidemiology and health
information would not normally be included in the general
book on applied statistics.
This book is intended to be useful to both medical
research workers with very little mathematical expertise as
well as those students who are preparing for various PGME
examinations. The emphasis throughout is on the general
concept underlying statistical techniques. Proofs are
regarded as of secondary importance, and are usually
omitted. Though, there are many mathematical formulae,
but these are necessary for computations and the
relationship between various methods. They rarely involve
other than very simple algebraic manipulations. Some
computational steps, such as those involve in probability
and significance test are perhaps more difficult. I have given

viii

Medical Statistics and Demography Made Easy

some solved examples clearly mentioning every steps


involve in the computation.
Nearly 50 unsolved questions mainly related to medical
problems are included, which will help undergraduate
students in their professional examination. For students
preparing for PGME examination, nearly 300 MCQs related
to various topics are included in this book. These includes
questions asked in various competitive examinations as well
as questions which I thought are important for such tests.
Going through these questions will help them to solve
problems related to Statistics and Demography in their
competitive examinations.
I owe thanks to my colleagues especially in Department
of Obstetrics and Gynaecology and of Community
Medicine. Special thanks to my wife Mrs. Anita Sharma,
and my son Dr. Pulak Sharma who helped me a lot by
suggesting me to frame this work according to problems
which he and his friends are facing.
I express my deep sense of gratitude to my publisher
Jaypee Brothers Medical Publishers (P) Ltd for their untiring
efforts in bringing out this book in such an elegant form.
Suggestions and criticism for further improvement of this
book as well as errors and misprint will be most gratefully
received and duly acknowledged.
Devashish Sharma

Contents
1. Classification and Tabulation ...................................... 1
2. Measure of Central Tendency .................................... 15
3. Measure of Dispersion ................................................ 31
4. Theoretical Discrete and Continuous
Distribution ................................................................... 47
5. Correlation and Regression ........................................ 61
6. Probability ..................................................................... 73
7. Sampling and Design of Experiments ..................... 83
8. Testing of Hypothesis ................................................. 99
9. Non-parametric Tests ................................................ 151
10. Statistical Methods in Epidemiology ..................... 163
11. Vital Statistics (Demography) .................................. 209
12. Health Information .................................................... 239
13. A Report on Census 2001 .......................................... 247
14. National Population Policy ...................................... 287
Unsolved Questions .......................................................... 305
Answers of MCQs and Unsolved Questions ............... 327
Appendix : Statistical Tables ................................................. 335
Index ...................................................................................... 349

Chapter 1

Classification
and Tabulation

Medical Statistics and Demography Made Easy

There are two types of data, (1) Primary data and (2)
Secondary data. Primary data is one which was originated
by the investigator and Secondary data is that data which the
investigator does not originate but obtains from someones
record.
Both primary and secondary data are broadly divided in
two categories:
1. Attributes (Qualitative data).
2. Variables (Quantitative data).
Attributes: are qualitative characteristics which are not
capable of being described numerically or, the data obtained
by classifying the presence or absence of attribute, e.g. Sex,
Nationality, Colour of eyes, Socioeconomic status. They can
further divided into two groups: (a) Nominal (b) Ordinal.
(a) Nominal: The quality that can be easily differentiated
by mean of some natural or physical line of demarcation,
e.g. some physical characteristic such as colour of eyes,
sex, physical status of a person, etc.
(b) Ordinal: An ordered set is known as ordinal, i.e. when
the data are classified according to some criteria which
can be given an order such as socioeconomic status.
Variable: are quantitative characteristics which can be
numerically described. Variables may be discrete or
continuous.
Discrete variables: can take exact values, e.g. Number of
family members, number of living children, etc.
Continuous variables: if a variable can take any numerical
value within a certain range is called continuous variable,
e.g. Height in cm, Weight in kg, etc.

Classification and Tabulation

REPRESENTATION OF DATA
Data may be representation either by means of graph or
diagram or by means of tables.
Tables
Tables are of two types: (1) Simple table or Complex depending
the number of measurements of single or multiple sets of item,
(2) Frequency distribution table.
There are certain general principles, which should be
followed while presenting the data into tabulated form:
1. A table should be numbered.
2. A title should be given, title should be brief and self
explanatory.
3. Heading of columns and rows should be clear.
4. Data must be presented according to size and
importance.
5. If percentage or averages are to be compared it should be
placed as close as possible.
6. Foot note may be given where necessary.
Simple Table
Table 1.1: Showing number of patients attending
hospital in winter season*
Months

November
December
January
February

Male

Female

No.

No.

250
350
100
400

25.00
35.00
10.00
40.00

150
100
70
180

30.00
20.00
14.00
36.00

Source* = Hospital Outdoor attendance

Medical Statistics and Demography Made Easy

Frequency Distribution Table


In a frequency distribution table, the data is first split up into
convenient groups (class interval) and the number of items
(frequencies) which occur in each group is shown in adjacent
column.
Following are the ages of 23 cases admitted to a hospital:
20, 35, 46, 10, 5, 25, 48, 33, 37, 41, 26, 29, 15, 6, 29, 56, 69, 66, 64,
25, 26, 56, 42.
Age group

Tally marks

Frequencies

0 10
10 20
20 30
30 40
40 50
50 60
60 70

2
2
7
3
4
2
3

Table 1.2: Age distribution of admitted cases


Age group

Cases admitted

(in years)

No

0 10
10 20
20 30
30 40
40 50
50 60
60 70

2
2
7
3
4
2
3

8.69
8.69
30.46
13.04
17.39
8.69
13.04

Total

23

100

Classification and Tabulation

In constructing frequency distribution table, the question


that arise is: into how many groups the data should be split?
As per rule it might be stated that when there is large data, a
maximum of 20 groups, and when there is not much data, a
minimum of 5 groups could be conveniently taken.
As far as possible class interval should be equal.
GRAPHS OR DIAGRAMS
Bar chart: This is a simple way of representing data. In bar
diagram the length of bar is proportional to the magnitude to
be represented. Bar charts are of three types: (a) Simple bar
chart, (b) Multiple bar chart, (c) Component bar chart.

(a) Simple bar diagram

(b) Multiple bar diagram

(c) Component bar diagram


Figure 1.1

Medical Statistics and Demography Made Easy

Pie chart: In pie chart the area of segment of circle


represents frequency. The total frequency comprises of 360.
Area of each segment depends upon the angle corresponding
to frequency of each group. Pie diagram is particularly useful
when the data is represented in percentage. In such cases 1%
is equal to 3.6.

Figure 1.2

Pictogram: Small pictures or symbols are used to present data

Figure 1.3

Classification and Tabulation

Cumulative Frequency Curve or Ogive: Cumulative


frequencies are obtained by adding the frequencies of each
variable. The cumulative frequency table is obtained as
follows:
Age in years

Frequencies

20
21
23
35
36
45
67

5
3
7
10
3
5
8

Total

41

Cumulative frequency
5
5+3=8
8 + 7 = 15
15 + 10 = 25
25 + 3 = 28
28 + 5 = 33
33 + 8 = 41

Less than Cumulative Frequency Curve: Less than


cumulative frequency table is expressed as:
Age in years

Frequencies

Cumulative frequency

20
21
23
35
36
45
67

5
3
7
10
3
5
8

Less than or equal to 20 = 5


Less than or equal to 21 = 8
Less than or equal to 23 = 15
Less than or equal to 35 = 25
Less than or equal to 36 = 28
Less than or equal to 45 = 33
Less than or equal to 67 = 41

Total

41

Medical Statistics and Demography Made Easy

Figure 1.4

More than Cumulative frequency curve: More than


cumulative frequency table is expressed as:
Age in years

Frequencies

Cumulative frequency

20
21
23
35
36
45
67

5
3
7
10
3
5
7

More than or equal to 20 = 41


More than or equal to 21 = 36
More than or equal to 23 = 33
More than or equal to 35 = 26
More than or equal to 36 = 16
More than or equal to 45 = 13
More than or equal to 67 = 8

Total

41

Classification and Tabulation

Figure 1.5

Line Diagram: Line diagram are used to show the trend


with the passage of time. Time is independent variable
represented on X-axis and the dependent variable on Y- axis.
It is essential to show zero point on y-axis.

Figure 1.6

10

Medical Statistics and Demography Made Easy

Histogram: Histogram is used to represent a continuous


frequency distribution, is essentially an area chart in which
the area of the bar represents the frequency associated with
the corresponding interval. It is not essential to show zero
point on X-axis (horizontal axis) but necessary to show it on
vertical axis.

Figure 1.7

Frequency Polygon: It is obtained by joining the upper


mid points of Histogram blocks by a straight line.
Frequency Curve: It is obtained by joining the upper mid
points of Histogram blocks by a smooth line.

Figures 1.8A and B

Classification and Tabulation

11

Scattered Diagram: Scattered diagram is used to


represent two variables simultaneously. Each point represent
one individual.

Figure 1.9

Comparison between Bar diagram and Histogram:


1. Bar diagram is used to represent the frequency mainly
characterized by qualitative variables and discrete
variable, while Histogram is used to represent
frequencies characterized by continuous variable.
2. In bar diagram length of bar represents frequency,
while in histogram area of bar represents frequency.

MULTIPLE CHOICE QUESTIONS


1. Scatter diagram show:
(a) Trend event with the passage of time
(b) Frequency distribution of a continuous variable
(c) The relation between maximum and minimum
values
(d) Relation between two variables
(AI,90)

12

Medical Statistics and Demography Made Easy

2. Sex composition can be demonstrated in which of the


following:
(a) Age pyramid
(b) Pie chart
(c) Component bar chart (d) Multiple bar chart
(JIPMER, 91)
3. Quantitative data can be best represented by:
(a) Pie chart
(b) Pictogram
(c) Histogram
(d) Bar diagram
(PGI, 80; AMC, 83, 87)
4. Percentage of data can be shown in:
(a) Graph presentation (b) Pie chart
(c) Bar diagram
(d) Histogram
(PGI, 79; Delhi, 87)
5. Graph showing relation between 2 variables is a:
(a) Scatter diagram
(b) Frequency polygon
(c) Picture chart
(d) Histogram
(AI, 96)
6. Weight in kg is a:
(a) Discrete variable
(c) Nominal scale

(b) Continuous variable


(d) None of the above
(AI, 96)

7. All are the example of nominal scale except:


(a) Age
(b) Sex
(c) Body weight
(d) Socioeconomic status
(AI, 96)
8. The average birth weights in a hospital are to be
demonstrated by statistical representation. The is best
done by:
(a) Bar chart
(b) Histogram
(c) Pie chart
(d) Frequency polygon
(AIIMS 95)

Classification and Tabulation

13

9. All are included in the nominal scale except:


(a) Colour of eye
(b) Sex
(c) Socioeconomic status (d) Occupation
(MP, 98)
10. Age and sex distribution is best represented by:
(a) Histogram
(b) Pie chart
(c) Bar diagram
(d) Age pyramid
(DNB, 2001)
11. Continuous quantitative variables are expressed by:
(a) Bar chart
(b) Histogram
(c) Frequency polygon
(d) Ogive
(e) Pie chart
(PGI, 2002)
12. Cumulative frequencies are represented by:
(a) Histogram
(b) Line diagram
(c) Pictogram
(d) Ogive
13. In which type of graphical representation frequencies
are represented by area of a rectangle
(a) Bar diagram
(b) Component bar diagram
(c) Age pyramid
(d) Histogram
14. Two variables can be plotted together by:
(a) Pie chart
(b) Histogram
(c) Frequency polygon
(d) Scatter diagram (AI,95)
15. Which of the following statement is false:
(a) Primary data is originated by the investigator
(b) Primary data originated by an investigator may be
used as secondary data by other investigator
(c) Data obtained from records of Hospitals are
secondary data
(d) None of the above statements are true

14

Medical Statistics and Demography Made Easy

16. Best way to study relationship between two variables


is:
(a) Scatter diagram
(b) Histogram
(c) Bar chart
(d) Pie chart
(AI,92)
17. All are the examples of nominal scale except:
(a) Race
(b) Sex
(c) Iris colour
(d) Socioeconomic status
(AI,96)
18. Low birth weight statistics of a hospital is best shown
by:
(a) Bar charts
(b) Histogram
(c) Pictogram
(d) Frequency polygon
(AIIMS, Dec 95)
19. Categorical values are:
(a) Age
(c) Gender

(b) Weight
(Manipal, 2002)

20. If the grading of diabetes is classified as mild,


moderate and severe the scale of measurement
used is:
(a) Interval
(b) Nominal
(c) Ordinal
(d) Ratio
21. The best method to show the association between
height and weight of children in a class is by:
(a) Bar chart
(b) Line diagram
(c) Scatter diagram
(d) Histogram (AI, 2002)
22. Mean and standard deviation can be worked out only
if data is on:
(a) Interval/Ratio scale (b) Dichotomous scale
(c) Nominal scale
(d) Ordinal scale
(AIIMS, 2005)

Chapter 2

Measure of
Central Tendency

16

Medical Statistics and Demography Made Easy

Statistical constants which enables us an idea about the


concentration of values in the central part of the distribution.
The following are five measures of central tendencies:
1. Arithmetic Mean or simply Mean.
2. Median.
3. Mode.
4. Geometric Mean.
5. Harmonic Mean.
Arithmetic Mean: A.M. of a set of observations is their
sum divided by the number of observations.
The arithmetic mean X of n observations X1, X2 ............
Xn is

In case of frequency distribution where the variable and


frequencies are:
Variable
Frequencies

x1
f1

x2
f2

The arithmetic mean is

x2
f3

x4
f4

............ ............ xn
............ ............ fn

where i = 1, 2, 3, 4, ....... n

and
Short Cut Method: Let ui = xi A, where A is any arbitrary
constant,

In case of continuous variables formed Grouped


frequency distribution., xi are taken as the mid value of the
class interval, i.e. xi = (Lower + Upper Limit)/2, and then
calculate mean.
In case of short cut method we will generate a variable
ui = (xi A)/h, where h is the length of class interval or class

Measure of Central Tendency

17

width, and the mean of the variable x will be


Properties of arithmetic mean:
1. Sum of deviations of a set of values from their arithmetic
mean is zero.
2. Sum of squares of deviation of a set of values is minimum
when taken about mean.
Merits and Demerits of Arithmetic Mean
Merits
1. It is based on all observations.
2. Of all averages, arithmetic mean is affected least by
fluctuations of samples, i.e. arithmetic mean is a stable
average.
3. If

is the mean of n1 observations and if

the mean of

n2 observations then the combined mean of two series is

Demerits
1. AM cannot be used if we are dealing with qualitative
data.
2. AM cannot be obtained if a single observation is missing.
3. AM is affected very much by extreme values.
4. AM cannot be calculated if extreme class is open, i.e.
below 10 or above 90.
5. In extremely asymmetrical (Skewed) distribution usually
AM is not a suitable measure of location.
Median: Median of a distribution is the value of the
variable which divide it into two equal parts.
If there are n observations then arrange the values either
is ascending or descending order. If n is odd then

18

Medical Statistics and Demography Made Easy

th value is the median and if n is even then median


will be the average of

th and

th observation.

For example if there are 9 (i.e. odd) values than arrange these
values in either in ascending or descending order and
median is

, i.e. 5th values. Suppose if number of

observation are even, i.e. 10 then median lies between 5th


and 6th value.
In case of discrete frequency distribution median is
calculated by forming a cumulative frequency table, then steps
for calculating median are:
(i) Find

where

(ii) See the cumulative frequency just greater than

(iii) The value of x corresponding to cumulative frequency


just greater than

is median.

In case of continuous frequency distribution the class


corresponding to the cumulative frequency just greater than
or in rare cases equal to

(where C.F. is exactly equal to

) is called median class and the value of median is obtained


by the following formula:

Where l is the lower limit of median class, h is the class


width, N = fi , C is the cumulative frequency preceding to
median class and f is the frequency of median class.

Measure of Central Tendency

19

Median can also be obtained by less than and greater


than cumulative frequency curves of Ogives. The intersection
of less than and greater than cumulative frequencies curve is
median.

Figure 2.1

Merits and Demerits of Median


Merits
1. It is not at all affected by extreme values.
2. It can be calculated for distribution with open end class.
3. Median is the only average to be used while dealing
with qualitative data. Which cannot be measured
quantitatively but can still arrange in ascending or
descending order.
Demerits
1. In case of even number of observations median cannot
be determined exactly.
2. It is not based on all observations.

20

Medical Statistics and Demography Made Easy

Mode: Mode is the value which occurs most frequently


in a set of observations.
In the following set of 10 observations; 5, 20, 16, 10, 20,
5, 16, 16, 18, 14 16" is the most frequently occurred value,
therefore 16 is the mode of the set of observations.
In case of discrete frequency distribution, the mode in the
value of x corresponding to maximum frequency.
The mode is determined by method of grouping if :
(i) The maximum frequency is repeated
(ii) If the maximum frequency occurs in the very beginning
or at the end of the distribution.
In case of continuous distribution Mode can be
determined by following formula:

f1 is the maximum frequency, the group corresponding to


maximum frequency is called Modal group, l if the lower limit
of modal group, h is the class width, f0 and f2 are the frequencies
preceding and following to modal group.
Mode can also be obtained by Histogram:

Figure 2.2

Measure of Central Tendency

21

Merits and Demerits of Mode


Merits
1. Mode is not affected by extreme values.
Demerits
1. Mode is ill-defined. It is not always possible to find a
clearly defined mode. In some cases distribution has two
modes is called bimodal.
2. It is not based on all observations.
3. As compared to mean, mode is affected to a great deal by
fluctuation of sampling.
Relationship between Mean, Median and Mode:
If a distribution is moderately asymmetrical then
Mode = 3 Median 2 Mean
EXAMPLE FOR CALCULATING MEAN, MEDIAN AND
MODE
In case discrete distribution
Table 2.1
Variable
(xi)

Frequency
(fi)

Cumulative
Frequency

ui = xi A
(A = 47)

ui.fi

25
28
34
47
52
55
60

5
7
10
12
6
4
6

5
12
22
34
40
44
50

22
19
13
0
5
8
13

110
133
130
0
30
32
78

Total

50

233

N f1 50

22

Medical Statistics and Demography Made Easy

Mean
Mean = [(255)+(287)+(3410)+(4712)+(526)+(554)+
(606)]/50
= 2117/50 = 42.34
Short Cut Method
Let u1 x1 A, where
Mean X A U 47 4.66 42.34
Median

N
25.
2
Cumulative frequency just greater than 25 is 34. The value of
xi corresponding to 34 is 47. Therefore median of this set of
data is 47.
In this example total frequency N = 50, therefore

Mode
The maximum frequency in the above Table is 12. The value
of xi corresponding to maximum frequency is also 47. The
mode of this set of data is 47.
In case of continuous frequency distribution:
Table 2.2
Groups

fi

Cumu.
freq.

xi =
(U+L)/2

xi.f i

ui =
(xi-A)/h

ui.fi

10-20
20-30
30-40
40-50
50-60
60-70
70-80

5
3
7
10
12
7
6

5
8
15
25
37
44
50

15
25
35
45
55
65
75

75
75
245
450
660
455
450

-3
-2
-1
0
1
2
3

-15
-6
-7
0
12
14
18

Total

50

2410

16

Measure of Central Tendency

23

A = 45, h = 10, N = 50,


U = upper limit of class interval, L = Lower limit of class
interval
Mean
Mean =

fi x i 2410

48.2
N
50

Short Cut Method:


Mean of ui is U
Mean of xi is

fi ui 16

0.32
N
50

X A h U 45 10 0.32 45 3.2 48.2


Median

N
25, the cumulative
2
frequency 25 lies in the group 40 50 (this is a rare case
In this example N = 50, therefore

where C.F. of a group is equal to

N
, therefore 40 50 is the
2

median group.
Lower limit of median group is 40, i.e. l = 40, frequency of
median group is 10, i.e. f = 10, Cumulative frequency
preceding to median group is 15, i.e. C = 15, and class width
is 10, i.e. h = 10.
Then the mean is calculated by the formula

Median l + h C /f
2

25 15
= 40 + 10
10

24

Medical Statistics and Demography Made Easy

=
Therefore, median of this set of data is 50.0
Mode
The maximum frequency in the above table is 12, therefore
Modal group is 50 60, the formula for calculating mode in
grouped frequency distribution is:

Therefore, in this example, l the lower limit of Modal group


is 50, frequency of modal group is f1 = 12, width of class
interval, h = 10, the frequencies preceding and following
modal group are 10 and 7 respectively, i.e. f0 = 10 and f2 = 7.
Then mode is calculated as
10 12 10
20
50
50 2.85 52.85
24 10 7
7
Thus mode of the data represented in Table 2.2 is 52.85.

Mode = 50 +

Geometric Mean: The geometric mean G of n


observations
xi,
i = 1, 2, .......... n is the nth root of their product.
G x i . x 2 . x 3 .......... x n

1/n

Properties of geometric mean:


1. If any observation is zero, geometric mean becomes zero.
2. If any observation is negative, geometric mean becomes
imaginary, regardless of the magnitude of other
observations.
3. Geometric mean is used to find out the rate of population
growth.

Measure of Central Tendency

25

Harmonic Mean: Harmonic mean is the reciprocal of


arithmetic mean of the reciprocals of observations.

HM =

1
, where i = 1, 2, 3, ......... n
1
1/x i
N

Relationship between Arithmetic, Geometric and


Harmonic Mean:
HM < GM < AM and GM2 = AM HM

MULTIPLE CHOICE QUESTIONS


1. What is the mode in statistic:
(a) Value of middle observation
(b) Arithmetic average
(c) Most commonly occurring value
(d) Difference between the highest and lowest value
(AI, 88; AIIMS, 86)
2. The frequently occurring value in a data is:
(a) Median
(b) Mode
(c) Standard deviation
(d) Mean

(TN, 91)

3. Mean incubation period of leprosy is calculated by:


(a) Median
(b) Harmonic mean
(c) Mode
(d) Geometric mean
(PGI, 81, AMC, 86, 87)
4. Calculate the mode of 70, 71, 72, 70, 70:
(a) 70
(b) 71
(c) 71.5
(d) 72
(PGI 79, AMC 85,88)

26

Medical Statistics and Demography Made Easy

5. Arrange the values in a serial order is to determine:


(a) Mean
(b) Mode
(c) Median
(d) Range
(AIIMS, 94)
6. Determination of which statistical parameter requires
quantities to be arranged in ascending or descending
orders:
(a) Mean
(b) Median
(c) Mode
(d) SD
(AIIMS, Dec 95)
7. 10 babies were born in a hospital, 5 were less than 2.5
kg and 5 were greater than 2.5 kg, the average is:
(a) Arithmetic mean
(b) Geometric mean
(c) Median
(d) Mode average
(AIIMS, 97)
8. The mean of 10 observations is 25,but later on it was
found that an observation 24 was wrongly written as
14. What will be the mean of correct sample:
(a) 24.5
(b) 25.5
(c) 26
(d) 26.5
9. Mean height of 10 female students of a class is 150 cm,
and the mean height of 20 male students is 175 cm.
What will be the mean height of all the 30 students of
the class:
(a) 166
(b) 166.6
(c) 168
(d) 166.8
10. If mean of a series is 10 and median is 15, what will be
the mode of the series:
(a) 20
(b) 25
(c) 30
(d) 35

Measure of Central Tendency

27

11. Which of the following measures of central tendency


will be calculated when the class interval is not closed:
(a) Mean
(b) Median
(c) Mode
(d) Geometric mean
12. Which measure of central tendency is most suitable to
determine the rate of population growth:
(a) Arithmetic mean
(b) Geometric mean
(c) Harmonic mean
(d) Median
13. Relation between arithmetic man, geometric mean and
harmonic mean is:
(a) GM < HM< AM
(b) HM< GM < AM
(c) AM < GM< HM
(d) GM< AM< HM
14. Complete the following relation:
(a) 2
(c) 1

Mode Median = ? (Median Mean)


(b) 3
(d) 1.5

15. Which of the following measure of central tendency is


extensively used in microbiological research:
(a) Harmonic mean
(b) Arithmetic mean
(c) Geometric mean
(d) None of the above
16. The most suitable average to be used while dealing
with socioeconomic status is:
(a) Arithmetic mean
(b) Median
(c) Geometric mean
(d) Harmonic mean
17. The geometric mean of the following set of data is:Data:
15, 23, 45, 0, 34, 10, 9
(a) 19.4
(c) 45

(b) 0
(d) 17

28

Medical Statistics and Demography Made Easy

18. The mean and median of 100 items are 50 and 52


respectively. The value of the largest item is 100. It was
later found that it is actually 110. Therefore, the true
mean is and true median is .
(a) 50 and 52
(b) 50.10 and 52.5
(c) 50.10 and 52
(d) 50 and 52.5
19. The point of insertion of the less than and greater
than ogive correspond to:
(a) The mean
(b) The median
(c) The geometric mean (d) None of these
20. Which measure of central tendency can be calculated
from a frequency distribution with open end interval:
(a) Mean
(b) Geometric mean
(c) Harmonic mean
(d) Median
21. The relationship between AM, GM, and HM is:
(b) HM2 = AM GM
(a) GM2 = AM HM
(c) AM = (GM HM) (d) None of the above
22. Which measures of central tendency does not
influenced by extreme values:
(a) Mode
(b) Mean
(c) Median
(d) Harmonic mean
23. Values are arranged in ascending and descending
order to calculate:
(a) Mode
(b) Mean
(c) Median
(d) Standard deviation
(AI,98)

Measure of Central Tendency

29

24. Number of cases of malaria detected in 10 years are


100, 160, 190, 250, 300, 300, 320, 320, 550, 380. How to
calculate the average number of cases per year:
(a) Arithmetic mean
(b) Geometric mean
(c) Mode
(d) Median
(AIIMS, June 2000)
25. Calculate the median from the following values;
1.9, 1.9, 1.9, 1.9, 2.1, 2.4, 2.5, 2.5, 2.5, 2.9
(a) 1.2
(b) 1.9
(c) 2.25
(d) 2.5 (AIIMS, Nov 2000)
26. Malaria incidence in village in the year 2000 is 430,
500, 410, 160, 270, 210, 300, 350, 4000, 430, 480, 540,
which of the following is the best indicator for
assessment of malaria incidence in that village by the
epidemiologist:
(a) Arithmetic mean
(b) Geometric mean
(c) Median
(d) Mode
(AIIMS, May 2001)
27. The median of values 2,5,7,10,10,13,25 is:
(a) 10
(b) 13
(c) 25
(d) 5
(AIIMS,Nov 2001)
28. The incidence of malaria in an area is: 250, 300, 320,
300, 5000, 200, 350,. The best value to give idea of
incidence in past 7 years;
(a) Median
(b) Mode
(c) Arithmetic mean
(d) Geometric mean
(AIIMS, Nov 2001)

30

Medical Statistics and Demography Made Easy

29. Which of the following statements is/are correct


regarding mean, median and mode:
(a) Mode nominal value
(b) Mean is sensitive to extreme values
(c) Median is not sensitive to extreme values
(Manipal, 2002)
30. For a negatively skewed data mean will be:
(a) Less than median
(b) More than median
(c) Equal to median
(d) One
(AIIMS, 2005)

Chapter 3

Measure
of Dispersion

32

Medical Statistics and Demography Made Easy

DISPERSION
Dispersion means scatteredness. Dispersion gives an idea
about the homogeneity (less dispersed) or heterogeneity (more
scattered) of the distribution.
Measure of Dispersion
Range: The range is the difference between two extreme
observations. If A and B are greatest and smallest observations
respectively then
Range = A B
Range is a simple but crude measure of dispersion.
Quartile Deviation or Semi-Inter Quartile Range: Quartiles
divide the total frequency into four equal parts.

Figure 3.1

Q1 = First Quartile (The frequency between first quartile and


origin is 25% of total frequency).
Q2 = Second Quartile (The frequency between second
quartile and origin is 50% of total frequency).
Q3 = Third Quartile (The frequency between third quartile
and origin is 75% of total frequency).

Measure of Dispersion

33

(Q 3 Q1 )
2
Quartile deviation is a better index than range because it make
use of 50% of observations.
In case of continuous frequency distribution the quartile
is calculated by the following formula:
Quartile deviation =

Where l is the lower limit of quartile class, h is the class


width, N fi , C is the cumulative frequency preceding to
quartile class and f is the frequency of quartile class. For first
quartile i = 1, for second quartile i = 2 and for third quartile
i = 3.
It is to be noted that second quartile is equal to median
Decile divides the total frequency into 10 equal
parts, the formula for calculating Decile is

Where l is the lower limit of Decile class, h is the class


width, N fi , C is the cumulative frequency preceding to
decile class and f is the frequency of decile class. For first
decile i = 1, for second decile i = 2 and for third decile i = 3 .
and for 9th decile i = 9.
Percentile: Percentile divides the total frequency into 100
equal parts. The formula for calculating percentile is:

Where l is the lower limit of percentile class, h is the class


width, N fi , C is the cumulative frequency preceding to

34

Medical Statistics and Demography Made Easy

percentile class and f is the frequency of percentile class. For


first percentile i = 1, for second percentile i = 2 and for third
percentile i = 3. and for 99th percentile i = 99.
Mean Deviation: If xi; fi, i = 1, 2, 3, .... n is a frequency
distribution then mean deviation from the average A (usually
Mean, Median, Mode) is given by:
Mean Deviation
Where fi N
Mean deviation is least when taken from Median
Standard Deviation and Root Mean Square Deviation:
Standard deviation
is the positive square root of the
arithmetic mean of the square of deviations of the given values
from their arithmetic mean:

Where N fi and x Mean


Square of Standard Deviation is known as Variance.
Root Mean Square Deviation: Root mean square deviation S
is given by:
S

fi x i A

where N fi and A is any arbitrary number


Relation between and S:
Standard Deviation is minimum value of Root Mean
Square Deviation
S
Relation between Mean Deviation from Mean and SD
Mean deviation from mean < SD

Measure of Dispersion

35

Coefficient of Dispersion
When we want to compare the variability of two series which
differ widely in their averages or which are measured in
different units. We will calculate coefficient of dispersion,
which is a pure number independent of units.
The coefficient of dispersion based on different measure
of dispersion:
Based on Range
CD = (A B) / (A + B)
Where A and B are the maximum and minimum values.
Based on Quartile Deviation:
CD = (Q3 Q1) / (Q3 + Q1)
Where Q1 and Q3 are first and third quartiles respectively.
Based on Standard Deviation:
CD = SD / Mean
Coefficient of Variation
100 times of coefficient of dispersion based on standard
deviation is called coefficient of variation
CV = (SD / Mean) 100
The series having greater CV is said to be more variable
than the series having less CV or in other words the series is
more homogenous if the CV is less.
Examples for Calculating Standard Deviation; Quartile,
Coefficient of Dispersion and Coefficient of Variation:
In case of Discrete Data:
Simple Method

36

Medical Statistics and Demography Made Easy


Variable xi
18
45
34
22
35
39
17

Total

12
15
4
8
5
9
13

210

724

No. of cases = 7

SD

xi x
n

724
103.42 10.16
7

Range = Max (A) = 45; Min (B) = 17 = A B = 28


Coefficient of Dispersion (Based on Range)

A B 28

0.45
A B 62

Coefficient of dispersion (Based on SD)

144
225
16
64
25
81
169

SD
10.16

0.338
Mean
30

SD
Coefficient of variation
100 33.8
Mean

Measure of Dispersion

37

Short-cut Method:
Variable xi

ui2

ui = (xi A)

18
45
34
22
35
39
17
Total

17
10
1
13
0
4
18

289
100
1
169
0
16
324

35

899

No. of cases = 7; Let A = 35


Mean u =

35
7

= 5; therefore Mean

(In this case we simply change the origin and SD is


independent of Origin)
In case of continuous frequency distribution:
Age
group

fi

Cumm.
xi
freq.
(U+L)/2

fi . xi

x i2

fi . xi2

20 30

25

25 5 = 125

625

625 5 = 3125

30 40

22

27

35

22 35 = 770

1225

1225 22 = 26950

40 50

20

47

45

20 45 = 900

2025

2025 20 = 40500

50 60

10

57

55

10 55 = 550

3025

3025 10 = 30250

60 70

60

65

65 3 = 195

4225

4226 3 = 12678

Total

N = 60

2540

113503

38

Medical Statistics and Demography Made Easy

U = Upper limit of class interval; L = Lower limit of class


interval

fi .x i
2540

42.33
N
60
Standard Deviation
Mean x

fi .x i 2
() = N

113503
2
42.33
60

1891.71 1791.82 99.89 9.9

Quartiles

iN

Quartile = l + h C /f, where i = 1, 2, 3


4

First Quartile (Q1): N = 60; for first quartile i 1;

iN 60

15
4
4

Cumulative frequency just above 15 is 27, therefore 30 40 is


the first quartile group
Thus in the above formula: 1 = 30, h = 10, C = 5 and f = 22, i = 1.

Second Quartile or Median (Q2):


N = 60; for second quartile i 2;

iN
60 60
2

30
4
4
2

Cumulative frequency just above 30 is 47, therefore 40 50 is


the second quartile group.
Thus in the formula: l = 40, h = 10, C = 27 and f = 20, i = 2.

Measure of Dispersion

39

Third Quartile (Q3): N = 60; for third quartile

i 3;

iN
60 180
3

45
4
4
4

Cumulative frequency just above 45 is 47, therefore 40 50 is


the third quartile group
Thus in the formula: l = 40, h = 10, C = 27 and f = 20, i = 3.
Q 3 40

10 45 27
180
40
40 9 49
20
20

Coefficient of Dispersion (Based on Quartile)

Q 3 Q i (49 34.45)
Q 3 Q i (49 34.45)

14.55
0.174
83.55

Coefficient of Dispersion (Based on Standard Deviation)

SD
9.9

0.2338
Mean 42.33

Coefficient of Variation
0.23 100 23.38
Short Cut Method:
Age
group
20
30
40
50
60

30
40
50
60
70

Total

fi

x1
(U + L)/2

ui = (x i A)
/h

5
22
20
10
3

25
35
45
55
65

2
1
0
1
2

60

fi ui

ui2

10
22
0
10
6

4
1
0
1
4

16

fi ui2
20
22
0
10
12
64

40

Medical Statistics and Demography Made Easy

U = Upper limit of class interval; L = Lower limit of class


interval
A (Arbitrary constant) = 45; h (Class width) = 10

Mean x A hu 45 10 0.267 45 2.67 42.33

f . u 2
2
64
2
SD (u) i i u
0.2672 1.06 0.07 .99
N
12

SD (x) = h SD(u) = 10 0.99.


(In this case we change the origin as well as scale while
creating a new variable ui; therefore we have to multiply SD
of ui by h to obtain the Standard deviation of xi).

SKEWNESS
Skewness means lack of symmetry. A distribution is said to
be skewed if
Mean Median Mode
Measure of Skewness
Skewness of a distribution can be measured by following
formulae:
1. Sk = Mean Median
2. Sk = Mean Mode
For comparing two series we calculate coefficient of
skewness
Karl Pearsons Coefficient of Skewness:

Sk

(Mean Mode)

Measure of Dispersion

41

If mode is ill defined then


(Mean Median)
Sk 3

The limits for Karl Pearsons coefficient of skewness if +


3. In practice these limits rarely attained
Skewness is positive if Mean > Mode or Mean > Median,
and negative if Mean (M) < Mode (Mo) or M < Md.

Figure 3.2

Figure 3.3

KURTOSIS
Kurtosis (Curvature of curve) enables us an idea about the
flatness of curve. It is measured by coefficient 2 .

Figure 3.4

42

Medical Statistics and Demography Made Easy

A - is called normal curve or Mesokurtic curve


.
B - which is flatter than normal curve is called Platykurtic
curve
.
C - Which is more peaked than normal curve called
Leptokurtic curve
.

MULTIPLE CHOICE QUESTIONS


1. In statistics, spread of dispersion is described by the:
(a) Median
(b) Mode
(c) Standard deviation (d) Mean
(Kerala, 88)
2. In statistical analysis what is used to mention the
dispersion of data:
(a) Mode
(b) Range
(c) Standard error of
(d) Geometric mean
mean
(PGI, 81, AMC 87, 92)
3. Measure of dispersion is:
(a) Mean
(b) Mode
(c) Standard deviation (d) Median

Kerala, 94)

4. Among the measure of dispersion which is most


frequently used:
(a) Range
(b) Mean
(c) Median
(d) Standard deviation
(Karn, 94)
5. Best index to detect deviation is:
(a) Variation
(b) Range
(c) Mean deviation
(d) Standard deviation

(AIIMS, 96)

Measure of Dispersion

43

6. Mean weight of 100 children was 12 kg. The standard


deviation was 3. Calculate the percent coefficient of
variation:
(a) 25%
(b) 35%
(c) 45%
(d) 55% (AIIMS, Nov 2000)
7. Mean square deviation will be minimum when taken
from .
(a) Mean
(b) Median
(c) Arbitrary constant
(d) Mode
8. Sum of absolute deviation about median is:
(a) Least
(b) Greatest
(c) Zero
(d) Equal
9. If mean and mode of the given distribution is equal
then its coefficient of skewness is -.
(a) 3
(b) Zero
(c) 1
(d) None of the above
10. Least value of root mean square of deviation is:
(a) Mean deviation from median
(b) Mean deviation
(c) Standard deviation
(d) Mean deviation from arbitrary constant
11. If mean of the distribution is 40 and median is 50 find
the mode the nature of the distribution:
(a) 70 and positively skewed
(b) 70 and negatively skewed
(c) 60 and negatively skewed
(d) 60 and positively skewed
12. If each of a set of observations of a variable is multiplied
by a constant (non-zero), the standard deviation of the
resultant variable:

44

Medical Statistics and Demography Made Easy

(a) Is unaltered
(c) Decreases

(b) Increases
(d) In unknown

13. Mean, SD and Variance have the same units:


(a) True
(b) False
14. Which quartile divides the total frequencies in 3: 1 ratio:
(a) First quartile
(b) Second quartile
(c) Third quartile
(d) Inter quartile range
(AI, 2003)
15. If 25% of the items are less than 10 and 25% are more
than 40 the deviation is:
(a) 20
(b) 15
(c) 10
(d) 40
16. If in a frequency curve of scores, the value mode was
found to be lower than mean the distribution is:
(a) Symmetric
(b) Negatively skewed
(c) Positively skewed
(d) Normal
17. In any discrete distribution (when all the values are
not same) the relations between Mean deviation (MD)
and standard deviation (SD) is:
(a) MD = SD
(b) MD > SD
(c) MD < SD
(d) None of these
18. If maximum value of a distribution is 60 and minimum
value is 40 he coefficient of dispersion is:
(a) 0.5
(b) 0.3
(c) 0.25
(d) 0.2
19. In a perfectly symmetrical distribution 50% of items
are above 60 and 75% items are below 75. Therefore
the of quartile deviation and coefficient of skewness
is:
(a) 15 and 0.5
(b) 15 and 0.25
(c) 30 and 0.5
(d) 30 and 0.25

Measure of Dispersion

45

20. Match the following:


(1) Range

(a)

(2) Quartile deviation

(b)

(3) Coefficient of variation

(c) X max X min

(4) Mean deviation

(d)

(a) 1-A, 2-B, 3-C, 4-D


(c) 1-C, 2-B, 3-A, 4-D

fi x i x
N

(b) 1-C, 2-A, 3-B, 4-D


(d) 1-C, 2-D, 3-A, 4-B

21. Root mean square deviation is:


(a) Standard deviation
(b) Standard error
(c) Standard variation
(d) Standard error of proportion

(AI,97)

22. Right sided skewed deviation causes:


(a) Median is more than mean
(b) SD more than variance
(c) Tale to the right
(d) Not affected at all

(AI, 98)

23. In a hospital, 10 babies were born on same day. All of


them had birth weight 2.8 kg. The standard deviation
will be:
(a) Zero
(b) One
(c) 1
(d) 0.28
(AI,2001)
24. Median incubation period means:
(a) Time for 50% cases to occur
(b) Time between primary case and secondary cases
(c) Time between onset of infection and period of
maximum infectivity
(JIPMER, 2003)

46

Medical Statistics and Demography Made Easy

25. If the systolic blood pressure in a population has a mean


of 130 mm Hg and a median of 140 mm Hg, the
distribution is said to be:
(a) Symmetrical
(b) Positively skewed
(c) Negatively skewed
(d) Either positively or negatively skewed depending
on the standard deviation
26. If each value of a given group of observations is
multiplied by 10, the standard deviation of the resulting
observations is:
(a) Original std. deviation 10
(b) Original std. deviation/10
(c) Original std. deviation 10
(d) Original std. deviation it self

Chapter 4

Theoretical Discrete
and Continuous
Distribution

48

Medical Statistics and Demography Made Easy

THEORETICAL DISCRETE DISTRIBUTION


Binomial Distribution
Let a random experiment be performed repeatedly, and let
the occurrence of an event in a trial be called a success and its
non-occurrence a failure. Consider a set of n independent
trials (n being finite), in which the probability p of success
in any trial is constant for each trial. The q = 1 p, is the
probability of failure in any trial.
If there are x success in n trial, then the number of
failure will be (n x).
But x success in n trials can occur in nCx ways and the
probability for each of these ways is px qn x. Hence, the
probability of x success in n trials in any order whatsoever
is given by the expression:
n x n x
xp q

The probability distribution of number of success so


obtained is called binomial probability distribution.
A random variable is said to follow binomial distribution if
it assumes only non-negative values.
Two independent constants are n and p in the distribution,
known as parameters. n is also sometimes known as the
degree of binominal distribution.
Physical Conditions for Binomial Distribution
We get binomial distribution under the following
experimental conditions:
1. Each trial results in two mutually exclusive disjoint
outcomes, termed as success and failure.

Theoretical Discrete and Continuous Distribution

49

2. The number of trials n is finite.


3. The trials are independent of each other.
4. The probability of success p is constant for each trial.
Mean and Standard Deviation of Binomial Distribution
If a random variable X follows a binomial distribution with
parameters n and p then its mean is np and variance is
npq
Mean = np
Variance = npq
POISSON DISTRIBUTION
Poisson distribution is a limiting case of binomial distribution
under the following conditions:
1. n the number of trials is indefinitely large n
2. p the constant probability of success for each trial and
is indefinitely small, i.e.
3.

(say) is finite. Thus

and

, where

is a positive real number.


A random variable is said to follow a Poisson distribution
if it assume only non-negative values and its probability mass
function is given by:

= 0 otherwise
Here is known as the parameter of the distribution.
Remarks
Poisson distribution occurs when there are events which do
not occur as outcomes of a definite number of trials (unlike

50

Medical Statistics and Demography Made Easy

binomial distribution) of an experiment but which occur at


random points of time and space wherein our interest lies
only in the number of occurrence of events, not in nonoccurrence.
For example: Number of deaths from a disease (not in
form of epidemic) such as heart attack, or cancer, or due to
snake bite.
Mean and Variance of Poisson Distribution
Poisson distribution is the only distribution in which mean
and variance are equal to .
THEORETICAL CONTINUOUS DISTRIBUTION
Normal (or Gaussian) Distribution
The Binominal and Poisson distributions both related to a
discrete random variable. The most important continuous
distribution is the Gaussian (CF Gauss, 1777-1855), or as it is
frequently called, the normal distribution.
Chief Characteristics of the Normal Distribution
The normal probability curve with mean and standard
deviation is given by the equation

2 0
1. The curve is bell shaped and symmetrical about the line
.
2. Mean, median and mode of distribution coincide.
3. As x increases numerically, f(x) decreases rapidly, the
maximum probability occurring at the point
and
is given by

Theoretical Discrete and Continuous Distribution

51

4.
5. Since f(x) being the probability, can never be negative,
no portion of the curve lies below x-axis.
6. x-axis is an asymptote to curve.
7. The point of inflexion where the curve changes its shape
from concave to convex of the curve are given by
8. Relation between Quartile deviation, Mean deviation
and Standard deviation is given by:

9. The total area under normal probability curve is unity.


Shape of Curve

Figure 4.1

A variable X is said to be a normal variate if it follows a


normal probability distribution with mean and variance 2
2
and is represented as X ~ N ( , ).
If

and
and

then X + Y ~ N
.

52

Medical Statistics and Demography Made Easy

The sum as well as the difference of the two independent


normal variate is also a normal variate.
In X ~ N (, 2) then kX will be distributed normally with
mean k and variance k22, i.e. kX ~ N (k, k22), also X+a
will be distributed normally with mean + a and variance 2,
i.e. X+a ~ N ( + a, 2)
STANDARD NORMAL VARIATE
If x ~ N (, 2), then

is a standard normal variate

with mean 0 and variance 1.


Area Properties
Standardized variable z

Figure 4.2

The above curve of normal distribution showing the


scales of the original variable which differ from by +, + 2

Theoretical Discrete and Continuous Distribution

53

and + 3. From the above Figure it is clear that a relatively


small proportion of the area under the curve lies outside the
pair of values x = + 2 and x = 2. In fact the probability
that x lies within + 2 is very nearly 0.95 and the probability
that lies outside this range in correspondingly 0.05.
In X and Y are two independent standard normal variate
then U = X + Y and V = X Y are also independently distributed
as a normal variate with mean 0 and variance 2.
The following tables gives the area under the normal probability curve for some important values of normal variate x.
Distance from mean ordinate
in terms of +

Area under
normal curve

x+1
x + 1.96
x+2
x + 2.58
x+3

68.3%
95%
95.4%
99%
99.7%

Importance of Normal Distribution


1. Most of the distribution occurring in practice, i.e.
Binomial, Poisson can be approximated by Normal
distribution.
2. Many distribution of sample statistic tend to normal for
large samples and as such they can be studied with the
help of normal distribution.
3. The entire theory of small samples tests viz. t, F, 2
tests is based on the fundamental assumption that the
parent population from which the sample is drawn
follows a normal distribution.

54

Medical Statistics and Demography Made Easy

MULTIPLE CHOICE QUESTIONS


1. In a standard normal curve the area between one
standard deviation on either side will be:
(a) 68%
(b) 85%
(c) 99.7%
(d) None of the above
(AI, 88, AIIMS, 86)
2. Normal distribution curve depends on:
(a) Mean and sample
(b) Mean and median
(c) Median and standard deviation
(d) Mean and standard deviation

(AI, 90)

3. The area under a normal distribution curve for SD of 2


is:
(a) 68%
(b) 95%
(c) 97.5%
(d) 100%
(AI, 93)
4. Mean + 1.96 SD included following % of values in a
distribution:
(a) 68%
(b) 99.5%
(c) 88.7%
(d) 95%
(AI, 96)
5. Shape of normal curve is:
(a) Symmetrical
(b) Curvilinear
(c) Linear
(d) Parabolic (Assam, 95)
6. SD is 1.96 the confidence limits is:
(a) 63.6%
(b) 66.6%
(c) 95%
(d) 99%
7. 95% of confidence limits exist between:
(b) + 2 SD
(a) + 1 SD
(c) +3 SD
(d) 4 SD
[Hint: 1.96 is approximately equal to 2]

(AI,98)

(AI,99)

Theoretical Discrete and Continuous Distribution

55

8. All are true regarding standard distribution curve


except:
(a) One standard deviation including 95% of the values
(b) Median is the mid point
(c) Mode is the common value recurrently occurring
(d) Mean and mode coincides
(AI, 2000)
9. The relation between mean deviation about mean and
quartile deviation is:
(a) Mean deviation is less than quartile deviation
(b) Mean deviation is more than quartile deviation
(c) Mean deviation is equal to quartile deviation
(d) They are not related to each other
10. The point of inflexion of normal curve are:
(a) Mean + SD
(b) Mean + 2SD
(d) Mean + 2/3 SD
(c) Mean + 3 SD
11. If X and Y are two independent normal variate then X
Y is also a normal variate:
(a) True
(b) False
12. The mean and variance of a normal distribution:
(a) Are same
(b) Cannot be same
(c) Are sometimes equal
(d) Are equal in the limiting case, as n
13. For a normal distribution:
(a) Mean> Median > Mode
(b) Mean < Median < Mode
(c) Mean > Median < Mode
(d) Mean = Mode = Median
14. The standard normal distribution is represented by:
(a) N (0,0)
(b) N (0,1)
(c) N (1,0)
(d) N (1,1)

56

Medical Statistics and Demography Made Easy

15. If in a normal distribution the standard deviation is


equal to 45, then the mean deviation from mean is
equal to:
(a) 45
(b) 40
(c) 36
(d) 30
16. In a normal distribution the number of observations
less than divided by mean are included in the range:
(a) Mean + 3 SD
(b) Mean + 1 SD
(c) Mean + 2 SD
(d) Mean + 0.67 SD
[Hint: As mean divides the total area into two equal parts (i.e.
50% of observations will lie below mean and 50% of
observations lie above mean). The first quartile of normal
distribution is 0.6745. These limits will include 50% of
observations. Therefore number of observations included
within limits Mean + 0.67 SD will be less than that divided
by mean].
17. Normal distribution is:
(a) Very flat
(b) Very peaked
(c) Smooth
(d) Bell shaped symmetrical distribution about mean
18. There are two independent normal variate X and Y. X
~ N (6, 3) and Y ~ N (3, 6). Then the distribution of XY
is:
(a) N (3,3)
(b) N (3,6)
(c) N (3, 9)
(d) N (3,9)
19. Total area under the normal probability curve is:
(a) 100
(b) 10
(c) 1
(d) 0.05

Theoretical Discrete and Continuous Distribution

57

20. Binomial distribution tends to normal distribution if:


(a) n and neither p or q is very small
(b) n and p 0
(c) n and q 0
(d) None of the above
21. Normal distribution is symmetrical only for some
specified values of X:
(a) True
(b) False
22. For a normal distribution, quartile deviation, mean
deviation and standard deviation are in the ratio:
(a) 4/5 : 2/3: 1
(b) 2/3: 4/5: 1
(c) 1: 4/5 : 2/3
(d) 4/5: 1: 2/3
23. The mean deviation about mean of a normal
distribution is:
(a)

(b)

(c)

(d)

[Hint:

is approximately equal to

24. If X is distributed Normally with mean m and variance


s2, then a linear combination of X, i.e. a X+ b will also
be a Normal Variate with:
(a) Mean a and variance a22
(b) Mean a + b and variance a22
(c) Mean + b and variance b22
(d) Mean b + a and variance b22
25. In the estimation of standard probability, Z Score is
applicable to:

58

Medical Statistics and Demography Made Easy

(a)
(b)
(c)
(d)

Normal distribution
Skewed distribution
Binominal distribution
Poisson distribution

(UPSC, 2001)

26. A non-symmetric frequency distribution is known as:


(a) Normal distribution
(b) Skewed distribution
(c) Cumulative frequency distribution
(d) None of the above
(Orissa, 99)
27. The area between one standard deviation on either
side of mean in a normal distribution is:
(a) 62%
(b) 68%
(c) 90%
(d) 99% (AIIMS, May 95)
28. True about normal distribution curve is all except:
(a) Mean, median and mode coincides
(b) Total area of the curve is one
(c) Standard deviation is one
(d) Mean of the curve is hundred
(AIIMS, Dec.97
[SD of standard normal curve is 1]
29. Which statement is true about standard normal
distribution curve:
(a) Mean 1 and standard deviation 0
(b) Mean 0 and standard deviation1
(c) Curve skews towards left
(d) Curve skews towards right
(AIIMS, Nov 99)
30. In a normal distribution curve, True statement is:
(a) Mean = SD
(b) Median = SD
(c) Mean = 2 Median
(d) Mean = Mode
(AIIMS, May 2001)
31. Systolic BP of a group of person follow normal
distribution curve. The mean BP is 120. The values
above 120 are:

Theoretical Discrete and Continuous Distribution

(a) 25%
(c) 50%

59

(b) 75%
(d) 100% (AIIMS,Nov 2001)

32. All are true in normal distribution curve except:


(a) Is bell shaped , symmetrical and on the x axis
(b) Occurs only in normal people
(c) Median=mode=mean
(Manipal, 2002)
33. A population study showed a mean glucose of 86 mg/
dL. In a sample of 100 showing normal curve
distribution, what percentage of people have glucose
above 86?
(a) 65
(b) 50
(c) 75
(d) 60
(AI, 2002)
34. The standard normal distribution:
(a) Is skewed to the left
(b) Has mean = 1.0
(c) Has standard deviation = 0.0
(d) Has variance = 1

(AI, 2002)

Chapter 5

Correlation and
Regression

62

Medical Statistics and Demography Made Easy

ASSOCIATION AND CORRELATION


Association
Association may be defined as the concurrence of two random
variables when they occur more frequently together than one
would expect by chance.
Correlation
Correlation indicates the degree of association between two
random variables
CORRELATION
A series where each term of series may assume values of two
or more variables. For example, if we measure the heights
and weights of certain group of persons, we will get a
distribution known as Bi-variate distribution.
If the two variables deviate in the same direction then
correlation is said to be Positive. But if deviate in opposite
direction then the correlation is said to be negative.
Scatter diagram is the simplest way to represent a bivariate
distribution.
Karl Pearson Correlation of Coefficient
Correlation coefficient between two random variables x and
y, usually denoted by rx y, is a numerical measure of linear
relationship between them:
Cov(x y) 1

xy x y / x y
rx y
x y
n

Graphical representation of the standard data for


different values of r.

Correlation and Regression

63

Figure 5.1

Properties of Correlation Coefficient


1. Correlation coefficient r lies between 1 and +1
2. Correlation coefficient is independent of change of origin
and scale.
3. `TWO independent variables are uncorrelated. If x and
y are two independent variables then rx y = 0.
4. But two uncorrelated variables may or may not
independent rx y = 0, merely implies the absence of any
linear relationship.
Standard Error of Correlation Coefficient
If r is the correlation coefficient is a sample of n pair of
observations, then standard error is given by:
SE (r)

(1 r 2 )
n

64

Medical Statistics and Demography Made Easy

REGRESSION
Regression Analysis
Regression analysis is a mathematical measure of the average
relationship between two or more variables in terms of original
units of the data.
The line of regression is obtained by the principles of least
square.
Let us suppose that in a bi-variate distribution (xi, yi); (i = 1, 2,
...n); y is dependent variable and x is independent variable.
Let the line of regression of y on x is given by:
y = a + bx
Where a and b are constant, estimated by the method of least
square
b is the slope of the regression equation of y on x.
The regression y on x is given by
y
(y y) r
xx
x
The line of regression x on y is given by:


(x X ) r x y y
y

Regression Coefficient will never be of different signs.


The correlation coefficient can also be calculated on the basis
of regression coefficient:

r= byx . bxy
Where


and bxy r x
y
byx . bxy r 2

Hence,

Correlation and Regression

65

It may be noted that the sign of correlation coefficient is the


same as that of regression coefficient, since the sign of each
depends upon the co-variance term. Thus if regression
coefficients are positive, r is positive and if the regression
coefficients are negative, r is negative.
Solved Example
Find the correlation coefficient and line of regression between
height and weight of 10 individuals:
Case no.

Height 175
Weight 65

166
56

182
78

167
66

176 169
72 69

182
81

10

190 187 151


87 84 60

Correlation Coefficient
Height
(xi)
175
166
182
167
176
169
182
190
187
151
Total

N = 10

Weight
ui =
vi =
(yi) (xi 170) (yi 70)
65
56
78
66
72
69
81
87
84
60

ui2

vi2

ui .vi

+5
4
+12
3
+6
1
+12
+20
+17
19

5
14
+8
4
+2
1
+11
+17
+14
10

25
16
144
9
36
1
144
400
289
361

25
196
64
16
4
1
121
289
196
100

25
+56
+96
+12
+12
+1
+132
+340
+238
+190

+45

18

1425

1012

1052

66

Medical Statistics and Demography Made Easy

SD (vi )

1012
(1.8)2 101.2 3.24 97.96 9.89
10

u i . vi / N u . v
u . v

1052/10 4.5 1.8


11.05 9.89

105.2 8.1
0.88
109.28

Mean of x = 170 + 4.5 = 174.5; Mean of y = 70 + 1.8 = 71.8


SD (x) = SD (u) = 11.05 and SD (y) = SD (v) = 9.89

11.05

Re gression of x on y : (x 174.5) 0.88


y 71.8
9.89

x 174.5 0.98(y 71.8)

x 174.5 0.98y 70.36 or, x 174.5 70.36 0.98 y


Similarly

y 0.78 64.31
Thus by putting the value of one variable in regression
equation we can predict the value of other variable

Correlation and Regression

67

MULTIPLE CHOICE QUESTIONS


1. Correlation between two variables is a numerical
measure of:
(a) Relationship between them
(b) Linear relationship between them
(c) Quadratic relationship between them
(d) All the above
2. If the correlation coefficient between two variables are
zero, then:
(a) Two variables are independent
(b) Two variables are linearly related
(c) There is a perfect correlation between the two
variables
(d) There may be a non-linear relation between the two
variables
3. The correlation coefficient between X and Y will have
positive sign when:
(a) X is increasing and Y is decreasing
(b) Both X and Y are increasing
(c) X is decreasing and Y is increasing
(d) There is no change in X and Y
4. The coefficient of correlation:
(a) Can take any value between 1 and +1
(b) Is always less than 1
(c) Is always greater than +1
(d) Cannot be zero
5. The coefficient of correlation between X and Y is +0.24.
There covariance is 3.5 and the variance of X is 16. The
SD of Y is:

68

Medical Statistics and Demography Made Easy

(a)
(c)

0.24
4 3.5

(b)

16
3.5 0.24

(d)

3.5
0.24 4

6. The coefficient of correlation is independent of:


(a) Change of scale only
(b) Change of origin only
(c) Both change of origin and scale
(d) Neither change of origin nor change of scale
7. Probable error of r is:
(a)
(c) 0.6745

(1 r 2 )
n

(b) 0.6745

(1 r 2 )
n

(d) 0.6745

(1 r 2 )
n

8. If one of the regression coefficient is greater than unity


then the other will be:
(a) Also greater than unity
(b) less than unity
(c) will equal to 1
(d) All the above
9. If two variables are uncorrelated then the two line of
regression, i.e. X on Y and Y on X will:
(a) Coincides
(b) Perpendicular
(c) The angle between will be equal to 45
(d) The two lines are parallel to each other
10. If one of the regression coefficient is positive then the
other will be:
(a) Also positive

Correlation and Regression

69

(b) Will be negative


(c) May or may not be positive
(d) Not depends on the sign of the regression coefficient
11. If the correlation coefficient between two variables X
and Y is 0.63. All the values of X is and Y is multiplied
by a non- zero constant 6. The correlation between the
new variables will be:
(a) More than 0.63
(b) Less than 0.63
(c) 0.63
(d) Cannot be calculated
12. Regression coefficient is independent of:
(a) Change of scale only
(b) Change of origin only
(c) Change of origin as well as scale
(d) Neither change of origin nor scale
13. If the two lines of regression X on Y and Y on X coincides
then the correlation will be:
(a) r = + 1
(b) r = 0
(c) r = +0.5
(
d) 1 < r < 1
14. If the lines of regression are given as x + 2y 5 = 0 and
2x + 3y = 8. Then the mean of x and y respectively are:
(a) 1, 2
(b) 1, 2
(c) 2, 5
(d) 2, 3
[Hint: The lines of regression pass through Mean x and
therefore at the point

the lines of regression will

be
and
, by solving these two
equations we can calculate the values of mean of a and y]

70

Medical Statistics and Demography Made Easy

15. The following statistics is used to measure the linear


association between two characteristics in the same
individuals:
(a) Coefficient of variation
(b) Coefficient of correlation
(c) Chi-square
(d) Standard error
(Karnat, 96)
16. All are the features of correlation of coefficient except:
(a) Cause effect association cannot be shown
(b) Risk association can be revealed
(c) Correlation risk to disease
(d) Indicates linear relationship
(AIIMS, 97)
17. When the height and weight is perfectly correlated,
coefficient of correlation is:
(a) +1
(b) 1
(c) 0
(d) More than 1
(AIIMS, 2000)
18. Height to weight is a/an:
(a) Association
(b) Correlation
(c) Proportion
(d) Index
(AIIMS, 96)
[Hint: Association is the relationship between two random
variables and correlation coefficient shows the degree of
association].
19. Correlation coefficient tends to lie between:
(a) Zero to 1.0
(b) 1.0 to +1.0
(c) +1.0 to zero
(d) +2.0 to 2.0(AIIMS, June
97)
20. If the correlation between height and weight is 2.6. True
is:
(a) Positive correlation
(b) No association

Correlation and Regression

71

(c) Negative correlation


(d) Calculation of coefficient is wrong
(AIIMS, June 2000)
21. In a regression between height and age follow y = a +
bx. The curve is:
(a) Hyperbola
(b) Sigmoid
(c) Straight line
(d) Parabola
(AIIMS, Nov 2001)
22. The correlation between IMR and socioeconomic
status is best depicted by:
(a) Correlation (+1)
(b) Correlation (+0.5)
(c) Correlation ( 1)
(d) Correlation ( 0.8)
(AIIMS, Nov 2001)
[Hint: The IMR decreases with the increase in socioeconomic
status, but it is not a perfectly correlated].
23. The correlation between variables A and B in a study
was found to be 1.1. This indicates:
(a) Very strong correlation
(b) Moderately strong correlation
(c) Weak correlation
(d) Computational mistake in calculating correlation
(AI, 2002)
24. A Cardiologist found a highly significant correlation
coeffcient (r = 0.90, p = 0.01) between the systolic blood
pressure valuse and serum cholesterol values of the
patients attending his clinic. Which of the following
statements is wrong interpretation of the correlation.
(a) Since there is a high correlation the magnitudes of
both the measurements are likely to be close to each
other.
(b) A patient with a high level of systolic BP is also
likely to have a high level of serum cholesterol.

72

Medical Statistics and Demography Made Easy

(c) A patient with a low level of systolic BP is also likely


to have a low level of serum cholesterol.
(d) About 80% of the variation in systolic blood pressure
among his patients can be explained by their serum
cholesterol values and vice versa.
(AI, 2005)
25. Total Cholesterol level = a + b (calorific intake) + c
(physical activity) + d (body mass index); is an example
of:
(a) Simple linear regression
(b) Simple curvilinear regression
(c) Multiple linear regression
(d) Multiple logistic regression
(AI, 2005)

Chapter 6

Probability

74

Medical Statistics and Demography Made Easy

Random Series: If a coin is tossed very large number of times,


and the result of each toss is written down, the result may be
something like the following (H standing for heads and T for
tails):
H, H, T, T, T, H, T, H, H, H, T, T, H, H, T, H, .......................
Such a sequence is called Random Sequence or Random
Series.
Trial and Events: Each toss of the above series is called
Trial and each result is called Outcome or Events.
In the above series in first trial, the outcome is head.
Exhaustive Events: The total number of possible events
in any trial is known as Exhaustive Events or Exhaustive
Cases. Thus in tossing of a coin there are only two events
Head and Tail. Or in throwing of a die there are six exhaustive
cases since one of the six faces 1,2,3, .......... 6 will come
uppermost.
Mutually Exclusive Events: Events are said to be mutually
exclusive if the happening of one precludes the happening of
all the others. For example, In throwing of a die all 6 faces 1 to
6 are mutually exclusive since if one of these faces comes,
the possibility of all the other faces in the same trial is ruled
out.
Equally Likely Events: If all the events in a trial have
equal chance of taking place, there is no reason to except one
in preference to others. For example, In throwing of an
unbiased die, all the six faces are equally likely to come.
Independent Events: Several events are said to be
independent if happening of an event is not affected by the
supplementary knowledge concerning the occurrence of any
number of remaining events. For example: in tossing of an
unbiased coin the event of getting head in the first toss is

Probability

75

independent of getting a head in the second, third and


subsequent tosses.
MATHEMATICAL OR CLASSICAL PROBABILITY
If in a trial result there are n exhaustive, mutually exclusive
and equally likely cases and out of them m are favourable to
the happening of an event E, then the probability of
happening of an event E is:
m
p P(E)
n
and the probability of non occurrence of the event E:

(n m)
m
1 1 p
n
n
Thus, p + q = 1
Obviously, p and q are non negative and cannot exceed
1, i.e. 0 < p < 1.
q

Sure Event: If the probability of occurrence of an event is


1, i.e. p = P(E) = 1 the E is called Sure Event.
Impossible Event: If the probability of an occurrence of
an event E is zero, i.e. p = P(E) = 0 then E is called Impossible
Event.
ADDITIVE AND MULTIPLICATIVE PROPERTY OF
PROBABILITY
Here we will consider the two basic laws of probability, i.e.
the addition and multiplication operation of probability.
Addition Rule
If in a population of doctors, the probability of a male doctor
is 0.8 and the doctor is a surgeon is 0.4. If A is defined that a
doctor is male the probability of occurrence of A is P (A) = 0.8,

76

Medical Statistics and Demography Made Easy

similarly if B is that the doctor is surgeon then probability of


occurrence of B is P (B) = 0.4.
If the two separate probabilities are added then the result
is 0.8 + 0.4 = 1.2, which is wrong because the probability of
occurrence of an event cannot exceed 1. This is because of the
double event person that is male and also surgeon is counted
twice, once when we are calculating the probability of male
doctor and another as a part of surgeon, thus the probability
of double event is subtracted.
This can be clear by the following diagram:

Figure 6.1

Figure 6.2

In Figure 6.1 the shaded portion is included in circle A as


well as in circle B, i.e. while calculating the probability of
male doctors the surgeons who are male are included in it,
and while calculating the probability of surgeons, the portion
of males who are surgeon is also included.

Probability

77

Therefore in additive law the probability of double event


is subtracted. As shown in Figure 6.2.
The additive property of probability states that:
If A and B are two events the combined probability of two
events is given by:
P (A) P(B)
P (B) P(A B)
P(A B) P(A)
i.e. Prob (A or B or both) = Prob (A) + Prob (B) Prob (A and B)
In case of Mutually Exclusive Events:
i.e. P (A or B) = Prob (A) + Prob (B)
In case of mutually exclusive events (Fig. 6.3) The
probability of occurrence of male surgeon is independent of
the probability of occurrence of female surgeon.

Figure 6.3

Thus if the probability of male surgeon in a population of


doctors, i.e. P (A) = 0.3 and the probability of female surgeon,
i.e. P(B) 0.1. Then the probability of surgeon in the population
of Doctors is:
P (A or B) = P (A) + P (B) = 0.3 + 0.1 = 0.4
Multiplication Rule
When the events are not mutually exclusive:

78

Medical Statistics and Demography Made Easy

Figure 6.4

Suppose in the Figure 6.4 there are n points in the square


and m1 the number of points in the circle A; m2 number of
points in the circle B and m3 be the number of points common
to both A and B. (assume m1 > 0 and m2 > 0).
Then the probability that both the events A and B occurs
if given by:
P (A and B) = P (A ) P ( B given A)
Or
P (A and B) = P (B) P (A given B)
P (B given A) is known as condition probability of
occurrence of B with the condition that A had already
occurred, and P (A given B) is the conditional probability of
occurrence of A when B had already occurred.
In the above example,
m
m
P(A) 1 ; P(B) 2 ,
n
n

P(B given A)
Thus,

m m m
P(A and B) 1 3 3
n
n m1

Probability

79

m m m
P(A and B) 2 3 3
n
n m2
Which is equal to number of points common to both A
and B to total number of points, i.e. n.

Or

In case of independent events:


The multiplication rule is:
P (A and B) = P (A) . P (B)
Suppose that two random sequence of trials are
proceeding simultaneously; for example, at each stage a coin
may be tossed and a die is thrown. What is the probability of
a particular combination of result, for example a head (H) on
the coin and a 5 on the die? The result is given by simple
multiplication rule.
P (H and 5) = P (H) P (5)
In this example, the probability of 5 on a die was not
affected by whether or not H occurred on the coin. Or in other
words the two events are said to be independent and by
multiplication rule the probability of H and 5 is equal to:
1 1 1
P(H and 5) P (H) . P (5) .
2 6 12
MULTIPLE CHOICE QUESTIONS
1. The Probability of Sure event is:
(a) 0
(b) 0.5
(c) 1
(d) + 1
2. Out of 1000 individuals surveyed, it was observed the
260 were suffering from respiratory disorders and 470
were from diabetes. And 170 were suffering from
diabetes as well as respiratory disorders. The
probability of persons suffering from respiratory
problems is:

80

Medical Statistics and Demography Made Easy

(a) 0.26
(b) 0.43
(c) 0.17
(d) 0.47
[Hint: Total person suffering from respiratory disorders also
includes those who are suffering from respiratory disorders as
well as diabetes also].
3. In the above problem the probability of individuals
who are suffering from diabetes alone is:
(a) 0.47
(b) 0.17
(c) 0.26
(d) 0.43
4. Find the probability of persons suffering from
respiratory disorders, diabetes as well as both diabetes
and respiratory disorders:
(a) 1.07
(b) 0. 17
(c) 0.90
(d) 0.69
5. Find the probability of persons suffering from diabetes
as well respiratory disorders:
(a) 0.90
(b) 0.17
(c) 1.17
(d) 0.47
6. The probability of any events in any case does not
exceed:
(a) 0.5
(b) 0.9
(c) 1
(d) 1
7. The probability of any event lies between:
(a) 1 < P < 1
(b) 0 < p < 1
(c) 0 < P < 1
(d) 1 < P < 0
8. In a population incidence of ocular deficiency in male
is 20%, and in females is 25%. What is the probability
of ocular disease in the population:
(a) 0.05
(b) 0.25
(c) 0.45
(d) None of the above

Probability

81

9. In question no. (8) what is the probability of diabetes


in the population:
(a) 0
(b) 0.25
(c) 0.20
(d) None of the above
10. The events A and B are mutually exculsive, so:
(a) Prob. (A or B) = Prob (A) + Prob (B)
(b) Prob (A and B) = Prob (A) . Prob (B)
(c) Prob (A) = Prob (B)
(d) Prob (A) + Prob (B) = 1
(AI, 2005)

Chapter 7

Sampling and Design


of Experiments

84

Medical Statistics and Demography Made Easy

POPULATION
The group of individuals under study is called population or
universe. The population may be finite or infinite.
SAMPLE
A finite subset of individuals in a population is called a
sample and the number of individuals in a sample is called
sample size.
The sample characteristic are utilized to approximately
determine or estimate the population. The error involved in
such approximation is known as sampling error which is
inherent and unavoidable in any and every sampling scheme.
Types of Sampling
Some of the commonly known and frequently used sampling
techniques are:
1. Random sampling
2. Stratified sampling
3. Systemic sampling
4. Cluster sampling
Random Sampling
In this case the sampling units are selected at random. A
random sample is one in which each unit of population has
an equal chance of being included in the sample.
Suppose we take a sample of size n from a finite population
of size N. Then there are NCn possible samples. A sampling
technique in which each of NCn samples has equal chance of
being selected is known as Random Sampling and the sample
obtained by this technique is termed as random sample.
In simple random sampling each unit of the population
has equal chance of being included in the sample and that

Sampling and Design of Experiments

85

this probability is independent of the previous drawing. To


ensure that sampling is simple, it must be done with replacement, if
population is finite. However, in case of infinite population
replacements are not necessary.
Stratified Sampling
If the population is not homogenous, then entire
heterogeneous population is divided into a number of
homogenous groups, usually called strata. The units are
sampled at random from each of these stratum, the sample
size in each stratum varies according to the relative importance
of the stratum in the population.
The sample which is the aggregate of the sampled units
of each stratum is termed as stratified sample.
Such a sample is a good representative of the population
when the population considered is heterogeneous.
Systemic Sampling
In systemic sample the number of units in population should
be a product of number of units in sample (i.e. sample size). If
there are N units in the population and they are numbered in
some order. Suppose we want to draw a sample of n units
from this population, then there should be a constant k which
when multiplied by sample size (n) will be equal to population
size (N), i.e. n . k = N or k = N/n. We divide the N units of
population units into n groups of k unit each as follows:
1
2
3
4
i
k

k+1
k+2
k+3
k+3
i+k
2k

2k + 1
2k + 2
2k + 3
2k + 4
i + 2k
3k

(n 1)k + 1
(n 1)k + 2
(n 1)k + 3
(n k)k + 4
i + (n 1)k
(n 1)k + k = nk = N

86

Medical Statistics and Demography Made Easy

In systemic sampling, to select a sample of n units, if k =


N/n then every kth unit is selected commencing with a
randomly chosen number between 1 and k. Hence, the
selection of the first unit determines the whole sample. Let
the ith unit be selected at random from first k unit, then the
sample will consist of ith, (i+k)th, (i+2k)th and [i +(n-1)k)th unit
of the population.
In system sampling the first unit will be drawn at random
and the remaining unit will follow a systemic pattern.
Example: Suppose from a population of size N = 5,000, we
want to draw a sample of size 250 (i.e. n = 250), then

5, 000
20. Therefore, in systemic sampling the first unit of
250
the sample is selected at random from the first 20 unit of the
population. Let us draw the 6th unit from the first 20 unit. Then the
first unit of the sample will be the 6th unit of the population, the
second unit of the sample will be the 26th unit of the population, the
next unit will be the 46th unit of the population and so on. In this
way we can draw a sample of size 250.
k

Advantages of Systemic Sampling


1. Easier to draw without mistake.
2. More precise than simple random sampling as more
evenly spread over population.
Disadvantages of Systemic Sampling
1. If the list has periodic arrangement then it can fare very
badly.
Cluster Sampling
Contrary to Simple Random sampling and Stratified
sampling, where single subjects are selected from the

Sampling and Design of Experiments

87

population, in cluster sampling the subjects are selected in


groups or clusters.
Cluster sampling is used when natural grouping are
evident in the population. The total population is divided
into groups or clusters. Elements within a cluster should be
as heterogeneous as possible. But there should be
homogeneity between clusters. Each cluster must be mutually
exclusive and collectively exhaustive. A random sampling
technique is then used on relevant clusters to choose which
clusters to include in the study.
In single-stage cluster sampling, all the elements from
each of the selected clusters are used. In two-stage cluster
sampling a random sampling technique is applied to the
elements from each of the selected clusters.
One version of cluster sampling is area sampling or
geographical cluster sampling. Clusters consist of
geographical areas. A geographically dispersed population
can be expensive to survey. Greater economy than simple
random sampling can be achieved by treating several
respondents within a local area as a cluster
Example: Suppose we want to conduct interviews with hotel
managers in a major city about their training needs. We could decide
that each hotel in the city represents one cluster, and then randomly
select a small number, e.g. say 10. Then we can contact the managers
of these 10 hotels for interview. When all the managers of the selected
10 hotels are interviewed then this is referred to as one-stage
cluster sampling.
If the subjects to be interviewed are selected randomly within
the selected clusters, it is called two-stage cluster sampling.
This technique might be more appropriate if the number of subjects
within a unit is very large (e.g. instead of interviewing managers,
we want to interview employees).

88

Medical Statistics and Demography Made Easy

Advantages of Cluster Sampling


1. The main objective of cluster sampling is to reduce the
costs, i.e. cluster sampling reduced field costs.
2. Applicable where no complete list of units is available
(special lists only need be formed for cluster).
Disadvantages of Cluster Sampling
1. Clusters may not be representative of whole population
but may be too alike.
2. Analysis is more complicated than for simple random
sampling.
Difference between Cluster Sampling
and Random Sampling
1. In simple random sampling single subjects are selected
from the population, while in cluster sampling the
subjects are selected in a groups or clusters.
2. As compared to random sampling the cluster sampling
is more evenly spread over the population.
Difference between Stratified and Cluster Sampling
1. Unlike stratified sampling, the clusters are thought of as
being typical of the population, rather than subsection
as in stratified sampling in which we divide the
heterogeneous population into homogeneous subsection
(strata).
2. In stratified sampling subjects are selected randomly
within strata. While in cluster sampling all units of the
selected cluster are interviewed (one-stage cluster
sampling).
3. In stratified sampling the strata should be homogeneous,
there should be maximum homogeneity within strata.
But in cluster sampling the clusters should be as

Sampling and Design of Experiments

89

heterogeneous as possible, each cluster should be a small


scale version of the population. In other words there
should be maximum heterogeneity within clusters and
minimum between clusters.
Multistage Sampling
We can also combine cluster sampling with stratified
sampling. For example, if we want to interview employees in
a randomly selected clusters of hotels(in above example of
cluster sampling). We might stratified employees based on
some characteristic (e.g. seniority, job function, etc) and then
randomly select employees from each of these strata. This
type of sampling is referred as Multistage Sampling.
Parameter and Statistic
In order to avoid verbal confusion with the statistical constants
of the population, viz. mean () standard deviation (), etc
which are usually referred to as parameters, statistical
measures computed from the sample observations alone, e.g.
mean ( x ) and standard deviation (s), etc have been termed as
statistic.
Sampling Distribution
If we draw a sample of size n from a population of size N,
then the total number of possible samples will be NCn = k
(say). For each of these k samples we will compute mean and
standard deviation , then there will be k values of mean as
well as standard deviation. The set of values so obtained, one
for each sample is called sampling distribution.
Standard Error
The standard deviation of sampling distribution is known as
its standard error (SE).

90

Medical Statistics and Demography Made Easy

The standard errors of some well known statistics, for large


samples, are given below, where n is the sample size, is the
population standard deviation, and P the population
proportion, and Q = 1 P, n1 and n2, represents the sizes of
two independent random samples respectively drawn from
the population(s).
Statistic

Standard error

Sample mean:

Sample proportion p

Difference between two samples


means

Difference between two samples


proportions (p1 p2)

P1 Q l P2 Q 2

n1 n 2

Utility of Standard Error


Standard error plays a very important role in the large sample
theory and forms the basis of testing of hypothesis.
The magnitude of standard error gives an index of the
precision of the estimate of the parameter. The reciprocal of
standard error is taken as the measure of reliability or
precision of statistic.
Thus, in order to double the precision. Which amounts to
reducing the standard error to half, the sample size has to be
increased four times.

Sampling and Design of Experiments

91

SE enables us to determine the probable limits within the


population parameters may be expected to lie. The probable
limits for population proportion P are given by:
p3

pq
n

Confidence Limits based on Mean and Standard Error


95% confidence limits
99% confidence limits

Mean + 2 SE
Mean + 3 SE

Size of a Statistical Investigation


One question most commonly asked about the planning of a
statistical study is how many observations should be made?
In any review of this problem at the planning stage is likely to
be important to relate the sample to a specified degree of
precision.
Suppose we want to compare the means of two
population 1 and 2 assuming that they have the same known
standard deviation, , and two equal samples of size n are
to be taken. If the standard deviation are known to be different
the present result may be thought of as an approximation
(taking to be the mean of two values). If the comparison is of
two proportions, 1 and 2, may be taken approximately to
be the pooled value.

1 1 1
1
2
2
2 1

We now consider two ways in which the precision may


be specified.

92

Medical Statistics and Demography Made Easy

Given Standard Error


Suppose it is required that the standard error of the difference
between the observed means
and
is less than ;
equivalently the width of the 95% confidence interval might
be specified to be not wider than + 2. This implies

Given Difference to be Significant


We might require that if x1 x 2 is greater in absolute value
than some value d0, then it shall be significant at some
specified level (say at two sided test 2 level). Denote by u2;
(for
2
=
0.05,
u2 = 1.96). Then

DESIGN OF EXPERIMENTS
While planning of a clinical experiment to compare the effect
of various treatments on some type of experimental units.
Then the problem is how the treatments should be allotted to
these units.
The allotments of treatment to experimental units should
be such that the disparity between the characteristic of units
receiving different treatments should be eliminated. This
cannot be eliminated completely but it can be reduced if the
groups of experimental units to which treatments were to be
applied were made alike in various relevant respect.
The three basic principle of doing these are:
1. Randomization

Sampling and Design of Experiments

93

2. Replication
3. Local Control.
Randomization
In simplest form the randomization means that the choice of
treatment for each unit should be made by an independent
act of randomization (by toss of a coin or by using random
number table).
In clinical trials the total number of patients is often not
known in advance, since many patients may become available
for inclusion in the trial sometime after it started. The simplest
method is then to be allocate treatment by an independent
random choice for each treatment.
Replication
An important principle of experimental design is Replication,
the use of more than one experimental unit for each treatment.
Various purpose are served by replication:
(a) An appropriate amount of replication ensures that the
comparison between treatments are sufficiently precise,
the sampling error between two means decreases as the
amount of replication in each group increases.
(b) The effect of sampling variation can be estimated only if
there is an adequate number of degree of replication. For
example, In comparison of means of two groups, for
instance, if both samples were as low as 2, the degree of
freedom for a t test would only be 2, the critical point of
t at 2 degree of freedom are very high and the test
therefore loses a great deal in effectiveness merely
because of the inadequacy of the estimate of within group
variation.
(c) Replication may be useful in enabling observation to be
spread over a wide variety of experimental conditions.

94

Medical Statistics and Demography Made Easy

Local Control
The third basic principle concerns the reduction in random
variation between experimental units is Local control. As we
know that the formula for the standard error of a mean is
, shows that effect of random error can be reduced
either by increasing the n (number of replication) or by
decreasing . This suggests that experimental units should
be as homogenous as possible in their response to treatment.
In clinical trials, For example, it may be that a precise
comparison could be effected by restricting age, sex, clinical
conditions and other features of the patients, but these
restrictions may make it too difficult to generalized for the
result. A useful solution to this dilemma is to subdivide the
units into relatively homogenous groups called blocks.
Treatments can then be allocated randomly within blocks so
that each block provided a small experimental unit. The
precision of the overall comparison between treatments is
then determined by random variability within blocks rather
then between different blocks. This is called a randomized
Block Design.
There are some more complex designs allowing
simultaneously comparing more than one set of treatments.
But they are beyond the scope of this book.

MULTIPLE CHOICE QUESTIONS


1. If the mean is 230 and the standard error is 10, the 95%
confidence limits would be:
(a) 210 to 250
(b) 220 to 240
(c) 225 to 235
(d) 230 to 210
(AI, 89)

Sampling and Design of Experiments

95

2. All of the following are examples of random sampling


method except:
(a) Stratified sampling
(b) Quota sampling
(c) Systemic sampling
(d) Simple random sampling
(AI, 96, AIIMS, 2000)
3. Area under 2SD of normal curve is:
(a) 66%
(b) 95%
(c) 97%
(d) 99%

(AI, 93)

4. True regarding Double blind of people study:


(a) Participant is not aware to study or control group
(b) Neither the doctor not the participants is aware of
the group allocation and the treatment received
(c) The participants, the investigator and the person
analyzing the data are all blind
(d) All the above
(AI, 96)
5. Sampling error is:
(a)

(b)

(c)
(d) None
(AI, 2001)
[There are only two types of error for testing a hypothesis, error
is
type-I error and -error is type-II error, sampling error is
inherent in sample while estimating population parameters
on the basis of samples drawn, a proper sampling will reduce
the sampling error].
6. Which is true in cluster sampling:
(a) Every nth case is chosen for study
(b) Natural group is taken as sampling unit
(c) Stratification of the population is done
(d) Involves use of random number
[Cluster sampling clusters are elected by natural demarcation
and every unit of cluster is selected as sampling unit]
(AIIMS, 92)

96

Medical Statistics and Demography Made Easy

7. In a sampling method adopted for VIP coverage


evaluation survey of a district is:
(a) Random sampling
(b) Cluster sampling
(c) Stratified sampling
(d) Multistage sampling
(JIPMER, 80, Orissa 91)
8. If you are doing a survey of a village divide the
population into lanes and rows select 5 lanes random
and survey all houses of the lane is type of:
(a) Simple random sampling
(b) Stratified sampling
(c) Systemic sampling
(d) Cluster sampling
[Hint: In cluster sampling we divide the population into
clusters according to geographical criteria and then take all
units of the cluster; at least in first stage cluster sampling].
9. Simple random sampling. True is:
(a) Adjacent number is considered while taking sample
(b) Each unit has an equal chance of being drawn in
the sample
(c) Each portion of sample represents a corresponding
strata of universe
(d) None of the above
(AIIMS, 2001)
10. For a survey, a village is divided into 5 lanes then each
lane is sampled randomly. It is an example of:
(a) Simple random sample
(b) Stratified random sampling
(c) Systemic random sampling
(d) All of the above
(AIIMS, 96)
11. True about simple random sampling is:
(a) All person have equal right to be selected
(b) Only selected person have right to be selected

Sampling and Design of Experiments

97

(c) Techniques provides least number of possible


samples
(d) Every fixed unit is taken for sampling
(AIIMS, June 98)
12. If sample size is bigger in random sampling, which of
the following is/are true:
(a) It approaches maximum samples
(b) Reduces non-sampling error
(c) Increases the precision of the result
(d) Decrease standard error
[Hint: Precision is inversely proportional to standard error,
to double the precision we have to reduce the standard error to
half, thus increasing the sample size four times].
(AIIMS, June 99)
13. In a random sample the chance of being picking up is:
(a) Same and known
(b) Not same and not known
(c) Same and not known
(d) Not same but known
[Hint: If a sample of size n is drawn from a population of size
N the probability of selection of each unit is 1/N].
(AIIMS,Nov 99)
14. While calculating the incubation period for measles in
a group of 25 children, the standard deviation is 2 and
mean incubation period is 8 days. Calculate standard
error:
(a) 0.4
(b) 1
(c) 2
(d) 0.5
15. In a population of pregnant female. Hb is estimated on
100 women with standard deviation of 1 gm. The
standard error is:

98

Medical Statistics and Demography Made Easy

(a) 1
(c) 0.01

(b) 0.1
(d) 10

(AIIMS, Nov 2001)

16. In a controlled trial to compare two treatment, the main


purpose of randomization is to ensure that:
(a) Two groups will be similar in prognostic factors
(b) The clinician does not know which treatment the
subjects will receive
(c) The sample may be referred to a known population
(d) The clinician can predict in advance which
treatment the subjects will receive
(AIIMS, 2002)
17. Mean hemoglobin of a sample of 100 pregnant women
was found to be 10 mg% with a standard deviation
1.0mg%. The standard error of the estimate would be:
(a) 0.01
(b) 0.1
(c) 1.0
(d) 10.0
(AIIMS, 2004)
18. Which sampling method is used in assessing
immunization status of children under an
immunization programme:
(a) Quota sampling
(b) Multistage sampling
(c) Stratified random sampling
(d) Cluster sampling
[Hint: In cluster sampling we divide the population in small
cluster, which are representative of populations, Cluster
sampling involves less time and cost].
(AIIMS, 2004)

Chapter 8

Testing of Hypothesis

100 Medical Statistics and Demography Made Easy

Statistical Hypothesis
A statement about population which we want to verify on the
basis of information available from a sample.
Test a Statistical Hypothesis
It is a two-action decision problem after the experimental
sample values have been obtained, the two action being
acceptance or rejection of hypothesis under consideration.
Null Hypothesis
Null hypothesis is the hypothesis of no difference, which is
usually denoted by H0.
Alternative Hypothesis
Every statistical hypothesis is being tested to observe that
null hypothesis is accepted or rejected. Which is meaningful
only when it is being tested against a rival hypothesis. This
hypothesis is denoted by H1.
Wrongly rejecting a null hypothesis seems to be more
serious error than wrongly accepting it.
Critical Region
Let x1, x2, ........ xn be the sample observation denoted by O.
All the values of O will be aggregate of samples and they
constitute a space called sample space. We consider x1, x2,
........ xn as a point in n dimensional sample space.
We divide the sample space into two distinct parts and
.
We reject the null hypothesis HO if the observed sample
point fall in . The region is known as critical region.

Testing of Hypothesis 101

Figure 8.1

Types of Errors
Table related to decision and hypothesis.
Decision from sample
Accept H0

Reject H0

True statement H0 True


Correct
Wrong (Type-I error)
Correct
H0 False Wrong (Type-II error)

The probability of Type-I and Type-II errors are denoted


by and respectively.
= Probability of Type-I error, i.e. Probability of rejecting
H0 when it is true.

= Probability of Type-II error, i.e. probability of


accepting H0 when H0 is false.

Level of Significance
the probability of Type-I error is known as the level of
significance. It is also called the size of critical region.

102 Medical Statistics and Demography Made Easy

Power of Test
(1 ) is called the power of test to test the hypothesis H0
against alternative hypothesis H1
Since Type-I error is deemed to be more serious than the
Type-II error. The usual practice is to control Type-I error
at a predetermined level and choose a test which
minimizes .
Steps in Solving Testing of Hypothesis Problem
1. Explicit knowledge about the nature of population, about
which the hypothesis are set-up.
2. Setting up the null and alternative hypothesis.
3. Choose a suitable statistic called test statistic which will
reflect the probability of H0 and H1.
4. On the basis of test statistic, reject or accept the null
hypothesis.
Test of Significance
A very important aspect of sampling theory is the study of the
test of significance which enables us to decide on the basis of
sample results, if
(i) The deviation between the observed sample statistic and
the hypothetical parameter values or
(ii) The deviation between two independent sample statistic.
Is significant or might be attributed to chance or
fluctuating of sampling.
One Tailed and Two Tailed Tests
In any test, the critical region is represented by a portion of
the area under the probability curve of the sampling
distribution of the test statistic.

Testing of Hypothesis 103

A statistical hypothesis where the alternative hypothesis


is one tailed (right tailed or left tailed) is called a one tailed
test
For example, testing mean of a population
Against the alternative
is called one tailed test.
A test where the alternative hypothesis is two tailed such
as:

H0 : x
Against the alternative
Is called two tailed test.
Critical Values or Significant Values
The value of the test statistic which separates the critical
region (rejection region) and the acceptance region is called
critical value or significant value.
It depends upon:
(i) The level of significance used.
(ii) The alternative hypothesis, whether it is two tailed or
single tailed.
Suppose that the critical value of the test statistics at a
level of significance
The value of

for a two tailed test is given by

is such that the area between the left

and to the right of


is also
2
area is divided into two equal parts.
of

is

. Thus, the total

104 Medical Statistics and Demography Made Easy

Two Tailed Test (Level of Significance )

Figure 8.2

In case of singletail test, the critical value


is
determined so that total area to the right of it (for right tailed
test) is and for left tailed test the total area to the left of
is .

Figure 8.3

Figure 8.4

Testing of Hypothesis 105

Thus, the critical value of Z for a single tailed test (left or


right) at a level is same as the critical value of Z for a two
tailed test at a level of significance 2 .
Critical values (Z) of Z
Critical values
(Z)

Level of significance
1%

5%

10%

Right tailed test

Z 2.33

Z 1.96
Z 1.64

Z 1.64
Z 1.28

Left tailed test

Z 2.33

Z 1.64

Z 1.28

Two tailed test

TEST OF SIGNIFICANCE FOR LARGE SAMPLES


For large values of n, almost all the distribution are very
closely approximated by normal distribution. Thus we can
apply the normal test, which is based upon the fundamental
properties of normal probability curve (area property).
1. Compute the test statistic Z under H0.
2. If Z 3 , H0 is always rejected.
3. If
, we test its significance at certain level of
significance, usually at 5% and sometimes at 1% level of
significance.
Thus for a two tailed test if
> 1.96, H0 is rejected at 5%
level of significance. Similarly if
> 2.58, H0 is rejected at
1% level of significance.
For practical purpose, sample may be regarded as large if
n > 30.

106 Medical Statistics and Demography Made Easy

Sampling of Attributes
Sampling from a population is divided into two mutually
exclusive classes one class possessing a particular attribute
say A and other class not possessing that attribute
The presence of an attribute in a sampling unit may be termed
as success and its absence is failure.
Test for Single Proportion
If x is the number of success in n independent trials with
constant probability P.
Then observed proportion of success
proportion SE(p) =

and SE of

, where Q = 1 P.

Then test statistic


for large n
Under the null hypothesis that the sample proportion is
equal to population proportion, i.e. the sample is drawn from
the same population with proportion of success P.
The probable limits for normal variate of the observed
proportion of success are:

PQ
n
If P is not known than taking p (the sample proportion)
as an estimate of P. Then the probability limits for the
proportion in the population.
P 3 SE p , i.e. P 3

p3

pq
, where q 1 p
n

Testing of Hypothesis 107

In particular 95% confidence limits for P are p + 1.96

and 99% confidence limits for P is given by p + 2.58

TEST OF SIGNIFICANCE FOR DIFFERENCE


OF PROPORTION
Let x1 and x2 be the number of person possessing certain
characteristic (attribute), say A, in a random sample of size n1
and n2 from the two population respectively.
Then sample proportions are given by:

If P1 and P2 are the population proportion, then under


the null hypothesis H0 : P1 = P2, the test statistic for difference
of proportion.
p1 p2 ~ N 0, 1
Z

1
1
PQ

n1 n 2
Generally we do not have any information about the
proportion A of population in such circumstances the
estimate of population proportion under null hypothesis.

H 0 : P1 P2 P(say) is calculated. The estimate

of P

(n 1 p1 n 2 p2 )
and Q (1 P)
(n 1 n 2 )

Then, Test Statistic

108 Medical Statistics and Demography Made Easy

Solved Examples
Test for Single Proportion
QUESTION: Thirty peoples were attacked by a viral disease in a

village and only 28 survived. If the survival rate of this viral


infection is reported to be 85%. Then test whether the survival rate
by this infection in this village is more then the reported survival
rate at 5% level of significance.
SOLUTION:

Setting of Hypothesis
Null hypothesis: The survival rate in this village is equal to
proportion of survival = 0.85 the reported survival rate, i.e.
H0 : P = 0.85
Alternative hypothesis: Survival rate in this village is more than
85%, i.e. H1 : P > 0.85 (One tail test)
Total number of persons survived x = 28
Total number of person attacked by infection = 30

x 28
;
0.93.
n 30
The reported survival rate = 85%, i.e. P = 0.85;
Proportion of person survived; p
therefore

Q = 1 0.85 = 0.15

The Test Statistic:


p P
Z
~ N 0, 1
PQ
n
Z

0.93 0.85
0.85 0.15
30

Z 1.25

0.08
0.08

1.25
0.0042 0.064

Testing of Hypothesis 109

Tabulated value of Z at 0.05 (i.e. critical value) = 1.64 (For one


tailed test).
Because Zcal < Ztab; therefore Null hypothesis is accepted.
Conclusion: The survival rate in the village is not more than
the reported survival rate.
Test of Significance of Difference of Proportion
(When population proportion is not known):
QUESTION: A survey conducted by a health agency, it was found

that in Town A out of 876 births 45% were male, while in town B
out of 690 birth 473 were males.
Is there any significant difference in the proportion of male
child in the two towns.
SOLUTION:

Proportion of male child in Town A p1 = 0.45;


therefore

q1 = (1 p1) = (1 0.45) = 0.55

Total number of Birth in town A is 876, i.e. n1 = 876


In Town B out of 690 birth 473 were males therefore,

Setting of Hypothesis
Null hypothesis: There is no significant difference between the
proportion of male child in two towns, i.e. H0 : P1 = P2
Alternative hypothesis: H 1 : P1 P2 (Two tail test).
Because population proportion is not known, therefore we
have to estimate it from sample proportions:

110 Medical Statistics and Demography Made Easy

Q 1 0.55 0.45

therefore,
Test statistics:

p1 p 2
1
1
PQ

n
n
2
1

0.45 0.68
1
1
0.55 0.45

876 690

0.23
0.23

2.87
0.247 0.026 0.08

Critical value of Z at 5% level of significance (for two tail


test) = 1.96; which is less than Zcal. Thus null hypothesis is
rejected.
Conclusion: There is a significant difference between
proportion of male birth in two Towns.
Test of Significance for Single Mean
If x1, x2, ........... xn is a random sample from a normal population
with mean and SD , then for large samples the statistic
Z

x ~ N 0, 1

Under the null hypothesis H0 : x , i.e. the sample is


drawn from the population with mean .
If the population standard deviation is unknown then we
use sample standard as an estimate of
Confidence limits for :

Testing of Hypothesis 111

95% confidence limits for is

+ 1.96

and 99% confidence limits for is

+ 2.58

Test of Significance for Difference of Means


Let
be the mean of random sample of size n1 from a
population mean
and SD
, and
be the mean of an
independent random sample of size n 2 from another
population with mean
and SD
.
Under the null hypothesis
then the test
statistic becomes (for large samples).

Remarks:
1. If 12 22 2 , i.e. samples have been drawn from the
population with common SD s then under

2. If is not known, then its estimate based on sample


variance is used. The unbiased estimate of
by:
Estimate of

is given

112 Medical Statistics and Demography Made Easy

3. If 12 2 2 and

and

are not known then they

can be estimated on the basis of sample. This results in


some error, which will be very less and can be ignored if
samples are large. There estimated for large samples are
given by
and 2 2 S 2 2
In this case the test statistic is:
x1 x 2
Z
~ N 0, 1
S 12 S 2 2

n1 n 2
However if the sample sizes are small, then a small sample
test t-test for difference of means should be used.
Solved Example
Test of Significance for Single Mean
QUESTION: A sample of 900 individuals has a mean haemoglobin
of 12.7 mg%. Is the sample drawn from a population with mean
13.6 mg% and SD 2.70.
SOLUTION:

Setting of Hypothesis
Null hypothesis: The sample is drawn from the population
with mean 13.6, i.e. H 0 : 13.6.
Alternative hypothesis: H1 : 13.6 (Two tail test).
The Test Statistic:

x 12.7 13.6 0.9 0.9 1,

2.70
900

2.70
30

0.9

Z 1

Testing of Hypothesis 113

Critical value of Z at 5% level of significance (for two tail test)


= 1.96, i.e. Ztab = 1.96; which is more than the calculated
value of Z . Hence we accept the null hypothesis.
Conclusion: The sample is drawn from a population with
haemoglobin level 13.6 and SD 2.70.
Test of Significance for Difference of Mean
QUESTION: A random sample is drawn from two hospitals and

following data related to blood pressure of adult males hospital


workers were obtained:

Mean blood pressure


Standard deviation
No. of cases

Hospital A

Hospital B

127.56 mmHg
10.37 mmHg
700

140.78 mmHg
13.77 mmHg
360

Is the blood pressure of male workers of Hospital B is


significantly higher than those working in Hospital A.
SOLUTION:

Setting of Hypothesis
Null hypothesis: There was no significant difference between
the blood pressure of workers working in two hospitals, i.e

Alternative hypothesis:
Test statistics:

In this example

(one tail test).

114 Medical Statistics and Demography Made Easy

x1 = 127.56; S1 = 10.37; n1 = 700


= 140.78; S2 = 13.77 and n2 = 360
Putting these values in test statistic

13.22
13.22

16.12
0.82
0.153 0.526

The calculated value of Z is much higher than the


tabulated value of Z. Thus we can reject the null hypothesis.
Conclusion: The difference in the mean values of blood
pressure of workers of two hospitals is highly significant.
Thus we can say that the mean value of workers working in
Hospital B is significantly higher than those working in
Hospital A.
EXACT SAMPLING DISTRIBUTION
2 Distribution)
Chi-Square Distribution (
The square of standard normal variate is known as ChiSquare variate with 1 degree of freedom.
If x ~ N ( , 2 ), then

is a standard
2

x
normal variate then Z 2
is a Chi-Square

distribution with 1 degree of freedom.
In general if xi (i = 1, 2, ........n) are n independent normal
variate with mean i and variance i2 (i = 1, 2, ........n); then

Testing of Hypothesis 115

is a Chi-Square distribution with n


degree of freedom.
Remarks:
1. Normal distribution is a particular form of
distribution when n = 1
2.
- distribution tends to normal distribution for large
degree of freedom. In practice for n > 30, then
approximation to normal distribution is fairly good.

Degree of Freedom
The number of independent variate which make the statistic
(e.g.
) is known as degree of freedom and is usually
represented by (nu).
In general, the number of degree of freedom, is the total
number of observations less than number of independent
constraints.
In a set of n observations usually the degree of freedom
(df) for are (n 1) because of a linear constraint
on
frequencies.
Mean and Standard Deviation of
Mean and SD of
is n and

-distribution with n degree of freedom


respectively.

Mode and Skewness of


Mode of

- Distribution

- Distribution

distribution with n degree of freedom is (n 2)

Skewness =

116 Medical Statistics and Demography Made Easy


2
Skewness is greater than zero for n > 1 thus
distribution is positively skewed.
Further, skewness is inversely proportional to square of
roof of df it rapidly tends to symmetry as the df increases,
consequently as nincreases.

Figure 8.5

For n = 2 the curve will meet the y= f(x) axis at x = 0, i.e. at


f(x) = 0.5
For n = 1, it will be an inverted J-shaped curve.
Conditions for the Validity of

- Distribution

For the validity of Chi-Square test for goodness of fit between


theory and experiment. The following conditions must be
satisfied.
1. Sample observations should be independent.
2. N, total frequency should be reasonably large, say greater
than 50.
3. No theoretical cell frequency should be less than 5.

Testing of Hypothesis 117

Critical Values

Figure 8.6

The value

known as the upper (right-tailed)

point, or critical value, can be calculated from


table for
different values of n and .
The value of
increases as n (df) increases and
the level of significance decreases.
Application of

- Distribution

- distribution has large number of application. Some of


which are: (1) to test the Goodness of fit and (2) to test the
independence of attributes.
1. Goodness of fit: A very powerful test for testing the
significance of discrepancy between theory and
experiment. It enables us to find if the deviation of the
experiment from theory is just a chance or is it really due
to the inadequacy of theory to fit the observed data.
If Oi (i = 1,2, ........ n) is the set of observed (experimental)
frequencies and Ei (i = 1, 2, ........ n) are the corresponding
set of expected frequencies (theoretical or hypothetical),
then Chi-Square is given by:

118 Medical Statistics and Demography Made Easy

follow a distribution with (n 1) degree of freedom.


2. Independence of attributes:
Four-fold classification:
Comparison of two proportions (2 2 contingency table):
An alternative method of representing the proportions
is a 2 2 contingency table or fourfold classification.
The total frequency or grand total is split into different
dichotomies represented by two horizontal rows and
the two vertical columns. There are four combinations
(2 2) of rows and column categories and the
corresponding frequencies occupy the four inner cells of
the body of the table. The comparison can be done by
applying
significance tests (discussed for comparing
several proportions).
The 2 2 contingency table is described as:

Positive
Negative
Total

Group 1

Group 2

Group 1 + Group 2

r1
ni r1

r2
n2 r 2

R (r1 + r2)
NR

n1

n2

N (n1 + n2)

Manifold Classification
Comparison of several proportions (2 k contingency table):
The comparison of two proportions was considered from two
point of view the sampling error of the difference of
proportions and the
significance test.

Testing of Hypothesis 119

When more than two proportions are compared the


calculation of standard errors between pairs of proportions
requires several comparison, and an undue number of
significant differences may arise. provides a method by
which we can compare several proportions.
Suppose there are k groups of observations and that in
the ith group ni individuals have been observed, of whom ri
shows a certain characteristic (say being positive). The
proportion of positive,

is denoted by pi. The data may be

described as follows:
1

r1
ni r1

r2
n2 r2

ri
n i ri

Total

n1

n2

ni

nk

Proportion
positive

p1

p2

pi

pk

P=
R/N

Positive
Negative

All
groups

rk
R
n k rk N R

The frequencies form 2 k contingency table (there being


2 rows and k columns). test requires for each of the observed
frequency Oi, an expected frequency which is calculated by
the formula:

The quantity

is calculated and finally

120 Medical Statistics and Demography Made Easy

(O i Ei )2
Ei

The summation is over the 2k cells in the table.


On the null hypothesis that all k samples are drawn
randomly from populations with the same proportions of
2
positives, the is distributed approximately as (k 1)(2 1)
df
General Contingency Table (r s)
Let us consider two attributes A and B. A is divided into r
classes A1, A2, ........ Ar and B is divided into s classes B1, B2
........ Bs.
The cell frequencies can be expressed as (r s) manifold
contingency table.
A1

A2

A3

Ar

B1

(A1B1)

(A2B1)

(A3B1)

(ArB1)

B2

(A1B2)

(A2B2)

(A3B2)

(ArB2)

B3

(A1B3)

(A2B3)

(A3B3)

Bs

(A1Bs)

(A2Bs)

(A3Bs)

(ArB3)

(ArBs)

(Ai Bj) is the number of person possessing the attributes


(Ai) and (Bj) [ i =, 1,2, ....... r; j = 1, 2, ...... s].

Testing of Hypothesis 121

Also

where

(where Oij is the observed frequency of Col i and Row j


and Eij is the corresponding expected frequency.)
Under the null hypothesis that attributes are independent:
2
The - test

is distributed as

-variate with (r 1) (s 1) degree of freedom

SOLVED EXAMPLE
Fourfold Contingency Table
Comparison of Two Proportion (2 2 Contingency Table)
The same question mentioned while calculating difference of
proportion can also be expressed as follows:
Town A

Town B

Total

Male
Female

394
482

473
217

867
699

Total Births

876

690

1566

Two proportions can also be compared by applying

test.

Setting of Hypothesis
Null hypothesis: There is no significant difference between the
proportion of male birth of two Towns.

122 Medical Statistics and Demography Made Easy

The test statistic is:

Where Oi are the observed value and Ei are expected


values.
In this example there are four observed values two values
for males corresponding to Town A and B and two for females
for Town A and B (i.e. 394, 473, 482 and 217 respectively).
The expected value for these four observed values is
calculated as follows:
Expected value for 394, i.e E (394) =

867 876
484.98
1566

Similarly:

E(473)

867 690
382.01
1566

E(482)

699 876
391.01
1566

E(217)

699 690
307.98
1566

(394 484.98)2 (473 382.01)2 (482 391.01)2

498.98
382.01
391.01

(217 307.98)2
307.98

2 17.06 21.67 21.17 26.87 86.77


2
Calculated value of is much more than tabulated value

of at (2-1) (2-1) = 1 degree of freedom. Hence we reject the


null hypothesis.

Testing of Hypothesis 123

Conclusion: The proportion of male birth in two towns is not


same. In town B the proportion of male birth is much higher
when compared with town A.
Manifold Contingency Table
Comparison of Several Proportions: The 2 k Contingency
Table:
QUESTION: The following table showing the persons suffering

from Respiratory illness in different groups:

Presence of
respiratory illness
Absence
Total

Children

Adolescents

Adult

Elderly
people

Total

76

47

65

79

267

54

67

89

46

256

130

114

154

125

523

Find out that the proportion of persons suffering from respiratory


illness in different categories is same.
SOLUTION: In the above table there are eight observed values

corresponding to four columns and two rows. Therefore this


is a (2 4) contingency table.
The expected values corresponding to each observed
values are calculated as follows:

E(65)

267 154
78.61;
523

E(79)

267 125
63.81
523

124 Medical Statistics and Demography Made Easy

E(54)

256 130
63.63;
523

E(67)

256 114
55.80
523

E(89)

256 154
75.38;
523

E (46)

256 125
61.85
523

(76 66.36)2 (47 58.19)2 (65 78.61)2 (79 63.81)2


2

66.36
58.19
78.61
63.81

(54 63.63)2 (67 55.80)2 (89 75.38)2 (46 61.85)2


+
+
+

63.63
55.80
75.38
61.85

2
Critical value of at (2 1) (4 1) = 3 degree of freedom

and 5% level of significance is


than calculated value of
hypothesis.

= 9.35. Hence

tab is less

, therefore, we reject the null

Conclusion: The incidence of respiratory illness in different


groups is not same.
Exact Sampling Distribution
The entire sampling theory was based on the application of
normal test. However if the sample size n is small the normal
test cannot be applied. In such cases exact sample test was
developed. Some of these tests are:
1. t-test;
2. F-test;
3. Fisher Z transformation.
The exact sample tests can, however, be applied to large
samples also though the converse is not true.

Testing of Hypothesis 125

In all the exact samples tests, the basic assumption is


that the population (s) from which the sample (s) are drawn
is (are) normal.
Students t distribution: Let xi (i = 1, 2, .......... n) be a random
sample of size n from a normal population with mean and
variance

. Then the Students t- is defined by the statistic:

(x i x)2
xi
2
and S
is the unbiased estimate
(n 1)
n
of population variance.

Where x

Application of t-Distribution
t-distribution has a wide number of application some of
which are:

1. To test if the sample mean x differ significantly from


the hypothetical value of its population mean
.
2. To test the significance difference between two sample
means.
3. To test the significance between sample correlation
coefficient.
Assumptions for Students t Test
1. The parent population from which the sample is drawn
is normal.
2. The sample observations are independent, i.e. the sample
is random.
3. The population SD is unknown.

126 Medical Statistics and Demography Made Easy

t- Test for Single Mean


If x1, x2, ..........xn is a random sample drawn from a population
with a specified mean 0, then under the null hypothesis:

where S

x i x

(n 1)

follows a t distribution with (n 1) degree of freedom.


It calculated t > tabulated t, null hypothesis will be
rejected, at the level of significance adopted.
t - Test for Difference of Means
Suppose we want to test if
(a) Two samples xi; (i = 1, 2, ........... n1) and yj; (j = 1, 2, ...........
n2) have been drawn from the population with same
mean or
(b) Two samples x and y differ significantly or not.
Under the null hypothesis
(a) The sample have been drawn from the population with
same means, i.e. x = y or
(b) The sample means
The Test Statistics

and

do not differ significantly

Testing of Hypothesis 127

n1 1 S12 n 2 1 S 2 2

Where S 2
n1 n 2 2
Follows a t distribution with (n1 + n2 2) degree of freedom.

Assumptions of t- Test for Difference of Means


1. Parent population from which the samples have been
drawn are normally distributed.
2. The population variances are equal and unknown, i.e.
x 2 y 2 2 .

3. The two samples are random and independent of each


other.
Paired t-Test for Difference of Means
Paired t-test is applied
(i) When the sample sizes are equal.
(ii) The two samples are not independent but the sample
observations are paired together, the pair of observations
(xi, yi); (i = 1, 2, ........... n) corresponding to ith unit of the
sample.
Here instead of applying the difference of means, we
consider the increment.
Under the null hypothesis H 0 : d = 0 , i.e. the increment
are due to fluctuation of samples.
The Test Statistic:

Where d

di
(di d)2
and S 2
.
n
(n 1)

128 Medical Statistics and Demography Made Easy

t- Test for Testing Significance of Correlation Coefficient


If r is the observed correlation coefficient in a sample of n pair
of observations from a bi-variate normal population. The
under the null hypothesis that population correlation
coefficient is zero, the test statistic.
r
t
n 2
1 r2

Follows a student t distribution with (n 2) degree of


freedom. If t comes out to be significant then we reject H0.
SOLVED EXAMPLES
Test for Significance for Single Mean (For Small
Sample)
QUESTION: A random sample of 10 students has the following IQ

67, 110, 115, 75, 63, 117, 120, 115, 100 and 97.
Do these data support that the sample is drawn from a population
of Medical students with IQ =100.
SOLUTION:

Setting of hypothesis: The sample is drawn from a population


of medical student with IQ = 100, i.e. H 0 : 100 .
Alternative hypothesis: H1 : 100 (Two Tail Test)
The Test Statistic is:
t

Where S

xi2 n x

n 1

x 100
S
n

; is an unbiased estimate of

Testing of Hypothesis 129

From the above data we can calculate Mean and SD S;


which is equal to:
x

x i 976

97.6; and
n
10

(99558 10(97.6)2
(99558 95257.6)

21.85
(10 1)
9

By putting these values in test statistic we can calculate


the value of t

97.6 100
21.85
10

2.4
2.4

0.34
21.85 6.91
3.16

The tabulated value of t at (n 1) = 9 degree of freedom


at 5% level of significance is 2.62.
The tabulated value of t is more than the calculated
value; hence we accept the null hypothesis.
Conclusion: The sample is drawn from the population of
medical students with IQ = 100.
t Test for Difference of Mean between Two
Independent Groups
QUESTION: Two groups of rats were placed on diets with high

and low protein contents and the gain in weight were recorded after
2 months. The results of gain in weight are as follows:
Group A (high protein diet): 140
146

117 160
107 102

123
114

145
121

127
132

107
153

97
120

63 110
115 120

120
150

96

74

86

Group B (low protein diet):

130 Medical Statistics and Demography Made Easy

Find out whether there is any significant difference between


the weight gain in rats of two groups.
SOLUTION:

Setting of hypothesis
Null hypothesis: H 0 : 1 2 ; and
Alternative hypothesis:
Mean and SD of the two groups can be calculated which will
be equal to:
Group A:
Group B: n 2 11; x 2 104.63 and S 2 24.68
The Test Statistics
x1 x 2
1
1
S

n1 n 2
Where S2 is the pooled estimate of variance and is equal to
t

S2

n 1 1 S12 n 2 1 S 2 2
n1 n 2 2

In this problem S2 = 454.73 (by putting the values of n1,


n2, S1 and S2 in the above formula)
Thus
S 454.73 21.32.
The test statistic will be equal to:
t

128.11 104.63
23.48
23.48

2.75
1 1
21.32 0.071 0.091 8.52
21.32

14 11

Tabulated value of t at (n1 + n2 2) degree of freedom,


i.e. 23 df is 2.04 which is less than calculated value of t.
Hence, we reject the null hypothesis.

Testing of Hypothesis 131

Conclusion: Weight gain of rats in Group A (high protein diet)


is significantly more than those rats which are on low protein
diet.
Paired t Test for Difference of Mean
QUESTION: In a clinical trial the anxiety score of 10 patients were

recorded (baseline value). A new tranquillizer was given to each


patient for one month. After one month the anxiety scores were
again recorded. Which are as follows:
Case
number

10

Baseline
values (xi)

23

21

24

19

17

26

22

17

12

15

After one
month (yi)

15

20

26

17

17

21

16

12

12

11

Find out whether the new tranquillizer is effective to


psychoneurotic patients.
SOLUTION:

Setting of hypothesis
Null hypothesis: There is no difference in mean anxiety score;,
i.e.

H0 : 1 2
Alternative hypothesis:
The Test Statistic
where di = xi yi

d is the mean of di and S is standard deviation of di

132 Medical Statistics and Demography Made Easy

The mean ad SD of di is calculated as follows:


Case No.

Base line
values (xi)

After one
month (yi)

di = xi yi

di2

1
2
3
4
5
6
7
8
9
10

23
21
24
19
17
26
22
17
12
15

15
20
26
17
17
21
16
12
12
11

8
1
2
2
0
5
6
5
0
4

64
1
4
4
0
25
36
25
0
16

Total

31 2= 29

175

(175 84.1)
3.17
9

Put these values in test statistic we can get the value of t


t

2.9
2.9
=
2.89
3.17

1.003

10

Tabulated value of t at (n 1) degree of freedom, i.e. 9


degree of freedom is 2.26; which is less than calculated value
of t = 2.89. Hence we reject the null hypothesis.
Conclusion: We can safely say that the new tranquillizer is
effective on psychoneurotic patients.

Testing of Hypothesis 133

t Test for Significance of Correlation Coefficient


QUESTION: If in a sample of 30 individuals, the correlation

coefficient between height and weight is r = +0.46. Find out whether


this correlation coefficient is significant in the population.
SOLUTION:

Setting of hypothesis
Null hypothesis: H 0 : 0 ; where is the population
coefficient, i.e. the observed sample correlation is not
significant of any correlation in the population.
Alternative hypothesis:
The Test Statistics
is distributed as t distribution with (n
2) degree of freedom.
In this problem r = +0.46; n = 30, putting these values in
the formula we get

0.46
2

1 0.46

30 2

0.46 5.29 2.43

2.76
0.88
0.88

Tabulated value of t at 28 degree of freedom and 5%


level of significant is 2.048 which is less than calculated value
of t. Thus we reject the null hypothesis.
Conclusion: On the basis of this sample we can say that there
is a significant positive correlation between height and weight
of individuals.

134 Medical Statistics and Demography Made Easy

F - Statistic
If X and Y are two independent Chi-Square variate with 1
and 2 degree of freedom, then F- statistic is defined by:
X Y
F /
1 2

Thus F is defined as the ratio of two independent ChiSquare variate divided by their respective degree of freedom
and it follows a F-distribution with (1, 2) degree of freedom.
Mode of F - Distribution
1. Since F > 0. mode exists if and only if 1 > 2
2. Mode of F-distribution is always < 1.
Skewness of F - Distribution
Coefficient of Skewness is given by:

Since mean > 1 and mode < 1. Hence F-distribution is highly


positively skewed.
Critical values of F - distribution

Figure 8.7

Testing of Hypothesis 135

Application of F - Distribution
F-test for Equality of Population Variance
Suppose we want to test
(i) Whether two independent samples xi; (i = 1, 2, ...... n1)
and yj, (j = 1, 2, ...... n2) have drawn from normal
population with same variance 2 .
(ii) Whether the two independent estimates of the population
variance are homogenous or not.
Under the null hypothesis

Where: Sx

xi x

(say)

n 1 1

and Sy

yj y

n 2 1

Follows F-distribution with 1 , 2 degree of freedom;


where

and

F-test for Equality of Several Means


F-test can be used for testing equality of several means using
the technique of Analysis of Variance (ANOVA).
COMPARISON OF SEVERAL GROUPS
One-way Analysis of Variance
The technique analysis of variance forms a powerful method
of analyzing the way in which the mean values of a variable
is affected by classifications of the data of various sorts. This
technique concerned with the comparison of means rather
than variances.

136 Medical Statistics and Demography Made Easy

t distribution for the comparison of the means of two


groups of data, distinguishing between the paired and
unpaired cases. The analysis of variance, is a generalization
of unpaired t test, appropriate for any number of groups, It
is entirely equivalent to unpaired t test when there are just
two groups.
Some examples of a one-way classification of data into
several groups are as follows:
(a) The reduction in blood sugar recorded for groups of
individuals given different doses.
(b) The values of certain lung function test recorded for men
of the same age group in a number of different
occupational categories.
Suppose there are k groups of observations on a variable
y, and that the ith group contains n i observations. The
numbering of the groups from 1 to k is quite arbitrary, although
if there is a simple ordering of groups it will be natural to use
this in the numbering.
Groups

........

........

All group
combined

Number of cases

n1

n2

........

ni

........

nk

N=

Mean of y

........

Sum of y
Sum of

y2

........

ni

= T/N

T1

T2

........

Ti

........

Tk

T=

Ti

S1

S2

........

Si

........

Sk

S=

Si

Note that the entries N, T and S in the final column are


the sum along the corresponding rows, but is not the sum
of

.
(

will be the mean of


)

if all the ni are equal otherwise

Testing of Hypothesis 137

In one way analysis of variance total sum of squares about


the mean of N values of y can be portioned into two parts:
(1) The sum of squares of each reading about its own mean
and
(2) The sum of squares of the deviations of each group mean
about the grand mean
(y ij y)2 (y ij y)2 (y i y)2

We can write this result as:


Total SSq = Within group SSq + Between SSq
Where SSq stands for sum of squares.
Now, if there are very large differences between group
means, as compared to with the within-group variation, the
between SSq is likely to be larger than within-group SSq. If on
the other hand, all the group means are nearly equal then
there is a considerable variation within groups. The relative
sizes of the between and within group SSq should be therefore,
provide an opportunity to assess the variation between group
means in comparison with that within groups.
The total sum of squares as well as sum of squares
between and within groups can be obtained by the following
formulae:
Total Sum of Squares:

y ij y

ij

T2
S
N

Within Sum of Squares:


For the ith group

yij y i
j

S i

T2
i
n
i

138 Medical Statistics and Demography Made Easy

Summing over k groups, therefore:

y ij y i
ij

T2
S1 1
n


T22
S 2

n2

T2
Si i
i
i ni
T2
S i
i ni

Tk 2
...... S k


nk

Between Sum of Squares:

yi y

Total SSQ Within group SSQ

ij

T2
S
n
T2
i
i ni


Ti 2
S

i n i

T2

N

Summarizing the results, we have the following formulae


for portioning the total sum of squares:

T 2 T2

1 N

Between groups

n1

Within groups

T2
S 1

i n1

Total

T2
N

Testing of Hypothesis 139

Testing for difference between mean of more than two


groups (i.e. k > 2):
Suppose that the ni observations in the ith group from a
random sample from a population with mean i and
variance 2 , As in two sample t-test we assume that is same
for all groups. To examine the evidence for the difference
between the i we shall test the null hypothesis that the i do
not vary, being equal to some common value . There are
three ways for estimating . These are as follows:
From total sum of squares: The whole collection of N
observations may be regarded as a random sample of size N,
and consequently:

Is an estimate of 2 .
From within group SSq: Separates unbiased estimated may
got for each group in turn:

A combined estimate based purely on variation within


groups may be derived by adding the numerator and
denominator of these ratio to gibe within group mean sum of
squares (or MSSq):
S2W

Within group SSq within group SSq

n i 1
N k

From between groups SSq: Since both S2T and S2w are
unbiased estimate of 2 . By subtracting them we can get the
third unbiased estimate by the between groups mean square.

140 Medical Statistics and Demography Made Easy

This we can form the analysis of variance table:


Source

df

Between groups k 1

Sum of squares

Ti 2 T 2

B
i n i N

Within groups

Ti 2
S

A B
Nk

i ni

Total

N 1 S

Mean sum
of squares F-ratio

S2B

S 2B

S 2w

S2w

T2
A
N

The difference between means could be made to depend


largely on the F-test in the analysis of variance at 1 = (k 1)
and 2 =(N k) degree of freedom.
If k = 2 the situation considered above is precisely that
for which the unpaired (or two sample) t test is. The variance
ratio, F will have 1, and N 2 degree of freedom at t will have
n1 + n2 2, i.e. (N 2) degree of freedom.
The value of F is equal to the square of the value of t. The
distribution of F on 1 and N 2 degrees of freedom is
precisely the same as the distribution of the square of a
variable following t distribution on N 2 degree of freedom.

Testing of Hypothesis 141

If k > 2 we may examine the difference between a


particular pair of mean, choose because the contrast between
these particular groups is of logical interest.
The standard error of the difference between two mean,
say

and

may be estimated by:

and the difference

is tested by referring:

To the t distribution with N-k degree of freedom. (Since


this is the number of degree of freedom associated with the
estimated variance s2). Confidence limits for the difference in
mean may be set in usual way, using tabulated percentiles of
t on N-k degree of freedom. The only function of the analysis
of variance in this particular comparison has been replace
the estimate of variance on n1 + n2 2 degree of freedom
(which would be used in the two samples).
Solved Example
Comparison of Several Means (ANOVA)
QUESTION: In a clinical trial, Twenty patients undergoing

operation were divided into four groups. Four different Anaesthetic


drugs were tested. The drugs were alloted at random in these groups.
The blood pressure was recorded just after induction. The results of
this trial was as follows:

142 Medical Statistics and Demography Made Easy


Group 1

Group 2

Group 3

Group 4

179
138
134
198
103

178
175
112
165
186

172
135
135
182
150

181
186
180
172
178

Find the affect of different drugs on blood pressure in patients.


SOLUTION:

Setting of hypothesis
Null hypothesis: There is no significant difference between the
mean values blood pressure between groups,
i.e. H0 : 1 2 3 4
Alternative hypothesis:
One way analysis of variance:
Group 1 Group 2 Group 3 Group 4

Total (Ti )
Number of cases (ni)
Mean (

Sum of squares (Si = yi


Ti2/n i

2)

All
groups

179
138
134
198
103
752
5

178
175
112
165
186
816
5

172
135
135
182
150
774
5

181
186
180
172
178
896
5

150.4

163.2

154.8

179.2

118,854

136,674

121,658

160682 S = 537886

113100.8 133171.2 119815.2 160563.2

Sum of squares between groups =

T =3238

N = 20

Testing of Hypothesis 143

Total sum of squares =

(T)2
S
[537, 886 524232.2] 13653.8
N
Analysis of variance table:
Source

Degree of
freedom

Sum of
squares

Mean sum
of squares

Sum
squares
between
groups

24 1 = 3

2418.2

2418.2

Error sum of
squares

19 3 = 16 (13653.8 2418.2)
16
= 11235.6
Sw 2= 702.25

F-value

SB2 = 806.06

Total sum of 20 1 = 19
squares

11235.6

13653.8

The critical value of F (from F table) at 3 and 16 degree of


freedom is Ftab = 3.24; which is more than calculated value of
F (From Analysis of variance table). Hence we accept the null
hypothesis, i.e. there is no significant difference between the
mean blood pressure values in four groups.
Conclusion: There is no significant different between the
blood pressure just after induction of different drugs. The
four drugs have same effect on blood pressure of patients.

144 Medical Statistics and Demography Made Easy

Comparison of mean values of blood pressure in Group 1 and


Group 4 on the basis of analysis of variance table:
Mean blood pressure of patients in Group 1 = 150.4
Mean blood pressure of patients in Group 4 = 179.2
Number of cases in both groups = 5
Standard error

The critical value of t at (N 2), i.e. 18 degree of freedom


is 2.10 which is more than the calculated value of t. Hence,
we accept the null hypothesis. That there is no significant
difference between the blood pressure values of group 1 and
group 4.
Thus by the use of analysis of variance table we can compare
the mean values of two groups also.

MULTIPLE CHOICE QUESTIONS


1.

pq
indicates:
n
(a) Standard error of proportion
(b) Difference between proportion
(c) Standard error of mean
(d) Standard deviation from the mean

(AI, 93)

Testing of Hypothesis 145

2. The number of degree of freedom in a table of (4 4)


is:
(a) 4
(b) 8
(c) 9
(d) 16
(AI,95)
3. Confidence limits is:
(a) Range and standard deviation
(b) Median and standard error
(c) Mean and standard error
(d) Mode and standard deviation

(AI,99)

4. All are true regarding student t-test except:


(a) Standard error of mean is not estimated
(b) Standard population is selected
(c) Two samples are compared
(d) Students t- map (table) is required for calculation
(AI, 2000)
5. A community has a population of 10,000 individuals,
beta carotene was given to 6,000 individuals and the
remaining population was not given beta carotene.
After some time 3 in the first group developed lung
cancer and 2 in the second group also developed lung
cancer. The correct statement is:
(a) Beta carotene and lung cancer have no association
(b) The P-value is not significant
(c) The study is not designed properly
(d) Beta carotene is associated with lung cancer
(AI, 2001)
6. If the mean is 230 and the standard error is 10, the 95%
confidence limits would be:
(a) 210 to 250
(b) 220 to 240
(c) 225 to 235
(d) 230 to 210
(AI, 89)

146 Medical Statistics and Demography Made Easy

7. Significant p value is all except:


(a) 0.005
(b) 0.05
(c) 0.01
(d) 0.1
8. The mean BP of a group of persons was determined
and after an interventional trial, the mean BP estimated
again. All the test to be applied to determine the
significance of intervention is:
(a) Chi-Square
(b) Paired t test
(c) Correlation coefficient
(d) Mean deviation
(AIIMS, 95)
9. Which of the following is a pre-requisite for the ChiSquare test to compare:
(a) Both samples should be mutually exclusive
(b) Both sample need not be mutually exclusive
(c) Normal distribution
(d) All of the above
(UPSC 2000)
10. If a group of persons taking part in a controlled trial of
an anti-hypertensive drug the blood pressures were
measured before and after giving the drug. Which of
he following tests will you use for comparison:
(a) Paired t-test
(b) F test
(c) t-test
(d) Chi-Square test
(AIIMS,2000, Dec 97)
11. About test of significance between two large
population, one of the following statement is true:
(a) Null hypothesis states that two means are equal
(b) Standard error of difference is the sum of the
standard error of 2 means
(c) Standard error of means are equal

Testing of Hypothesis 147

(d) Standard error of difference between population is


calculated
[Hint: Null hypothesis is usually the hypothesis of no
difference, is to be tested for the possible reason of rejection
under the assumption that it is true.The denominator for test
of difference between two population is the standard error of
difference of means or proportion not the standard error of
difference between population].
(AIIMS, Dec 98)
12. True about Chi-Square test is:
(a) Null hypothesis is equal
(b) Doesnt measures the significance
(c) Measures the significant difference between two
proportions
(d) Test correlation and regression
(AIIMS, June 99)
13. For 95% confidence limits true is:
(a) 1.95 of standard error of mean
(b) Reduces 95% of values
(c) 2.95 of standard error of mean
(d) Normal distribution + 2.5 SD
(AIIMS, June 95)
14. Standard error of mean indicates:
(a) Dispersion
(b) Distribution
(c) Variation
(d) Deviation
[Hint: Standard error is merely the standard deviation of some
statistic calculated from a sample (in this case, the mean) is an
indefinitely long series of repeated sampling].
(AIIMS, Nov. 99)
15. In a p test p indicates the probability:
(a) Accepting null when it is false
(b) Accepting when it is true
(c) Rejecting null when it is true
(d) Rejecting null when it is false
[Hint: Level of significance is also the critical region]
(AIIMS,June 2000)

148 Medical Statistics and Demography Made Easy

16. In a group of 100 children, the weight of a child is 15


kg. The standard error is 1.5 kg. Which one of the
following is true:
(a) 95% of all children weigh between 12 and 18 kg
(b) 95% of all children weigh between 13.5 and 16.5
(c) 99% of all children weigh between 12 and 18
(d) 99% of all children weigh between 13.5 and 16.5
(AIIMS,May 2001)
17. A group tested for a drug shows 60% improvement as
against a standard group showing 40% improvement.
The best test to test the significance of result is:
(a) Students t test
(b) Chi-Square test
(c) Paired t test
(d) Test for variance
(AIIMS, Nov 2001)
18. A test was done to compare serum cholesterol levels in
obese and non-obese women. The test for significance
of difference is:
(a) Paired t test
(b) Students t test for independent variables
(c) Chi-Square test
(d) Fisher test
(AIIMS, Nov 2001)
19. Which of the following is a parametric test of
significance:
(a) U test
(b) t test
(JIPMER, 2003)
20. For testing the statistical significance of the difference
in heights of school children among three
socioeconomic groups, the most appropriate statistical
test is :
(a) Students t test
(b) Chi-Square test

Testing of Hypothesis 149

(c) Paired t test


(d) One way analysis of variance (one way ANOVA)
(AI, 2002)
21. In a study, variation in cholesterol was seen before and
after giving a drug. The test which would give its
significance is
(a) Unpaired t test
(b) Paired t test
(c) Chi-Square test
(d) Fishers test
(AI, 2002)
22. An investigator wants to study the association between
maternal intake of iron supplements (Yes/ No) and
birth weights (in gm) of newborn babies. He collects
relevant data from 100 pregnant women and their
newborns. What statistical test of hypothesis would you
advise for the investigator in this situation ?
(a) Chi-Square test
(b) Unpaired or independent t-test
(c) Analysis of variance
(d) Paired t-test
[Hint: The investigator classify the pregnant women into two
groups depending upon intake of iron supplement. Thus there
are two independent groups and mean birth weights of the
babies can be compared].
(AIIMS, 2003)
23. A randomized trial comparing the efficacy if two drugs
showed a difference between the two with a p value
of < 0.005. In reality, however, the two drugs do not
differ. This is therefore is an example of:
(a) Type-I error (-error) (b) Type-II error ( error)
(c) 1
(d) 1
[Hint: Rejecting null hypothesis, when it is true is called typeI error]
(AIIMS, 2002)

150 Medical Statistics and Demography Made Easy

24. If we reject null hypothesis when it is actually true, is


known as:
(a) Type I error
(b) Type II error
(c) Power
(d) Specificity
(AIIMS, 2004)
25. A randomized trial comparing the efficacy of two drugs
showed a difference between two (with a p valuse <
0.05). Assume in reality, however the two drugs do not
differ. This is therefore an example of:
(a) Type I error ( error)
(b) Type II error ( error)
(c) 1
(d) Power of Test.
(AIIMS, 2004)
26. The Hb level in healthy women if 13.5 g/dl and
standard deviation is 1.5 g/dl, what is the Z score for a
women with Hb level 15.0:
(a) 9.0
(b) 10.0
(c) 2.0
(d) 1.0
(AIIMS, 2004)

Chapter 9

Non-parametric
Tests

152 Medical Statistics and Demography Made Easy

Non-parametric (NP) tests does not depend on the particular


form of the basic frequency function from which the samples
are drawn.
Non-parametric tests does not make any assumption
regarding the form of the population.
Advantages of Non-parametric Tests
1. Non-parametric methods are very simple and easy to
apply.
2. No assumption is made about the form of frequency
function of the parent population from which the sample
is drawn.
3. NP tests can apply to the data which are mere
classification (i.e. which are measured in nominal scale).
4. NP tests are available to deal with the data which are
given in ranks, or whose seemingly numerical score have
the strength of ranks (i.e. scores are given in grades, i.e.
A, A, A+, B, B+).
Disadvantages of Non-parametric Tests
1. NP tests can only be used if the measurements are
nominal or ordinal. If a parametric test exists it is more
powerful than NP tests.
Remarks
Since no assumption is made about parent population, the
non-parametric methods are some times referred as
distribution free methods.
These tests are based on the Ordered Statistic theory. A
sample x1, x2 ......... xn is an ordered sample. If x1 < x2 < x3 .........
< xn .
The whole structure of NP methods rests on simple but
fundamental property of order statistic.

Non-parametric Tests 153

Run Test
Suppose x1, x2 ............ xn1 is an ordered sample from a
population and y1, y2, ............ yn2 be an independent ordered
sample from other population. We want to test if the samples
have been drawn from the same population or from different
population.
Let us combine two samples and arrange the observations
in order of magnitude to give the combined ordered sample:
x1, x2

y1, y2, y3

x3, x4, x5

y4, y5

1(l = 2)

2(l = 3)

3(l = 3)

4(l = 2)

x6 ............

Run: A run is defined as a sequence of one kind


surrounded by a sequence of other kind and the number of
elements in a run is usually referred as the length l of the
run.
If both samples came from same population, there would
be a thorough mingling of xi and yj in combined sample and
the number of runs in the combined sample would be large.
On the other hand if the samples came from two different
population then their ranges do not overlap, then there would
be only two runs. Of the type x1, x2 ............ xn1 and y1, y2, ............
yn2.
Generally, any difference in mean and variance would
tend to reduce the number of runs. Thus alternative hypothesis
will entail too few runs.
Procedure: In order to test the null hypothesis that the
samples have come from the same population. We have to
count the number of runs U in the combined ordered sample.
When n1 and n2 are large then under null hypothesis U
is asymptomatically normal with
2n l n 2
Mean (U)
1 and
nl n2

154 Medical Statistics and Demography Made Easy

Variance (U)

2n l n 2 2n l n 2 n l n 2

n l n 2 2 n l n 2 1

Thus we can use the normal test:

U Mean U
Variance U

~ N 0, 1

This approximation is fairly good if each of n1 and n2 is


greater than 10. Since alternative hypothesis is too few
runs the test is ordinarily one tailed with only negative
values leading to the rejection.
OTHER NON-PARAMETRIC TESTS
Median Test
Median test is a statistical procedure for testing, if the two
independent ordered samples differ in their central
tendencies.
If x1, x2 ........ xn1 and y1, y2, ........ yn2 be two independent
ordered samples and z1, z2, ........ zn1 + n2 be the combined
ordered sample.
Let m1 be the number of xs and m2 be the number of ys
exceeding the median value of combined series.

No. of observations > Median


No. of Observations < Median
(m1+m2)
Total

Sample 1

Sample 2

Total

m1
n1 m1

m2
n2 m2

m1 + m2
(n1+n2)

n1

n2

(n1 + n2)

If the frequencies are small we can compute the exact


probabilities. However, if the frequencies are large, we may

Non-parametric Tests 155

use 2 test with 1 degree of freedom for testing H0 (the null


hypothesis, that the samples came from the same population).
The approximation test is fairly good, if both n1 and n2
exceed 10.
Sign Test
Sign test is used under the following conditions:
(a) When any given pair of observations two things being
compared.
(b) For any pair, each of the two observations is made under
similar extraneous conditions.
(c) Different pairs are observed under different conditions.
Third condition (condition c) implies that di = (xi yi); i
= 1, 2, 3 ........ n have different variance and thus renders the
paired t test invalid, which would have otherwise being
used unless there was obvious non-normality.
Sign test is based on the sign (plus or minus) of the
deviation di = (xi yi). No assumptions are made regarding
the parent population. The only assumptions are:
(1) Measurements are such that the deviations di = (xi yi)
can be expressed in term of positive or negative.
(2) Variables have continuous distribution.
(3) dis are independent.
Different pairs (xi, yi) may be from different population (say
with respect to age, weight, stature, education). The only
requirement is that within each pair, there is matching with
respect to relevant extraneous factors.

156 Medical Statistics and Demography Made Easy

Procedure:
Let (xi, yi), i = 1, 2, 3 ........ n be n paired observations drawn
from the two population. Under the null hypothesis that two
population are equal. Find out the difference between each
pair of observations, i.e. di = xi yi.
Let us define Ui such that
If xi > yi (i.e. positive sign); Ui = 1; and if xi < yi (i.e. negative
sign) Ui = 0.
Since Ui; i = 1, 2, 3 ........ n are independent. Therefore

U U1
For large samples, (n > 30), we may regard U to be
asymptotically normal (under null hypothesis) with mean
and variance equal to:
Mean of U

n
and Variance
2

Thus,

and we may use Normal test.


Mann-Whitney Wilcoxon U Test
The non-parametric test for two samples was the most widely
used test when we do not make assumption about the parent
population.
Let x1, x2, ........ xn1 and y1, y2, ........ yn2 be two independent
ordered samples of size n1 and n2.

Non-parametric Tests 157

Mann-Whitney test is based on the pattern of xs and ys


in the combined order samples.
x1, x2, y1, y2, y3, x3, x4, x5, y4, y5, x6 ........
Let T denote the sum of ranks of the ys in the combined
sample. The rank of y in the combined sample is: 3, 4, 5, 8, 9
........
Then T = 3 + 4 + 5 + 8 + 9
U n1 . n 2

n 2 n 2 1

T
2
If T is significantly large or small then H0 will be rejected.
It has been established that under the null Hypothesis U
is asymptotically normally distributed with mean (, 2) where

Then

Hence

n n n n 2 1
n1 n 2
and 2 1 2 1
2
12

U
~ N 0, 1

A normal test can be used if both n1 and n2 are greater


than 8.
Z

Solved Example
Run Test
QUESTION: In the given set of data drawn from two populations;

Apply Run and test the hypothesis whether the samples are drawn
from the population with same distribution function:
xi 15 77 01 65 69 69 58 40 81 16 20 20 00 84 22
y j 28 26 46 66 36 86 66 17 43 49 85 40 51 40 10

158 Medical Statistics and Demography Made Easy


SOLUTION:

Setting the Hypothesis


Null hypothesis: The two populations have same distribution
function. H0: f1(.) = f2(.)
Alternative hypothesis: H1: f1(.) f2(.)
The Test Statistics:

Where

Mean U

2n1n 2
1 and
n1 n 2

Variance U

2n 1n 2 2n1 n 2 n1 n 2

n 1 n 2 2 n 1 n 2 1

Calculate the number of RUN is the combined ordered


series. For this first arrange xi and yj in ascending order:
S.No. 1
xi
yi

10 11 12 13 14 15

00 01 15 16 16 20 22 40 58 65 69 69 77 81 84
10 17 26 28 36 40 40 43 46 49 51 66 66 85 86

Combine the two series in ordered form in terms of xi and yj:


x1, x2,

y1,

x3, x4, x5,

y2,

x6, x7,

y3, y4, y5,

x8,

y6, y7, y8, y9, y10, y11,

x9, x10,

y12, y13,

10

.x11, x12, x13, x14, x15,


11

y14, y15
12

Non-parametric Tests 159

Thus, we can see that in the combined series there are 12


runs (the sequence of one kind of series). Therefore U = 12
(Total number of Runs).
The mean and variance of U:
Mean U

Variance U

2 15 15
1 15 1 16; and
15 15

2 15 15 2 15 15 15 15
2

15 15 15 15 1

450 450 30

30 2 29

450 430 193500

7.43
900 29
26100
Thus the test statistic Z is
Variance U

12 16
4

1.47
7.43 2.72
The tabulated value of Z is more than the calculated value
(i.e. Z = 1.47). Hence, we accept the null hypothesis. That the
distribution of two populations is same.
Conclusion: The distribution of two populations from
which the two samples are drawn is same.
Z

Sign Test
QUESTION: In the above example if (xi, yi ) be the pair of

observations are drawn from the two population Then apply sign
test and find out whether the distribution of two population are
equal:
xi 15 77 01 65 69 69 58 40 81 16 20 20 00 84 22
y j 28 26 46 66 36 86 66 17 43 49 85 40 51 40 10

160 Medical Statistics and Demography Made Easy


SOLUTION:

Setting of Hypothesis
Null hypothesis: The two populations have same distribution
function. H0: f1(.) = f2(.)
Alternative hypothesis: H1: f1(.) f2(.)
The Test Statistic is

S.no.

xi
yj

15 77 01 65 69 69 58 40 81 16 20 20 00 84 22
28 26 46 66 36 86 66 17 43 49 85 40 51 40 10
+ + + + + +

di =
(x i y i )

10 11 12 13 14 15

Ui = 1, if xi > yi (i.e. positive sign) and 0 if xi < yi (i.e. negative


sign)
U U i 6 (There are total 6 pairs in which xi > yi).
Thus Test statistic Z is:

Tabulated value of Z is more than the calculated value.


Hence, we accept the null hypothesis, i.e. the distribution
functions of two populations are same.
Conclusion: The two sample are drawn from the same
population

Non-parametric Tests 161

Mann-Whitney U Test
QUESTION: In the same set of data Apply Mann-Whitney U test to

compare the distribution function of the population.


The combined observations of two series are arranged in ascending
order: (As in Run Test):
Ranks 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
x1 x2 y1 x3 x4 x5 y2 x6 x7 y3 y4 y 5 x8 y6 y 7
Ranks 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
y8 y9 y10 y11 x9 x10 y12 y13 x11 x12 x13 x14 x15 y14 y15

T (sum of ranks of y in combined ordered series) is


calculated from the above table, which is equal to:
T = 3 + 7 + 10 + 11 + 12 + 14 + 15 + 16 + 17 + 18 + 19 + 22
+ 23 + 29 + 30 = 246

U n1 . n 2

n 2 n 2 1
15 15 1
T 225
246
2
2

225 120 246 99


Mean and variance of U is:

Mean (U)
Variance

(U)

n 1 .n 2 n 2 n 2 1
12

15 . 15 15 15 1 225 31

12
12

581.25
Thus, test statistics Z is

99 112.5 13.5

0.55

24.11 24.11

162 Medical Statistics and Demography Made Easy

Tabulated value of Z is more than the calculated value of


Z. Hence, we accept the null hypothesis, i.e. the two samples
are drawn from the same population.
Conclusion: The distribution function of the populations
from which the two samples are drawn is same.

MULTIPLE CHOICE QUESTIONS


1. Statistical tests that are non-parametric include:
(a) Regression
(b) Correlation
(c) The students test
(d) Rank correlation
(e) Wilcoxon rank sum test
(PGI, 80, AIIMS 80)
2. If the distribution of population is not known which of
the following test will be used:
(a) F-test
(b) Students t test
(c) ANOVA
(d) Sign test
3. For large sample size Mann-Whitney U test the test
statistics U is Normally distributed with:
(a) N (, 1)
(b) N (, 2)
2
(c) N (0, )
(d) N (0, 1)

Chapter 10

Statistical Methods
in Epidemiology

164 Medical Statistics and Demography Made Easy

Epidemiology is a study of the distribution and determinants


of health related states or events in a specified population.
Epidemiology is by definition concerned with certain
problems affecting groups of individuals rather then single
subjects.
In broad terms Epidemiology is concerned with the
distribution of disease, chronic as well as communicable
diseases which gives rise to epidemics of the classical sort.
Some important terms used in epidemiological studies:
Baseline: Health state (disease severity, confounding
condition) of individuals at the beginning of a prospective
study. A difference (asymmetry) in the distribution of baseline
values between groups will bias the results.
Blinding (Masking): Blinding is a method to reduce bias
by preventing observers and/or experimental subjects
involved in any analytic study from knowing the hypothesis
being investigated, the case control classification, the
assignment of individuals or groups, or the different treatment
being provided. Blinding reduces bias by preserving symmetry
in the observers measurements and assessment. This bias is
usually not due to deliberate deception but due to human
nature and prior held belief about the area of study.
Placebo: A placebo is the dummy treatment used in a
control in place of actual treatment. If a drug is being evaluated,
the inactive carrier is used along with active drug. So it is as
similar as possible in appearance and in administration to
the active drug. Placebo are used to blind observers and for
human trials, the patient to which group the patient is
allocated.
Case definition: The set of history, clinical sign and
laboratory findings that are used to classify an individual as
a case or not for an epidemiological study. Case definition

Statistical Methods in Epidemiology 165

are needed to exclude individuals with the other conditions


that occurs at an endemic background, rate in a population
or other characteristics that will confuse or reduce the
precision of a clinical trial.
Cohort: A group of individuals identified on the basis of
a common experience or characteristic that is usually
monitored over time from the point of assembly.
Experimental unit: In an experiment, the experimental
unit are the units that are randomly selected or allocated to a
treatment and the unit upon which the sample size is
calculated and subsequently data analysis must be based.
Prospective study (Data): Data collection and the events
on interest occur after individuals are enrolled (e.g. clinical
trials or cohort studies) This prospective collection enables
the use of more solid consistent criteria and avoid potential
biases or retrospective recall. Prospective studies are limited
to those conditions that occurs relatively frequently and to
studies with relatively short follow-up periods so that
sufficient number of eligible individuals can be enrolled and
followed within a reasonable period.
Retrospective study (Data): All events of interest have
already occurred and data are generated from historical
records (secondary data) or from recall (which may result in
the presence of significant recall bias). Retrospective data is
relatively inexpensive compared to prospective studies
because of the use of available information and is typically
used in case-control studies. Retrospective studies of rare
conditions are much more efficient than prospective studies.
Basic Measures of Epidemiology
Measurements of epidemiology includes the following:
1. Measurement of mortality, morbidity, etc.

166 Medical Statistics and Demography Made Easy

2. Measurement of the presence or absence or distribution


of the characteristic or attributes of the disease.
3. Measurements of demographic variables.
4. Measurement of the presence, absence or distribution of
the environmental and other factors suspected of causing
the disease.
Parameters of Measurements
Epidemiologist usually express disease magnitude as rate,
ratio or proportion. These three are the basic parameters of
measuring epidemiology.
Rate: A rate measures the occurrence of some particular
event (occurrence of death or disease) in a population during
a given period of time. The rate is expressed as per thousand.
For example:
Death rate

Total number of deaths in a year


1000
Mid year population

The rates can be broadly classified as


(1) Crude rate.
(2) Specific rates.
(3) Standardized rates.
Ratio: In ratio the numerator is not a component of
denominator. The numerator and denominator may involve
an interval of time and may be instantaneous in time.
For example:
In sex ratio (Male: Female), the numerator will be the
number of males population during a given period, and the
denominator will be the number of female population during
the same period. If number of males = a and number of
females = b
a
Then, Ratio
b

Statistical Methods in Epidemiology 167

Thus we can see that the numerator is not a component


of denominator.
Proportion: A proportion is a ratio which indicate the
relation of magnitude of a part of the whole. The numerator is
always included in the denominator. A proportion is usually
expressed as percentage.
In the above example the proportion of Male: Female is:

Proportion of male in the population

a
100
a b

Numerator: Numerator refers to number of times an event


occurs. The numerator is a component of denominator in
calculating rates but not in a ratio.
Denominator: Literal meaning of denominator is the
number below line in a fraction. In epidemiology generally,
we use three types of denominator.
Mid year population: While calculating rates (death, birth)
the denominator comprises the mid-year population.
Because of the population size changes daily due to birth,
deaths and migration, therefore, we use mid year population
as a denominator for calculating rates. The mid year
population refers to the population estimated as on 1st July
Population at risk: For calculating morbidity statistics the
population exposed to risk is used as denominator. The term
is applied to all those to whom an event could have happened
whether it did or not.
For example: While calculating general fertility rate, the
women of reproductive age group (15-49 years) is taken as
denominator, because women < 15 years and > 49 years of
age generally does not give birth, therefore, they are not
exposed to risk.

168 Medical Statistics and Demography Made Easy

Related to events: In some situation the denominator may


be related to total events instead of total population.
For example: While calculating maternal mortality rate
the denominator will be number of live birth.
Measurements of Mortality
The measures of mortality are Crude Death Rate, Age Specific
Death Rates, Standardized Death Rates. Which will be
discussed in details in the following heading.
Measurements of Morbidity
Morbidity is defined as any departure, subjective or objective
from the state of physiological well-being.
The morbidity could be measured in terms of three units.
(a) Person who were ill.
(b) The illness (period or spell) that these persons
experienced.
(c) The duration (says or weeks, etc.) of illness.
Disease is frequently measured by incidence and
prevalence rates (though prevalence is referred as rates, but it is
actually the ratio).
Incidence rate (Person): The number of new cases occurring
in a defined population during a specified period of time.

No. of new cases of a specified


disease during a given period of time
Incidence rate
1000
Population at risk during that period
Persons

Example: If there are 1,000 new cases of illness in a


population of 50,000 in a year then incidence rate is:
Incidence rate

1000
1000 20 per thousand per year
50, 000

Statistical Methods in Epidemiology 169

Incidence rate must include unit of time, the incidence of


disease in the above example is 20 per 1000/year.
Features of incidence rate:
1. Only New cases
2. During a given period of time
3. In a specified population (population at risk)
4. Unit of time should be mentioned.
Incidence rate (Spells): The number of new spells of illness
in a defined population during a specified time.

No. of spells of sickness starting in


a defined period of time
Incidence rate
1000
Mean number of persons exposed
Spells

to risk in that period


Incidence measures the rate at which new cases are
occurring in a population. It is not influenced by the duration
of disease.
Use of incidence rate: Incidence rates are useful in
determining the causality of diseases.
The incidence rate is useful for taking action
(a) To control disease.
(b) Distribution of disease and efficacy of prevention and
therapeutic measures.
If the incidence rate is increasing, it might indicate failure
or ineffectiveness of the current control programme and
there is a need for a new disease control programme.
Prevalence: The total number of all individuals who have
an attribute or disease at a particular time (or during a
particular period) divided by the population at risk of having
attribute or disease at this point of time or mid way through
the period.

170 Medical Statistics and Demography Made Easy

Prevalence refers specially to all current cases (old and


new) existing at a given point of time, or over a given period
of time in a given population:
Prevalence are of two types:
(1) Point prevalence.
(2) Period prevalence.
Point prevalence: Point prevalence of a disease as a
measure of all cases (old and new) of a disease at one point of
time in relation to defined population.
No. of all current cases old and new
of a specified disease existing at a
Point
given po int of time
Pr evalence
100
Estimated population at the
same point of time

In point prevalence point may be a day, several days or even


few weeks depending upon the time it takes to examine the
population.
Period prevalence: It includes cases arising before but
existing into or through the year as well as those cases arising
during the year.
Period prevalence it is a combination of point prevalence and
incidence.
No. of all current cases old and new
of a specified disease existing at a
Period
given po int of time
Pr evalence
100
Estimated mid int erval
population at risk

Incidence and Prevalence can best explained by following


Figure

Statistical Methods in Epidemiology 171

Figure 10.1

From the above figure number of new cases in the given


period (January 2000 December 2000) are 3 (case 2, 5 and 8).
Therefore for incidence, number of new cases will be 3.
For point prevalence at January 2000, three cases will be
included (case 3,6, and 7). While for point prevalence at
December 2000 2 cases will be included (case 5 and case 8).
For period prevalence (during a period from January 2000
to December 2000) 6 cases will be included (Case 2, 3, 5, 6,
7and 8; case 2, 5 and 8 are new cases and 3, 6 and 7 are old
cases). Case no 1 and 4 are excluded because these two cases
fell outside the given period).
Use of Prevalence
Prevalence helps to estimate the magnitude of health/disease
problem in the community and to identify potential high risk
population.
Prevalence data provide an indication of the extent of a
condition and may have implications to the provision of
services needed in a community.

172 Medical Statistics and Demography Made Easy

Prevalence rate is especially useful for administrative and


planning purpose.
Both measures of prevalence are proportions - as such
they are dimensionless and should not be described as rates
(Friis and Sellers, 1999).
Friis RH and Sellers TA Epidemiology for public health
practice 2nd ed., Aspen Publishers, Inc. (1999).
Incidence

# New cases*
Population at risk*
* During specified time period
Prevalence

Remember, incidence means NEW. Prevalence means ALL.


Relation between Incidence and Prevalence
If the population is stable and incidence and duration are
unchanging:
Then Prevalence = Incidence Duration
Or

Incidence =

And

Duration =

Statistical Methods in Epidemiology 173

From the above relation we can say that the longer the
duration of disease the prevalence rate will be high in relation
to incidence.
If shorter the duration of illness the disease is acute and
of short duration (either because of rapid recovery or death)
the prevalence will be relatively low as compared to incidence.
Decrease in prevalence may take place not only from a
decrease in incidence but also from a decrease in duration of
illness either more rapid recovery or more rapid death.
Epidemiological Studies
Epidemiological studies can be classified as observational
studies and experimental studies:
Observational studies were further divided into
Descriptive studies and Analytical studies. While
Experimental studies were divided into Randomized
controlled trials, Field trials and Community trials.
Observational Studies
In observational studies the allocation or assignment of factors
is not under control of investigator. In an observational study,
the combination are self selected or are experiments of
nature. Observational studies provide a weaker empirical
evidence because of the potential of large confounding biases
to be present where there is an unknown association between
a factor and outcome.
The greatest value of these type of studies is that they
provide preliminary evidence that can be used as the basis
for hypothesis in stronger experimental studies.
Descriptive studies: The objective of descriptive studies
is to describe the distribution of variables in a group. Statistics
serve only to describe the precision of those measurements or
to make statistical inferences about the values in the

174 Medical Statistics and Demography Made Easy

population from which the sample is drawn. Such studies


asked questions about:
(a) When the disease occurring-time distribution.
(b) Where it is occurring-place distribution.
(c) Who is getting the disease - person distribution.
Measurement of morbidity in descriptive studies:
Measurement of morbidity has two aspects Incidence and
Prevalence. Incidence can be obtained from longitudinal
studies and prevalence from cross-sectional studies. Beside
case series and case report the descriptive studies may use
cross-sectional and longitudinal studies to obtain estimates
of the health and disease problems of the population.
Case series: A descriptive, observational study of a series
of cases, typically describing the manifestations, clinical
course and prognosis of condition. A case series provides a
weak empirical evidence because of the lack of comparability
unless the findings are dramatically different from
expectations. Case series are best used as a source of
hypothesis for investigation by stronger study design.
Unfortunately, the case series is the most commonly used in
clinical trials.
Case report: A description of a single case, typically
describing the manifestations, clinical course and prognosis
of that case. Due to the wide range of natural biologic variation
in these aspects, a single case report provides little empirical
evidence to the clinicians. They do describe how other
diagnosed and treated the condition and what the clinical
outcome was.
Longitudinal studies (Incidence Study): Longitudinal
studies are those studies in which the observations are
repeated in the same population over a prolonged period of
time by means of follow-up examinations. Longitudinal

Statistical Methods in Epidemiology 175

studies are useful in (a) identifying the risk factors of disease


and (b) for finding out the incidence rate or rate of occurrence
of new cases of the disease in community.
Cross-sectional studies (Prevalence Study): A
descriptive study of the relationship between disease and
other factors at one point of time (usually) in a defined
population. Cross-sectional studies lack any information on
timing of exposure and outcome relationship and include
only prevalent cases. Cross-sectional studies are more useful
for chronic than short-lived diseases. This type of studies
tells about distribution of a disease in a population rather
than its aetiology.
Analytical studies: In analytical studies, the subject of
interest is the individual within the population. The object is
not to formulate but to test hypothesis. Although individuals
are evaluated in analytical studies, the inference is not to the
individual but to the population from which they are selected.
Measurement of morbidity in analytical studies: Analytical
studies comprise two distinct types of observational studies
(a) Cohort study and (b) Case control study studies. From
these studies we can determine (1) whether or not a statistical
association exists between a disease and a suspected factor
and (2) if it exists , the strength of the association.
Cohort study: A prospective, analytical, observational
study, based on data, usually primary, from a follow-up
period of a group in which some have had, have or will have
the exposure of interest and to determine the association
between the exposure and an outcome.
Cohort is defined as a group of people who share a
common characteristic or experience within a defined period.
In a cohort study a population of individuals selected
usually by geographical or occupational criteria rather then

176 Medical Statistics and Demography Made Easy

on medical grounds. The population is classified by the factor


or factors of interest and followed prospectively in time so
that the rates of occurrence of various manifestations of disease
can be observed and related to the classification by aetiological
factors.
Because of their prospective nature, cohort studies are
stronger than case-control studies when well executed but
they are more expensive.
Case control study: A retrospective, analytical,
observational study often based on secondary data in which
the proportion of cases with a potential risk factors are
compared to the proportions of controls (individuals without
the disease) with the same risk factor.
The method is appropriate when the classification by the
disease is simple (i.e. presence or absence of a specific
condition). A further advantage is that, by mean of the
retrospective enquiry, the relevant information can be
obtained comparatively quickly.
A central problem in a case control study is the method
by which the controls are chosen. Ideally, they should be on
average similar to the cases in all respect except in the medical
condition under study and in associated aetiological factors.
These studies are commonly used for initial, inexpensive
evaluation of risk factors with long induction of periods.
Unfortunately, due to the potential for many forms of bias
in this study type, case control studies provide relatively weak
empirical evidence even when properly executed.
Case control studies are often called retrospective studies
while cohort studies are called prospective studies.
Experimental Studies
The hallmark of the experimental study is that the allocation
or assessment of individuals is under control of investigator

Statistical Methods in Epidemiology 177

and thus can be randomized. The key is that the investigator


controls the assignment of the exposure of the treatment but
otherwise symmetry of potential unknown confounders is
maintained through randomization. Properly executed
experimental studies provide the strongest empirical
evidence. The randomization also provides a better
foundation for statistical procedures than do the observational
studies.
The following are some important randomized control
trials:
Randomized controlled clinical trial (RCT): A
prospective, analytical experimental study using primary
data generated in the clinical environment. Individuals similar
at the beginning are randomly allocated two or more
treatment groups and the outcomes the groups compared after
sufficient follow-up time.
Properly executed, the RCT is the strongest evidence of
the clinical efficacy of preventive and therapeutic procedures
in the clinical setting.
Randomized cross-over clinical trial: A prospective,
analytical, experimental study using primary data generated
in the clinical environment. Individuals with a chronic
condition are randomly allocated to one of two treatment
group, and after a sufficient treatment period and often
washout period, are switched to other treatment for the same
period.
In this type of study design each patient serves as his
own control. The patients are randomly assigned to a study
group and control group. The study receives the treatment
under consideration. The control group receive some
alternative form active treatment or placebo. The two groups
are observed over a time. The patients in each group are taken
off their medication or placebo to allow for possible

178 Medical Statistics and Demography Made Easy

elimination of the medication from the body and for the


possibility of any carry out effects. After this period the two
groups are switched. Those who received the treatment under
study are changed to control group therapy or placebo, and
vice versa.
Carry over studies has an advantage that during the
course of investigation, patients will receive the new therapy.
But this design is susceptible to bias if carry over effects of
first treatment occurs.
Randomized controlled laboratory study: A prospective,
analytical, experimental study using primary data generated
in the laboratory environment. Laboratory studies are very
powerful tolls for doing basic research because all extraneous
factors other than those of interest can be controlled or
accounted for (e.g. age, gender, genetics, nutrition,
environment, etc.). However, this control of other factors is
also the weakness of this type of study.
If any interaction occurs between these factors and the
outcome of interest, which is usually the case, the laboratory
results are not directly applicable to clinical setting unless
the impact of these interactions are also investigated.
Bias Occurred in the Studies
Systemic Error
Almost all studies have bias, but to varying degree. Bias can
be reduced only by a proper study design and execution and
not by increasing the sample size( which increases the
precision by reducing the opportunity for a random chance
deviation from the truth). The critical question is whether or
not the results could be due to large part to bias, thus making
the conclusion invalid.

Statistical Methods in Epidemiology 179

Observational study design are inherently more


susceptible to bias than are experimental study design.
Following are some bias which can occur in any study:
Confounding bias: Confounding is the distortion of the
effect of one risk factor by the presence of another.
Confounding occurs when another risk factor for a disease is
also associated with the risk factor being studied but acts
separately. Age, gender, breed are often confounding risk
factors. Confounding can be controlled by restriction, by
matching on the confounding variable.
Systemic error due to the failure to account for the effect
of one or more variables that are related to both the causal
factor being studied and the outcome, and are not distributed
in the same manner between the groups being studied.
Confounding can be accounted for if the confounding
variable are measured and are included in the statistical
model of the cause-effect relationships.
Ecological (Aggregation) bias: Systemic error that occurs
when an association observed between variables representing
group averages is mistakenly taken to represent the actual
association that exists between these variables for individuals.
This bias occurs when the nature of the association at the
individual level is different from the association observed at
the group level.
Measurement bias: Systemic error that occurs because
of the lack of blinding or related reasons such as diagnostic
suspicion, the measurement method (instrument or observer
of instrument) are consistently different between groups in
the study Screening bias is one of the most important
measurement bias.
Screening bias: The bias that occurs when the presence
of a disease is detected earlier during its latent period by

180 Medical Statistics and Demography Made Easy

screening tests but the course of the disease is not be changed


by earlier intervention. Because the survival after screening
detection is longer than survival after detection of clinical
signs, ineffective intervention appears to be effective unless
they are compared appropriately in clinical trials.
Readers bias: Systemic errors of interpretation made
during inference by the users or reader of clinical information.
Such biases are due to clinical experience, tradition, prejudice
and human nature. The human tendency is to aspect
information that supports preconceived opinions and to reject
that which do not support preconceived openions.
Sampling (Selection) bias: Systemic error that occurs
when, because of design and execution errors in sampling,
selection, or allocation methods, the study comparisons are
between groups that differ with respect to the outcome of
interest for reasons other than those under study.
Analysis of Epidemiological Studies
Analysis of Cohort Study
The analysis of epidemiological studies are done and the data
are analyzed in term of:
(a) Incidence rate of outcome among exposed and nonexposed.
(b) Estimation of risk.
(a) Incidence Rates
In cohort study, we can determine incidence directly in those
exposed and those non exposed.
The frame work of the cohort study can be represented as
follows:

Statistical Methods in Epidemiology 181


Cohort

Disease

Total

Positive

Negative

Exposed
Non-exposed

a
c

b
d

(a + b) = H1
(c + d) = H2

Total

(a + c) = V1

(b + d)= V2

Then incidence rates are:


Incidence of exposed
Incidence of non-exposed
(b) Estimation of Risk
The risk of outcome of disease or death in exposed and nonexposed cohort is determined by two indices (a) relative risk
and (b) attributable risk
Relative Risk
Relative risk is the ratio of the incidence of the disease (or
death) among exposed and the incidence among non-exposed.
This may also referred and risk ratio.
Estimation of relative risk is important in aetiological
studies,. It directly measures the strength of the association
between suspected cause of effect.
A relative risk of 1 indicates no association; relative risk
of greater than 1 suggests a positive association between
exposure and disease under study.
The larger the relative risk, the greater the strength of the
association between suspected factor and disease.

182 Medical Statistics and Demography Made Easy

H
Re lative risk (RR) 1
c

H2

Attributable Risk
Attributable risk (AR) is the difference in incidence rates of
disease (or deaths) between exposed group and non-exposed
group. This may also be referred as Risk difference.
Attributable risk are often expressed as percent.

Attributable risk indicates to what extent the disease


under study can be attributed to exposure.
Relative Risk vs Attributable Risk
Relative risk is important in aetiological enquires, larger the
relative risk the stronger the association between cause and
effect.
Attributable risk gives a better idea than relative risk about
the impact of successful preventive or public health
programme.

Statistical Methods in Epidemiology 183

Analysis of Case Control Study


In case control study data are analyzed in terms of:
(a) Exposure rates among cases and controls to suspected
factor
(b) Estimation of disease risk associated with exposure
(Odds ratio).
Exposure Rates
A case control study provides a direct estimation of exposure
rate (frequency of exposure) to a suspected factor is a disease
and non-disease group.
The framework of a case control study in form of 2 2
contingency table.
Factor

Case

Control

Total

Exposed
Non-exposed

a
c

b
d

(a + B) =H1
(c + d) = H2

Total

(a + c) = V1

(b + d)= V2

Exposure rate for cases

a
a c

Exposure rate for control

b
b d

The exposure rate for exposed and non-exposed can be


compared by applying suitable statistical tests (comparing
the proportion of two groups be z-test for proportion or the
association between two groups and factors by Chi-Square
test).

184 Medical Statistics and Demography Made Easy

Estimation of Risk Association with Exposure


A typical case control study does not provide incidence rate
from which a relative risk (RR) can be directly calculated. The
common association measure for a case control study is the
Odds Ratio.
Odds Ratio
Odds ratio is a measure of the strength of association between
risk factor and outcome. Cases must be a representative of
those with disease and control of those without disease.

a
to , these two quantities can be
b
thought of as odd in favour of having the disease.
It is the ratio of

Odds Ratio

Odds ratio is a key parameter in the analysis of case


control study.
Important Features of Relative Risk
(Risk Ratio) and Odds Ratio:
(a) The odds ratio is used in retrospective design called case
control study, while the risk ratio is useful in Cohort
(prospective) study design.
(b) Both the odds ratio and the relative risk compare the
likelihood of an event between two groups. The odds
ratio compares the relative odds of death (disease) in
each group, while the relative risk (risk ratio) compares
the probability of death (disease) in each group rather
than odds.

Statistical Methods in Epidemiology 185

(c) Both the odds ratio and the relative risk are computed by
division and are relative measures.
(d) Both the risk ratio and the odds ratio takes on valuse
between zero (0) and infinity ( ). One is the natural
value means that there is no difference between the
groups compared, close to zero and infinity measures a
large difference. A risk ratio/odds ratio larger than 1
means that the group one has larger proportion than
group two, if the opposite is true the risk ratio/odds
ratio will be smaller than 1. If we swap the two
proportions the risk ratio/odds ratio will take on its
inverse (1/RR; 1/OR).
(e) The odds ratio can be compared with risk ratio. The risk
ratio is easier to inerpret than odds ratio. Howeer, in
practice the odds ratio is used more often. This has to do
with the fact that odds ratio is more closely related to the
frequently used statistical techniques such as logistic
regression.
(f) The risk ratio gives the percentage difference in
classification between group one and group two, while
odds ratio gives the ratio of the odds of suffering some
fate. The odds themselves are also ratio.
(g) Both odds ratio and risk ratio are non negative valuse
and lies between 0 and (0 < OR < ; 0 < RR < ).
(h) The significance of odds ratio can be tested by using
95% confidence interval. If the value 1 is not included
within 93% CI, then odds ratio is significant at 5% level
(p<0.05).
Diagnostic Tests
In epidemiological studies much use is made of diagnostic
test, based either on clinical observations or on laboratory
techniques, by means of which individuals are classified as

186 Medical Statistics and Demography Made Easy

healthy or as falling into one of a number of disease categories.


Such tests are, of course, important throughout the whole
medicine, and in particular from the basis of screening
programme for the early diagnosis of disease.
Most such tests are imperfect instruments, in the sense
that healthy individuals will occasionally be classified
wrongly as being ill, while some individuals who are really
ill may fail to detect. How should we measure the ability of a
particular diagnostic test to give the correct diagnosis both
for healthy and for ill subjects?
Properties of diagnostic tests have traditionally been
described using sensitivity, specificity, positive and negative
predictive values. These measures, however, reflect population
characteristics and do not easily translate to individual
patients.
In clinical practice, physician are often faced with
interpreting the results of diagnostic tests. These results are
not absolute. A negative test does not always rule out disease
and some positive results can be false.
Clinical epidemiology has long focused on sensitivity
and specificity, as well as positive and negative predictive
values, as a way of measuring diagnostic utility. The test is
compared against a reference (gold) standard, and the results
are tabulated in a 2 2 contingency table.
The gold standard is a test that is considered to be the
most accurate among all known tests. All the other should be
compared with this test, in order to indicate whether they are
reliable, so that less accurate tests are not preferred.
Sensitivity: Sensitivity is the proportion of those with the
disease who test positive. Sensitivity is a measure of how
well the test detects disease when it is really there; a sensitive
test has few false negative.

Statistical Methods in Epidemiology 187

Specificity: Specificity is the proportion of those without


disease who test negative. It measures how well the test rules
out disease when it is really absent; a specific test has few
false positive.
Predictive values: Considering sensitivity and specificity
we can choose what is necessary or helpful, but the most
important is predictive value. Results of a test can be positive
or negative.
In case the test is positive or abnormal, it is necessary to
know some important information about the disease. The
positive predictive value express how many times the positive
results of the test really represents disease. The positive
predictive value expresses the proportion of those with positive
test results who truly have disease.
On the other hand, negative predictive value is the
probability of a negative result really correlates to a disease
free person.
Thus we can summarize these diagnostic tests as:
Sensitivity: is disease focusedi.e. the percentage of people
with the disease that the test correctly identifies.
Specificity: is wellbeing or normal focusedi.e. the
percentage of normal people the test correctly identifies as
normal.
Positive predictive value: focuses of the positive resultsi.e.
the percentage of positive results that are correct.
Negative predictive value: focuses on the negative results
i.e. the percentage of negative results that are correct.
Early Diagnostic and Screening Test
Defining normality and abnormality:
One of the central concerns in clinical medicine is
differentiating the normal from the abnormal. How does one,

188 Medical Statistics and Demography Made Easy

for instance, decide that somebody has hypertension? This


will not a big problem if the frequency distribution of BP in
hypertensive people and non hypertensive people completely
different and did not overlap. In reality, they overlap (Figure
10.2) and no matter which cut-off point is used for diagnosis,
some hypertensive will be wrongly labeled as normotensive,
while some normotensive will be diagnosed as hypertensive.
If the cut-off point is moved to the left, the number of false
negative will decrease at the expense of more false positive. If
the cut-off point is shifted to the right the reverse will happen.

Figure 10.2

An ideal test will completely separate the diseased and


the disease-free groups and there would be no overlap (Fig.
10.3) Such ideal test are very rare. Overlap is almost seen and
this makes it difficult to validate tests.

Statistical Methods in Epidemiology 189

A test with complete separation of groups results


in a perfect diagnostic performance

A test with partial separation of groups results


in a intermediate diagnostic performance

A test with no separation of groups results in no diagnostic information

Figure 10.3

190 Medical Statistics and Demography Made Easy

Validity of Test
A diagnostic test is valid if it detects most people with the
target disorder and excludes most people without disorder,
and if a positive test usually indicates that the disorder is
present. To understand this, we need to understand the need
to validate tests against a gold standard.
Using a 2 2 table, we could compute the sensitivity,
specificity, positive predictive value, and the negative
predictive value of the test.
It is important that all new tests should be validated by
comparison against a test which is established and considered
a gold standard. Diagnostic test are generally not 100%
accurate. If the sensitivity is very high, the specificity tends to
be low.
Suppose the data be classified as:
Gold standard*

Test
Result

Total

Positive

Negative

a
c

b
d

a+b
c+d

Total

a+c

b+d

[* By Gold standard we can classify the individual as


presence/ absence of a particular disease]
a= True positive

b = False positive

c = False negative

d = True negative

Sensitivity = The proportion of person with the condition who


test positive.
=

Statistical Methods in Epidemiology 191

Specificity = The proportion of persons with out the condition


who test negative.
=
Positive predictive value: The proportion of person with
a positive test who have the condition.
=

a
a b

Negative predictive value: The proportion of person with


a negative test, who do not have the condition.
=
Diagnostic accuracy: The following condition given the
diagnostic accuracy of the test
=

a d
a b c d

Prevalence: Prevalence of the disease is the total positive


cases by gold standard to total cases.
=
Predictive Value in Relation to Prevalence
Positive predictive value (PPV) is a function of specificity,
sensitivity and prevalence.

192 Medical Statistics and Demography Made Easy

The positive predictive value is expressed as percentage.


It is influenced by the sensitivity, specificity of the screening
test and the prevalence of disease.
SENSITIVITY AND SPECIFICITY IN TERMS OF
TYPE-I AND TYPEII ERRORS
Table related to decision and hypothesis (Types of error).
Decision from sample

True statement

Accept H0

Reject H0

H0 True
H0 False

1 = power
(Type-II error) =

Total

Table related to diagnostic test:


Gold Standard*

Test
Result

Total

Positive

Negative

a
c

b
d

a+b
c+d

Total

a+c

b+d

From the above tables we can see that (Type-II error) is


false negative and the type-I error is false positive. (1 ) the
power of test is true positive and (1 ) is true negative.
a
As we know that sensitivity of a test is a c therefore,

which is power of test is equal to sensitivity,


similarly

is equal to specificity.

Statistical Methods in Epidemiology 193

Thus we see that there is an analogy here with


significance test. If the null hypothesis is that an individual
is a true positive and a negative test is regarded as significant.
The is analogous to significance level and 1 is analogous
as power of test, the alternative hypothesis is that individual
is true negative.
Likelihood Ratio
A fairly new concept in diagnostic tests is the concept of
likelihood ratios. Likelihood ratios are more practical way of
making sense of diagnostic test result and have immediate
clinical relevance. In general a useful test provides high
positive likelihood ratio and a small negative likelihood ratio.
Likelihood ratios are independent of disease prevalence.
They may be understood using the following analogy. Assume
that the patient test positive on diagnostic test; if this were a
perfect test, it would mean that the patient would certainly
have a disease (true positive). The only thing that stops us
from making this conclusion is that some patients without
disease also test positive (false negative). We therefore have
to correct the true positive (TP) rate by the false positive (FP)
rate, this is done mathematically by dividing one by the other.
Pr obability of positive test
in those with disease
Positive likelihood ratio
Pr obability of positive
test without disease

TP rate
FP rate

194 Medical Statistics and Demography Made Easy

a c

b d

Likewise, if a patient test negative, we are still worried


about the likelihood of this being a false negative (FN) rather
than a true negative (TN). This likelihood is given
mathematically by the probability of a negative test in those
with diseases, compared to the probability of a negative test
in those without disease.
Probability of negative
test in those with disease
Negative likelihood ratio
Pr obability of negative
test without disease

FN rate
TN rate

a c

d

b d
Likelihood ratios have number of useful properties:
1. Because they are based on a ratio of sensitivity and
specificity, they do not vary in different populations or
setting.

Statistical Methods in Epidemiology 195

2. They can be used directly at the individual patient level.


3. They allow the clinician to quantitate the probability of
disease for any individual patient.
The interpretation of likelihood ratios is intuitive: The
larger the positive likelihood ratio, the greater the likelihood
of disease; the smaller the negative likelihood ratio, the lesser
the likelihood of disease.
For example: A 50-year-old male with the positive stress
test. It is known that a more than 1 mm depression of exercise
stress testing have a sensitivity and specificity of 65% and
89% respectively for coronary artery disease when compared
with reference standard of angiography [Ref: (Diamond GA et
al Analysis of probability as an aid in the clinical diagnosis of
coronary-artery disease. N Eng J Med 1979; 300: 1350-8)].
This means that positive likelihood ratio

0.65
5.9
1 0.89

Thus we can say that the likelihood of this patient having


a disease has increased by approximately six-fold given the
positive test result.
Thus we can say that the likelihood ratios are useful and
practical way of expressing the power of diagnostic tests in
increasing and decreasing the likelihood of disease.
Unlike sensitivity and specificity, which are the
population characteristics, likelihood ratios can be used at
the individual patient level.
MULTIPLE CHOICE QUESTIONS
1. Prevalence of disease affects:
(a) Sensitivity
(b) Specificity
(c) Predictive value
(d) Repeatability

(AI, 92)

196 Medical Statistics and Demography Made Easy

2. Sensitivity of a test:
(a) True positive/True positive + False negative
(b) True negative/True negative + False positive
(c) False negative/True negative + True positive
(d) False negative/True positive + False negative
(AI, 92, 93, 97)
3. Which of the following is not true for case control study.
(a) Easy to carry out
(b) Inexpensive
(c) Attributable risk can be measured
(d) No attrition problem
(AI, 94)
4. All is true about prevalence except:
(a) Rate
(b) Specifically for old and new cases
(c) prevalence = incidence duration
(d) Prevalence is of two types

(AI, 96)

5. Case control study provides all except:


(a) Incidence
(b) Relative risk
(c) Odds ratio
(d) Strength of association
(AI, 97)
6. True about prevalence all except:
(a) Rate
(b) Ratio
(c) Duration of disease affects it
(d) Numerator and denominator are separate (AI,98)
7. Incidence rate is measured by:
(a) Case control study
(b) Cohort study
(c) Cross-sectional study (d) Cross over study(AI, 98)
8. Predictive value for positive test is defined as :
(a) True positive/true positive + False negative 100
(b) True positive/True positive + False positive 100

Statistical Methods in Epidemiology 197

(c) False positive/True positive + False positive 100


(d) False positive/ True positive + False negative 100
(AI, 99)
9. Specificity of a test means all except:
(a) Identify those without disease
(b) True positive
(c) True negative
(d) An ideal screening test should have 100% specificity
(AI, 2000)
10. ELISA test for HIV was done in a population. What
will be the result of performing double screening
ELISA test:
(a) Increased sensitivity and positive predictive value
(b) Increased sensitivity and negative predictive value
(c) Increased specificity and positive predictive value
(d) Increased specificity and negative predicted value
(AI, 2001)
[Hint: By performing double screening, the true positive will
increase and the value of false negative will decrease]
11. Incidence is calculated by:
(a) Retrospective study (b) Prospective study
(c) Cross-sectional study (d) Random study
(AIIMS, May 95)
12. Prevalence is a:
(a) Rate
(c) Proportion

(b) Ratio
(d) Mean

(AIIMS, Feb 97)

13. Incidence of disease among exposed minus that of nonexposed is equal to:
(a) Relative risk
(b) Attributable risk
(c) Odds ratio
(d) None of the above
(AIIMS, June 97)

198 Medical Statistics and Demography Made Easy

14. Specificity is related to:


(a) True positive
(c) False positive

(b) True negative


(d) False negative
(AIIMS, Dec 97)

15. ELISA test has sensitivity of 95% and specificity of 95%.


Prevalence of HIV carriers is 5%. The predictive value
of positive test is:
(a) 95%
(b) 50%
(c) 100%
(d) 75%
[Solution: The Positive predictive value is given by
PPV

Prevalence sensitivity
Prevalence sensitivity (1 Prevalence) (1 specficity)

0.05 .95
0.05 0.95 (1 0.05) (1 0.95)

0.05 .95
1
0.5
0.05 0.95 (1 1) 2

and is expressed in percentage = 50%


(AIIMS, June 99)
16. All of the following are true about case control study
except:
(a) Relatively cheap
(b) Relative risk can be calculated
(c) Used for rare cases
(d) Odds ratio can be calculated
(AIIMS,June 2000, AI 2002))
17. Which of the following are best for calculating the
incidence of a disease:
(a) Case control
(b) Cohort
(c) Cross-sectional study (d) Longitudinal study
(AIIMS,Nov 2000)

Statistical Methods in Epidemiology 199

18. Too much false positive in a test is due to which of the


following:
(a) High prevalence
(b) Test with high specificity
(c) Test with high sensitivity
(d) High incidence
(AIIMS, Nov 2000)
19. In a community, the specificity of ELISA test is 99%
and sensitivity is 99%. The prevalence of the disease is
5/1000. Then positive predictive value of the test is:
(a) 33%
(b) 67%
(c) 75%
(d) 99%
[Solution: The Positive predictive value is given by
PPV

Prevalence sensitivity
Prevalence sensitivity (1 Prevalence) (1 specficity)

Prevalence

5
0.005, specificity
1000

0.005 0.99
0.005 0.99 (1 0.005) (1 0.99)

0.005 0.99
0.005 0.99 (0.995)(0.01)

0.005 0.99
0.99 (0.005 0.01)

(take 0.995 0.99)

0.005
0.015
= approximately 0.33 and is expressed as
percentage
= 33%]
(AIIMS, May 2001)

200 Medical Statistics and Demography Made Easy

20. In a village of 1 lakh population, among 20,000 exposed


to smoking 200 developed cancer, and among 40,000
people unexposed 40 developed cancer. The relative
risk of smoking in the development of cancer is:
(a) 20
(b) 10
(c) 5
(d) 15
[Hint: Incidence of smokers =

200
;
20, 000

Incidence of non-smokers =
Relative Risk =

]
(AIIMS, May 2001)

21. A women exposed to multiple sex partners has 5 times


increased risk for CaCx. The attributable risk is:
(a) 20%
(b) 50%
(c) 80%
(d) 100%
[Solution: Let incidence rate among non-exposed is x, then
incidence rate among exposed is 5 times higher therefore the
incidence rate among exposed is 5x.
According to definition of attributable risk
AR =
And expressed in percentage = 80%]

(AIIMS,Nov 2001)

22. True about case control study All except:


(a) Less expensive
(b) Those with disease and not diseased compared

Statistical Methods in Epidemiology 201

(c) Attributed risk is estimated


(d) None of these

AIIMS,Nov 2001)

23. Which of the following is true about cohort study:


(a) Incidence can be calculated
(b) It is from effect to cause
(c) It is inexpensive
(d) Shorter time than case control
(JIPMER,2003)
24. For the calculation of positive predictive value of a
screening test, the denominator is comprised of:
(a) True positives +False negatives
(b) False positives + True negatives
(c) True positives + False positives
(d) True positives + True negatives
(AI, 2003)
25. The table below shows the screening test results of
disease Z in relation to the true disease status of the
population being tested:
Screening
test results

Yes

Disease

Total
No

Positive
negative

400
100

200
600

600
700

Total

500

800

1300

The specificity of the screening test is:


(a) 70%
(b) 75%
(c) 79%
(d) 86%
26. If prevalence of diabetes is 10%, the probability that
three people selected at random from the population
will have diabetes is:

202 Medical Statistics and Demography Made Easy

(a) 0.01
(b) 0.03
(c) 0.001
(d) 0.003
[Hint: There are two rules of probability, the addition law and
the multiplication law.
1
= 0.1
10
The probability of all 3 having diabetes can be calculated using
the multiplication law of probability. It will be

Probability of one person having diabetes is p =

p p p = 0.10.10.1 = 0.001 ]

27. The usefulness of a screening test depends upon its:


(a) Sensitivity
(b) Specificity
(c) Reliability
(d) Predictive value
(AI, 2002)
28. In a low prevalence area for Hepatitis B, a double ELISA
test was decided to be performed in place of a single
test which used to be done. This would cause an
increase in the:
(a) Specificity and positive predictive value
(b) Sensitivity and positive predictive value
(c) Sensitivity and negative predictive value
(d) Specificity and negative predictive value (AI, 2002)
29. The association between coronary artery disease and
smoking was found to be as follows.

Smokers
Non-smokers

Coronary art dis

No. coronary art dis

30
20

20
30

Statistical Methods in Epidemiology 203

The Odds ratio can be estimated as


(a) 0.65
(b) 0.8
(c) 1.3
(d) 2.25
30 30
= 2.25 ]
[Hint: Odds ratio =
20 20

(AI,

2002)
30. A screening test is used in the same way in two similar
populations; but the proportion of false positive results
among those who test positive in population A is lower
than those who test positive in population B. What is
the likely explanation?
(a) The specificity of the test is lower in population A
(b) The prevalence of the disease is lower in population
A
(c) The prevalence of the disease is higher in population
A
(d) The specificity of the test is higher in population A
[Hint: When false positive result in population A is less than
that of B. Then PPV of population A is higher than that of B,
thus by the formula the prevalence of population A is higher
than that of B]
(AIIMS, 2003)
31. Residence of three village with three different types of
water supply were asked to participate in a study to
identify cholera carries. Because several cholera deaths
had occurred in the recent past, virtually everyone
present at the time submitted to examination. The
proportion of residents in each village who were carries
was computed and compared. This study is a :
(a) Cross- sectional study.
(b) Case-control study.

204 Medical Statistics and Demography Made Easy

(c) Concurrent cohort study.


(d) Non-concurrent.

(AIIMS, 2003)

32. A drug company is developing a new pregnancy-test


kit for use on an outpatient basis. The company used
the pregnancy test on 100 women who are known to be
pregnant. Out of 100 women, 99 showed positive test.
Upon using the same test on 100 non-pregnant women,
90 showed negative result. What is the sensitivity of
the test ?
(a) 90%
(b) 99%
(c) Average of 90 and 99
(d) Cannot be calculated from the given data
[Hint:
Pregnant

Non-pregnant

Total

Test positive
Test negative

99
1

10
90

109
91

Total

100

100

200

Sensitivity =

99
= 0.99 (expressed in percentage = 99%)]
100
(AIIMS, 2003)

33. Which of the following relationship between different


parameters of a performance of a test is correct:
(a) Sensitivity = 1 specificity
(b) Positive predictive value = 1 negative predictive
value
(c) Sensitivity is inversely proportional to specificity
(d) Sensitivity = 1 positive predictive value
[Hint: Both sensitivity and specificity can not be increase
simultaneously. If one increase then other will decrease]
(AIIMS, 2004)

Statistical Methods in Epidemiology 205

34. Which of the following is not an advantage of a


prospective cohort study:
(a) Precise measurement of exposure is possible
(b) Many disease outcomes can be studies
simultaneously
(c) It usually cost less than a case control study
(d) Recall bias is minimized compared with a case
control study
35. The incidence rate of a disease is five times greater in
women than in men, but the prevalence rate shows no
sex difference. The best explanation is that:
(a) The crude death rate (by all causes) is greater in
women
(b) The case fatality rate for this disease is lower in
women
(c) The case-fatality rate is greater in women
(d) Risk factors for the disease are more common in
women
36. In a study of a disease in which all cases that developed
were ascertained, if the relative risk for the association
between factor and disease is equal to or less than 1
then:
(a) The factors protect against the development of the
disease
(b) There is either no association or a negative
association between the factors and disease
(c) Either matching is not done properly
(d) There is a significant positive association between
the diseases
[Hint: The risk ratio 1 indicate that there is no difference
between two groups, and the range of Risk Ratio lies between
(0<RR< ), thus negative values of RR indicate negative
association].

206 Medical Statistics and Demography Made Easy

37. A new screening programme was instituted in India.


The programme used a screening test that is effective
in detecting AIDS at early stage. Assume that there is
no effective treatment of AIDS, therefore, that
programme results no change in the usual course of
AIDS. Assume also that the rates noted are calculated
from all cases of AIDS and that there were no changes
in the quality of death certification of this disease.
Identify the true statement in regards to AIDS in the
country during the first year of this programme:
(a) Both incidence and prevalence will increase
(b) Incidence will increase and prevalence will decrease
(c) Incidence will decrease and prevalence will increase
(d) Both incidence and prevalence will remain same
[Hint: Prevalence = Incidence Duration as the screening
programme will detect early cases of AIDS thus incidence will
increase, and as there is no effective treatment of AIDS the
duration of illness and pattern of mortality will remain same
thus, due to increase in incidence, the prevalence will also
increase].
38. A recently discovered treatment for acute leukemia
extends the lifespan but does not prevent the disease
and does not lead to its cure. In this scenario, which of
the following statements, about acute leukemia is true:
(a) Incidence will increase
(b)
Prevalence will
increase
(c) Incidence will decrease
(d)
Prevalence will
decrease
39. The diagram shows the finding of a test X for screening
of Hypertensive patients. After screening the BP was
recorded in each group and frequency distribution
reading of Blood Pressure of Normal population and
cases with hypertension is shown below:

Statistical Methods in Epidemiology 207

Figure

Which of the following statement about establishing


reference interval for this test and interpreting then
is true.
(a) If the reference interval is 90-115 mmHg, the test has
100% specificity
(b) In the reference interval is 90-115 mmHg a patient
with a test result of 117 must have hypertension
(c) If the reference interval is 90-120 mmHg, the test has
100% sensitivity.
(d) Of the reference interval 90-120 mmHg, a result of
117 mmHg may represent a true negative (TN) or a
false negative (FN).
[Hint: A test has 100% specificity if there is no False positive
and a test will be 100% sensitive if there is no false negative.
Therefore, the choice (a) and (c) will not possible, because the
intersection of the curve will include False positive as well as
False negative cases. The patient with BP 117 mmHg will lie
in the interval 115 to 120 this interval will include false
negative as well as false positive cases. (Ref: Fig. 10.2)]

Chapter 11

Vital Statistics
(Demography)

210 Medical Statistics and Demography Made Easy

Demography is the scientific study of human population.


Human population has following aspects:
1. Study of the composition of population at a point of time.
2. Study of the change that occurs during a given period,
i.e. growth and decline of population.
3. The distribution of population in space
Change in population is the outcome of events like, birth,
deaths, migration, marriages, divorces, etc. are called vital
events.
The main source of demographic statistics in India are:
(1) Population census.
(2) National Sample Surveys.
(3) Registration of vital events.
The distinction between Census and Registration of
vital events; the methods of collecting demographic data is
that the former is a record of persons while the latter is a record of
events.
DEMOGRAPHIC CYCLE
A Nation passes through five stages of demographic cycle
These are:
First stage (High stationary): This stage is characterized
by a high birth rate and a high death rate.
Second stage (Early expanding): The death rate begins
to decline but the birth rate remains unchanged, i.e. high birth
rate.
Third stage (Late expanding): The death rate decline
further and the birth rate tends to fall.
Fourth stage (low stationary): Low death rate and low
birth rate with the result the population become stationary.
Fifth stage (Declining): Birth rate is lower than death
rate.

Vital Statistics (Demography) 211

DEMOGRAPHIC TRENDS IN INDIA


In India the demographic transition is such that from a high
birth rate-high death rate country of the pre-independence era,
India had reached the stage of high birth rate-low death rate in
early fifties. Now the transition of low birth rate-low death rate
has started at very slow pace.
Age and Sex Composition
Age and sex are distributive characteristics of the population
which defines some basic potential activities of the social
order since age and sex define behaviour of a society. The age
and sex composition of a population is represented by
population pyramid (age pyramid).
Age pyramid: The age (population pyramid) is a
pictorial/graphic presentation of a population by age and
sex categories. First, we obtain data on a population
distribution by age and sex. These data are usually found in
census report of a population. Second, calculate the total
population contained in each age and sex grouping. The sum
of all percentages should be expressed as 100%. The sum of
all age/sex group frequencies should be equal the population
size.
To construct a age pyramid use standard graph paper.
On the horizontal axis calibrate the percentage at 0% at the
intercept of the horizontal and vertical axis. Let the male
percentages be expressed to the left; and female percentages
to the right. Start with age group 0-4 at the bottom of the graph
plotting each age group, building upwards until reach the
oldest group. The population pyramid shows the age-sex
components of a give population for a given point of time.

212 Medical Statistics and Demography Made Easy

Figure 11.1: Age pyramid of under-developed countries

Age pyramid assume a fixed shape when birth and


deaths rates are constant and no migration occurs over time.
This constancy is the proportionate age-sex distribution over
time is called stable population.
A stationary population occurs when the births equals
deaths, no migration occurs, age-sex specific death rates and
age specific birth rates remain constant over time.
Sex Ratio
Sex ratio is defined as the number of females per 1000 males.
This is one of the basic demographic characteristic of the
population.
The sex composition of the population is affected by the
differentials in mortality conditions of males and females,
and sex ratio at birth.
Dependency Ratio
The proportion of persons above 65 years of age and children
below 15 years of age are considered dependent.

Vital Statistics (Demography) 213

The ratio of the combined age groups 0-14 and plus 65


and more and the 15-65 year age group is referred as the total
dependency ratio.
The dependency ratio can be further subdivided into
young age dependency ratio (0-14 years) and old age
dependency ratio (65 years and more).
Density of Population
In India density is defined as the number of persons, living
per square kilometer.
Family Size
In demography family size means total number of children a
women borne at a point of time. The complete family size
indicates the total number of children borne by a women
during her child-bearing age.
Total fertility rate gives the approximate magnitude of
complete family size.
Life Expectancy
Life expectancy at a given age is the average number of years
which a person of that age may expect to live, according to
mortality pattern prevalent in that country.
IMPORTANT DEMOGRAPHIC INDICES
The population of a given geographic area at any point of
time may be expressed as:
P(t) = P(0) + B(t) D(t) + I(t) E(t)
Where P(t) represents total population at a given point of
time.
P(0) total population at a point of time taken as base.
B(t) total number of births during the given period.

214 Medical Statistics and Demography Made Easy

D(t) total number of deaths during the given period.


I(t) total number of people immigrant.
E(t) total number of emigrants.
Deaths and births are ordinarily the chief determinants
of the change in population.
MEASURES OF MORTALITY
Crude Death Rate (CDR)
It represents deaths per thousand of the population. The
formula for calculating this rate is:
No. of deaths in a population
during a given year
Crude death rate
1000
Mid year total population
during the given year

Crude death rate is widely used as an index of mortality.


It can be easily adopted for making comparison for the same
area from year- to-year.
Limitations of CDR: Mortality normally varies with age.
If the age structure of two population are different then
comparison of crude death rate may be misleading.
Age Specific Death Rate
CDR does not give an exact idea about the death rate in a
particular section of population.
Mortality varies with age. Therefore, age specific death
rates is a sound basis to study the death rates in various agegroups of population.

Vital Statistics (Demography) 215

Infant Mortality Rate (IMR)


It is the number of infant deaths under one year of age per
1000 live births in any population in one year.
Total number of deaths under
under one year of age
which occurred among the
Infant
population in a given year
1000
mortality
Total number of live births
rate
which occurred among the
population during the
same year

Infant mortality is the largest single age-category of


mortality. Deaths at this age are due to a peculiar set of disease
and conditions to which the adult population is less exposed.
Limitations:
(1) Babies who die soon after their births may not be
registered at all, as birth or deaths.
Neonatal mortality rate: Deaths occurring within 4 weeks or
28 days of births are called neonatal deaths.
Number of deaths under 28 days
occurred among the population
in a given year
Neonatal
mortality Total number of live births which 1000
rate
occurred among the population
during the same year

The neonatal mortality is directly related to the birth


weight and gestational age.
Post-natal mortality rate: It is the death rate of infants
dying from 28 days to under one year of age.

216 Medical Statistics and Demography Made Easy

Number of deaths of infants 28 days


to one year of age occurred among
the population in a given year
Post-natal
=
1000
Total number of live births which
mortality rate
occurred among the population
during the same year
Post-natal mortality increases steadily with birth order,
and that infants born into already large families run higher
risk of death from infectious diseases.
Peri-natal mortality rate: It is the mortality occurring
during the period from 28 weeks of pregnancy to under 7
days of post-natal life per 1000 live births.

According to the revised definition by WHO, 1970. The


perinatal mortality should be calculated on the basis of
following formula.
Late foetal deaths
(28 weeks or more
+ 1 week weighing over
1000 gm at birth
Peri-natal
=
1000
Total number of live births
mortaliy rate
weighing over 1000

According to the eighth revision of the International


Classification of Disease (ICD) the peri-natal period lasting
from the 28 weeks of gestation to the seventh day after birth.
But the ninth revision (1975) of ICD stated that the peri-natal
period is:

Vital Statistics (Demography) 217

(1) Babies should above 1000gm at birth.


(2) If birth weight is not available, a gestational period of at
least 28 weeks should be used.
(3) When both the above conditions (1) and (2) are not
available, body length (crown to heal) of at least 35 cm
should be used.
Still birth rate: Still birth is foetal death occurring after 28
complete week of gestation or over (This is equivalent to foetus
weighing 1000gm).
No. of foetal deaths weighing
over 1000gm at birth
Still birth rate
1000
Total live birth still birth
weighing over 1000gm at birth

Case Fatality Rate


Case fatality rate represent the killing power of a disease. It is
simply the ratio of deaths to cases.
Total number of deaths
due to a particular disease
Case fatility rate
100
Total number of cases
due to the same disease

Proportional Mortality Rate (Ratio)


Proportional mortality rate expresses the number of deaths
due to a particular cause (or in specific age group) per 100 (or
1000) total deaths.
Proportional mortality rate from a specific disease: The
proportional mortality rate for a specific disease is based on
the number of deaths from a specific disease to total deaths
from all causes:

218 Medical Statistics and Demography Made Easy

Number of deaths from


Proportional mortality
a specific disease in a year
Rate from specific
=
100
Total number deaths from
disease
all causes in that year
Proportional mortality rate under 5 years of age: This is
represented as:
Number of deaths
under 5 years of
Proportional mortality
age in a year
=
100
Rate under 5 years
Total number deaths
in that year
Proportional mortality rate for aged 50 years and above:
The proportional mortality rate 50 years and more is
represented as:
No. of deaths of person aged
Proportional mortality
50 years and above in a year
Rate for 50 years and
=
100
Total number deaths of all
above
age group in that year

Proportional rates are useful indicators within any


population group of the relative importance for the specific
disease or disease group, as a cause of death.
Standardized Death Rate
Crude death rates cannot be used for comparison of two
population, because two population may have different
composition as regard to age and sex.
Age specific and sex specific death rates give a large bulk
of data which does not facilitate comparison.

Vital Statistics (Demography) 219

For purpose of comparing the deaths rates in two


geographical areas, it is essential that the age and sex
differences in the composition of two population should be
eliminated.
There are two methods of doing it:
(1) Direct and
(2) Indirect.
Direct Standardization
Under the method the mortality rates in each age group in
two geographical areas are applied to common standard
population.
These total rates called standardized rates, shows what
be the mortality in each one of two areas if they had population
which were similar in their age and sex distribution.
Calculation of crude death rate:
Age group

District A

District B

Pop.

Deaths

Rate/1000

Pop.

0 - 10
10 - 25
25 - 60
60+

4,000
12,000
6,000
8,000

36
48
66
158

9
4
11
19.5

3000
20,000
4,000
3,000

30
100
48
60

10
5
12
20

Total

30,000

308

10.26

30,000

238

7.93

Crude death rate for district A:

Deaths Rate/1000

308
1000 10.26 per 1000
30, 000

Crude death rate for district B:

238
1000 7.93 per 1000
30, 000

220 Medical Statistics and Demography Made Easy

Thus, we can say that the crude death rate of population


B is less than that of population of district A.
Calculation of standardized death rate
Direct Method
Let there be a standard population and we make use of this
standard population to calculate the standardized death rate
for comparing the population of two district A and B.
Standard population

District A

Age
group

Population

Mortality
rate/1000

District B

Total
deaths

Mortality
rate/1000

Total
deaths

0 10

1,000

10

10

10 25

4,000

16

20

25 60

3,000

11

33

12

36

60+

2,000

19.75

39.50

20

40

Total

10,000

97.50

Standardized death rate of district A:


per 1,000
Standardized death rate of district B:

106

97.50
1000 9.75
10, 000
106
1000 10.6
10, 000

per 1000.
Thus, when the mortality rate of the district A and B were
applied to a common standard population then we observed
that the mortality rate of district B is higher than district A,
while the crude death rate of district B is less than district A

Vital Statistics (Demography) 221

Thus, we can say that the crude death rate gives a


misleading picture when the composition of the two
populations is different.
Indirect Standardization
Direct standardization has the drawback that it depends on
the composition of standard population and secondly it
requires the age and sex specific death rates which may not
always available.
In indirect method we require:
1. The crude death rate of the population in question.
2. The age and sex distribution of the population which
can be obtained from the census report.
3. The specific death rate of some other place in the same
country.
Procedure:
1. These specific death rates are applied to age and sex
group of the place in question. The death rate so
calculated is called Index Death Rate.
2. The Crude Death Rate of the standard population is
divided by the Index Death Rate which gives the
Standardizing Factor.
3. The Crude Death Rate of the place in question is
multiplied by standardizing factor gives the
Standardized Death Rate

222 Medical Statistics and Demography Made Easy

Solved Example:
Calculation of standardized death rate (indirect method)
Standard population

District A

District B

Age

Standardized
mortality
rate (/1000)

Population

<2
210
1020
2060
60+

64
7
4
8
60

3,000
10,000
10,000
32,000
9,000

192
70
40
260
540

5,000
12,000
10,000
25,000
8,000

320
84
40
200
480

64,000

1072

60,000

1124

Total

No of deaths Population No of deaths


would occur at
would occur at
standardized
standardized
rate
rate

CDR of standard population = 15*


(*obtained from census report)
Index death rate for district A

1072
1000 16.75
64, 000

Index death rate for district B 1124 1000 18.73


60, 000
Standardizing factor for district A

15
0.896
16.75

15
0.801
18.73
If the crude death rates of district A is 20 and district B is 22.
Then,
Standardized death rate of district A is 20 0.896 = 17.92
Standardized death rate for district B is 22 0.801 = 17.62
Conclusion:
District B is healthier than district A.

Standardizing factor for district B

Vital Statistics (Demography) 223

Maternal Mortality Rate


Maternal mortality rates measures the risk of dying from
causes associated with child-birth in the various age groups
in the reproductive span of life 15-49 years of age. At these
ages, death rate among women is generally higher that those
of men. The formula is:
Total number of deaths due to
child birth among the female
population of a given geographical
area during a given year
1000
Maternal mortality =
Total number of live births which
rate
occurred among the population of
the given geographical area during
the same year

Ideally denominator should include all deliveries and


abortions.
The maternal mortality rate should be expressed as rate
1000 live births. In developing countries, MMR has declined
significantly, because of this they used multiplying factor
100,000 instead of 1000.
Measurement methods: The maternal mortality ratio is
expressed as follows:

Maternal deaths (direct and indirect) K


Live-births
k = 1,000, 10,000, or 100,000
MEASURES OF FERTILITY
The growth of the population of a given area depends on the
number of live births that occur.
The frequency or speed by which population is increased
is calculated by fertility rates.
Some important fertility rates are:

224 Medical Statistics and Demography Made Easy

Crude Birth Rate


It gives the average number of births per 1000 person in the
population of a given area during a given period of time.
Total number of live births which
occurred among the population
of a given area during a given year
Crude Birth Rate =
1000
Mid year population of the
given area during the same period

Limitations:
1. Like crude death Rate it is also affected by several factors
like age, and sex structure of population.
2. Crude birth rate (CBR) related to total number of live
birth to mid year population. But in fact total number of
live births depends upon the population of women of child
bearing age.
General Fertility Rate
General fertility rate (GFR) relates the number of live births to
total female population of child baring age.

Total number of live births in a given


population in a particular year
1000
General Fertility =
Total number of females in
Rate
reproductive span of life only
15-49 years at the mid year.
General fertility rate (GFR) gives a general view of the
fertility rate of the child bearing age group as whole.
Age Specific Fertility Rate
Age specific fertility rate gives fertility rate for different childbearing age groups.

Vital Statistics (Demography) 225

No. of live birth which occurred


to females of a specified age
group of the population of a
given area during a given period
Age Specific
1000
=
Fertility Rate Mid year female population
of specified age-group in the
given area during the same period

Age specific fertility rate afford a detailed analysis of


fertility in a given population of a given period.
Total Fertility Rate
Total fertility rate is the sum of age specific fertility rate at
each age group interval from 15-49 years of age.
This rate indicate that how many children will be born
per thousand women (approximate magnitude of complete family
size), arriving at child bearing age provided none of these
women dies before having passed the child bearing age.
Gross Reproduction Rate
Gross reproduction rate (GRR) is the sum of age specific
fertility rate calculated from female birth for each single year
of child bearing age.
Number of female live births
to specified age group of mother
Age specific
of a given area during a given year
reproduction =
1000
Total mid year female population
rate
in that specific age group in the
given area during the same year
Summing up these age specific reproductive rate for all
ages in the reproductive span of life will give gross
reproductive rate.

226 Medical Statistics and Demography Made Easy

It provide an upper limit of the rate of population growth


indicating the average number of daughters that would be
born to each age group of 1000 women beginning life together
if none died before reaching the end of child-bearing period.
If gross reproductive rate is 1, it indicates that the current
generation of females of child bearing age will maintain itself
on the basis of current fertility rate but without mortality.
But if gross reproductive rate is less than 1, then females
of child bearing age will decline sooner or later.
Gross reproductive rate depends on the availability of
the data, i.e. classification of births according to age of mother
at the time of birth, and according to sex.
In absence of data we can approximately find out GRR
by the formula.

No. of female births


Total No. of births
Gross reproductive rate has a drawback that it ignores
the current mortality. Some of the female who begin life
together may die before reaching the upper limit, of the child
bearing age.
GRR = Total fertility rate

Net Reproductive Rate


Net reproductive rate indicates the average number of
daughters that would be born to a group of women beginning
their life together if they are subjected to the fertility and
mortality rate throughout their reproductive span of life. It is
computed by multiplying the age specific fertility rates for
female birth of each age group by the survival factor of that
age or age group. The sum of these specific fertility rates will
be net reproductive rate. Survival factor, i.e. the proportion of
female survivors to that age is available from life table. NRR
uses the same specific fertility rates, as the gross reproductive

Vital Statistics (Demography) 227

rate but it takes into consideration the survival factor taken


from a life table.
Net reproductive rate =

49

bx
15

Lx

Where bx represent female birth at each age x; and Lx the


number of years lived at each age per women born to the
original group of females.
49

15

represents the sum of these rates for the reproductive

span of life taken from 15-49 years of age.


Net reproductive rate cannot exceeds gross reproductive
rate because it takes the mortality factor into consideration.
If NRR = 1, then it indicates that on the basis of current
fertility and mortality rates, a group of newly born females
will exactly replace itself in the next generation, i.e. the
tendency of the population to remain constant.
It will show a tendency of increase or decrease in
population if it is greater then one or less than one.
However, both the gross reproductive rate and net
reproductive rate should not be used for forecasting future
population changes. Firstly, because they do not taken into
consideration the factor of migration. Secondly, the rates of
fertility and mortality are unlikely to be the same as at present.
Life Table
The life table, first developed adequately by the astronomer
E Halley (1656-1742). It provides a composite measure of the
mortality experience of a community at all ages and
permitting useful comparison with the experience of other
group.

228 Medical Statistics and Demography Made Easy

There are two distinct ways in which a life table may be


constructed from mortality data for a large community; the
two forms are usually called the Current life table and the
Cohort or generation life table.
The current life table describes the survival pattern of a
group of individuals subject throughout life to the age specific
death rates currently observed in a particular community.
The cohort life table described the actual survival
experience of a group, or cohort of individuals born at about
the same time. Cohort life table survives the mortality at
different ages at the times when the cohort would have been
at these ages.
The method of constructing life table is a complex
phenomenon. A simplified approach of constructing life table
is described by Hill (1966) which was based on the mortality
of males in England and Wales in 1930-32. The main feature
of life table can be seen in the Table 11.1.
The second column qx, the probability that an individual,
alive at x years exactly, will die before his next birthday.
The third column lx, the number of survivals out of an
arbitrary 1,000 born alive would survive to the xth birthday.
To survive for this period an individual must survive the first
year, then second and so on consequently.
lx = l0 p0 p1..px-1 Where px = 1 qx
The fourth column 0ex, the expectation of life at x. This is
the mean length of additional life beyond age x of all the lx
people alive at age x. 0ex can be calculated approximately as:

The term in the bracket is the total number of years lived


beyound age x by lx individuals if those dying between age x

Vital Statistics (Demography) 229


Table 11.1: Current and Cohort life tables for men in
England and Wales born around 1931
Current life tables 1930-32
Age in years

Cohort life

Life table
survivors

(x)

Probability of
death between
Are x and x+1
qx

Expectation table 1931


of life
cohort

lx

0e
x

lx

0
1
5
10
20
30
40
50
60
70
80

0.0719
0.0153
0.0034
0.0015
0.0032
0.034
0.0056
0.0113
0.0242
0.0604
0.1450

1,000
928.1
900.7
890.2
872.4
844.2
809.4
747.9
636.2
433.6
162

58.7
62.2
60.1
55.8
46.8
38.2
29.6
21.6
14.4
8.6
4.7

1,000
927.8
903.6
894.8
884.2
874.1
-

and lx+1 did so immediately after the xth birthday, and is a


correction to allow for the fact that death take place throughout
each year of age which very roughly half a year to be the
mean survival time.
A higher age the values of lx are greater for the cohort
table because this is based on mortality rates at the higher
ages which were experienced since 1931 and which are lower
than the 1931 rates.
Both form of life table are useful for vital statistical and
epidemiological studies.
Current life tables may be used as an alternative methods
of standardization for comparisons between mortality
patterns of different communities.

230 Medical Statistics and Demography Made Easy

Cohort life table are particularly useful in studies of


occupational mortalities.
Growth Rate
India is on of the most populous country of the world.
However, the most recent data indicates a decline in Indias
population growth rate.
Currently the national health goal is to attain a birth rate
of 21 and death rate of 9 per thousand by 2007. This would
have yielded an annual growth rate of 12% which was
considered essential for the stabilization of population of India
over the next 50 years or so.

MULTIPLE CHOICE QUESTIONS


1. To calculate the vital statistics the population used is
as on:
(a) January 1
(b) April 1
(c) July 1
(d) October 1
(AI, 90)
2. The following are the all India fertility indicator (1985)
except:
(a) Total fertility rate 4.51
(b) Net fertility rate 1.61
(c) Child women ratio 605
(d) General fertility rate 151
(AI,91)
3. Perinatal mortality is:
(a) Still births
(b) Neonatal deaths
(c) Still birth + early neonatal deaths
(d) Still birth + neonatal deaths

(AI, 92)

Vital Statistics (Demography) 231

4. Denominator for general fertility rate is:


(a) All females
(b) Females of reproductive age group
(c) All females above 15 years of age
(d) Mid year population

(AI, 92)

5. Still birth rate includes babies dead after:


(a) 20 weeks
(b) 24 weeks
(c) 28 weeks
(d) 32 weeks
(AI, 94)
6. In India perinatal rate is related to:
(a) Late foetal death (still birth) + death under 1 week
(b) Late foetal death (still birth) + death under 2 weeks
(c) Late foetal death and early neonatal death weighing
over 1000 gm at birth
(d) Late foetal death and early neonatal death weighing
over 1500 gm at birth
(AI, 96)
7. Denominator of maternal mortality rate is:
(a) 1000 total births
(b) Mid year population
(c) 1000 live births
(d) Total live births
(AI,96,98)
8. Two populations can be compared with:
(a) Proportional death rate
(b) Specific death rate
(c) Standardized death rate
(d) Crude death rate
(AI, 97, 99)
9. Annual growth rate is:
(a) Crude death rate-crude birth rate/mid year
population
(b) Crude birth rate - crude death rate
(c) Demographic gap
(d) Birth rate of that year
(AI, 97)

232 Medical Statistics and Demography Made Easy

10. In perinatal mortality rate all are true except:


(a) Death of neonate within one week
(b) Late still birth and death beyond 28 weeks
(c) Death or more than 1000 gm is numerator
(d) Total number of birth in numerator
[Hint: According to revised definition of perinatal mortality
by WHO, 1970].
(AI, 98)
11. Total fertility rate indicates:
(a) Approximate magnitude of complete family size
(b) Women of child bearing age (15-49) is numerator
(c) All live birth in numerator
(d) Married women (15-49) of child bearing age in
numerator
(AI, 98; AIIMS, Dec, 97)
12. All are indicator of physical quality of life index except:
(a) Infant mortality rate
(b) Life expectancy at age on
(c) Literacy
(d) Per capita gross national product
(AI, 99)
13. All are true regarding direct standardization except:
(a) Age specific death rate is required for comparison
(b) Age composition of the population is required
(c) Vital statistics required
(d) With out knowledge composition of population two
samples are compared
(AI, 2000)
14. In a population of 10,000, birth rate is 36 per thousand.
There are 5 maternal deaths. The maternal mortality
rate is:
(a) 36.5
(b) 13.8
(c) 20
(d) 5
(AI, 2001)

Vital Statistics (Demography) 233

15. Census in India was done:


(a) Every year
(b) Every 5 years
(c) Every 10 years
(d) As and when noted
(JIPMRE, 81; UPSC, 85)
16. Basic events recorded by vital statistics:
(a) Death
(b) Birth
(c) Divorces
(d) All of the above
(e) Only b and c
(AIIMS, 80; UPSC, 86)
17. Sample registration system is done in once in:
(a) 6 months
(b) 1 year
(c) 2 years
(d) 5 years
(PGI, 95)
18. In family welfare programme, score of 1 is given to:
(a) Birth rate
(b) Net reproductive rate
(c) Achievement of goal
(d) Total implementation of programme
(AIIMS, May 95)
19. Denominator of birth rate is:
(a) Mid year population
(b) Total number of deaths
(c) Women of child bearing age group
(d) Total number of eligible couples
(AIIMS, Dec 95)
20. Most significant indicator of fertility is:
(a) Net reproductive rate
(b) Family size
(c) Gross reproductive rate
(d) General fertility rate

(AIIMS, 96)

234 Medical Statistics and Demography Made Easy

21. By which method except standardized death rates, we


can compare the mortality pattern of two communities:
(a) Cohort life table
(b) Crude death rate
(c) Current life table
(d) No suitable other
method
22. Denominator of crude death rate:
(a) 1000 live births
(b) Mid year population
(c) Total number of deaths in the community
(d) Total number of case population in community
(AIIMS, June 97; Nov 2002)
23. Denominator of maternal mortality rate:
(a) Per 1000 live births
(b) per 1000 births
(c) Mid year population
(d) Total number of females in the population
(AIIMS, Feb. June, 97)
24. India having crude birth rate from 93 to 80 and death
rate from 21 to 10, the country is in which stage of
demographic cycle:
(a) Late expanding
(b) Early expanding
(c) High stationary
(d) Low stationary
(AIIMS,Dec 97)
25. Infant mortality rate is considered:
(a) Below 1 month
(b) Below 1 year
(c) Upto 1 year
(d) 28 days
(AIIMS, Dec 97; Nov 2000)
26. Denominator of general fertility rate is:
(a) All women between 15-45 years
(b) All married women between 15-45
(c) Total number of live births
(d) Total number of all births
(AIIMS, Dec 97)

Vital Statistics (Demography) 235

27. A community survey shows crude birth rate is 23 and


crude death rate is 6. The demographic stage of
population is:
(a) High stationary
(b) Early expanding
(c) Late expanding
(d) Low stationary
(AIIMS, Nov 99)
28. In standardization of a population all are true except:
(a) Two population are compared
(b) One standard population is compared with other
population
(c) Age specific rates are taken
(d) Number of cases are detected
(AIIMS, June 2000)
29. General fertility rate is a better measure of fertility than
the crude fertility rate because the denominator
includes:
(a) 15-45 years of age female
(b) Mid year population
(c) Total women population
(d) Married women population
(AIIMS, Nov 2001)
30. In Crude death rate the population is taken as on:
(a) 1st March
(b) 1st July
(c) 1st April
(d) 15th August
(AIIMS, Nov 2001)
31. The socioeconomic status of community is best
indicated by:
(a) IMR
(b) Under 5 mortality rate
(c) Maternal mortality rate
(d) Perinatal mortality rate
(AIIMS, Nov. 2001)

236 Medical Statistics and Demography Made Easy

32. The death of two countries can be best demonstrated


by:
(a) Standardized death rate
(b) Age adjusted death rate
(c) IMR
(d) CDR
(AIIMS, Nov 2001)
33. Calculate IMR if in a population of 100,000 there are
3000 live births in a year and 150 infant deaths in the
same year:
(a) 75
(b) 18
(c) 5
(d) 50 (AIIMS, Nov 2001)
34. Which of the following rates is not only an indicator of
mortality but also of the living standard of a community:
(a) IMR
(b) PNMR
(c) MMR
(JIPMER, 2003)
35. About direct standardization all are true except:
(a) Age specific death rate is not needed
(b) A standard population is needed
(c) Population should be comparable
(d) Two populations are compared
(AI, 2002)
36. In what stage of demographic cycle is India today:
(a) Low stationary
(b) High stationary
(c) Early expanding
(d) Late expanding
(AIIMS, 87; JIPMER, 86; UPSC,88; Kerala 87)
37. Age adjusted death rates are used to:
(a) Correct death rates for error in the statement of age
(b) Determine the actual number of deaths that occurred
in specific age groups of population

Vital Statistics (Demography) 237

(c) Compare death rates of persons of the same age


(d) Eliminate the effect of difference in the age
distribution of population in comparing death rates
38. Infant mortality does not include:
(a) Early neonatal mortality
(b) Pernatal mortality
(c) Post neonatal mortality
(d) Late neonatal mortality

(AI, 2005)

39. The age and sex structure of a population may be


described by a:
(a) Life Table
(b) Correlation Coefficient
(c) Population pyramid
(d) Bar Chart
(AIIMS, 2005)
40. A one day census of patients in a mental hospital could:
(a) Give good information about patient in that hospital
at that time
(b) Give reliable estimates of seasonal factors in
admission
(c) Enable us to draw conculsion about the mental
hospital of India
(d) Enable us to estimate the distribution of different
diagnosis in mental illness in the local area
(AIIMS, 2005)

Chapter 12

Health Information

240 Medical Statistics and Demography Made Easy

According to WHO (1973) a Health Information System is


defined as:
A mechanism for the collection, processing, analysis and
transmission of information required for organizing and
operating health services, and also for research and training
The primary objective of a health information system in
to provide reliable and complete health information at all
levels to those who are concerned with health system of
country.
Distinction between Data and Information
Data consist of discrete observations or attributes or events.
These data can be transformed into information by
summarizing them and adjusting them for variations. Data
which are not transformed into information are of little use.
Sources of Health Information
The health information is collected in various ways. A
comprehensive health information system requires
information related to demography and vital events, i.e. health
status which can be assessed by mortality, morbidity, quality
of life, etc. The following are the major source of data on
demographic and social statistics:
Population Census
The census is an important source of health information. A
census is defined as the total process of collecting, compiling
and publishing demographic, economic and social data pertaining
at a specified time or in other words a population census is a
complete enumeration, at a specified time, of individuals inhabiting
a specified area.
In India, the population census is undertaken at an
interval of ten years. In the population census, particulars

Health Information 241

are collected about age, sex, social, economic, ethnic and


familial characteristics of individuals.
A population census directly supplies the data on vital
events, e.g. births and deaths that have occurred.
The first regular census was taken in 1881. The last
census, which was conducted from February 9 to 28,2001
reported that the total population on 1st March 2001 was
1.03 billion (1,027,015,247). The Decennial census figures
related to population reflect the demographic trends as on
1st March of first year of every decade.
Although the primary function of census is to provide
demographic information such as total count of population
and breakup into groups such as sex, age distribution. But
census contains a lot of information not only related to
demographic profile of country but also social and economic
characteristic of the people.
The population census of a country of the size of India is
a gigantic exercise and requires enormous efforts from the
Central and State Governments. Since its inception in 1872,
the conduct of census has greatly improved in terms of
content, coverage, quality and speedy release of census data.
Registration of Vital Events
In 1873 the Government of India passed the birth, death and
marriage registration act. The civil registration system
envisages the recording of each and every vital events for
legal purpose and in the process, it capture a lot of information
on various characteristics of these events like age of mother,
religion, cause of death, age at death, etc. which help in the
compilation of a continuous series of vital statistics.
Timely, accurate and complete registration of birth and
deaths is very crucial for understanding of population
dynamics at the local and policy level for planning of effective
health and development programmes.

242 Medical Statistics and Demography Made Easy

In India, the enactment of the Registration of Birth and


Deaths (RBD) Act, 1969 replacing all laws that existed and
the subject, and Model Rules framed under the Act introduced
a uniform legislation to overcome the problem of multiplicity
of acts and rules that existed in the country. However,
registration of marriage and divorce act is not compulsory in India.
The Registration of Birth and Deaths (RBD) Act, 1969:
The act came into force from 1st April, 1970. The Act provides
for compulsory registration of births and deaths throughout
the country, and compilation of vital statistics in the states so
as to ensure uniformity and comparability of data.
The time limit for registering the events of births is 14
days and that of deaths is 7 days. In case of default a fine
upto Rs. 50 can be imposed.
Sample Registration System
The Sample Registration System (SRS) is a large scale
demographic survey conducted in India for providing reliable
annual estimates of birth rate, death rate and other fertility
and mortality indicators at the national and sub-national
levels.
The field investigation consists of continuous
enumeration of births and deaths by a resident part time
enumerator, generally a teacher followed by an independent
survey every six months by an official. The data obtained
through these operations are matched. The unmatched and
partially matched events are re-verified in the field and
thereafter an unduplicated count of births and deaths is
obtained.
The SRS was initiated by the Office of the Registrar
General, India on a pilot basis in a few selected states in 196465. It became fully operational during 1969-70 covering about
3700 samples units. Thereafter the sample size has been

Health Information 243

periodically increased. The frame was recently updated based


on 1991 Census data.
The sample unit in rural areas is a village or a segment of
it if the village has a population of 1500 or more. In urban
areas the sample unit is a census enumeration block with a
population ranging from 750 to 1000. At present SRS covers
6671 sample units (4436 rural and 2235 urban) in all the
states and union territories of India covering 1.1 million
households and a population of about 6 million.
The survey is conducted by every six months by a
competent supervisor and the results of the sample survey
are published in June and October every year indicating the
vital rates.
Hospital Records
Records of in-patient and out-patients as maintained by
various hospitals and health centers containing information
such as age, sex, nature of illness, type of treatment and
outcome serve as a good source of health related information
in the country.
Sample Surveys
India has one of the largest sample survey organization,
namely National Sample Survey Organization (NSSO), in the
world. This organization was set up in 1950 and has
conducted 59 rounds of various kinds of surveys. The
National Sample Survey Organization under the Ministry of
Statistics and Programme Implementation took up a number
of countrywide household based surveys for collecting data
on mainly vital events. Based on NSSO, the International
Institute of Population Sciences, another specialized
organization in this area, has conducted two National Family
Health Surveys (NFHS), one during 1992-93 and another

244 Medical Statistics and Demography Made Easy

during 1998-99. In these surveys information on health status


of the population, family planning, fertility and mortality as
well as growth and development of children, levels of anemia
of women and children and related parameters at the state
and national levels were collected.

MULTIPLE CHOICE QUESTIONS


1. Recording of data of births and deaths in a community,
continuously after 6 months survey is known as:
(a) Sample registration system
(b) Data linkage
(c) Hospital records
(d) Notification system
(AIIMS, 96)
2. In India birth and deaths registered within days.
(a) 3 days
(b) 7 days
(c) 14 days
(d) 30 days
[Hint: Though deaths are to be registered within 7 days and
birth within 14 days, but if a single answer is given then
within 14 days both are to be registered].
(UPSC, 86; AMC, 92)
3. In India death is to be registered within days.
(a) 3 days
(b) 7 days
(c) 14 days
(d) 30 days
(UPSC, 87; Delhi 93)
4. Basic events recorded by vital statistics:
(a) Deaths
(b) Births
(c) Divorce
(d) All the above
(e) Only b & c
(AIIMS 80, UPSC 86)

Health Information 245

5. Sample registration system is done once in:


(a) 6 months
(b) 1 year
(c) 2 years
(d) 5 years
(PGI, 95)
6. Registration of births and deaths with a 6 monthly
survey is done in:
(a) National sample survey
(b) Vital statistical system
(c) Census
(d) Sample registration system
(AIIMS, 96)
7. Sample registration system was started to acquire
information on which of the following:
(a) Birth and death rate for the states and the country
(b) Migration status
(c) Death rate from rural area
(d) Morbidity rate of various disease
(Rajasthan, 97)
8. Which of the following is the national level system that
provides annual national as well as state level reliable
estimates of fertility and mortality:
(a) Civil Registration System
(b) Census
(c) Adhoc Survey
(d) Sample Registration System
(AIIMS, 2005)
9. National Family Health Survey has successfully
completed:
(a) One round
(b) Two round
(c) Three round
(d) Four round
(AIIMS, 2005)

Chapter 13

A Report on
Census 2001

248 Medical Statistics and Demography Made Easy

HISTORY OF CENSUS
There are evidences of conducting census in several countries
in ancient times. But as the purpose of census in those days
were for recruitment to the army or for taxation it was not
exhaustive and elaborate as in modern times. Indications are
there of having conducted census as early as 3000 years BC
in parts of Babylon, China and Egypt. The importance of
census has been mentioned in the Arthashastra by Chanakya
and the Bible.
Modern system of census taking started in the 18th
Century. The first such census was conducted in Sweden in
1749. Decennial Census started in 1790 in the United States
of America and in 1801 in England.
India is one of the few countries which had an unbroken
series of decennial censuses spanning over a hundred years.
Indias history of conducting census dates back to 1865-75
when a systematic census was taken. The first synchronous
census was taken in 1881 in India and thereafter census has
been taken every ten years without break.
After Independence, census is conducted in India under
the census Act of 1948. The present census of 2001 is the
fourteenth in the regular series and the sixth after
Independence. It has the distinction of being the first census
of the millennium and twenty first century. It is also the first
census to be held after India crossed the one billion mark as
per population projections.
ADMINISTRATIVE PROCEDURE FOR CONDUCTING
CENSUS IN INDIA
Census is a joint effort by the Union and State Governments
in India. Under the Census Act, 1948, the Central Government

A Report on Census 2001 249

notifies the intention of taking a Census and appoints the


Census Commissioner for the country and the Directors of
Census Operations for the States and Union Territories.
Thereafter, the concerned State Government is to notify the
appointment of Census officers to conduct the Census in their
respective jurisdictions under the guidance of the Director of
Census notified as the Chief Principal Census Officer of the
State. Accordingly, the District collectors have been appointed
as the Principal Census Officers of the respective districts to
conduct the Census of 2001. Deputy Collectors (General) have
been notified as District Census Officers and RDOs as Sub
Divisional Census Officers. Tahsildars, Secretaries of
Corporations and Municipalities have been notified as
Charge Officers and City/Town Census Officers for their
respective jurisdictions. Divisional Forest Officers are the
Divisional Forest Census Officers of the Forest areas.
The preparation for Census of India 2001 started much
before the actual census takes place by collecting details of
changes in jurisdiction of administrative units since the last
census. This was required for preparing an updated list of
administrative units. With the intention of publishing
panchayat ward wise data and also village wise data after
the census, the details of panchayat wards coming under
each village was collected along with the national maps
showing these wards. After collecting the list of villages the
rural and urban areas are demarcated.
Census operations are conducted in two phases. House
numbering and House listing in the first phase and actual
population enumeration in the second phase. During the
House listing operations, all houses will be numbered, a
layout map of the houses prepared and a questionnaire called
House list schedule filled up.

250 Medical Statistics and Demography Made Easy

CENSUS 2001
The last census was carried out in year 2001, this was carried
from February 9 to 28. The provisional report was released by
Registrar General of India and Census Commissioner on
March 26, 2001. Which give a provisional demographic data
and vital statistics as on 1st March, 2001.
CENSUS TO COLLECT DATA ON
DISABLED FOR FIRST TIME
For the first time a clear picture about the disabled people in
the country will emerge with the government collecting data
on them during the Census 2001 in February.
Though there are no official statistics on the number of
disabled persons in the country, the National Sample Survey
of 1991 showed that 1.9 per cent of the population is disabled.
However, according to the UN, 10 per cent of any developing
countrys population is disabled.
If we compare the percentage of people with disabilities
in other Asian countries, we tend to falsely feel proud about
our figures. The truth is that we have never carried out any
data collection on disabled people in the country to come to a
true picture.
Limited information was given about the disabled
persons during the 1872 to 1931 census. However, this was
discontinued and no attempts were made in 1941, 1951, 1961
and 1971 to collect data on the disabled. The only time Census
was conducted was in 1981 as it fell on the International
Year for Disabled Persons and that too was done in a halfhearted manner. As the authorities found it cumbersome to
conduct it, they again dropped it from their list in 1991. Finally,
disability was included in the list as a criteria for data
collection in census 2001.

A Report on Census 2001 251

The census will offered an opportunity to collect data


from every nook and corner of the country in a systematic
and scientific manner. Data collection is essential for policy
planning, fund allocation according to the region and for the
well-being of 70 million disabled citizens of the country.
FERTILITY TRENDS (DEMOGRAPHIC TRANSITION)
From a high birth rate-high death rate country of preindependence era, India had reached the stage of high birth
rate-low death rate in the early fifties which contributed to
high rate of growth population. This continued till the
beginning of the eighties, with a decadal growth riding to
21.51% in the fifties, from 13.31 in the previous decade
followed by 24.80% in the sixties and 24.6% in the seventies.
The transition to the phase of low birth rate-low death
rate has started at a very slow pace, with the population
growth sliding to 23.85% in the eighties and 21.34% in the
nineties.
The changing demographic profiles of the country
indicate a few important trends:
1. Although the percentage of child population below age
of 14 has declined from 37.6 to 34.33% in 2001, the
absolute number of children are still on the rise. The
population of children in the age group of 0-6 has
increased from 150.42 million in 1991 to 157.86 million
in 2001.Likewise the population in the age group 7-14
also showed an increasing trend. Together they constitute
31.9% of the total population. The adolescents in the age
group of 15-19 constitute another nearly 10% of the
population.
2. The second trend in the country is the declining sex ratio.
The overall sex ratio had declined almost consistently

252 Medical Statistics and Demography Made Easy

over the years from 972 females per 1000 males in 1091
to 933 in 2001. This decline in sex ratio have been
particularly striking in the age group of 0-6 years in recent
years.
3. While the female sex ratio among the children is
declining, it is increasing among the aged, due to
increased life expectancy among females, leading to a
strange gender asymmetry in the age pyramid of the
country.
4. The age pyramid of Indian population will swell in the
center in the years to come. At present 58% of Indians
are in the age group of 15-59 years. This will increase to
nearly 64% during the next 10 years.
DENSITY
Crowding worsened Indias density according to Census 2001
was 324 people per square kilometer, 57 point higher than
1991. The highest population density, 9, 294 people per square
kilometer was recorded in Delhi.
LITERACY
Literacy is among the most promising aspect of the latest
Census. Indias literacy increased by 13% points from 52% in
1991 to 65% in 2001. Seventy six percent of males and 54% of
females are now literate, compared with 64% and 39%
respectively in 1991.
GROWTH RATE
While in the last century the worlds population increased
more than three fold, Indias grew more than four fold. Still
its growth rate over the last 10 years (12%) was lower than for

A Report on Census 2001 253

the previous 10 year period (24%), marking the biggest


percentage drop since India became independent in 1947.
The decline in growth rate can be attributed to effective
implementation of government-sponsored programmes
aimed at improving reproductive health services and bringing
the fertility rate down to replacement level.
At the state levels, growth rate varies widely. Three
southern states Kerala, Tamil Nadu and Andhra Pradesh
had the lowest rates, with Andhra Pradesh registering the
most dramatic decline in its growth rate since the last Census:
down from 24 to 14%. Uttar Pradesh added the most people,
34 million.
THE STATUS OF WOMENS HEALTH
Although women of India have made major gains in terms of
decline in maternal mortality and rise in life expectancy,
increase in female literacy and employment, mobilization
through self help groups and representation at the grassroots
level democracy, etc.
Still, India accounts for nearly 25% of the worlds
maternal deaths. Every year about 1, 25, 000 Indian women
die from pregnancy-related causes many of which are
preventable. Poor maternal health results in low birth weight
and premature babies. More than 7% of the newborn babies
perish every year. Nearly 2.3% of the babies who survive the
first year perish before they complete five years. The number
is more in case of female babies.
The mean age of marriage at the national level is 19.5
years about 17.4% of girls are married below the age of 18
years. Corresponding rates show marked rural (20.3%) and
urban (7.4%) difference. 8.3% of fertility in India is contributed
by mothers below 19 years of age and this also linked with

254 Medical Statistics and Demography Made Easy

pregnancy wastages ranging from premature death, still birth,


neonatal deaths, low birth weight babies and maternal
mortality.
Significant co-relation between spread of female literacy
and decline of fertility has been observed throughout the
country although there are regions where fertility has declined
despite of prevalence of illiteracy. A lot of gain have been
made in the recent years in female literacy, but still 45.84% of
females in the age group of 6 to 50 are illiterates. The gap
between male and female literacy is still as wide as 22% point.
THE STATUS OF CHILDREN
Nearly one-third of the population of India, i.e. about 328.20
million are children below 15 years. About 45% of its
population fare minor i.e. below the age of 18 years.
Although a significant improvement have taken place in
all the crucial indicators like Infant mortality rates, school
enrollment rate, drop-out rates, level of malnutrition, etc. yet
infant mortality has remained around 72 per 1000 live births
with no significant improvement in the nineties unlike in the
eighties. This is much above the average of 6 in developed
countries, 64 in developing countries and the world average
of 59. The underlying reasons of high infant mortality are
early marriage and childbearing, lack of spacing, inadequate
maternal nutrition, inadequate antenatal care, and large
proportion of delivery lacking supervision by trained
attendants.
The child mortality rate (CMR) of India is also very high
25 children out of every 1000 who survive the first year die
before they complete 5 years. Malnutrition continues to remain

A Report on Census 2001 255

a principal underlying cause of morbidity and mortality under


5-children. This is because intrauterine growth retardation
caused by inadequate nutrition during pregnancy cannot be
corrected later on.
One out of three children born in India with low birth
weight (< 2500 gm).The average child succumbs to acute
respiratory infection and diarrhoea.
Malnutrition constitutes a major threat to the development
potential of young children. Although surveys conducted by
National Nutrition Monitoring Bureau, Hyderabad have
confirmed that there has been a declining trend in severe and
moderate degree of malnutrition amongst children, the micronutrient deficiencies, viz. Vitamin A, iron and iodine have
been affecting children in various degrees.
ISSUE OF THE ADOLESCENTS
Adolescents comprise about a fifth of Indian population, but
they have so far not been recognized as a target of any strategy
for development in the country. They are an important human
resource material, which can be effectively moulded and
channelised for national building
Provisional Population Totals: India
The total population of India as at 0:00 hours on 1st March
2001 stood at 1, 027, 015, 247 persons. With this, India became
only the second country in the world after China to cross the
one billion mark. The population of the country rose by 21.34
% between 1991-2001. The sex ratio (i.e. number of females
per thousand males) of population was 933, rising from 927
as at the 1991 Census. Total literacy rate was returned as
65.38%.

256 Medical Statistics and Demography Made Easy

Population:
Persons
1, 027, 015, 247
Males
531, 277, 078
Females
495, 738, 169
Sex Ratio:
933

Decadal growth 1991-2001:


Persons:
(+)21.34%
Males:
(+)20.93%
Females:
(+)21.79%

Population (0 - 6 years):

Percent age of population


(0-6) to total population:
Persons:
15.42%
Males:
15.47%
Females:
15.36%

Persons
Males
Females
Sex Ratio:

157, 863, 145


81, 911, 041
75, 952, 104
927

(0 - 6 years)

Number of literates:
Persons
Males
Females

566, 714, 995


339, 969, 048
226, 745, 947

Percentage of literates
to total population:
Persons:
65.38%
Males:
75.85%
Females:
54.16%

(Source: Provisional Population Totals: India. Census of India 2001, Paper


1 of 2001).
Basic population data census of India 2001
(India, States and Union territories)
State India/State/UT
Code
01
02
03
04
05
06

India@
Jammu and Kashmir
Himachal Pradesh
Punjab
Chandigarh
Uttaranchal
Haryana

Number of
house holds

Population

Household size

193, 579, 954 1, 028, 610, 328


1, 568, 519
10, 143, 700
1, 221, 589
6, 077, 900
4, 348, 580
24, 358, 999
206, 465
900, 635
1, 603, 242
8, 489, 349
3, 712, 319
21, 144, 564

5.3
6.5
5.0
5.6
4.4
5.3
5.7
Contd...

A Report on Census 2001 257


Contd...

07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

Delhi
Rajasthan
Uttar Pradesh
Bihar
Sikkim
Arunachal Pradesh
Nagaland
Manipur@
Mizoram
Tripura
Meghalaya
Assam
West Bengal
Jharkhand
Orissa
Chhattisgarh
Madhya Pradesh
Gujarat
Daman and Diu
Dadra and Nagar Haveli
Maharashtra
Andhra Pradesh
Karnataka
Goa
Lakshadweep
Kerala
Tamil Nadu
Pondicherry

2, 733, 383
9, 317, 675
25, 757, 640
13, 744, 130
114, 223
215, 574
328, 057
375, 095
176, 134
664, 334
418, 850
4, 914, 823
15, 872, 083
4, 799, 081
7, 738, 065
4, 091, 551
10, 912, 025
9, 691, 362
35, 686
45, 586
19, 576, 736
17, 004, 305
10, 401, 918
294, 812
9, 993
6, 726, 356
14, 665, 983
215, 538
Andaman and Nicobar Islands 78, 242

13, 850, 507


56, 507, 188
166, 197, 921
82, 998, 509
540, 851
1, 097, 968
1, 990, 036
2, 166, 788
888, 573
3, 199, 203
2, 318, 822
26, 655, 528
80, 176, 197
26, 945, 829
36, 804, 660
20, 833, 803
60, 348, 023
50, 671, 017
158, 204
220, 490
96, 878, 627
76, 210, 007
52, 850, 562
1, 347, 668
60, 650
31, 841, 374
62, 405, 679
974, 345
356, 152

Source: Primary Census Abstract: Census of India 2001.

5.1
6.1
6.5
6.0
4.7
5.1
6.1
5.8
5.0
4.8
5.5
5.4
5.1
5.6
4.8
5.1
5.5
5.2
4.4
4.8
4.9
4.5
5.1
4.6
6.1
4.7
4.3
4.5
4.6

258 Medical Statistics and Demography Made Easy


Provisional population totals: India - Part 1
S.no

India/State/
Union territories*

Persons

Population
Males

Population Sex ratio


Females Variation (females
1991-2001
per
thousand
males)

India 1, 2

1, 027, 015, 247 531, 277, 078 495, 738, 169

21.34

933

356, 265

192, 985

163, 280

26.94

846

75, 727, 541

38, 286, 811

37, 440, 730

13.86

978

1 Andaman and
Nicobars.*
2 Andhra Pradesh

3 Arunachal Pradesh 1, 091, 117

573, 951

517, 166

26.21

901

4 Assam

26, 638, 407

13, 787, 799

12, 850, 608

18.85

932

5 Bihar

82, 878, 796

43, 153, 964

39, 724, 832

28.43

921

6 Chandigarh*

900, 914

508, 224

392, 690

40.33

773

7 Chhattisgarh

20, 795, 956

10, 452, 426

10, 343, 530

18.06

990

220, 451

121, 731

98, 720

59.20

811

158, 059

92, 478

65, 581

55.59

709

13, 782, 976

7, 570, 890

6, 212, 086

46.31

821

1, 343, 998

685, 617

658, 381

14.89

960

12 Gujarat 5

50, 596, 992

26, 344, 053

24, 252, 939

22.48

921

13 Haryana

21, 082, 989

11, 327, 658

9, 755, 331

28.06

861

6, 077, 248

3, 085, 256

2, 991, 992

17.53

970

5, 300, 574

4, 769, 343

29.04

900

8 Dadra and Nagar


Haveli*
9 Daman & Diu*
10 Delhi*
11 Goa

14 Himachal Pradesh
4

15 Jammu and Kashmir


2, 3

10, 069, 917

16 Jharkhand

26, 909, 428

13, 861, 277

13, 048, 151

23.19

941

17 Karnataka

52, 733, 958

26, 856, 343

25, 877, 615

17.25

964

18 Kerala

31, 838, 619

15, 468, 664

16, 369, 955

60, 595

31, 118

29, 477

19 Lakshadweep*

9.42 1, 058
17.19

947

Contd...

A Report on Census 2001 259


Contd...
20 Madhya Pradesh

60, 385, 118

31, 456, 873

28, 928, 245

24.34

920

21 Maharashtra

96, 752, 247

50, 334, 270

46, 417, 977

22.57

922

22 Manipur

2, 388, 634

1, 207, 338

1, 181, 296

30.02

978

23 Meghalaya

2, 306, 069

1, 167, 840

1, 138, 229

29.94

975
938

24 Mizoram

891, 058

459, 783

431, 275

29.18

1, 988, 636

1, 041, 686

946, 950

64.41

909

36, 706, 920

18, 612, 340

18, 094, 580

15.94

972

973, 829

486, 705

487, 124

28 Punjab

24, 289, 296

12, 963, 362

11, 325, 934

19.76

29 Rajasthan

56, 473, 122

29, 381, 657

27, 091, 465

28.33

922

540, 493

288, 217

252, 276

32.98

875

62, 110, 839

31, 268, 654

30, 842, 185

11.19

986

25 Nagaland
26 Orissa
27 Pondicherry*

30 Sikkim
31 Tamil Nadu
32 Tripura

20.56 1, 001
874

3, 191, 168

1, 636, 138

1, 555, 030

15.74

950

166, 052, 859

87, 466, 301

78, 586, 558

25.80

898

34 Uttaranchal

8, 479, 562

4, 316, 401

4, 163, 161

19.20

964

35 West Bengal

80, 221, 171

41, 487, 694

38, 733, 477

17.84

934

33 Uttar Pradesh

Notes
1. The population of India includes the estimated
population of entire Kachchh district, Morvi, MaliyaMiyana and Wankaner talukas of Rajkot district, Jodiya
taluka of Jamanagar district of Gujarat State and entire
Kinnaur district of Himachal Pradesh where population
enumeration of Census of India 2001 could not be
conducted due to natural calamity.
2. For working out density of India, the entire area and
population of those portions of Jammu and Kashmir
which are under illegal occupation of Pakistan and
China have not been taken into account.

260 Medical Statistics and Demography Made Easy

3. Figures shown against Population in the age-group 0-6


and Literates do not include the figures of entire Kachchh
district, Morvi, Maliya-Miyana and Wankaner talukas
of Rajkot district, Jodiya taluka of Jamanagar district and
entire Kinnaur district of Himachal Pradesh where
population enumeration of Census of India 2001 could
not be conducted due to natural calamity.
4. Figures shown against Himachal Pradesh have been
arrived at after including the estimated figures of entire
Kinnaur district of Himachal Pradesh where the
population enumeration of Census of India 2001 could
not be conducted due to natural calamity.
5. Figures shown against Gujarat have been arrived at after
including the estimated figures of entire Kachchh
district, Morvi, Maliya-Miyana and Wankaner talukas
of Rajkot district, Jodiya taluka of Jamnagar district of
Gujarat State where the population enumeration of
Census of India 2001 could not be conducted due to
natural calamity.
(Source: Provisional Population Totals : India . Census of
India 2001, Paper 1 of 2001).

Socioeconomic Situation
Net national product per capital:
At current prices
At 1993-94 prices

Population and Vital Statistics


Total population (in millions)
Population density (persons per sq km)
Sex ratio (females per 1000 males)
Population under 15 years (%)
Population 65 years and above (%)
Crude birth rate (per 1000 population)
Crude death rate (per 1000 population)
Annual population growth rate (%)
Total fertility rate (per woman)
Urban population (%)

Indicator

Rs 13, 193
Rs 9, 660

1, 027
324
933
35.6
4.1
26.1
8.7
1.74
2.85
26.13

Latest available
data

1997-98
1997-98

2001
2001
2001
1998
1998
1999
1999
1999
1996-98
1991

Year

Country Health Profile - India


Country Reported Data on Health Indicators

5
5

1
1
1
2
2
3
3
3
4
5

Source

Contd...

Census results

Natural growth

Census results
Census results
Census results

Remarks

A Report on Census 2001 261

Environment
Population with safe drinking Total
water available in the home or Urban
with reasonable access (%)
Rural
(piped or hand pump)
Population with adequate excreta Total
disposal facilities available (%)
Urban
Rural
(population with toilet/latrine facility)

1998-99

1998-99
1998-99
1998-99

1998-99

36.0
80.7
18.9

1998-99

4
4
4

4
4
4

4
4
4

1995-96
1995-96
1995-96

1998-99
1998-99
1998-99

1
1
1

2001
2001
2001

77.9
92.6
72.3

Total
65.38
Male
75.85
Female
54.16
Prevalence of low birth weight (weight < 2500 grams
at birth) (%):
Total
23
Urban
21
Rural
24
Prevalence of under weight (weight-for-age) in
children < 3 years of age (%)
47.0
Prevalence of stunting (height-for-age) in children
< 3 years of age (%)
45.5
Prevalence of wasting (weight-for-height) in children
< 3 years of age (%)
15.5

Literacy rate (%):

Contd...

Contd...

For population
of age 7years
and above

262 Medical Statistics and Demography Made Easy

301, 691

5
5
5

5
5
5

As of 1/1/98
For hospitals only

1998
1998

5.1%
18.0%

Contd...

5
Registered
5
Computed value
5
5
Registered
1997
5

1998
1998
1998
1997

1998
1998
1998

137, 006
23, 179
2, 913
503, 900
1, 916
5.2
607, 376

1998
1998
1998

665, 639
1, 451
6.9

Budgetary Resources
Total Expenditure on Health (THE) as % of Gross Domestic
Product (GDP)
Public Expenditure on Health (PHE) as % of Total Expenditure
on Health (THE)

Human Resources
Number of physicians
Population per physician
Physicians per 10, 000 population
General nurse midwives
Auxiliary nurse midwives/
health workers

Health Resources Facilities


Number of hospital beds
Population per hospital bed
Hospital beds per 10, 000 population
Number of health centres:
(a) Sub-centres
(b) Primary health centres
(c) Community health centres

Contd...

A Report on Census 2001 263

Private Expenditure on Health (Pvt HE) as % of Total Expenditure


on Health (THE)
82.0%
Public Expenditure on Health (PHE) as % of General Government
Expenditure (GGE)
5.6%
Social Security Expenditure on Health (SSHE) as % of Public
Expenditure on Health (PHE)

Tax funded Health Expenditure (Tax FHE) as % of Public


Expenditure on Health (PHE)
96.4
External Resources for Health (Ext Res HE) as % of Public
Expenditure on Health (PHE)
3.6
Private Insurance for Health Risks (Pvt ins HE) as % of Private
Expenditure on Health (Pvt HE)

Out-of-Pocket Spending on Health (OOPS) as % of Private


Expenditure on Health (Pvt EH)
97.3%
Per Capita Total Expenditure on Health (THE) at official Exchange
rate (X-Rate per US $)
22
Per Capita Public Expenditure on Health (PHE) at official Exchange
rate (X-Rate per US $)
4
Per capita Total Expenditure on Health (THE) in International
Dollars (intl$)
110
Per capita Public Expenditure on Health (PHE) in International
Dollars (intl$)
20

Contd...

6
6
6
6
6
6
6
6
6
6
6

1998
1998
1998
1998
1998
1998
1998
1998
1998
1998
1998

Contd...

264 Medical Statistics and Demography Made Easy

Health Services
Pregnant women attended by trained personnel during pregnancy (%)
Total
65.1 1995-96
Urban
85.6 1995-96
Rural
59.3 1995-96
Deliveries attended by trained personnel (%)
Total
42.3 1995-96
Urban
73.5 1995-96
Rural
33.5 1995-96
Women of child bearing age using family planning (%)
48.2 1998-99
Eligible population (i.e. infants reaching their first birthday) that
has been fully immunized according to national
34.5 1998-99
immunization policies
49.0
2001
Infants reaching their first birthday that have been fully
immunized against diphtheria, tetanus, and whooping cough (%) 52.1 1998-99
Infants reaching their first birthday that have been fully
immunized against poliomyelitis (%)
59.2 1998-99
Infants reaching their first birthday that have been fully
immunized against measles (%)
41.7 1998-99 4

Contd...

4
7

4
4
4
4

4
4
4

Contd...

As of
Oct. 2001

A Report on Census 2001 265

1996-01
1996-01
1994-98
1994-98
1998

5
5
4
4
8

Projected
values

1995-96

66.8
62.36
63.39
68
95
407

1998-99

69.1

1.
2.
3.
4.
5.
6.

Sources:
India, Census of India 2001:Provisional Population totals, March 2001.
India, Sample Registration System, Statistical Report 1998, October 2000.
India, Sample Registration System, SRS Bulletin, April 2001.
India, National Family Health Survey (NFHS-2), 1998-99, October 2000.
India, Health Information of India 1997 and 1998, July 2000.
Adapted from WHO Geneva, The World Health Report 2001: Mental Health, New
Understanding, New Hope, October 2001.
7. India, Press briefing by the Minister of Health, 23 October 2001.
8. India, Sample Registration System, SRS Bulletin, April 2000.

Male
Female
Infant mortality rate (per 1000 live births)
Under-five mortality rate(per 1000 live births)
Maternal mortality ratio (per 100, 000 live births)

Health Status
Life expectancy at birth (years):

Infants reaching their first birthday that have been fully


immunized against tuberculosis (%)
Women that have been immunized with tetanus toxoid
(TT) during pregnancy (%)

Contd...

266 Medical Statistics and Demography Made Easy

A Report on Census 2001 267

Rural-Urban Distribution of Population-India


and States/Union Territories: 2001
Sl. India/State
no. /Union

T/R/U

territory*

2
INDIA

Population

Percent
urban

Persons

Males

Females

population

T 1, 027, 015, 247 531, 277, 078 495, 738, 169 27.78
R
741, 660, 293 381, 141, 184 360, 519, 109
U

285, 354, 954 150, 135, 894 135, 219, 060

State/Union Territory*
U
1 Jammu

285, 354, 954 150, 135, 894 135, 219, 060

10, 069, 917

5, 300, 574

4, 769, 343 24.88

R
U

7, 564, 608
2, 505, 309

3, 925, 846
1, 374, 728

3, 638, 762
1, 130, 581

2 Himachal T

6, 077, 248

3, 085, 256

2, 991, 992
2, 728, 116

and
Kashmir
Pradesh
3 Punjab

5, 482, 367

2, 754, 251

594, 881

331, 005

24, 289, 296

12, 963, 362

16, 043, 730

8, 500, 647

U
T
R
U
5 Uttaranchal T
R
U
6 Haryana T
R
U
7 Delhi*
T

Chandigarh*

8, 245,
900,
92,
808,
8, 479,
6, 309,
2, 170,
21, 082,
14, 968,
6, 114,
13, 782,

566
914
118
796
562
317
245
989
850
139
976

4, 462,
508,
56,
451,
4, 316,
3, 143,
1, 173,
11, 327,
8, 017,
3, 310,
7, 570,

715
224
837
387
401
380
021
658
622
036
890

9.79

263, 876
11, 325, 934 33.95
7, 543, 083
3, 782,
392,
35,
357,
4, 163,
3, 165,
997,
9, 755,
6, 951,
2, 804,
6, 212,

851
690
281
409
161
937
224
331
228
103
086

89.78

25.59

29.00

93.01

Contd...

268 Medical Statistics and Demography Made Easy


Contd...

10

11

12

13

14

15

16

17

18

R
U
Rajasthan T
R
U
Uttar
T
Pradesh R
U
Bihar
T
R
U
Sikkim
T
R
U
Arunachal T
Pradesh R
U
Nagaland T
R
U
Manipur T
R
U
Mizoram T
R
U
Tripura
T
R
U
Meghalaya T
R
U
Assam
T
R
U

12,
56,
43,
13,
166,
131,
34,
82,
74,
8,

1,

1,
1,
2,
1,

3,
2,
2,
1,
26,
23,
3,

963,
819,
473,
267,
205,
052,
540,
512,
878,
199,
679,
540,
480,
60,
091,
868,
222,
988,
635,
352,
388,
818,
570,
891,
450,
441,
191,
648,
543,
306,
853,
452,
638,
248,
389,

215
761
122
678
444
859
230
629
796
596
200
493
488
005
117
429
688
636
815
821
634
224
410
058
018
040
168
074
094
069
457
612
407
994
413

7,
29,
22,
6,
87,
69,
18,
43,
38,
4,

1,

1,

1,
1,
1,

13,
11,
1,

533,
037,
381,
394,
987,
466,
096,
369,
153,
510,
643,
288,
255,
32,
573,
453,
120,
041,
846,
195,
207,
923,
283,
459,
233,
226,
636,
359,
276,
167,
939,
228,
787,
983,
804,

219
671
657
479
178
301
765
536
964
686
278
217
386
831
951
560
391
686
651
035
338
428
910
783
718
065
138
288
850
840
803
037
799
157
642

5,
27,
20,
6,
78,
62,
16,
39,
35,
4,

1,

1,
1,
1,

12,
11,
1,

429,
782,
091,
873,
218,
586,
443,
143,
724,
688,
035,
252,
225,
27,
517,
414,
102,
946,
789,
157,
181,
894,
286,
431,
216,
214,
555,
288,
266,
138,
913,
224,
850,
265,
584,

996
090
465
199
266
558
465
093
832
910
922
276
102
174
166
869
297
950
164
786
296
796
500
275
300
975
030
786
244
229
654
575
608
837
771

23.38

20.78

10.47

11.10

20.41

17.74

23.88

49.50

17.02

19.63

12.72

Contd...

A Report on Census 2001 269


Contd...
19 West
Bengal
20

21

22

23

24

25

26

27

28

T
R
U
Jharkhand T
R
U
Orissa
T
R
U
Chhattisgarh
T
R
U
Madhya T
Pradesh R
U
Gujarat
T
R
U
Daman and
Diu*
T
R
U
Dadra and
Nagar
T
Haveli*
R
U
Maharashtra
T
R
U
Andhra
T
Pradesh R
U

80,
57,
22,
26,
20,
5,
36,
31,
5,

221,
734,
486,
909,
922,
986,
706,
210,
496,

171
690
481
428
731
697
920
602
318

41,
29,
11,
13,
10,
3,
18,
15,
2,

487,
606,
881,
861,
660,
200,
612,
711,
900,

694
028
666
277
430
847
340
853
487

38,
28,
10,
13,
10,
2,
18,
15,
2,

733,
128,
604,
048,
262,
785,
094,
498,
595,

477 28.03
662
815
151 22.25
301
850
580 14.97
749
831

20,
16,
4,
60,
44,
16,
50,
31,
18,

795,
620,
175,
385,
282,
102,
596,
697,
899,

956
627
329
118
528
590
992
615
377

10,
8,
2,
31,
22,
8,
26,
16,
10,

452,
290,
161,
456,
975,
481,
344,
289,
054,

426
983
443
873
256
617
053
423
630

10,
8,
2,
28,
21,
7,
24,
15,
8,

343,
329,
013,
928,
307,
620,
252,
408,
844,

530 20.08
644
886
245 26.67
272
973
939 37.35
192
747

96,
55,
41,
75,
55,
20,

158, 059
100, 740
57, 319

92, 478
63, 576
28, 902

65, 581 36.26


37, 164
28, 417

220, 451
169, 995
50, 456

121, 731
91, 887
29, 844

98, 720 22.89


78, 108
20, 612

752,
732,
019,
727,
223,
503,

247
513
734
541
944
597

50,
28,
21,
38,
27,
10,

334,
443,
891,
286,
852,
434,

270
238
032
811
179
632

46,
27,
19,
37,
27,
10,

417,
289,
128,
440,
371,
068,

977 42.40
275
702
730 27.08
765
965

Contd...

270 Medical Statistics and Demography Made Easy


Contd...
29 Karnataka T
R
U
30 Goa
T
R
U
31 Lakshadweep*
T
R
U
32 Kerala
T
R
U
33 Tamil Nadu
T
R
U
34 Pondicherry*
T
R
U
35 Andaman T
and
R
Nicobar U
Islands*

52,
34,
17,
1,

733,
814,
919,
343,
675,
668,

958
100
858
998
129
869

26, 856, 343


17, 618, 593
9, 237, 750
685, 617
339, 626
345, 991

25, 877, 615 33.98


17, 195, 507
8, 682, 108
658, 381 49.77
335, 503
322, 878

60,
33,
26,
31, 838,
23, 571,
8, 267,

595
647
948
619
484
135

31,
17,
13,
15, 468,
11, 450,
4, 017,

29,
16,
13,
16, 369,
12, 120,
4, 249,

118
196
922
664
785
879

62, 110, 839


34, 869, 286
27, 241, 553

31, 268, 654


17, 508, 985
13, 759, 669

973,
325,
648,
356,
239,
116,

486,
163,
323,
192,
128,
64,

829
596
233
265
858
407

705
586
119
985
837
148

477 44.47
451
026
955 25.97
699
256

30, 842, 185 43.86


17, 360, 301
13, 481, 884
487,
162,
325,
163,
111,
52,

124 66.57
010
114
280 32.67
021
259

Notes:
1. The total, rural and urban population of India includes
the estimated total, rural and urban population of entire
Kachchh district, Morvi, Maliya-Miyana andWankaner
taluks of Rajkot district, Jodiyataluka of Jamnagar district
of Gujarat state and estimated total and rural population
of entire Kinnaur district of Himachal Pradesh where
population enumeration of Census of India, 2001 could
not be conducted due to natural calamities.

A Report on Census 2001 271

2. The figures of total, rural and urban population of


Himachal Pradesh state have been arrived at after
including the estimated total and rural population of
entire Kinnaur district where population enumeration
of Census of India, 2001 could not be conducted due to
natural calamity.
3. The figures of total, rural and urban population of
Gujarat state have been arrived at after including the
estimated total, rural and urban population of entire
Kachchh district, Morvi, Maliya-Miyana and Wankaner
taluks of Rajkot district, Jodiya taluka of Jamnagar
district where population enumeration of the census of
India, 2001 could not be conducted due to natural
calamity.
Literacy Rate in India
State

Literacy Rate (2001 Census) Literacy


Change
(In %)
rate
In
(1991 Census) Literacy
rate
Persons

Males

Females

1991-2001

INDIA 1
1 Andaman and
Nicobars Is*
2 Andhra Pradesh
3 Arunachal Pradesh
4 Assam
5 Bihar
6 Chandigarh*

65.38

75.96

54.28

51.63

13.75

81.18
61.11
54.74
64.28
47.53
81.76

86.07
70.85
64.07
71.93
60.32
85.65

75.29
51.17
44.24
56.03
33.57
76.65

73.02
44.09
41.59
52.89
37.49
77.81

8.17
17.02
13.15
11.52
10.04
3.94

7 Chhattisgarh

65.18

77.86

52.40

42.91

22.27

60.03

73.32

42.99

40.71

19.33

8 Dadra and Nagar


Haveli*

Contd...

272 Medical Statistics and Demography Made Easy


Contd...
9 Daman and Diu*

81.09

88.40

70.37

71.20

9.89

1 0 Delhi*

81.82

87.37

75.00

75.29

6.53

1 1 Goa

82.32

88.88

75.51

75.51

6.81

1 2 Gujarat

69.97

80.50

58.60

61.29

8.68

1 3 Haryana

68.59

79.25

56.31

55.85

12.74

1 4 Himachal Pradesh

13.27

77.13

86.02

68.08

63.86

1 5 Jammu and Kashmir 54.46

65.75

41.82

NA

1 6 Jharkhand

54.13

67.94

39.38

41.39

12.74

1 7 Karnataka

67.04

76.29

57.45

56.04

11.00

1 8 Kerala

90.92

94.20

87.86

89.81

1.11

1 9 Lakshadweep*

87.52

93.15

81.56

81.78

5.74

2 0 Madhya Pradesh

64.11

76.80

50.28

44.67

19.41

2 1 Maharashtra

77.27

86.27

67.51

64.87

12.39

2 2 Manipur

68.87

77.87

59.70

59.89

8.97

2 3 Meghalaya

63.31

66.14

60.41

49.10

14.21
6.22

NA

2 4 Mizoram

88.49

90.69

86.13

82.27

2 5 Nagaland

67.11

71.77

61.92

61.65

5.45

2 6 Orissa

63.61

75.95

50.97

49.09

14.52

2 7 Pondicherry*

81.49

88.89

74.13

74.74

6.74

2 8 Punjab

69.95

75.63

63.55

58.51

11.45

2 9 Rajasthan

61.03

76.46

44.34

38.55

22.48

3 0 Sikkim

69.68

76.73

61.46

56.94

12.61

3 1 Tamil Nadu

73.47

82.33

64.55

62.66

10.81

3 2 Tripura

73.66

81.47

65.41

60.44

13.22

3 3 Uttar Pradesh

57.36

70.23

42.98

40.71

16.65

3 4 Uttaranchal

72.28

84.01

60.26

57.75

14.53

3 5 West Bengal

69.22

77.58

60.22

57.70

11.52

Notes:
1. The population of India includes the estimated
population of entire Kachchh district, Morvi, Maliya-

A Report on Census 2001 273

Miyana and Wankaner talukas of Rajkot district, Jodiya


taluka of Jamnagar district of Gujarat State and entire
Kinnaur district of Himachal Pradesh where population
enumeration of Census of India 2001 could not be
conducted due to natural calamity.
2. Figures shown against Population in the age-group 0-6
and Literates do not include the figures of entire Kachchh
district, Morvi, Maliya-Miyana and Wankaner talukas
of Rajkot district, Jodiya taluka of Jamnagar district and
entire Kinnaur district of Himachal Pradesh where
population enumeration of Census of India 2001 could
not be conducted due to natural calamity.
3. Figures shown against Himachal Pradesh have been
arrived at after including the estimated figures of entire
Kinnaur district of Himachal Pradesh where the
population enumeration of Census of India 2001 could
not be conducted due to natural calamity.
(Source: Provisional Population Totals: India. Census of
India 2001, Paper 1 of 2001)
State Wise Distribution of Households
in Rural/Urban Area (Census 1991)
State/
Union
Territory

Total/
Rural/
Urban

I N D I A*

Total
Rural
Urban

Population
Total

Males

Females

846,302,688
628,691,676
217,611,012

439,230,458
324,321,614
114,908,844

Number
of
Households
6

407,072,230 152,009,467
304,370,062 111,591,326
102,702,168 40,418,141

Contd...

274 Medical Statistics and Demography Made Easy


Contd...
States
1. Andhra
Pradesh

Total
Rural
Urban
2. Arunachal Total
Pradesh
Rural
Urban
3. Assam
Total
Rural
Urban
4. Bihar
Total
Rural
Urban
5. Goa
Total
Rural
Urban
6. Gujarat
Total
Rural
Urban
7. Haryana
Total
Rural
Urban
8. Himachal Total
Pradesh
Rural
Urban
9. Jammu
Total
and
Rural
Kashmir @ Urban
10. Karnataka Total
Rural
Urban
11. Kerala
Total
Rural
Urban

66,508,008
48,620,882
17,887,126
864,558
753,930
110,628
22,414,322
19,926,527
2,487,795
86,374,465
75,021,453
11,353,012
1,169,793
690,041
479,752
41,309,582
27,063,521
14,246,061
16,463,648
12,408,904
4,054,744
5,170,877
4,721,681
449,196
7,718,700
5,879,300
1,839,400
44,977,201
31,069,413
13,907,788
29,098,518
21,418,224
7,680,294

33,724,581
24,591,875
9,132,706
465,004
400,966
64,038
11,657,989
10,304,161
1,353,828
45,202,091
39,045,095
6,156,996
594,790
346,169
248,621
21,355,209
13,884,299
7,470,910
8,827,474
6,657,334
2,170,140
2,617,467
2,372,193
245,274
4,014,100
3,042,209
971,891
22,951,917
15,744,942
7,206,975
14,288,995
10,512,788
3,776,207

32,783,427
24,029,007
8,754,420
399,554
352,964
46,590
10,756,333
9,622,366
1,133,967
41,172,374
35,976,358
5,196,016
575,003
343,872
231,131
19,954,373
13,179,222
6,775,151
7,636,174
5,751,570
1,884,604
2,553,410
2,349,488
203,922
3,704,600
2,837,091
867,509
22,025,284
15,324,471
6,700,813
14,809,523
10,905,436
3,904,087

13,937,455
10,326,962
3,610,493
175,448
150,131
25,317
3,844,370
3,364,151
480,219
14,012,071
12,175,277
1,836,794
234,597
135,816
98,781
7,492,603
4,804,255
2,688,348
2,614,725
1,882,390
732,335
969,018
861,445
107,573
N.A.
N.A.
N.A.
8,143,879
5,552,438
2,591,441
5,513,200
4,102,167
1,411,033

Contd...

A Report on Census 2001 275


Contd...
12. Madhya
Pradesh

Total
Rural
Urban
13. Maharashtra
Total
Rural
Urban
14. Manipur
Total
Rural
Urban
15. Meghalaya
Total
Rural
Urban
16. Mizoram
Total
Rural
Urban
17. Nagaland
Total
Rural
Urban
18. Orissa
Total
Rural
Urban
19. Punjab
Total
Rural
Urban
20. Rajasthan
Total
Rural
Urban
21. Sikkim
Total
Rural
Urban

66,181,170
50,842,333
15,338,837

34,267,293
26,164,353
8,102,940

31,913,877
24,677,980
7,235,897

11,714,945
8,945,374
2,769,571

78,937,187
48,395,601
30,541,586
1,837,149
1,331,504
505,645

40,825,618
24,536,280
16,289,338
938,359
682,395
255,964

38,111,569
23,859,321
14,252,248
898,790
649,109
249,681

15,344,435
9,259,441
6,084,994
296,689
215,790
80,899

1,774,778
1,444,731
330,047

907,687
734,865
172,822

867,091
709,866
157,225

327,371
265,668
61,703

689,756
371,810
317,946

358,978
194,414
164,564

330,778
177,396
153,382

120,994
63,699
57,295

1,209,546
1,001,323
208,223
31,659,736
27,424,753
4,234,983
20,281,969
14,288,744
5,993,225

641,282
522,235
119,047
16,064,146
13,794,955
2,269,191
10,778,034
7,569,423
3,208,611

568,264
479,088
89,176
15,595,590
13,629,798
1,965,792
9,503,935
6,719,321
2,784,614

216,982
174,695
42,287
5,999,447
5,168,221
831,226
3,424,666
2,355,096
1,069,570

44,005,990
33,938,877
10,067,113
406,457
369,451
37,006

23,042,780
17,686,463
5,356,317
216,427
195,277
21,150

20,963,210
16,252,414
4,710,796
190,030
174,174
15,856

7,289,839
5,573,981
1,715,858
76,329
69,213
7,116

Contd...

276 Medical Statistics and Demography Made Easy


Contd...
22. Tamil
Nadu
23. Tripura

24. Uttar
Pradesh
25. West
Bengal

Total
Rural
Urban
Total
Rural
Urban
Total
Rural
Urban
Total
Rural
Urban

Union Territories
1. Andaman
and
Total
Nicobar
Rural
Islands
Urban
2. Chandigarh
Total
Rural
Urban
3. Dadra and Total
Nagar
Rural
Haveli
Urban
4. Daman
Total
and Diu
Rural
Urban
5. Delhi
Total
Rural
Urban
6. Lakshadweep
Total
Rural
Urban

55,858,946
36,781,354
19,077,592
2,757,205
2,335,484
421,721
139,112,287
111,506,372
27,605,915
68,077,965
49,370,364
18,707,601

28,298,975
18,567,717
9,731,258
1,417,930
1,202,529
215,401
74,036,957
59,197,138
14,839,819
35,510,633
25,442,210
10,068,423

27,559,971
18,213,637
9,346,334
1,339,275
1,132,955
206,320
65,075,330
52,309,234
12,766,096
32,567,332
23,928,154
8,639,178

12,542,672
8,433,757
4,108,915
526,659
440,789
85,870
22,377,820
18,024,435
4,353,385
12,514,414
8,909,515
3,604,899

280,661
205,706
74,955

154,369
111,986
42,383

126,292
93,720
32,572

59,113
42,674
16,439

642,015
66,186
575,829
138,477
126,752
11,725
101,586
54,043
47,543
9,420,644
949,019
8,471,625

358,614
40,548
318,066
70,953
64,499
6,454
51,595
28,111
23,484
5,155,512
525,056
4,630,456

283,401
25,638
257,763
67,524
62,253
5,271
49,991
25,932
24,059
4,265,132
423,963
3,841,169

146,521
18,215
128,306
26,237
23,766
2,471
19,179
9,828
9,351
1,877,046
177,428
1,699,618

51,707
22,593
29,114

26,618
11,530
15,088

25,089
11,063
14,026

8,295
3,742
4,553

Contd...

A Report on Census 2001 277


Contd...
7. Pondicherry
Total
Rural
Urban

807,785
290,800
516,985

408,081
147,599
260,482

399,704
143,201
256,503

162,448
60,967
101,481

The population figures for India include the projected


population figures for Jammu and Kashmir. The number
of households do not include the figures for Jammu and
Kashmir where the 1991 census could not be conducted
due to disturbed conditions.
The population figures are projected figures as the 1991
census could not be conducted in the state.
Crude Birth Rate
The Crude Birth Rate ( CBR) is defined as the number of live
births in a year per 1,000 of the midyear population.

Figure 13.1: Trend of crude birth rate in


India (1981-97) (SourceSRS data)

278 Medical Statistics and Demography Made Easy

Crude Death Rate


The Crude Death Rate (CDR) is defined as the number of
deaths in a year per 1,000 of the mid-year population.

Figure 13.2: Trend of crude death rate in


India (1981-97) (SourceSRS data)

Infant Mortality Rate


Infant Mortality Rate (or IMR) is defined as the number of
infant deaths in a year per 1,000 live births during the year.

Figure 13.3: Trend of infant mortality rate in


India (1981-97) (SourceSRS data)

A Report on Census 2001 279

Comparative Statistics of Different Indicators


Basic Indicators
Under-5 mortality rank
Under-5 mortality rate

53
1960
242
2002
93
Infant mortality rate (under 1)
1960
146
2002
67
Total population (thousands)
2002
1049549
Annual no. of births (thousands)
2002
25221
Annual no. of under 5 deaths (thousands) 2002
2346
GNI per capita (US $)
2002
480
Life expectancy at birth (years)
2002
64
Total adult literacy rate
2000
57
Net primary school
enrolment/attendance (%)
1996-2002*
76
% share of household income
lowest 40%
20
1990-2000*
highest 20%
46

Nutrition
Under 5 mortality rank
% of infants with low
birth weight
% of children
(1995-2002*)
who are:
% of under-fives
(1995-2002*)
suffering from:

53
1998-2002*
30
exclusively breastfed
37k
(<6 months)
breastfed with complementary
food (6-9 months)
44
still breastfeeding (20-23 months) 66
underweight
47
moderate and severe
underweight
severe
wasting
moderate and severe
stunting
moderate and severe

18
16
46

280 Medical Statistics and Demography Made Easy


Vitamin A
supplementation
coverage rate
(6-59 months)
% of households
consuming iodized salt

2001

25

1997-2002*

50

Health
Under 5 mortality rank
% of population using improved total
drinking water sources 2000
urban
rural
% of population using adequate total
sanitation facilities 2000
urban
rural
% of routine EPI vaccines
total
financed by government 2002
% immunized 2002
1-year-old children TB
1-year-old children DPT 3
1-year-old children polio 3
1-year-old children measles
1-year-old children HepB 3
pregnant women tetanus
(%) under fives with ARI
1998-2002*
(%) under fives with ARI taken
1998-2002*
to health provider
Oral rehydration rate (%)
1994-2002*
Malaria: 1999-2001
% under fives sleeping
under a bednet
% under fives sleeping
under a treated bednet
% under fives with fever
receiving anti-malarial drugs

53
84
95
79
28
61
15
98
81
70
70
67
78.3
19
64
-

A Report on Census 2001 281

HIV/AIDS
Adult prevalence rate (15-49
years), end-2001
Estimated number of people
living with HIV/AIDS,
end-2001
Median HIV prevalence among
pregnant women (15-24 years)
in countries with adult
prevalence over 1%
HIV prevention 1996-2002*
(15-24 years)

% who used condom at last


high-risk sex 1996-2002*
Orphans

0.8
adults and children 3970000
(0-49 years)
children (0-14 years) 170000
Year
all regions [# sites]
capital city [# sites]
other urban [# sites]
rural [# sites]
% who know condom can
prevent HIV male
63y
female
62
% who know healthylooking person can
have HIV
male
female
% who have comprehensive
knowledge of HIV female Male (15-24 years)
51y
Female (15-24 years)
40
Children orphaned by
AIDS (0-14 years)
2001 Orphan school attendance
ratio
(1995-2001*) -

Demographic Indicators
Under 5 mortality rank
Population (thousands) 2002
Population annual growth
rate (%)
Crude death rate

under 18
under 5
1970-90
1990-2002
1970
2002

53
413623
119524
2.1
1.8
17
9

282 Medical Statistics and Demography Made Easy


Crude birth rate
Life expectancy
Total fertility rate
% of population urbanized
Average annual growth rate of
urban population (%)

1970
2002
1970
2002
2002
2002

40
24
49
64
3.1
28

1970-90

3.4

Women
Under 5 mortality rank
53
Life expectancy: Females as
a % of males
2002
102
Adult literacy rate: Females as
a % of males
2000
66
Gross enrolment ratios: Females
as a % of males
primary school 1997-2000*
83
secondary school 1997-2000* 70
Contraceptive prevalence (%)
1995-2002*
47
Antenatal care coverage (%)
1995-2002*
60
Skilled attendant at delivery (%) 1995-2002*
43
Maternal mortality ratio+
reported 1985-2002*
540
adjusted 2000
540

The Rate of Progress


Under 5 mortality rank
Under 5 mortality rate

Average annual rate of


reduction (%)

1960
1990
2002
1960-90
1990-2002

53
242
123
93
2.3
2.3

A Report on Census 2001 283

Reduction since 1990 (%)


GDP per capita average annual 1960-90
growth rate (%)
2002
Total fertility rate
1960
1990
2002
Average annual rate of
1960-90
reduction (%)
2002
Summary of Census 2001
Source: Census of India, 2001
28 States
7 Union Territories
600 Districts
6.4 Lakh villages
Population: 1027million in 2001
Population growth (1991-2001): 21.34
Annual population growth (percent): 1.6
Population density (per sq.km): 324 as of 2001
Sex ratio (females per 1,000 males): 933
Literacy (Total) : 65.38 Males 75.85; Females 54.16
Increase in literacy: 13.75 in 1991 - 2001
Source: Estimation (UNPOP)
Crude Birth Rate (per 1000 population): 25 1999
Crude Death Rate: 9 1999
Per capita GNP (US $): 440 in 1999
Source: Human Development Report - 2003
Total Fertility Rate: 3.0 ( 2000 - 2005)
Infant Mortality (per 1000): 67 in 2001
Maternal Mortality Rate: 540 in 2001
Human Development Index Ranking : 127 in 2003
People below poverty line (%) : 28.6 in 2000
Urban Population (%) : 27.9 in 2001
Life expectancy : 63.9 in 2000

24
1.7
4
5.9
4
3.1
1.3
2.1

284 Medical Statistics and Demography Made Easy


Population with access to proper sanitation (%) : 28 in 2000
Population with access to improved water sources (%) : 84 in

2000

Health Expenditure-Public (% of GDP) : 0.9 in 2000


Health Expenditure - Private (% of GDP) : 4.0 in 2000
Physicians per 100,000 population : 48 in 1990 - 2002

MULTIPLE CHOICE QUESTIONS


1. Life expectancy at birth for males per 2001 census is:
(a) 60.1 years
(b) 62.39 years
(c) 63.75
(d) 58.1 years
2. Sex ratio of India according to 2001 census is:
(a) 929
(b) 938
(c) 913
(d) 933
3. The National Population Policy 2001 aims to achieve
Net Reproduction Rate of 1 by the year:
(a) 2005
(b) 2010
(c) 2015
(d) 2050
(AIIMS, 2002)
4. Census in India is done:
(a) Every year
(b) Ever 5 year
(c) Every 10 years
(d) As and when noted
(JIPMER 81, UPSC 86, Orissa 2000)
5. The annual growth rate of India is:
(a) 0.8
(b) 1.26
(c) 1.74
(c) 2.36
6. Population count is taken on:
(a) 1st January
(b) 1st March
(c) 1st July
(d) 1st April

(PGI, 89)

A Report on Census 2001 285

7. Death rate reported in India according to 2001 census


is:
(a) 12.5
(b) 7.9
(c) 10.6
(d) 8.7
8. The population density according to 2001 census is .../
sq km.
(a) 252
(b) 294
(c) 324
(d) 398
9. According to 1991 Census the family size is:
(a) 2.4
(b) 3.9
(c) 4.4
(d) 5.6

(AI, 98)

10. Net reproductive rate of 1 implies a couple protection


rate of:
(a) 50
(b) 60
(c) 70
(d) 80
(JIPMER, 2000)
11. Population growth is said to be explosive when growth
is more than:
(a) 1.5
(b) 2
(c) 2.5
(d) None of the above
(MAHE, 2001)
12. Infant mortality rate in India is:
(a) 63
(b) 65
(c) 67
(d) 69
13. Maternal mortality rate in India is:
(a) 535
(b) 540
(c) 545
(d) 550
14. Population growth in India from 1991 to 2001 is:
(a) 19.76
(b) 21.34
(c) 24
(c) 22.50

Chapter 14

National
Population Policy

288 Medical Statistics and Demography Made Easy

INDIA, the second most populous country of the world, with


more than a billion persons by 11 May 2000 (according to
preliminary results of the 2001 Census, India counted 1.027
billion people on 1 March 2001), was the first to initiate a
government policy of promoting a family planning programme
in 1952. The programme, which began somewhat hesitantly,
has been a major effort at social engineering in a democratic
polity with an unusually high level of heterogeneity.
At the beginning of the 21st century it is time to review
the past record on population, the recently announced
national population policy 2000 (NPP 2000), hope to draw
lessons for both state and public action to help achieve social
goals on this important subject.
1. The Indian concern with population growth and the
need to influence the behaviour of the people with respect
to fertility was entirely indigenous.
2. The Indian family planning programme has certainly
not been a dismal failure. The contrary statements are
based on the use of a wrong yardstick, focused solely on
the lack of change in the rate of population growth during
successive decades. If the change in the average number
of children born to women in reproductive ages is used
to assess the performance of the family planning
programme, India has done reasonably well according
to the data provided by the Sample Registration System
(SRS).
3. Despite the promising performance so far, it seems
unlikely that India will attain the goal of a replacement
level of fertility of 2.1 by 2010, as was projected in the
NPP 2000.

National Population Policy 289

CONCEPT OF A POPULATION POLICY


The size of the population, its characteristics, spatial and
rural-urban distribution, rate of growth and its determinants
decide the quantum, pattern and distribution of consumption
and production. It is, therefore, only natural for the state or
the government to be concerned about population.
Unlike in the case of several other developing countries,
the Indian concern about the relatively high level of fertility
or the number of children born to Indian women rather than
the rate of population growth, reflected a genuine desire to
improve the living standards of the people.
During the 1920s and l930s, some pioneers had set up
family planning clinics in Poona and Bangalore. In the 1940s,
the Bhore Committee on Health Survey and Development
(1946) and the subcommittee on population set up by the
National Planning Committee (1940) favoured the
involvement of the government in the promotion of family
planning. Not surprisingly, therefore, the memorandum
submitted by the Family Planning Association of India, set
up in 1949 under the presidentship of Lady Dhanvanti Rama
Rau, elicited a favourable response from the Planning
Commission.
Simultaneously, there was considerable effort to initiate
varied programmes to lower the level of morbidity and
malnutrition and to raise life expectancy at birth from the
then low value of around 32 years. Thus, the early concept of
population policy covered both mortality and fertility and did
not exclusively focus on fertility. There was also a recognition
of the need to improve the quality of life of the people by
lowering the burden of disease or morbidity, promoting
universal primary education and eradicating illiteracy,
exploitation and poverty.

290 Medical Statistics and Demography Made Easy

A Working Group on Population Policy, set up by the


Planning Commission, recommended in 1980 an unrealistic
goal before the country: a net reproduction rate (NRR) of 1.0
by 1996 as a national average and by 2001 in all the states.
On the basis of several assumptions about the method-mix of
contraception, the presumed efficacy of different methods, a
rise in the age at marriage, and a lowering of the infant
mortality rate (IMR) to 60 per 1000 live births, a simulation
exercise was done. An effective couple protection rate (ECPR)
of 60% was expected to lead to a birth rate of 21 and a death
rate of 9. These estimates were incorporated in the draft sixth
five year plan as well as the National Health Policy of 1983
and became accepted national goals.
The limitations of the approach adopted by the Working
Group are evident in the fact that even in the state of Punjab,
where the reported ECPR exceeded 68%, the birth rate in 1996
was 23.7 and the total fertility rate (TFR) in 1997 was 2.7,
much above the replacement level of fertility. Of course, Kerala
as well as Tamil Nadu reported a below- or near-replacement
level of fertility, with a TFR of 1.8 and 2.0, respectively, in
1997. In the middle of 2000, we have a death rate of 9.0 but the
IMR has not dropped to 60 and the birth rate of 21 or NRR of
1.0 in the country as a whole is still far away. In fact, in 1992,
the eighth plan had recognised that the NRR of 1.0 was likely
to be attained only during 2011-16; and the ninth plan
accepted the very real possibility that the replacement level of
fertility may be reached only by 2021 in the country as a whole
and in some states much later.
The drafts underwent several revisions until finally the
NPP 2000 was announced by the government in February
after its approval by the Cabinet.
To place the various population policy statements in
perspective, it is essential to review the current population
scene.

National Population Policy 291

India had a population of 846 million at the time of the


Census conducted in February 1991, with a reference date of
1 March. (If account is taken of the net undercount of
population, including the greater under-count of young
children, the actual population at the time of the census was
864 million, almost 16 million more than the widely used
estimate.) The implicit annual growth rate during the 1980s
was 2.1%, only slightly lower than the 2.2% observed during
the two decades of 1961-81. This has been misinterpreted as
indicating a stagnation in the underlying demographic
processes.
The census results have been interpreted as indicating a
dismal failure of the family planning programme. Nothing
could be further from the truth. The rate of inter-censal growth
reflects trends in both mortality and fertility or death rates as
well as birth rates. As is evident from the summary data for
1901-91, shown in Table 14.1,
Recent evidence on decline in fertility: The recent data
provided by the SRS suggest a clear decline in fertility
throughout the country, including in the large North Indian
states (Bihar, Madhya Pradesh, Uttar Pradesh and Rajasthan),
where since 1971 TFR has declined by 27-28% (Table 14.2).
Mortality trends: The infant mortality rate (IMR) of around
200-225 per 1000 live births at the time of Indias
independence in 1947 has declined to about 72 during 199698. Admittedly, even this figure far exceeds the IMR in China,
which has now declined to around 30. Within India, only
Kerala, with about 91% of births in 1991 occurring in
institutions and another 6% attended by trained birth
attendants, has achieved an even lower IMR of 17. Elsewhere,
the IMR ranges between low 50s in Punjab, Tamil Nadu and
Maharashtra, and high values between 85 and 98 in Uttar
Pradesh, Madhya Pradesh and Orissa. Obviously, there is

292 Medical Statistics and Demography Made Easy

substantial scope and need for a further decline in the present


high IMR (Table 14.3).
Table 14.1: Key Population Statistics of India, 1901-2001
Census year

Total
population
(million)

1901
1911
1921
1931
1941
1951
1961
1971
1981
1991
2001

238.3
252.0
251.2
278.9
318.5
361.0
439.1
548.2
683.3
846.6
1027.0

Average
Density
Sex ratio
annual
(persons per (males per
growth rate
sq km)
100 females)
(per cent)

0.3
0.6
N
1.1
1.3
1.3
2.0
2.2b
2.2b
2.1
1.9

77
82
81
90
103
117
141
178
221
267
324

1029
1038
1047
1953
1058
1057
1063
1075
1071
1076
1072

Per cent
of urban
population

10.8
10.3
11.2
12.0
13.9
17.3
18.0
19.9
23.3a
25.7
27.8

Notes: a Includes only an estimate for Assam; b Growth


rates for 1961-71 and 1971-81 take account of the fact that the
reference data of the 1971 Census was 1 April, whereas that of the
1981 Census (like the 1951 and 1961 Censuses) was 1 March; N
Negligible.
Sources: Census of India, 1961, Vol. 1, India, Parts II-A(i)
General Population Tables, 1961 and II-C(i), Social and Cultural
Tables, 1964; Census of India, 1971, Series I, India, Parts II-A(i),
General Population Tables, 1975, and II-C(ii), Social and Cultural
Tables, 1977; Census of India, 1981, Series 1, Paper 1 of 1982, Final
Population Tables; Part II Special Report and Tables based on 5%
Sample Data, 1984, Part II-B(i), Primary Census Abstract: General
Pupulation 1983; Census of India, 1991, Series I, India, Paper 2 of
1992, Final Population Totals, Brief Analysis of Primary Census
Abstract; Census of India, 2001, Series I, India, Paper 1 of 2001,
Provisional Population Totals.

National Population Policy 293


Table 14.2: Vital Rates per 1000 Population, India, 1901-1990

Birth rate

Death rate

Rate of natural
increase

49.2
48.1
46.2
45.2
39.9
40.9
40.0
37.8
33.8
30.8
29.1
27.4
27.0

42.6
47.2
36.3
31.2
27.4
22.8
17.8
15.4
12.3
10.3
9.4
8.9
9.0

6.6
0.9
9.9
14.0
12.5
18.1
22.2
22.4
21.5
20.5
19.4
18.5
18.0

1901-10
1911-20
1921-30
1931-40
1941-50
1951-60
1961-70
1971-80
1980-82
1988-90
1991-93*
1994-96*
1996-98*

* Excluding Jammu and Kashmir.


Sources: Davis (1951); India, Registrar General (1954); Office
of the Registrar General (1998b).
Table 14.3: Mortality Indicators for All India, 1971-1998
Year

1971-75
1976-80
1981-85
1986-90
1991-95
1996-98

Crude death rate

Infant mortality rate

Life expectancy at birth

All

Rural

Urban

All

Rural

Urban

All

15.5
13.8
11.0
10.6
9.9
9.0

17.1
15.0
11.9
11.6
10.4
9.7

9.8
8.9
7.5
7.3
6.6
6.5

134
124
90
91
76
72

144
134
98
99
83
77

83
74
56
59
50
45

49.7
52.3
55.5
57.7
60.0
NA

Males Females

50.5
52.5
55.4
57.7
59.4
NA

49.0
52.1
55.7
58.1
60.4
NA

294 Medical Statistics and Demography Made Easy

Note: Estimates for 1998 are provisional. The state of


Jammu and Kashmir is excluded from estimates beginning
1991.
According to the 1991 Census, 65% of Indian villages had
a population of less than 1000 persons and 42 had less than
500 persons each. The size class of population of a village is
an excellent indicator of the size of the rural market, the extent
of diversification of economic activities of the population and
also the level of development.
The NPP 2000 has stressed the need for ending
discrimination against girls during childhood and early
adolescence and against women during the childbearing period
in order to improve their health and nutrition.
The role of targets: A major bane of the Indian family
planning programme since 1966 has been the pressure of
targets on all functionaries. When the government accepted
the goal of reducing the birth rate to 25 in 10 years, the targets
about the expected number of acceptors of different methods
of contraception were worked out on the basis of the desired
level of decline in the birth rate. Every health worker was
assigned the target number of sterilizations, IUD insertions,
and users of condoms (and later also the oral pills) to be
recruited each year. The achievement of these targets at the
state, district and sub-district level was monitored by the
supervisors at successive levels to judge the extent of success
of the grassroots workers in performing the task assigned to
them. While many policy-makers considered such targets as
essential and feared serious consequences of abandoning
them, their assessment of the dependability of the statistical
system of reporting of performance was totally unrealistic.
Therefore, in 1994, the Expert Group recommended the
abandonment of the system of method-specific targets.

National Population Policy 295

NPP 2000 has affirmed the commitment of government


towards voluntary and informed choice and consent of citizens
while availing of reproductive care services, and continuation
of the target free approach in administering family planning.
Goals of National Population Policy: The NPP 2000 has
distinguished between immediate, medium-term and longterm policy objectives. The immediate objective is to address
(a) the unmet needs for contraception, (b) health care
infrastructure and health personnel, and (c) to provide
integrated service delivery for basic reproductive and child
health care. The past history suggests, however, that the
questions of health infrastructure and health personnel can
hardly be addressed in the short run. The resource
requirements are so large that the funds will hardly be
mobilised immediately or even over a period of five years.
The integration of the service delivery is also a longstanding
goal that has been difficult to achieve.
The specification of the fertility reduction goal in terms of
the replacement level of fertility or a total fertility rate of 2.1,
to be achieved by 2010, is better than the earlier goal of a Net
Reproduction Rate (NRR) of 1.0, noted in the National Health
Policy of 1983 and the sixth five year plan. The NRR is a
function of both fertility and mortality. A rise in mortality,
which clearly cannot be a policy objective, can also lead to an
NRR of l.0.
As suggested by the Expert Group in 1994, the target year
for the achievement of a replacement level of fertility has been
advanced by five years relative to the eighth five year plan.
However, what seemed feasible in 1994, is probably not so in
2000. At that time, the SRS data for 1992 indicated an
acceleration of the decline in fertility. Subsequently, as the
SRS sample units were replaced during 1993 and 1994, the
TFR during 1993-95 remained steady at 3.5; and only during
1996-97, the decline appears to have been resumed.

296 Medical Statistics and Demography Made Easy

Admittedly, given the momentum for continued growth


built into the age structure of Indias existing large population,
an earlier attainment of the replacement level of fertility will
imply a lower ultimate population. The ninth plan had
conceded that the attainment of replacement level of fertility
may slip beyond 2011-16 to 2021. The diffusion of fertility
decline may well be faster than has been the case so far,
particularly as the continuing decline in the size of land
holdings brings home to the people the gap between their
aspirations and the reality. The rising levels of literacy and
education, partly associated with the attrition of the survivors
of cohorts born when the schooling facilities were limited,
will also help to accelerate fertility decline. Much will depend,
however, on the implementation of programmes to achieve
other socio-demographic goals, incorporated in the NPP 2000.
Indias population will continue to grow for about 50 to
60 years after the attainment of a replacement level of fertility.
The only way to attain a stable population by 2045 may be to
aim at a below replacement level of fertility after 2010, which
will lead to a declining population some 40 years later.
Therefore, greater attention needs to be paid to the other
socio-economic goals, which include the attainment by 2010
of:
Free and compulsory school education up to age 14 and
lowering of the dropout rates at primary and secondary
level to 20% for both boys and girls (The task is
particularly difficult in rural areas of backward states
and among the scheduled tribes and agricultural or rural
labourers).
Lowering IMR to 30 and maternal mortality rates to
below 100 per 100,000 live births.
Universal immunisation of children against all vaccinepreventable diseases.

National Population Policy 297

Promotion of delayed marriage among girls to after age


18, and preferably after 20 years of age. (The rule of law
and the perceptions about the safety of unmarried
women are the critical issues).
Raising the institutional deliveries to 80% and those by
trained persons to 100%. (The rural infrastructure is the
main bottleneck here).
100% registration of births, deaths, marriages and
pregnancies (While the goal is laudable, its attainment is
not likely to be easy even over a 15 to 20 years period).
Containment of AIDS and treatment of RTIs and STIs.
Prevention and control of communicable diseases.
Incentives: The NPP 2000 refers to five schemes that
involve payments. For individuals, these include:
1. The Balika Samridhi Yojana run by the Department of
Women and Child Development to promote survival and
care of the girl child, with a cash incentive of Rs 500
given at the time of birth of a girl child of birth order 1 or
2.
2. The Maternal Benefit Scheme run by the Department of
Rural Development awards an incentive of Rs 500 for
the birth of the first child after 19 years of age and is
limited to the first and second births only. The cash award
is now to be linked to antenatal check up, institutional
delivery by a trained birth attendant, registration of birth
and BCG immunisation.
3. A Family Welfare-linked Health Insurance Plan is to be
established to offer health insurance (for hospitalisation,
not exceeding Rs 5000) to couples (and their children)
below the poverty line, if the couple undergoes
sterilisation with no more than two living children. The
spouse undergoing sterilisation is also to get a personal
accident insurance cover.

298 Medical Statistics and Demography Made Easy

4. Couples below the poverty line, who marry after the legal
age at marriage, register the marriage, have their first
child after the mother reaches the age of 21, accept the
small family norm, and adopt a terminal method after
the birth of the second child, are to be rewarded.
5. This scheme provides for group incentives that will
reward panchayats and zila (district) parishads for
exemplary performance in universalising the small family
norm, achieving reductions in infant mortality and birth
rates, and promoting literacy with completion of primary
schooling. While it would be a mistake to judge these
schemes from the point of view of small sums of money
they provide for, the real costs of proving ones eligibility
and actually receiving the awards far exceed what is
recognised in our metropolitan centres. The proof of age
and of the fulfilment of prescribed conditions is difficult
to obtain in most areas.
Also, the group incentives can generate misreporting of
the level of fertility as well as mortality and it would be a
mistake to award them until a system of complete registration
of births and deaths, marriages and pregnancies is actually
established. Overall, it is difficult to believe that the incentive
schemes will make any material difference to the promotion
of fertility decline.
Disincentives: The question of disincentives for a large
family has often been discussed. Neither couples with large
families nor localities that have a high birth rate or a high
level of fertility can be penalised, because more often than
not, on grounds of equity, they need greater support to ensure
the welfare of the future citizens of the country. It is argued,
however, that they have a symbolic role in communicating to
the people what is in the social or national interest.

National Population Policy 299

LANDMARKS IN THE EVOLUTION OF INDIAS


POPULATION POLICY
1940 The subcommittee on Population, appointed by the
National Planning Committee set up by the President of the
Indian National Congress (Pandit Jawaharlal Nehru),
considered family planning and a limitation of children
essential for the interests of social economy, family happiness
and national planning. The committee recommended the
establishment of birth control clinics and other necessary
measures such as raising the age at marriage and a eugenic
sterilization programme.
1946 The Health Survey and Development Committee
(Bhore Committee) reported that the control of disease and
famine and improvement of health would cause a serious
problem of population growth. It considered deliberate
limitation of births desirable.
1951 The draft outline of the First Five Year Plan
recognized population policy as essential to planning and
family planning as a step towards improvement in health
of mothers and children.
1952 The final First Five Year Plan document noted the
urgency of the problems of family planning and population
control and advocated a reduction in the birth rate to stabilize
population at a level consistent with the needs of the economy.
1956 The Second Five Year Plan proposed expansion of
family planning clinics in both rural and urban areas and
recommended a more or less autonomous Central Family
Planning Board, with similar state level boards.
1959 The Government of Madras (now Tamil Nadu) began
to pay small cash grants to poor persons undergoing
sterilization as compensation for lost earnings and transport
costs and also to canvassers and tutors in family planning.

300 Medical Statistics and Demography Made Easy

1961 The Third Five Year Plan envisaged the provision


of sterilization facilities in district hospitals, sub-divisional
hospitals and primary health centres as a part of the family
planning programme. Maharashtra state organized
sterilization camps in rural areas.
1963 The Director of Family Planning proposed a shift
from the clinic approach to a community extension approach
to be implemented by auxiliary nurse midwives (one per
10,000 population) located in PHCs. Other proposals
included: (a) a goal of lowering the birth rate from an
estimated 40 to 25 by 1973; and (b) a cafeteria approach to the
provision of contraceptive methods, with an emphasis on
free choice.
1965 The intrauterine device was introduced in the Indian
Family Planning Programme.
1966 A full-fledged Department of Family Planning was
set up in the Ministry of Health. Condoms began to be
distributed through the established channels of leading
distributors of consumer goods.
1972 A liberal law permitting abortions on grounds of
health and humanitarian and eugenic considerations came
into force.
1976 The statement on National Population Policy, made
in the Parliament by the Minister for Health and Family
Planning, assigned top national priority and commitment
to the population problem to bring about a sharp drop in
fertility. The Constitution was amended to freeze the
representation of different states in the lower house of
Parliament according to the size of population in the 1971
Census. The states were permitted to enact legislation
providing for compulsory sterilization.
1977 A revised population policy statement was tabled
in Parliament by a government formed by the former

National Population Policy 301

opposition parties. It emphasized the voluntary nature of the


family planning programme. The term family welfare
replaced Family Planning.
1982 The draft Sixth Five Year Plan adopted a long term
goal of attaining a net reproduction rate of 1.0 on the average
by 1996 and in all states by 2001. It adopted the targets for
crude birth and death rates, infant mortality rate and life
expectancy at birth and the couple protection rate, to be
achieved by 2001. (The numbers were based on the illustrative
exercises of a Working Group on Population Policy set up by
the Planning Commission during 1978).
1983 The National Health Policy incorporated the targets
included in the Sixth Five-Year Plan document. While
adopting the Health Policy, the Parliament emphasized the
need for a separate National Population Policy.
1993 A Committee on Population, set up by the National
Development Council in 1991, in the wake of the census
results, proposed the formulation of a National Population
Policy.
1994 The Expert Group, set up by the Ministry of Health
and Family Welfare in 1993, to draft the National Population
Policy recommended the goal of a replacement level of fertility
(a total fertility rate of 2.1) by 2010. Other proposals of the
expert group included (i) removal of method-specific targets
down to the grassroots level; (ii) an emphasis on improving
the quality of services; (iii) a removal of all incentives in cash
or kind; (iv) a National Commission on Population and Social
Development under the chairmanship of the prime minister.
The draft statement was circulated among the members of
Parliament and various ministries at the centre and among
the states for comments.
1997 The cabinet headed by Prime Minister I K Gujral
approved a draft National Population Policy, to be placed

302 Medical Statistics and Demography Made Easy

before the Parliament. With the dissolution of the lower house


of Parliament, the action was postponed.
1999 Another draft of National Population Policy, placed
before the cabinet, was remitted to a Group of Ministers (GOM)
headed by the Deputy Chairman of the Planning Commission,
to examine the scope for the inclusion of incentives and
disincentives for its implementation. The GOM consulted
various academic experts and womens representatives and
finalised a draft, which was discussed by the cabinet on 19
November 1999, and which was revised further for resubmission.
2000 National Population Policy was adopted by the
cabinet and announced on February 2000.

MULTIPLE CHOICE QUESTIONS


1. The family planning programme started in:
(a) 1947
(b) 1950
(c) 1952
(d) 1069
(AIIMS, 79; PGI, 83)
2. Under the national population policy by 2001, family
size should be brought to:
(a) 1
(b) 2.1
(c) 3.2
(d) 4.2
3. National population policy is to bring the couple
protection rate to:
(a) 50%
(b) 60%
(c) 75%
(d) 90%
(PGI, 79; Kerala, 88))

National Population Policy 303

4. Net reproduction rate by 2000 AD:


(a) 1 to 1.2
(b) 2 to 2.5
(c) 2.5 to 3
(d) 3.5 to 4
(PGI 80; AIIMS, 60)
5. Which of the following is not a target for 2000AD:
(a) Family size 3.2
(b) Death rate 9
(c) Birth rate 21
(d) NRR 1
(UP, 96)
6. A Net Reproduction Rate (NRR) of one by 2000 AD
would help to achieve stabilization of population in
about 50 years. For this purpose, the couple protection
rate shold be at least:
(a) 30%
(b) 40%
(c) 50%
(d) 60%
(UPSC, 99)
7. First five year plan in India is started in:
(a) 1950
(b) 1951
(c) 1952
(d) 1955
8. The national population policy of India has set the
following goals except:
(a) To bring down total fertility rate to replacement level
by 2015.
(b) To reduce the infant mortality rate to 30 per 100 live
births
(c) To reduce the maternal mortality rate to 100 per 1
lakh live births
(d) 100% registration of births, deaths, marriage and
pregnancies.

Unsolved Questions

306 Medical Statistics and Demography Made Easy

1. In a study, to compare two types of anti-hypertensive


drugs A and B, the following are the results of
decrease in blood pressure:
Total
Drug A

53

51

52

47

50

52

54

319

Drug B

55

52

53

54

54

50

54

372

Assuming that the two samples of patients are


independent. Find out which of the drug is better in
treating hypertension.
2. In a clinic the haemoglobin values of nine anaemic
women were recorded (baseline values). After given
treatment for one month, their haemoglobin values
were again recorded, which were as follows:
Case
No.

Baseline
values

After
one
month

Case
No.

Baseline
values

After
one
month

12

10

11

13

Find out whether the women responded to the


treatment.

Unsolved Questions 307

3. The associative between coronary artery disease and


smoking was found to be as follow:
Coronary
artery disease

Non coronary
artery disease

Total

Smokers
Non-smokers

30
20

20
30

50
50

Total

50

50

100

Find the risk associated with coronary artery disease


and smoking and also compare the person exposed to
smoking in coronary artery disease (study group) and
without coronary artery disease (control).
4. The following number gives the weight of 55 students
of a class. Prepare a suitable frequency table:
42

74

40

60

82 115

54

84

50

67

42

64

68

51

86

66

44

77

50

79

52 103
78

63 100

94

73

53 110

76

69 104

80

79

79

90

84

76

59

81

72

96

64

70

41

61

75

83

65

78

77

56

95

80

71

(a) Draw a histogram and frequency polygon.


(b) Find mode of the above data by using histogram.

308 Medical Statistics and Demography Made Easy

5. The systolic blood pressure of 50 persons are as


follows:
Blood pressure
(mm Hg)

No. of cases

80 100
100 120
120 140
140 160
160 180
180 200

6
8
18
12
4
2

Total

50

(a) Calculate mean, median and mode of the above


frequency.
(b) Also comment about symmetry of the distribution.
6. What will be the value of correlation coefficient
between the age and Blood pressure of 8 patients: The
values of age and blood pressure are as follows:
Case
No.

Age
(yrs.)

BP

Case
No.

Age
(yrs.)

BP

1
2
3
4

56
42
72
36

147
125
160
118

5
6
7
8

63
55
49
38

149
150
145
115

Unsolved Questions 309

7. The following data pertain to systolic blood pressure


and cholesterol levels of individuals in a certain study:
Mean blood pressure ( x ) = 150 mmHg
Standard deviation of BP x = 10.5
Mean cholesterol levels ( y ) = 170
Standard deviation of cholesterol levels x = 13.5
Correlation between blood pressure and cholesterol
level rxy = + 0.75
Draw the two line of regression. Give the estimated
value of cholesterol when the blood pressure of
individual is 160.
8. The population and deaths occurred in two districts
are as follows:
Age
Group

PA

DA

PB

DB

PC

0 10
10 25
25 60
> 60

4,000
12,000
6,000
8,000

36
48
66
158

3,000
20,000
4,000
3,000

30
100
48
60

1,000
4,000
3,000
2,000

Total

30,000

308

30,000

238

10,000

Find out the death rate of which district is higher.


9. The following data given the number of women in
child bearing age and yearly birth in five year age
groups for a city. Calculate the general fertility rate
and total fertility rates. If the ratio of male to female is
13:12. What is the gross reproductive rate?

310 Medical Statistics and Demography Made Easy

Age
Group

Female
pop

Births

Age
Group

Female
pop

Births

15 19
20 24
25 29
30 34

16,000
15,000
14,000
13,000

400
1710
2100
1430

35 39
40 44
45 49
Total

12,000
11,000
9,000
60,000

960
330
36
6690

10. A total of 1,000 individuals were surveyed and


classified as:
Hypertensive

Normotensive

Total

Smokers
Non-smokers

250
50

250
450

500
500

Total

300

700

1000

(a) Calculate the prevalence of hypertension from the


study.
(b) Calculate smoking rate among hypertensive and
normotensive.
(c) Find out whether, smoking is associated with
hypertension.
(d) Find out the risk associated with hypertension.
11. A comparative evaluation of Ziehl-Neelsen staining
and culture on Lowenstein Jensen medium in the
diagnosis of pulmonary and extrapulmonary
tuberculosis patients. Following results were obtained:

Unsolved Questions 311

Z-N
stain

L-J culture (Gold standard)


Positive
Negative

Total

Positive
Negative

16
16

0
12

16
28

Total

32

12

44

Find out the sensitivity, specificity, positive predictive


value, negative predictive value and diagnostic accuracy
of Z.N. Stain.
12. Following are the marks obtained by students in an
examination:
Marks

No. of students

Marks

No. of students

20 30

25

60 70

27

30 40

26

70 80

15

40 50

36

> 80

10

50 60

42

(a) Find out the quartile deviation


(b) Also comment about the skewness of the
distribution.
13. Form a frequency distribution table of the following
data and calculate the two most suitable measures of
central tendencies:
32

47

41

51

30

39

18

48

54

32

31

46

15

37

32

56 300

21

45

32

37

41

44

18 650

47 390

42

44

37

56

48

53

42

37

41

51

50

47

48

312 Medical Statistics and Demography Made Easy

14. The haemoglobin levels of patients are as follows:


Hb%

No. of cases

Hb%

No. of cases

6
7
8
9
10

14
23
26
30
130

11
12
13
14

110
70
50
12

(a) Find out the median of the above distribution by


using ogives.
(b) Also find out the mean by using short cut method.
15. A random sample of patients selected from the
Cardiology OPD of a hospital have following values
of blood pressure:
Blood
pressure

No. of cases

Blood
pressure

No. of cases

130 140

14

160 170

23

140 150

24

170 180

40

150 160

54

180 190

32

Calculate coefficient of dispersion (based on Quartiles


and based on Mean and SD).
16. Find the correlation coefficient and line of regression
between height and weight of 10 individuals:

Unsolved Questions 313

Case
No.

Height

Weight

Case
No.

Height

Weight

175

65

169

69

166

56

182

81

182

78

190

87

167

66

187

84

176

72

10

151

60

17. A survey conducted by a health agency, it was found


that in Town A out of 876 birth 46% were male, while
in Town B out of 690 birth 473 were males.
Is there any significant difference in the proportion
of male child in the two towns. Clearly state the
hypothesis which is to be tested.
18. A sample of 900 individuals has a mean haemoglobin
of 12.7 mg%. Is the sample drawn from a population
with mean 13.6 mg% and SD 2.70.
19. A random sample is drawn from two hospitals and
following data related to blood pressure of adult males
hospital workers were obtained:
Hospital A

Hospital B

Mean blood pressure 127.56 mmHg 140.78 mmHg


Standard deviation
13.77 mmHg 10.37 mmHg
No. of cases
360
700
Is the blood pressure of male workers of Hospital B is
significantly higher than those working in Hospital A.

314 Medical Statistics and Demography Made Easy

20. Two groups of rats were placed on diets with high


and low protein contents and the gain in weight (in
gms) were recorded after 2 months. The results of gain
in weight are as follows:
Group A (high protein diet):
140 117 160 123 145 127 107 146 107 102
114 121 132 153
Group B (low protein diet):
97 63 110 120 96 74 86 120 115 120
150
Find out whether there is any significant difference
between the weight gain in rats of two groups.
21. In a clinical trial the anxiety score of 10 patients were
recorded (baseline value). A new tranquillizer was
given to each period for one month. After one month
the anxiety scores were again recorded. Which are as
follows:
Case
No.

Baseline
value
(xi)

After
one
month
(yi)

Case
No.

Baseline
value
(xi)

After
one
month
(yi)

1
2
3
4
5

23
21
24
19
17

15
20
26
17
17

6
7
8
9
10

26
22
17
12
15

21
16
12
12
11

Find out whether the new tranquillizer is effective to


psychoneurotic patients.

Unsolved Questions 315

22. Concentration of haemoglobin (xi) and bilirubin (yi)


for infants with haemolytic disease of newborn are as
follows:
Case No.

(xi)

(yi)

Case No.

(xi)

(yi)

1
2
3
4

15.8
12.3
9.5
9.4

1.8
5.6
3.6
3.8

5
6
7
8

9.2
8.8
7.6
7.4

5.6
5.6
4.7
6.8

Calculate the correlation coefficient and comment


whether haemoglobin level is directly proportional to
bilirubin levels.
23. Most recent amount smoked by all patients other than
those with cancer of the lung, from a retrospective
survey, are as follows:
Dis.

Cigarette daily

Total

Group

14

514

1524

> 24

Cancer
RDS
CHD
GI Dis.
Others

236
42
22
39
38

78
33
19
31
31

237
128
64
143
91

110
98
38
81
44

57
34
23
34
18

718
335
166
328
215

Total

377

185

663

371

166

1762

Find out whether various disease groups are associated


with daily cigarette smoking. Also mention the degree
of freedom required in this problem.

316 Medical Statistics and Demography Made Easy

24. Following table shows the number of individuals in


various age groups who were found in a survey to be
positive and negative for Schistosoma mansoni eggs
in the stool.
Age in yesrs
010 1020 2030

Total

3040

> 40

Test +
Test

14
87

16
33

14
66

7
34

6
11

57
231

Total

101

49

80

41

17

288

Find out whether the presence of Schistosoma mansoni


eggs in the stool is related to age.
25. Number of children who were nasal carrier or noncarrier of Streptococcus pyogenes, classified by size of
tonsils. The results of survey as follows:

Present
but not
enlarged

Tonsils
Enlarged

Total
Greatly
enlarged

Carrier
Non-carrier

19
497

29
560

24
269

72
1326

Total

516

589

293

1398

Find out whether nasal carrier are associated with size


of tonsils.
26. Two groups of female rats were placed on diets with
high and low protein content, and gain in weight
between the 28th and 84th days of age was measured
for each rat. The results were as follows:

Unsolved Questions 317

High protein diet


(n 12)
134
146
104
119

124
161
107
83

Low protein diet


(n 8)
113
129
97
123

70
118
101
85

107
132
94
115

Find out whether there is any significant increase in the


weight of rats who were given high protein diet.
27. In a clinical trial to assess the value of a new method
of treatment (A) in comparison with the old method
(B). patients were divided at random into two groups.
Out of 257 patients treated by method A. 41 died, of
244 patients treated by method B, 64 died. Find out
whether difference in fatality rate of group A is less
than group B.
28. Fill in the blanks:
(a) Statistical hypothesis under test is called ..................
(b) The probability of type-I error is given by ...................
(c) The probability of type-I error is also called
...................
(d) If is the probability of type II error, the (1b) is
called ................ of the test.
(e) The power of function is related to type .............
error.
(f) In any testing problem, the type ................... error is
considered more serious then type .................. error.
(g) The level of significance of a test is related to type
............... error and is given by .................

318 Medical Statistics and Demography Made Easy

(h) Critical region provides a criteria for .................. Null


hypothesis.
(i) The choice of one tailed and two tailed test depends
on .................
29. Calculate standard deviation of the following two
series:
Series A

25

30

45

60

10

100

70

Series B

100

120

180

240

40

400

280

30. Two random samples of size 16 and 25 are drawn from


normal population and the data of abdominal skin fold
thickness are as follows:
Sample

No. of
observation

Sum of
observation

Sum of square
observations

1
2

16
25

76
105

561
680

Find out whether there is any significant difference


between skin fold thickness of two groups.
31. Fill in the blanks:
(a) Absolute sum of deviation is minimum from
.................
(b) The sum of squares of deviation is least when
measured from .....................
(c) If 25% of the items are less than 10 and 25% are more
than 40, the coefficient of quartile deviation is
.................

Unsolved Questions 319

(d) In a symmetric, distribution the upper and lower


quartile are equidistant from ..................
(e) If mean and the mode of a given distribution are
equal, then its coefficient of skewness is ..................
(f) In any distribution, the standard deviation is always
..................... the mean deviation from mean.
32. A clinical researcher postulates that weight bearing
exercise prevents the development of osteoporosis by
increasing secretion of calcitonin a hormone that
inhibits bone re-absorption. He wishes to test the
hypothesis by comparing blood levels of calcitonin
in subjects who exercise to those in subjects who do
not. The mean calcitonin secretion (g/dl) in study and
control groups of women alongwith their respective
standard deviation are given below:
Study group
No. of women
(ages 25 to 45)
Sample mean
Sample SD

Control group

100

100

0.60
0.20

0.54
0.15

Test the desired hypothesis based on the above


observation.
33. A community health director observes that exposure
of a particular pesticide results in a higher rate of
miscarriage. To test the hypothesis regarding exposure
and miscarriage, he selects 40 women experiencing a
miscarriage and 160 women experiences a normal
pregnancy from the records of the hospital. The 200
subjects were interviewed to determine their prior
exposure to the pesticide. The results are summarized
as:

320 Medical Statistics and Demography Made Easy

Exposed

Not Exposed

Total

30
60

10
100

40
160

Miscarriage
Normal preg.

Explain the type of study design and finds odds in


favour of exposure pesticide.
34. Test whether there is any association between marital
status and breast cancer among females:
Breast Cancer

Married

Unmarried

Yes
No

26
16

9
49

35. Compute crude death rates of population A, B and C


from the table and also compare the death rate of
population A and B taking population C as standard
population.
Age
Group

PA

DA

PB

DB

< 10
10 20
20 40
40 60
> 60

16,000
25,000
45,000
21,000
12,000

425
560
955
752
600

20,000
12,000
50,000
30,000
10,000

600
240
1250
1050
550

PC

DC

12,000 372
30,000 660
62,000 1612
15,000 525
3,000 180

36. In Allahabad city, 20% of a random sample of 900


school children had defective eye sight, while in
Kanpur city 15% of random sample of 1,600 children
had the same defect. Is the difference between two
proportions significant?
37. Draw two systemic samples of size 5 from the data
given below:
3, 4, 7, 5, 1, 6, 8, 2, 7, 4, 7, 11, 9, 3, 4, 6, 13, 11, 11, 10

Unsolved Questions 321

38. A screening test is 90% sensitive and 60% specific.


Calculate Positive and negative likelihood ratio of the
test.
39. Two population of women using oral contraceptives
and no contraceptive device were followed-up for
occurrence of myocardial infarction and observation
are given below:
Myocardial
infarction

No Myocardial
infarction

25
35

40
100

OC users
Non-users

Explain what type of study design has been adopted,


also find the relative risk of myocardial infarction due
to Oral contraceptive.
40. On the basis of two stage screening programme
adopted blood sugar at first stage and glucose
tolerance test (GTT) at second stage for detecting
diabetes. Calculate net sensitivity and net specificity
on the basis of following results.
I stage

Diabetes (+)

Diabetes ()

Total

Test (+)
Test ()

425
125

1575
7875

2000
8000

Total

550

9450

10,000

II stage

Diabetes (+)

Diabetes ()

Total

Test (+)
Test ()

400
25

175
1400

575
1425

Total

425

1575

2000

322 Medical Statistics and Demography Made Easy

41. A random sample of 25 patients is taken from ICCU


of a hospital and the outcome cured (C) or death (D)
was recorded according to the date of admission of
the patient, which are as follows:
C
C
C
D
D
D
C
C
C
C
C
D
D
C
D
D
D
D
C
C
D
C
D
D
C
Apply a run test to test that whether the sequence of
cured and death is random.
42. Two samples are drawn from a two populations whose
distribution is not known. In one group (Group A, n1
= 10) a high caloric diet was given and the second
group (Group B, n2 = 10) was on normal diet. The
weight gain in two groups were recorded after a month
and the increase in weight was recorded in these
group:
Group A 12

10 12

15

10

15

16 18

12

10

Group B

Apply suitable test to find out whether the weight gain


in two groups are same.
43. A coefficient of correlation of 0.4 is derived from a
random sample of size 102 pairs of observation. Is the
value of r is significant.
44. In four families each containing eight persons, the
chest measurements (in cm) of these persons are given
below. Calculate whether there is any significant
difference between the chest measurement of these
families.

Unsolved Questions 323

Family 1

Family 2

Family 3

Family 4

35
53
47
60
85
66
49
55

67
39
33
65
69
66
58
42

56
47
33
79
90
49
57
62

56
78
44
42
39
67
68
86

45. The following table gives the frequency distribution


of pulse rate of 60 normal persons:
Pulse rate

No. of persons

Pulse rate

45 50
50 55
55 60

3
7
20

60 65
65 70
70 75

No. of persons
15
9
6

Calculate upper and lower quartile and the coefficient


of dispersion.
46. The value of mean and median of 100 observations
are 50 and 52 respectively. The value of the largest item
is 100. It was found later that the correct value is
actually 120. Find the correct value of mean and
median and also calculate the mode and second
quartile.
47. Two laboratories carry out independent estimates of
content of progesterone in a particular brand of oral
contraceptive. A sample is taken from each batch,
halved and the separate halved sent to two
laboratories. The following data are obtained:

324 Medical Statistics and Demography Made Easy

No. of sample

Mean value of the difference of estimate


Standard deviation of difference

0.8
16

Find out whether there is significant difference between


the content of progesterone in oral contraceptive on the
basis of report of two laboratories.
48. Calculate the correlation coefficient for the following
height (in inches) of father (x) and their sons (y):
x

65

66

67

67

68

69

70

72

67

68

65

68

72

72

69

71

49. In an investigation on neonatal blood pressure in


relation to maturity following results were obtained:
Babies
9 days old
1. Normal
2. Neonatal asphyxia

Number
50
15

Mean systolic SD
BP
75
69

8
6

Is the difference in mean systolic BP between the two


groups statistically significant?
50. From a field area 40 females using oral contraceptive
and 60 females using other contraceptive were
randomly selected and the number of hypertensive
cases from the groups were recorded as given below:

Unsolved Questions 325

Type of
Contraceptive

Total

No. of
hypertensive

Oral
Others

40
60

12
18

Find whether there is any significant difference between


Oral contraceptive users in Hypertensive and
normotensive females.

Answers of MCQs and Unsolved Questions 327

Answers of MCQs
and
Unsolved Questions

328 Medical Statistics and Demography Made Easy

Answers of MCQs
Chapter 1: Classification and Tabulation
1. d

2. a

3. c

4. b

5. a

6. b

7. d

8. b

9. c

10. d

11. b

12. d

13. d

14. d

15. d

16. a

17. d

18. b, d

19. c

20. c

21. c

22. a

Chapter 2: Measure of Central Tendency


1. c
2. b
3. d
4. a
5. c

6. b

7. c
13. b

8. c
14. a

9. b
15. c

10. b
16. b

11. b
17. b

12. b
18. c

19. b
25. c

20. d
26. c

21. a
27. a

22. c
28. a

23. c
29. b, c

24. a
30. a

Chapter 3: Measure of Dispersion


1. c
7. a

2. b
8. a

3. c
9. b

4. d
10. c

5. d
11. b

6. a
12. b

13. b*
19. b

14. c
20. b

15. b
21. a

16. c
22. c

17. c
23. a

18. d
24. a

25. c
26. a
* because variance is the square of standard deviation
Chapter 4: Theoretical Discrete and Continuous
Distribution
1. a
7. b

2. d
8. a

3. b
9. b

4. d
10. a

5. a
11. a

6. c
12. b

Answers of MCQs and Unsolved Questions 329

13. d

14. b

15. c

16. d

17. d

18. d

19. c
25. a

20. a
26. a

21. b
27. b

22. b
28. d

23. a
29. b

24. b
30. d

31. c

32. b

33. b

34. d

Chapter 5: Correlation and Regression


1. b
2. d
3. b
4. a
5.
7. a
8. b
9. b 10. a
11.
13. a
14. b
15. b 16. b
17.
19. b 20. d
21. c
22. d
23.
25. c
Chapter 6: Probability
1. d
2. b
3. a
7. c
8. c
9. a

4. c
10. a

d
c
a
d

5. b

6.
12.
18.
24.

c
a
a
d

6. d

Chapter 7: Sampling and Design of Experiments


1. a
2. b
3. b
4. b
5. d
6. b
7. b
8. d
9. b 10. b
11. a
12. c, d
13. a
14. a
15. b 16. a
17. b
18. d
Chapter 8: Testing of Hypothesis
1. a
2. c
3. c
4. a
7. d
8. b
9. a
10. a
13. a
14. a
15. c
16. a
19. b 20. d
21. b 22. b
25. a
26. d

5.
11.
17.
23.

a
a
b
a

6.
12.
18.
24.

a
c
b
a

Chapter 9: Non-parametric Tests


1. e
2. d
3. b
Chapter 10: Statistical Methods in Epidemiology
1. c
2. a
3. c
4. a
5. a
6. a

330 Medical Statistics and Demography Made Easy

7.
13.
19.
25.
31.
37.

b
b
a
b
a
a

8.
14.
20.
26.
32.
38.

b
b
b
c
b
b

9.
15.
21.
27.
33.
39.

a
b
c
a
c
d

10.
16.
22.
28.
34.

b
b
c
a
c

11.
17.
23.
29.
35.

b
b
a
d
c

Chapter 11: Vital Statistics (Demography)


1. c
2. d
3. c
4. b
5. c
7. d
8. c
9. b 10. d
11. a

12.
18.
24.
30.
36.

b
c
c
c
b

6. a
12. d

13. d

14. b

15. c

16. d

17. b

18. b

19. a

20. a

21. c

22. b

23. a

24. a

25. b

26. a

27. c

28. d

29. a

30. b

31. a

32. a

33. d

34. a

35. a

36. d

37. d

38. b

39. c

40. a
5. a

6. d

Chapter 12: Health Information


1. a

2. c

3. b

7. a

8. d

9. b

4. d

Chapter 13: A Report on Census 2001


1. b

2. d

3. b

4. c

5. c

6. b

7. d

8. c

9. b

10. b

11. b

12. c

13. b

14. b

Chapter 14: National Population Policy


1. c

2. b

7. b

8. a

3. b

4. a

5. a

6. d

Answers of MCQs and Unsolved Questions 331

Answers of Unsolved Questions


1. Null hypothesis H0 : A = B, Alternative hypothesis
H1 : A B; Mean (a) = 51.28 SD (a) = 2.28; Mean (B) =
53.14, SD (B) = 1.67; t = 2.95, d.f. = 12, P < 0.05.
2. H0: A = B; H1: A B; Mean (difference) = 2; SD (d)
= 2.64, t = 2.27, d.f. = 8, p > 0.05.
3. H0: No association between coronary artery disease and
smoking; 2Cal = 4, d.f. = 1; p < 0.05.
4. Hint: Go through Chapter 2.
5. Mean = 132.4; Median = 131.22; Mode = 132.5;
approximately symmetrical.
6. Correlation coefficient r = + 0.82.
7. Regression line x on y: x = 57.4 + 0.58y
Regression line y on x: y = 26 + 0.96x
Estimate of cholesterol for blood pressure x = 160 is
179.6.
8. Crude death rate (A) = 10.26; CDR (B) = 7.93
Standardized death rate (A) = 9.7; SDR (B) = 10.6
9. GFR = 77.4; TFR = 2.56; GRR = 1.23
10. Prevalence = 300/1000; Rate of smokers for
Hypertensive = 83.33%; Rate of smoking for
Normotensive = 35.71; 2 = 190.46, Risk Ratio = 5.
11. Sensitivity = 50%, Specificity = 100%, PPV = 100%, NPV
= 42.85%, Diagnostic Accuracy = 63.36%.
12. Q1 = 37.78, Q3 = 135.75; Coff. of dispersion = 0.24.

332 Medical Statistics and Demography Made Easy

13. Median = 43.33; Mode = 43.33.


14. Median = 11; Mean = 10.52.
15. Coff. of dispersion (based on SD) = 0.09
Coff. of dispersion (based on Quartile) = 0.07
16. Correlation coefficient r = 0.79; Regression line
between Height (Ht) and weight (Wt) is Ht = 111.32 +
0.88 Wt.
17. H0: P1 = P2; H1: P1 P2; Z = 9.16; p < 0.001.
18. H0: = 13.6; H1 : 13.6; Z = 10, p < 0.001.
19. H0: A = B, H1 : A < B; Z = 15.94; p < 0.001.
20. H0: A = B, H1 : A B; Mean (A) = 128.14, SD (A) =
18.33; Mean (B) = 104.63, SD (B) = 24.60; t = 2.27, d.f. =
23; p < 0.05.
21. H0: x = y, H1 : x > y; Mean (difference) = 2.9; SD (d)
= 3.17; t = 2.89, d.f. = 9; p < 0.05.
22. Correlation coefficient r = 0.58; inversly proportional.
23. H 0 ; No association between disease groups and
cigarette smoking: 2 = 27.18, d.f. = 16; p < 0.05.
24. H0: No relation between age and presence of Shistosoma
mansoni eggs, 2 = 10.35, d.f. = 4; p < 0.05.
25. H0: Nasal carrier are not associated with size of tonsils;
2 = 7.85, d.f. = 2; p < 0.05.
26. H0: 1 = 2; H1: 1 2; t = 1.84, d.f. = 18, p > 0.05.
27. H0: A = B; H1: A < B; Z = 2.77, p < 0.01.
28. (a) Null hypothesis, (b) ; (c) Level of significance; (d)
Power; (e) Type II (f) Type I, Type II; (g) Type I, ; (h)
Rejecting (i) Alternative hypothesis.
29. SD (A) = 30.64; SD (B) = 122.56.

Answers of MCQs and Unsolved Questions 333

30. H0: 1 = 2; 1 2; Mean (1) = 4.75, SD (1) = 3.65; Mean


(2) = 4.20, SD (2) = 3.15; t = 0.51; d.f. = 39; p > 0.05.
31. (a) Median; (b) Mean; (c) 15; (d) Mean; (e) zero; (f) less.
32. H0: 1 = 2; H1: 1 2; Z = 2.5, p < 0.05.
33. Retrospective study; Odds ratio = 5.
34. H0: No association between marital status and breast
cancer; 2 = 20.02, d.f. = 1; p < 0.001.
35. CDR (A) = 27.66; CDR (B) = 30.24; CDR (c) = 27.45
Standardized death rate (A) = 24.53; SDR (B) = 26.26.
36. H0: P1 = P2; H1: P1 P2; Z = 3.21; p < 0.001.
37. Hint: Systematic sampling; 20 = 5 k; k = 20/5 = 4.
38. Positive likelihood ratio = 2.25;
Negative likelihood ratio = 0.16
39. Prospective study; Risk ratio = 1.48.
40. Sensitivity = 72.2%, Specificity = 98.14%
41. H0: sequence of crude and death in this series is random,
No. of run = 11, z = 1.02; p > 0.05 (i.e. accept H0).
42. H0: 1 = 2; H1: 1 2; Mann Whitney U-test, Z =
0.01; p > 0.05.
43. t = 4.39; d.f. = 100, p < 0.001.
44. H0: 1 = 2 = 3 = 4; H1: 1 2 3 4: Analysis of
variance, F = 0.14; d.f. (3.28); p > 0.05.
45. Q1 = 56.25; Q2 = 65.00, Coeff. of dispersion = 0.07.
46. Mean = 50.20; Median = 52, Mode = 55.6.
47. H0: d = 0; H1: d 0, t = 0.15, d.f. = 8, p > 0.05.
48. Correlation coefficient t = 0.60.
49. H0: 1 = 2; H1: 1 2; t = 2.65, d.f. = 63. p < 0.05.
50. H0: P1 = P2; H1: P1 P2; Z = 0, p > 0.05.

Appendix

Statistical Tables

336 Medical Statistics and Demography Made Easy


Table 1: Areas under normal curve

Normal probability curve is given by

f x

1 x 2
1
exp
x
2
2

and standard normal probability curve is given by

1
1
exp z 2 , z
2
2

Figure A-1

The following table gives the shaded area in the diagram,


viz.... P(0 < Z < z) for different values of z.

Appendix 337

Tables of Areas
Z

.0
.1
.2
.3
.4
.5
.6
.7
.8
.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
3.5
3.6
3.7

.0000
.0398
.0793
.1179
.1554
.1915
.2257
.2580
.2881
.3159
.3413
.3643
.3849
.4032
.4192
.4332
.4452
.4554
.4641
.4713
.4772
.4821
.4861
.4893
.4918
.4938
.4953
.4965
.4974
.4981
.4987
.4990
.4993
.4995
.4997
.4998
.4998
.4999

1
.0040
.0438
.0832
.1217
.1591
.1950
.2291
.2611
.2910
.3186
.3438
.3655
.3869
.4049
.4207
.4345
.4463
.4564
.4649
.4719
.4778
.4826
.4864
.4896
.4920
.4940
.4955
.4966
.4975
.4982
.4987
.4991
.4993
.4995
.4997
.4998
.4998
.4999

.0080
.0478
.0871
.1255
.1628
.1985
.2324
.2642
.2939
.3212
.3461
.3686
.3888
.4066
.4222
.4357
.4474
.4573
.4656
.4726
.4783
.4830
.4868
.4898
.4922
.4941
.4956
.4967
.4976
.4982
.4987
.4991
.4994
.4995
.4997
.4998
.4999
.4999

.0120
.0517
.0910
.1293
.1664
.2019
.2357
.2673
.2967
.3238
.3485
.3708
.3907
.4082
.4236
.4370
.4484
.4582
.4664
.4732
.4788
.4834
.4871
.4901
.4925
.4943
.4957
.4968
.4977
.4983
.4988
.4991
.4994
.4996
.4997
.4998
.4999
.4999

.0160
.0557
.0948
.1331
.1700
.2054
.2389
.2703
.2995
.3264
.3508
.3729
.3925
.4099
.4251
.4382
.4495
.4591
.4671
.4738
.4793
.4838
.4875
.4904
.4927
.4945
.4959
.4969
.4977
.4984
.4988
.4992
.4994
.4996
.4997
.4998
.4999
.4999

.0199
.0596
.0987
.1368
.1736
.2088
.2422
.2734
.3023
.3289
.3531
.3749
.3944
.4115
.4265
.4394
.4505
.4599
.4678
.4744
.4798
.4842
.4678
.4906
.4929
.4946
.4960
.4970
.4978
.4984
.4989
.4992
.4994
.4996
.4997
.4998
.4999
.4999

.0239
.0636
.1026
.1406
.1772
.2123
.2454
.2764
.3051
.3315
.3554
.3770
.3962
.4131
.4279
.4406
.4515
.4608
.4686
.4750
.4803
.4846
.4881
.4909
.4931
.4948
.4961
.4971
.4979
.4985
.4989
.4992
.4994
.4996
.4997
.4998
.4999
.4999

.0279
.0675
.1064
.1443
.1808
.2157
.2486
.2794
.3078
.3340
.3577
.3790
.3980
.4147
.4292
.4418
.4525
.4616
.4693
.4756
.4808
.4850
.4884
.4911
.4932
.4959
.4962
.4972
.4979
.4985
.4989
.4992
.4995
.4996
.4997
.4998
.4999
.4999

.0319
.0714
.1103
.1480
.1844
.2190
.2517
.2823
.3106
.3365
.3599
.3810
.3997
.4162
.4306
.4429
.4535
.4625
.4699
.4761
.4812
.4854
.4887
.4913
.4934
.4951
.4963
.4973
.4980
.4986
.4990
.4993
.4995
.4996
.4997
.4998
.4999
.4999

.0359
.0759
.1141
.1517
.1879
.2224
.2549
.2852
.3133
.3389
.3621
.3830
.4015
.4177
.4319
.4441
.4545
.4633
.4706
.4767
.4817
.4857
.4890
.4916
.4936
.4952
.4964
.4974
.4981
.4986
.4990
.4993
.4995
.4997
.4998
.4998
.4999
.4999

338 Medical Statistics and Demography Made Easy


3.9

.5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000

Table 2: Ordinates of the normal probability curve

The following table gives the ordinates of the standard


normal probability curve, i.e., it gives the value of

1
1
exp z 2 , z
2
2
for different values of z, where
z

X E X X

~ N 0, 1
x

Obviously z z
Z

.00

.01

.02

.03

.04

.05

.06

.07

.08

0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1

.3989
.3970
.3910
.3814
.3683
.3521
.3335
.3123
.2897
.2661
.2420
.2179
.1942
.1714
.1497
.1295
.1109
.0940
.0790
.0656
.0540
.0440

.3989
.3965
.3902
.3802
.3668
.3503
.3312
.3101
.2874
.2637
.2396
.2155
.1919
.1691
.1476
.1276
.1092
.0925
.0775
.0644
.0529
.0431

.3989
.3961
.3894
.3790
.3653
.3485
.3292
.3079
.2850
.2313
.2371
.2131
.1895
.1669
.1456
.1257
.1074
.0909
.0761
.0632
.0519
.0422

.3988
.3956
.3885
.8778
.3637
.3467
.3271
.3056
.2827
.2589
.2347
.2107
.1872
.1647
.1435
.1238
.1057
.0893
.0748
.0620
.0508
.0413

.3986
.3951
.3876
.3765
.3621
.3448
.3251
.3034
.2803
.2565
.2323
.2083
.1849
.1626
.1415
.1219
.1040
.0878
.0734
.0608
.0498
.0404

.3984
.3954
.3867
.3752
.3605
.3429
.3230
.3011
.2780
.2541
.2299
.2059
.1826
.1604
.1394
.1200
.1023
.0863
.0721
.0596
.0488
.0396

.3982
.3939
.3857
.3739
.3589
.3410
.3209
.2989
.2756
.2516
.2275
.2036
.1804
.1582
.1374
.1182
.1006
.0848
.0707
.0584
.0478
.0387

.3980
.3932
.3847
.3725
.3572
.3391
.3187
.2966
.2732
.2492
.2251
.2012
.1781
.1561
.1354
.1163
.0989
.0833
.0694
.0573
.0468
.0379

.3977
.3925
.3836
.3712
.3555
.3372
.3166
.2943
.2709
.2468
.2227
.1989
.1758
.1539
.1334
.1145
.0973
.0818
.0681
.0562
.0459
.0371

.09
.3973
.3918
.3825
.3697
.3538
.3352
.3144
.2920
.2685
.2444
.2203
.1965
.1736
.1518
.1315
.1127
.0957
.0804
.0669
.0551
.0449
.0363

Appendix 339
Contd...
Contd...
Z

.00

.01

.02

.03

.04

.05

.06

.07

.08

2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9

.0355
.0283
.0224
.0175
.0136
.0104
.0079
.0060
.0044
.0033
.0024
.0017
.0012
.0009
.0006
.0004
.0003
.0002

.0347
.0277
.0219
.0171
.0132
.0101
.0077
.0058
.0043
.0032
.0023
.0017
.0012
.0008
.0006
.0004
.0003
.0002

.0339
.0270
.0213
.0167
.0129
.0099
.0075
.0056
.0042
.0031
.0022
.0016
.0012
.0008
.0006
.0004
.0003
.0002

.0332
.0264
.0208
.0163
.0126
.0096
.0073
.0055
.0040
.0030
.0022
.0016
.0011
.0008
.0005
.0004
.0003
.0002

.0325
.0258
.0203
.0158
.0122
.0093
.0071
.0053
.0039
.0029
.0021
.0015
.0011
.0008
.0005
.0004
.0003
.0002

.0317
.0252
.0198
.0154
.0119
.0091
.0069
.0051
.0038
.0028
.0020
.0015
.0010
.0007
.0005
.0004
.0002
.0002

.0310
.0246
.0194
.0151
.0116
.0088
.0067
.0050
.0037
.0027
.0020
.0014
.0010
.0007
.0005
.0003
.0002
.0002

.0303
.0241
.0189
.0147
.0113
.0086
.0065
.0048
.0036
.0026
.0019
.0014
.0010
.0007
.0005
.0003
.0002
.0002

.0297
.0235
.0184
.0143
.0110
.0084
.0063
.0047
.0035
.0025
.0018
.0013
.0009
.0007
.0005
.0003
.0002
.0001

.09
.0290
.0229
.0180
.0139
.0107
.0081
.0061
.0046
.0034
.0025
.0018
.0013
.0009
.0006
.0004
.0003
.0002
.0001

340 Medical Statistics and Demography Made Easy


Table 3: Significant values

of t-distribution

(Two tail areas)


Probability (Level of Significant)
d.f. (v)

0.50

0.10

0.005

0.02

0.01

0.001

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

1.00
0.82
0.77
0.74
0.73
0.72
0.71
0.71
0.70
0.70
0.70
0.70
0.69
0.69
0.69
0.69
0.69
0.69
0.69
0.36
0.69
0.39
0.69
0.69
0.68
2.68
0.68
0.68
0.68
0.68
0.67

6.31
0.92
0.35
2.13
2.02
1094
1.90
1080
1.83
1.81
1.80
1.78
1.77
1.76
1.75
1.75
1.74
1.73
1.73
1.73
1.72
1.72
1.71
1.71
1.71
1.71
1.70
1.70
1.70
1.70
1.65

12.71
4.30
3.18
2.78
2.57
2.45
2.37
2.31
2.26
2.23
2.20
2.18
2.16
2.15
2.13
2.12
2.11
2.10
2.09
2.09
2.08
2.07
2.07
2.06
2.06
2.06
2.05
2.05
2.05
2.04
1.96

31.82
.6397
4.54
3.75
3.37
3.14
3.00
2.92
2.82
2.76
2.72
2.68
2.05
2.62
2.60
2.58
2.57
2.55
2.54
2.53
2.52
2.51
2.50
2.49
2.49
2.48
2.47
2.47
2.46
2.46
2.33

63.66
6.93
5.84
4.60
4.03
3.71
3.50
3.36
3.25
3.17
3.11
3.06
3.01
2.98
2.95
2.92
2.90
2.88
2.86
2.85
2.83
2.82
2.81
2.80
2.79
2.78
2.77
2.76
2.76
2.75
2.58

636.62
31.60
12.94
8.61
6.86
5.96
5.41
5.04
4.48
4.59
4.44
4.32
4.22
4.14
40.7
4.02
3.97
3.92
3.88
3.85
3.83
3.79
3.77
3.75
3.73
3.71
3.69
3.67
3.66
3.65
3.29

Appendix 341
Table 4: Significant values of chi-square
distribution (Right tail areas for given probability
2

Where
Degree
of
freedom
0 = .99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

.000157
.0201
.115
.297
.554
.872
1.239
1.646
2.088
2.558
3.053
3.571
4.107
4.660
4.229
5.812
6.408
7.015
7.633
8.260
8.897
9.542
10.196
10.856

and is degrees of freedom (d f)

0.95

0.50

0.10

0.05

0.02

0.01

.00393
.103
.352
.711
1.145
1.635
2.167
2.733
3.325
3.940
4.575
5.226
5.892
6.571
7.261
7.962
8.682
9.390
10.117
10.851
11.591
11.338
13.091
13.848

.455
1.386
2.366
3.357
4.351
5.348
6.346
7.344
8.343
9.340
10.341
11.340
12.640
13.339
14.339
15.338
16.338
17.338
18.338
19.337
20.337
21.337
22.337
23.337

2.06
4.605
6.251
7.779
9.236
10.645
12.017
13.362
14.684
15.987
17.275
18.549
19.812
21.064
22.307
23.542
24.769
25.989
27.204
28.412
29.615
30.813
32.007
32.196

3.840
5.991
7.815
9.488
11.070
12.592
14.067
15.507
16.919
18.307
19.675
21.026
22.362
23.685
24.996
26.296
27.587
28.869
30.144
31.410
32.671
33.924
35.172
36.415

5.214
7.824
9.837
11.668
13.388
15.033
16.622
18.168
19.679
21.161
22.618
24.054
25.472
26.873
28.259
29.633
30.995
32.346
33.687
35.020
36.343
37.659
38.968
40.270

6.635
9.210
11.341
13.277
15.086
16.812
18.475
20.090
21.666
23.209
24.725
26.217
27.688
29.141
30.578
32.000
33.409
34.805
36.191
37.566
38.932
40.289
41.638
42.980
Contd...

342 Medical Statistics and Demography Made Easy


Contd...

Degree
of
freedom
0 = .99
25
26
27
28
29
30

11.524
12.198
12.879
13.565
14.256
14.953

0.95

0.50

0.10

0.05

0.02

0.01

14.611
15.379
16.151
16.928
17.705
18.493

24.337
25.336
26.336
27.336
28.336
29.336

34.382
35.363
36.741
37.916
39.087
40.256

37.652
38.885
40.113
41.337
42.557
43.773

41.566
41.856
44.140
45.419
46.693
47.962

44.314
45.642
46.963
48.278
49.588
50.892

Note: For degrees of freedom


quantity

greater than 30, the

may be used as a normal variate with unit


variance.

Appendix 343
Table 5: Significant values of the variance ratio
F-distribution (Right tail areas 5 percent points)
1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
60
120
240

12

24

161.4 199.5 215.7 224.6 230.2 234.0 238.9 243.9 249.0 254.3
18.51 19.00 19.16 19.25 19.30 19.35 19.37 19.41 19.45 19.50
10.13 9.55 9.28 9.12 9.01 8.94 8.84 8.74 9.64 9.55
7.71 6.94 6.59 6.39 6.26 6.16 6.04 5.91 5.77 5.65
6.61 5.79 5.41 5.19 5.05 4.95 4.82 4.68 4.53 4.96
5.99 5.14 4.76 4.53 4.39 4.28 4.15 4.00 3.84 3.67
5.59 4.74 4.35 4.12 3.97 3.87 3.78 3.57 3.41 3.23
5.32 4.46 4.07 3.84 3.69 3.58 3.44 3.28 3.12 2.93
5.12 4.26 3.865 3.63 3.48 3.37 3.23 3.07 2.90 2.71
4.96 4.10 3.71 3.48 3.33 3.22 3.07 2.91 2.74 2.54
4.84 3.98 3.59 3.365 3.20 3.09 2.95 2.79 2.61 2.40
4.75 3.88 4.49 3.26 3.11 3.00 2.85 2.69 2.50 2.30
4.67 3.80 5.51 3.18 3.02 2.92 2.7
2.60 2.42 2.21
4.60 3.74 3.51 3.11 2.96 2.85 2.70 2.53 2.35 2.13
4.54 3.68 3.29 3.06 2.90 2.79 2.64 2.48 2.29 2.07
4.49 3.63 3.4
3.01 2.85 2.74 2.59 2.42 2.24 2.01
4.45 3.59 3.20 2.96 2.81 2.70 2.55 2.38 2.19 1.96
4.41 3.55 3.96 2.93 2.77 2.66 2.51 2.34 2.15 1.92
4.38 3.52 3.13 2.90 2.74 2.63 2.48 2.31 2.11 1.88
4.35 3.49 3.10 2.87 2.71 2.60 2.45 2.28 2.08 1.84
4.32 3.47 3.07 2.84 2.68 2.57 2.42 2.25 2.05 1.81
4.30 3.44 3.05 2.82 2.66 2.55 2.40 2.23 2.03 1.76
.28 3.42 3.03 2.80 2.64 2.53 2.38 2.20 2.00 1.76
4.26 4.40 3.01 2.78 2.62 2.51 2.36 2.18 1.98 1.73
4.24 3.38 2.99 2.76 2.60 2.49 2.34 2.16 1.96 1.71
4.22 3.37 2.98 2.74 2.59 2.47 2.32 2.15 1.95 1.60
4.21 3.35 2.96 2.73 2.57 2.46 2.30 2.13 1.93 1.67
4.20 3.34 2.95 2.71 2.56 2.44 2.29 2.12 1.91 1.65
4.18 3.33 2.93 2.70 2.54 2.43 2.28 2.10 1.90 1.64
4.17 3.32 2.92 2.69 2.53 2.42 2.27 2.09 1.89 1.62
4.08 3.23 2.84 2.61 2.45 2.34 2.18 2.00 1.79 1.51
4.00 3.15 2.76 2.52 2.37 2.25 2.10 1.92 1.70 1.30
3.92 3.87 2.68 2.45 2.29 2.17 2.02 1.83 1.62 1.25
3.84 2.99 2.60 2.37 2.21 2.09 1.94 1.75 1.52 1.00

47
74
76
35
59
22
42
01
21
60
18
62
36
85
29
62
49
08
16

03
97
16
12
55
16
84
63
33
57
18
26
52
37
70
56
99
16
31

17
18
37
15
93

07
38
28
94

77
17
63
12
86

43
24
62
85
56

12
37
22
04
32

92
97
19
35

94
53
78
34
32

73
67
27
99
35

13
35
77
72
43

46
75
95
12

39
31
59
29
44

86
62
66
26
64

40
96
88
33
50

44
84
50
83

49
57
16
78
09

36
42
56
96
38

33
83
42
27
27

17
16
92
39

54
24
95
64
47

96
81
50
96
34

20
50
95
14
89

16
07
26
50

43
55
55
56
27

47
14
26
68
82

38
87
45
34
87

58
44
11
08

54
06
67
07
96

36
57
71
27
46

26
755
72
09
19

09
99
37
30

82
88
19
82
54

61
20
07
31
22

13
97
16
45
20

79
83
00
2

17
77
98
52
49

46
42
32
05
31

89
12
64
59
15

83
11
53
34

37
04
10
42
17

98
53
90
03
62

51
25
36
34
37

86
46
76
07

93
74
50
07
46

63
32
79
72
43

06
93
16
09
00

19
32
31
96

23
47
71
44
09

71
37
78
93
09

74
47
00
45
49

62
24
38
88

78
67
75
38
62

62
32
53
15
90

17
70
04
59
52

06
20
80
54

87
21
12
15
90

33
27
13
57
06

76
33
43
34
85

76
14
22
42

35
76
86
51
52

26
07
55
12
18

37
24
18
68
66

50
85
02
06

20
33
73
00
84

16
36
38
10
44

Table 6: Random sampling numbers

13
03
66
49
60

06
88
53
87

96
50
58
13
77

80
07
58
14
32

04
54
79
12
44

10
45
53
98

43
25
07
42
27

45
51
59
21
53

07
97
94
72
38

55
10
86
35

84
83
44
99
08

60
24
88
88
23

74
77
77
07
68

23
93
60
85

26
92
39
66
02

11
51
97
26
83

21
46
24
34
88

64
85
42
29

34
12
52
02
73

14
79
54
49
01

30
80
90
99
80

05
10
53
39

64
76
79
54
28

95
73
10
76
30

Contd...

19
44
21
45
11

05
79
04
48

91
06
38
79
43

10
89
14
81
30

344 Medical Statistics and Demography Made Easy

34
57
42
39
94
90
27
24
23
96
67
90
05
46
19
26
97
71
99
95

68
74
27
00
29
16
11
35
38
31
66
14
68
20
67
05
07
68
26
14

Contd...

93
10
56
61
52

40
84
51
78
58

82
94
10
16
25

30
25
37
68
98

70
88
85
65
98

47
45
18
73
97

66
75
16
86
91

13
65
86
29
94

60
23
85
53
75

14
11
00
90
79

59
06
20
38
47

70
76
53
61
24

22
09
54
58
87

64
75
33
97
15

83
06
33
42
96

55
59
48
66
68

35
98
87
37
59

05
73
96
51
06

62
09
32
38
44

74
29
55
37
49

85
42
66
78
36

71
88
02
40
15

64
19
51
97
33

30
97
90
32
69

15
99
47
80
22

95
05
75
14
93

11
74
26
01
49

77
68
65
20
10

13
64
54
70
41

86
90
19
02
20

12
66
38
50
13

40
60
72
30
82

92
61
73
42
26

11
52
07
04
01

67
02
49
87
34

44
71
96
77
53

03
71
32
10
78

05
27
60
02
90

19
94
78
75
86

22
91
57
84
75

51
62
08
50
63

65
40
62
33
10

00
37
45
66
82

78
38
69
57
91

59
99
11
67
06

09
14
93
31
75

71
34
04
81
53

84
67
36
03
93

77
15
12
42
55

38
86
55
08
06

74
02
91
41
91

26
54
10
29
30

59
06
44
32
13

76
22
59
39
40

60
76
16
40
00

04
13
96
10
34

56
51
95
17
08

83
98
33
51
78

47
70
92
01
52

33
58
46
45
25

78
29
92
55
27

20
12
82
16
78

21
90
53
74
43

46
18
92
65
20

06
16
63
85
01

37
22
43
49
89

29
30
56
91
48

09
24
42
04
57

83
93
16
47
50

90
08
90
36
62

68
86
16
62
85

52
76
45
23
27

52
58
29
94
15

57
07
49
47
02

02
38
02
48
27

68
15
97
11
40

91
05
56
44
29

16
52
37
95
67

02
45
75
51
55

07
54
60
04
48

05
77
24
67
39

00
74
38
93
74

37
94
50
84
26

97
55
49
96
73

74
51
48
94
43

66
80
59
30
33

31
38
98
32
62

57
52
91
24
92

Contd...

70
09
29
16
39

11
95
44
3
17

03
30
95
08
89

06
95
04
67
51

Appendix 345

53
26
23
20
25
50
22
79
75
96
74
38
30
43
25
63
55
07
54
85

17
90
41
60
91
34
85
09
88
90
55
63
35
63
98
02
64
85
58
34

Contd...

21
22
26
16
27

23
06
58
36
37

57
04
13
82
23

77
59
582
50
38

17
21
13
24
84

99
86
21
82
55

74
39
77
18
70

58
21
55
81
05

69
82
89
15
87

67
51
46
69
26

37
43
48
14
00

71
19
99
69
90

71
48
01
51
61

61
99
06
65
01

98
73
73
22
39

71
23
31
31
94

50
22
10
54
48

32
00
72
51
91

80
81
82
95
00

41
52
04
99
58

80
28
07
44
64

28
65
17
18
82

33
53
97
75
03

61
23
49
73
28

89
06
82
82
56

69
26
10
37
81

00
94
22
42
06

50
33
69
68
41

36
00
04
00
26

84
94
94
88
46

91
79
21
49
90

72
12
96
68
36

38
61
59
62
90

94
02
25
61
74

09
33
05
39
55

12
96
10
35
45

15
54
63
61
18

62
82
21
38
71

77
62
03
32
85

41
93
47
81
37

70
13
69
65
48

67
90
61
44
12

93
46
27
82
78

94
02
48
33
59

11
43
36
04
13

86
23
75
12
94

19
86
24
22
38

96
18
45
03
03

48
91
03
69
26

24
07
96
42
97

82
28
83
49
36

26
39
88
76
09

43
82
69
38
37

98
79
49
32
24

47
08
72
02
94

44
07
13
24
90

40
78
11
18
70

33
62
28
92
02

94
31
89
48
37

95
02
41
30
35

45
12
15
65
15

41
67
24
85
71

80
54
44
07
30

27
18
43
12
57

86
23
83
18
42

19
80
00
88
37

04
46
05
70
69

36
39
89
48
29

98
29
80
97
57

95
60
49
65
07

04
31
60
37
32

99
07
20
60
12

00
06
13
85
65

47
75
55
54
03

45
53
35
16
90

02
25
97
18
82

83
66
29
72
65

53
91
65
34
92

07
94
80
04
89

96
99
17
99
62

26
24
54
13
80

53
12
79
81
18

31
13
39
61
00

74
32
14
10
54

03
27
28
21
07

09
19
07
35
75

49
47
88
87
33

83
23
17
34
60

Contd...

91
12
19
49
39

38
81
78
85
66

66
38
94
67
76

30
70
49
72
65

346 Medical Statistics and Demography Made Easy

92
95
45
08
85
84
78
17
76
31
44
66
24
73
60
37
67
28
15
19

03
62
08
07
01
72
88
45
96
43
50
22
96
31
78
84
36
07
10
55

Contd...

90
10
59
83
68

66
22
40
91
73

71
28
75
28
67

18
30
93
55
89

61
08
07
87
97

44
15
14
61
99

14
16
65
12
72

27
27
15
18
95

56
23
48
60
65

21
86
51
19
84

35
84
57
54
30

46
59
22
40
66

70
98
89
79
03

66
26
23
60
43

19
13
28
22
24

57
37
60
45
51

10
93
64
24
73

06
63
22
20
89

11
52
40
01
02

99
75
21
44
10

23
35
58
31
52

38
75
30
72
94

58
53
19
11
94

16
41
75
75
19

98
08
89
66
16

05
41
88
93
36

49
94
72
94
08

96
66
46
13
34

05
86
75
56
56

92
99
57
48
475

26
53
12
25
63

56
48
91
90
88

85
99
83
21
00

68
58
95
98
56

50
75
25
71
38

30
86
98
24
15

11
29
85
48
53

156
42
67
57
69

11
45
12
96
32

33
97
77
94
84

34
76
62
24
55

54
36
47
07
47

17
96
74
16
36

72
80
27
96
97

76
29
27
06
90

35
72
29
23
07

17
30
75
16
66

85
61
85
61
19

60
81
89
93
27

02
24
83
69
40

76
96
67
88
02

22
45
42
02
75

76
33
30
91
33

42
58
94
65
90

86
73
60
68
69

84
23
28
57
12

48
34
14
98
42

35
37
69
95
22

31
89
40
64
36

64
53
88
55
76

45
91
78
94
29

48
52
40
39
91

57
62
60
36
38

38
04
61
66
39

34
58
56
05
38

96
18
06
69
07

20
70
81
74
25

56
01
08
83
43

60
93
27
49
87

32
51
07
58
12

18
31
19
45
39

98
63
84
15
78

01
63
86
01
22

14
03
14
56
78

95
99
24
19
48

99
45
69
73
64

64
14
63
47
13

50
37
16
80
35

60
17
62
59
03

01
76
62
42
63

18
52
59
59
88

41
18
36
30
34

78
43
01
50
45

30
08
03
37
91

96
52
02
00
34

48
11
86
44
72

75
76
16
92
22

64
27
73
61
25

Contd...

39
32
80
38
83

52
39
78
19
08

46
48
61
88
15

98
64
42
11
08

Appendix 347

81
86
91
71
66
96
83
60
17
69
93
30
29
31
01
33
84
40
31
59

53
51
35
37
93
02
49
84
18
79
75
38
51
21
29
95
90
46
20
71

Contd...

95
60
62
89
73

36
92
50
38
23

08
43
71
30
10

29
32
70
67
13

22
79
98
03
05

87
29
10
86
84

45
478
62
88
61

13
68
29
95
83

00
80
82
43
50

83
03
34
24
88

65
35
46
71
78

39
92
13
13
27

18
24
54
38
08

56
06
31
37
58

13
82
40
44
71

35
33
80
20
92

47
36
97
46
22

20
28
57
79
02

025
88
80
91
32

01
98
03
02
79

72
59
20
82
23

14
81
75
81
39

00
33
81
14
76

20
75
54
44
64

00
87
56
68
71

82
39
95
53
37

41
69
30
88
95

71
66
07
95
64

18
38
95
72
77

11
38
820
74
67

84
96
37
47
62

34
99
27
94
72

38
82
15
32
91

74
62
51
73
42

93
72
34
89
87

62
40
96
64
28

79
07
74
14
01

21
25
94
24
10

07
36
39
23
00

33
14
94
85
54

58
53
80
82
93

97
06
02
16
14

51
04
23
30
22

74
71
78
04
96

69
89
08
99
20

90
84
74
10
20

72
19
05
63
58

82
94
32
05
53

32
35
32
70
49

65
63
77
33
92

59
76
38
15
40

14
58
66
72
84

81
96
16
80
82

96
61
76
52
16

21
47
25
56
92

53
45
50
01
48

76
35
46
60
96

42
29
15
83
55

45
45
15
34
54

73
94
95
32
14

80
23
70
47
59

68
08
48
90
23

57
15
35
20
01

19
19
52
90
52

26
79
50
18
26

63
93
49
94
42

09
18
71
47
75

09
38
74
76
98

92
18
80
97
94

86
67
44
76
45

77
60
30
89
25

03
81
33
14
94

82
05
67
63
66

74
04
18
70
54

19
82
88
99
43

56
14
13
53
56

80
98
72
49
39

54
32
55
47
96

48
11
12
82
11

54
44
80
89
07

84
90
16
30
67

13
92
63
14
09

56
08
57
93
71

29
99
55
74
93

25
07
42
21
98

26
08
77
54
11

27
95
21
24
99

56
81
62
60
89

39
35
79
30
60

4
09
09
36
06

44
97
77
98
31

93
07
54
41
30

348 Medical Statistics and Demography Made Easy

Index
A
Addition rule of probability 75
Age and sex composition 211
Age pyramid 211
Age specific fertility rate 224
Alternative hypothesis 100
Analysis of variance table 140
Analytical studies 175
Application of t distribution
125
Arithmetic mean 16
Association 62
Assumption for students t test
125
Attributable risk 182
Attributes 2

B
Bar chart 5
Base line 164
Basic population data 256
Binominal distribution 48
Blinding (Masking) 164

C
Case control study 176
Case definition 164
Case report 174
Case series 174
Census 2001 250

Chi square distribution 114


Classical probability 75
Cluster sampling 86
Coefficient of dispersion 35
Coefficient of variation 35
Cohort 165
Cohort study 175
Comparative statistics of
different indicators 279
Comparison of several
proportions (2 k
contingency table) 118
Comparison of two proportions
by Chi square 118
Concept of population policy
289
Conditional probability 78
Confidence limits 107
Confounding bias 179
Contingency table (2 2 table)
121
Continuous variable 2
Correlation 62
Country health profile 261
Critical region 100, 103
Critical value 103
Cross-sectional studies 175
Crude birth rate 224, 277
Crude death rate 214, 278
Cumulative frequency curve 7

350 Medical Statistics and Demography Made Easy

D
Decile 33
Degree of freedom 115
Demographic cycle 210
Denominator 167
Density 252
Density of population 213
Dependency ratio 212
Descriptive studies 173
Design of experiments 92
Diagnostic accuracy 191
Direct standardization 219
Discrete variable 2
Dispersion 32

E
Ecological bias 179
Equally likely events 74
Exact sampling distribution
114
Exhaustive events 74
Experimental studies 176
Experimental unit 165
Exposure rates 183

F
Failure 106
Family size 213
Fertility trends 251
First quartile 32
Fourfold classification 118
Frequency curve 10
Frequency distribution table 4
Frequency polygon 10

F-statistic 134
F-test for equality of
population variance 135
F-test for equality of several
means 135

G
General contingency table (r
s) 120
General fertility rate 224
Geometric mean 24
Goals of national population
policy 295
Goodness of fit 117
Gross reproductive rate 225
Growth rate 230, 252

H
Harmonic mean 25
Histogram 10
History of census 248
Hospital records 243

I
Impossible event 75
Incidence rate (person) 168
Incidence rate (spell) 169
Incidence rates 180
Independence of attributes
118
Independent events 74
Indirect standardization 221
Infant mortality rate 215, 278
Issue of the adolescents 255

Index 351

K
Key population statistics of
India 1901-2001 292
Kurtosis 41

L
Landmarks in the evolutions of
Indias national population
policy 299
Level of significance 101
Life expectancy 213
Life table 227
Likelihood ratio 193
Line diagram 9
Literacy 252
Literacy rate in India 271
Local control 94
Longitudinal studies 174

M
Manifold classification 118
Mann-Whitney U test 156
Maternal mortality rate 223
Mean deviation 34
Measurement bias 179
Measurement of morbidity
168
Measurement of mortality 168
Median 17
Median test 154
Mid year population 167
Mode 20
Mode of F-distribution 134
Mortality indicators for all
India, 1971-1998 293

Mortality trends 291


Multiplication rule of
probability 77
Multistage sampling 89
Mutually exclusive events 74

N
Negative predictive value 187
Neonatal mortality rate 215
Net reproductive rate 226
Nominal 2
Non parametric tests 152
Normal distribution 50
Null hypothesis 100
Numerator 167

O
Observational studies 173
Odds ratio 184
One tailed test 102
One way analysis of variance
135
Ordinal 2

P
Paired t test 127
Parameter 89
Percentile 33
Perinatal mortality rate 216
Period prevalence 170
Pictogram 6
Pie chart 6
Placebo 164
Point prevalence 170
Poisson distribution 49

352 Medical Statistics and Demography Made Easy


Population 84
Population at risk 167
Population census 240
Positive predictive value 187
Postnatal mortality rate 215
Power of test 102
Prevalence 169, 191
Primary data 2
Proportion 167
Proportional mortality rate
217
Prospective study 165
Provisional population totals:
India - part I 258
Provisional population totals:
India 255

Q
Quartile deviation 32

R
Random sampling 84
Random series 74
Randomization 93
Randomized controlled
laboratory study 178
Randomized controlled
cllinical trials 177
Randomized cross-over
clinical trials 177
Range 32
Rate 166
Ratio 166
Readers bias 180
Region of acceptance 103
Region of rejection 103

Registration of births and


deaths act, 1969 242
Registration of vital events
241
Regression 64
Regression coefficient 64
Relative risk 181
Replication 93
Retrospective study 165
Role of targets 294
Root mean square deviation
34
Run test 153
Rural-urban distribution of
population 267

S
Sample 84
Sample registration system
242
Sample size 84
Sample surveys 243
Sampling bias 180
Sampling distribution 89
Sampling of attribute 106
Scattered diagram 11
Screening bias 179
Second quartile 32
Secondary data 2
Sensitivity 186
Sex ratio 212
Sign test 155
Significant value 103
Skewness 40
Skewness of F-distribution
134

Index 353
Sources of health information
240
Specificity 187
Stable population 212
Standard deviation 34
Standard error 89
Standard normal variate 52
Standardized death rate 218
State wise distribution of
households 273
Stationary population 212
Statistic 89
Statistical hypothesis 100
Statistical methods in
epidemiology 163
Status of children 254
Status of womens health 253
Still birth rate 217
Stratified sampling 85
Success 106
Summary of census 2001 283
Sure event 75
Systemic error 178
Systemic sampling 85

T
t- test for difference of mean
126

t- test for significance of


correlation coefficient 128
t- test for single mean 126
Tables 3
Test of significance for
difference of mean 111
Test of significance for
difference of proportion
107
Test for significance for large
samples 105
Test of significance for single
mean 111
Test for single proportion 106
Test of significance 102
Third quartile 32
Total fertility rate 225
Trials and events 74
Two tailed test 102
Type-I error 101
Type-II error 101

V
Variable 2
Vital rates per 1000
population,
India 1901-1990 293

You might also like