Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Concise Biostatistics Manual

Download as pdf or txt
Download as pdf or txt
You are on page 1of 85

Concise Biostatistics Manual Prashant Rao, Sarika Rao

1
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Concise
Biostatistics
Manual
By

Dr Prashant R Rao
MBBS, MS, DNB, MNAMS, FMAS, FIAGES
DNB (Surgical Gastroenterology)
Assistant Professor in Surgical Gastroenterology
LTMMC & LTMGH, Sion, Mumbai

&

Dr Sarika P Rao
MBBS, MS, MCh, DNB (Plastic & Reconstructive Surgery)
Fellowship in Microvascular & Aesthetic Surgery
Assistant Professor in Plastic & Reconstructive Surgery
LTMMC & LTMGH, Sion, Mumbai

2
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Dedicated to
Our Parents

3
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Concise Biostatistics Manual

© Leelavathi Publications
First Edition: 2019
All rights reserved.
The authors have taken special care to ensure that the information provided in
the text are correct to the best of their abilities. However, mistakes are inevitable.
Hence the readers are requested to check and confirm the information provided
in the book in case of any doubt. The authors are not liable to anyone for any loss
or damage caused by the errors.

PUBLISHED BY: LEELAVATI PUBLICATIONS

Cost: free. I just hope this book reaches to whoever needs it and it helps you in
any way possible to pass your theory exams.

Help us make this book better by providing your valuable feedback, positive
criticisms and suggestions to us, via email on concisecancermanual@gmail.com

Also like and follow us on our Facebook page


“Concise Cancer Manual- Preparatory Manual for Surgery Exams”
For snippets from the book with the same title and the latest updates in GI
Surgery

4
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Concise Biostatistics Manual


About the Book

This book has been prepared by compiling and editing “DATA” from various study
materials and notes provided to us by our seniors and colleague friends, along
with some very good articles and books.

Multiple topics in the subject of Biostatistics which are frequently asked in the
medical examinations have been covered in this book.

The highlights of this book are:


➢ Designed keeping in mind the examination patterns of our universities
➢ Important topics frequently asked in exams are all compiled in one place
➢ Simplified language for easy understanding
➢ Point wise standardized description for better grasping and answer writing

Further reading
If one has the time and patience please try and go through the following:

➢ Ghoshal UC, Tripathi S, Chourasia D (2007) Principle of statistical analysis in


clinical research: a primer. In: Mehta R (ed) Clinical gastroenterology. Paras
Publishing, Hyderabad, pp 372–386

➢ “High-Yield Biostatistics, Epidemiology & Public Health” by Anthony N Glaser


➢ “Methods in Biostatistics for Medical Students and Research Workers” by B K
Mahajan

5
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Also try: Concise Cancer Manual, Available Online

6
Concise Biostatistics Manual Prashant Rao, Sarika Rao

About Concise Cancer Manual

7
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Concise Biostatistics Manual: Topics covered:

➢ Types of Research Studies


➢ Case Control Study
➢ Cohort study
➢ Case Control Study vs Cohort Study
➢ Randomized Control Trial
➢ Concept of Randomization in RCT
➢ Concept of blinding in RCT
➢ Concept of Allocation concealment in RCT
➢ Meta-analysis
➢ Bias in Clinical Research
➢ Types of Data in Statistics
➢ Measures of Central tendency
➢ Measures of Dispersion of Data
➢ Concept of hypothesis testing
➢ P value
➢ Types of Error in Statistics
➢ Concept of power of a study
➢ Sample size
➢ Statistical Tests and Choosing a statistical test
➢ Concept of Univariate and Multivariate Analysis
➢ Correlation and Regression
➢ Incidence and Prevalence
➢ Screening
➢ Evaluation of an investigative test
➢ Kaplan Meier plots
➢ Forest plots
➢ Receiver Operating Characteristic curve
➢ Evidence based medicine
➢ Levels of evidence and Grades of Recommendation
➢ Ethics in Research and Informed consent
➢ Clavien-Dindo classification of surgical complications

8
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Types of research studies

• Descriptive studies

• Analytical studies

o Ecological studies

o Case control studies

o Cohort studies

o Cross sectional studies

• Experimental studies

o Randomised control trial

o Uncontrolled trial

• Integrative studies

o Systematic review

o Meta-analysis

9
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Case Control Study

• Definition: it is a type of observational study comparing characteristics of


individuals with the disease of interest with a suitable control group of
individuals without the disease

• Since it is an observational study no intervention is attempted or no


attempt is made to alter the course of disease

• Case control studies are commonly retrospective in nature

• It is also known as: backward looking study, effect to cause study, disease
to risk factor study, outcome to exposure study

• It provides Odds ratio, which is an estimate of relative risk

• Distinguishing features:

o Both exposure and disease have occurred before the start of study

o Study proceeds backward from effect to cause

o It uses a control/ comparison group to support or refute an inference

o It provides Odds ratio which is a measure of the strength of


association between the risk factor and outcome
• It is based on 3 assumptions:

o Cases must be representative of those with the disease

o Controls must be representative of those without the disease

o The disease being investigated must be relatively rare

10
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Basic steps:

o Selection of cases and controls

o Matching

o Assessment of exposure

o Analysis and interpretation

• Study design:

Exposure Diseased/ cases Non diseased/ controls

Yes a b
No c d

o Exposure rate in cases: a/ a+c


o Exposure rate in controls: b/ b+d
o Odds ratio= a/b = ad
c/d bc

11
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Advantages of case control study:

o Short duration

o Rare diseases can be studied

o Multiple risk factors can be simultaneously looked at

o No follow up required

o Inexpensive and Rapid

o No extra manpower required, less administrative problems

o No ethical problems

o No Hawthrone effects

o One can calculate odds ratio

o No risk to subjects

• Disadvantages

o Retrospective study so data may be of poor quality

o Incidence, relative risk, attributable risk cannot be calculated; it only


yields an estimate of relative risk, which is, odds ratio

o Interviewer bias, confounding bias, recall bias and selection bias is


involved

o Selection of appropriate controls is difficult

o It does not differentiate between causes and associated factors

12
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Odds ratio
• It is a measure of the strength of association between the risk factor and
outcome

Case Control
Exposed a b
Non exposed c d

• Odds ratio (OR) = ad/bc

• Interpretation:

o OR = 1: exposure to risk factor is identical in both case and control


group

o OR < 1: exposure to risk factor is lower in cases than in control

o OR > 1: exposure to risk factor is greater in cases than in control

13
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Cohort study

• Definition: it is a type of analytical study which is undertaken to obtain


evidence to support or refute existence of association between suspected
cause and disease

• Also known as: prospective study, forward-looking study, incidence study,


cause-effect study, longitudinal study, exposure to outcome study

• Since it is an observational study no intervention is attempted or no


attempt is made to alter the course of disease

• Analysis may be both retrospective and prospective

• Definition of cohort: defined as a group of people who share a common


characteristic or experience within a defined time period

o Example: all those born in 2019 form birth cohort of 2019

• Distinguishing features:

o The two cohorts are identified prior to appearance of the disease


under investigation

o The study groups are observed over a period of time to determine


the frequency of disease among them

o Study proceeds forward from cause to effect

o The exposure has occurred but disease has not

14
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Indication:

o Good evidence between exposure and disease

o When the exposure is rare, but the incidence of disease is higher


among exposed

o When attrition in study population can be minimized

• Types of cohort studies

o Prospective cohort studies

o Retrospective cohort studies

o Combination of retrospective and prospective studies

o Prospective study is also known as current cohort study and


retrospective cohort study is also known as historical cohort study

• Elements or steps of performing cohort study

o Selection of study subjects

o Obtaining data on exposure

o Selection of suitable comparison group who are unexposed

o Follow up

15
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Cohort study design

• Analysis

Exposure Deseased Non diseased


Yes A b
No C d
o Incidence in exposed: a/ a+b
o Incidence in non-exposed: c/ c+d
o Interpretation:

▪ If incidence in exposed is more than incidence in non exposed:


risk is present

▪ If incidence in exposed is equal to incidence in non exposed:


there is no risk and

▪ If the incidence in exposed is less than that in non exposed: the


exposure is protective

16
Concise Biostatistics Manual Prashant Rao, Sarika Rao

o Relative risk:

▪ Is the ratio of incidence of disease among exposed to incidence


of disease among non-exposed

▪ Relative risk is direct measure of strength of association


between suspected cause and effect

▪ Interpretation of relative risk:

• If RR > one: risk is present

• If RR = one: there is no risk

• If RR < 1: exposure is protective

• Larger the RR greater the strength of association


between suspected factor and disease

• Example: relative risk of 2 indicates that the incidence


rate of disease is two times higher in exposed subjects
as compared to non exposed ones

o Attributable risk

▪ Defined as the difference in incidence rate of disease in


exposed group and un exposed group

▪ Expressed as percentage

▪ AR= incidence in exposed- incidence in unexposed x 100


incidence in exposed

17
Concise Biostatistics Manual Prashant Rao, Sarika Rao

▪ Uses: It indicates the extent to which the disease under study


can be attributed to the exposure

o Population attributable risk =

Incidence rate in total sample – incidence in exposed x 100


Incidence rate in total sample

▪ Use: it provides an estimate of the amount by which the


disease could be reduced in that population if the suspected
factor was eliminated or modified

• Advantages of cohort study:

o In case of prospective study, data is of better quality

o It yields incidence rate, relative risk and attributable risk

o Multiple diseases resulting from an etiological factor under study can


be simultaneously looked at / several possible outcomes related to
exposure can be studied simultaneously

o No recall bias as in case control study

18
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Disadvantages:

o Long duration of study

o Unsuitable for study of rare diseases

o Attrition is a problem

o It is expensive

o It can be difficult to find a suitable cohort group

o Administrative and ethical problems

o Hawthrone effect: change in behaviour of study subjects

o If retrospective, there will be problem with data quality

19
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Case Control Study vs Cohort Study

Case control study Cohort study


proceeds from effect to cause, that is proceeds from cause to the effect, that
retrograde is antegrade
relatively inexpensive it is expensive
tests whether the suspected cause it test whether the disease occurs
occurs more frequently in those with more frequently in those exposed than
the disease than among those without in unexposed individuals
the disease
it is retrospective it maybe retrospective or prospective
involves lesson number of subjects it involves larger number of subjects
provides quick results it usually involves long follow-up
period
suitable for study of rare diseases unsuitable for study of rare diseases
generally yields only an estimate of it yields incidence rate relative risk and
relative risk, which is, odds ratio attributable risk
this is usually the first approach to the it is reserved for testing of precisely
testing of a hypothesis formulated hypothesis

20
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Randomized Control Trial

• It’s a type of experimental study design which aims to reduce bias when
testing a new treatment/ intervention

• Basic steps:

o Drawing up a protocol

▪ One of the essential features of RCT is that this study is


conducted under a strict protocol

▪ The protocol specifies the aims and objectives of the study,


questions to be answered, inclusion and exclusion criteria for
selection of study and control group subjects, sample size,
treatment or intervention to be performed

o Selecting a reference and experimental population

▪ Reference population: it is the population to which the findings


of the trial if found successful will be applicable
▪ Study/ experimental population:
• Derived from reference population
• Should be randomly chosen
• Should be representative of reference population
• Should be eligible for the trial and should give consent
for the same
• Is also called as experimental population

21
Concise Biostatistics Manual Prashant Rao, Sarika Rao

o Randomization

▪ It is the heart of randomized control trial

▪ Definition: randomization is a statistical procedure by which


the participants are allocated into either study or control
groups, to receive or not receive the intervention under study

▪ Randomization is done in an attempt to eliminate bias and


allow for comparability between the study and control group

▪ It ensures that the investigator has no control over the


allocation process and this helps eliminate selection bias

▪ It means that every individual study subject has an equal


chance of being allocated to either study or control group

▪ It is best done by using a table of Random numbers

o Intervention/ manipulation: Deliberate application of treatment/


intervention to be tested or withdrawal/ reduction of suspected
causal factors

o Follow up: examination of the experimental and control group


subjects at defined interval of time to look for desired study outcome
▪ Attrition: it is loss to follow up which may happen due to either
death, migration, loss of interest or withdrawal of consent.
Every effort should be made to minimize this. Losses are
inevitable

o Assessment of data to derive results

22
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Basic study design:

Protocol

Select a suitable population (reference/ target population)

Select a suitable sample (study population)

Make necessary exclusions Those not eligible

Those not willing

Randomization

Experimental group control group

Manipulation and follow up

Assessment

23
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Classification of RCTs:
o On basis of hypothesis:
▪ Superiority trial
▪ Non inferiority trial
▪ Equivalence trial
o On basis of outcome of interest:
▪ Explanatory
▪ Pragmatic
o Study designs in randomized control trial:
▪ Concurrent parallel study design
▪ Crossover type study design
o Types of RCT:
▪ Animal experiment
▪ Human clinical trial
▪ Preventive trial
▪ Risk factor trial
▪ Cessation experiment

24
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Advantages of RCT
o Considered most reliable form of scientific evidence (level 1
evidence)
o Used to find cause and effect relationship
o No selection bias
o Multiple outcome variables can be measured in a single study
• Disadvantages of RCT
o Expensive
o Longer study duration
o Ethical restrictions and administrative issues
o Participant and observer bias
o Noncompliance of controls threatens the validity of study

25
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Concept of Randomization in RCT

• It is the heart of randomized control trial

• Definition: randomization is a statistical procedure by which the participants


are allocated into either study or control groups, to receive or not receive the
intervention under study

• Randomization is done in an attempt to eliminate bias and allow for


comparability between the study and control group

• It ensures that the investigator has no control over the allocation process and
this helps eliminate selection bias

• It means that every individual study subject has an equal chance of being
allocated to either study or control group

• It is best done by using a table of Random numbers

• Methods:
o Simple randomization (easiest method)
o Systematic randomization
o Block randomization
o Stratified randomization
• Advantages:
o Eliminates bias, especially selection bias and confounding bias
o Allows for comparability between study and control group
o Facilitates the concept of ‘blinding’
o It permits the use of probability theory to express the likelihood that any
difference in outcome between treatment groups merely indicates
chance

26
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Concept of blinding in RCT

• It refers to the concealment of group allocation from one or more


individuals involved in the research study
• It is also called as ‘masking’
• Classification:
o Single Blind: Participant is not aware whether he belongs to study or
control group.
o Double Blind: Neither the participant nor the investigator is aware of
the group allocation and treatment or manipulation received.
o Triple Blind: Participant, investigator as well as the person analyzing
the data are unaware of the group allocation and treatment or
manipulation received.
• Purpose:
o Randomization minimizes the differences between the treatment
and control groups
o Reduces bias in a study
o Since the participant is unaware about which treatment group they
are in, their beliefs about the treatment are less likely to influence
the outcome.
o Also, since the researcher is unaware of which subjects are receiving
the tested treatment, they are less likely to influence the outcomes.

27
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Concept of ‘Allocation concealment’ in RCT

• It refers to the stringent precautions taken to ensure that the group


assignment of study subjects is not revealed prior to definitively allocating
them to their respective groups.
• It means that the person randomizing the subjects does not know what the
next treatment allocation would be
• There is a possibility that the person randomizing the patients to different
groups may selectively allocate a subject to a specific group if he knowns
what the next allocation would be, thus introducing selection bias
• To prevent this, one should be unaware of the future group allocation
• This is called as ‘allocation concealment’
• Example: if a health care provider knows what the next group allocation is,
he may try and allocate it to a particular subject. This will lead to the
introduction of selection bias. Allocation concealment can help eliminate
this bias.
• Methods of allocation concealment
o SNOSE: sequentially numbered opaque sealed envelope
o Numbered/ coded container
o Secured computer method
o Pharmacy controlled method
o Centralized service: where researcher calls trial office to know
allocation sequence. This is the best method

28
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Importance of allocation concealment


o Yields larger estimates of effect
o Yields greater heterogeneity in results
o Allows to reduce selection bias

29
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Meta-analysis

• first done by Karl Pearson in 1904, term coined by Gene V Glass in 1940

• Meta-analysis is a statistical analysis that systematically combines the


results of several studies which address a specific research question

• Steps:

o Define research question and specify hypothesis

o Define criteria for inclusion and exclusion of studies

o Literature search

o Selection of studies on specified subject

o Aggregate finding across different studies

o Selection of meta regression model; three types are:

▪ Simple regression model

▪ Fixed effects meta regression model

▪ Random effect regression model

o Combined study results using different approaches; example: inverse


variance method, Mantel Haenszel method, Peto method.

o Report results

30
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Meta-analysis flowchart:

Define Research question

Perform literature search

Select studies

Extract data

Analyze data

Statistical analysis

Report results

31
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Bias in Clinical Research

• It is a systematic error which is defined as disproportionate weight in favor


of or against one thing, person or group compared with another

• It is in a way considered to be unfair

• Types of bias

o Selection bias:

▪ It's also called as Berksonian bias

▪ It involves individuals being more likely to be selected for study


than others

o Funding bias: It refers to bias which has been introduced to derive


outcomes which favor the study’s financial sponsor.

o Reporting bias: It refers to bias which is introduced when reporting


observations; such that observations of a certain kind are more likely
to be reported than others

o Analytical bias: It refers to bias which is introduced while analyzing


the study results

o Exclusion bias: It refers to bias which is introduced due to systematic


exclusion of certain individuals from the study to alter study results

o Attrition bias: It refers to bias which is introduced due to loss of


participants/ attrition; example: death or loss to follow up

32
Concise Biostatistics Manual Prashant Rao, Sarika Rao

o Recall bias: It refers to bias which arises as a result of inability of the


participant to recollect past events accurately

o Observer bias: It refers to bias which is introduced by the researcher/


observer. It is the subconscious cognitive bias of judgement by the
researcher

o Confounding bias:

▪ It refers to a situation in which association between exposure


and outcome is distorted by presence of a confounding factor/
variable

▪ exposure outcome
confounding variable

▪ Types:

• Positive confounding: observed association is biased


away from null

• Negative confounding: observed association is biased


towards null

• Various methods to eliminate bias in RCT

o Randomization
o Blinding
o Allocation concealment

33
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Types of Data in Statistics

Data refers to observed values of a variable

Types of data

• Qualitative or categorical or nonparametric data: it refers to data which can


be separated into different categories

o Ordinal data

▪ It refers to data which can be arranged in an ascending or


descending order

▪ Example: stages of cancer (1, 2, 3, 4)

o Nominal data

▪ It refers to data which can't be arranged in ascending or


descending order

▪ Example: sex, eye color

• Quantitative or numerical or parametric data: it refers to data which can be


measured

o Interval data

▪ It refers to data which is measured along a scale in which each


data point is equidistant from one another

▪ Example: level of pain rated from 1 to 10 on scale

34
Concise Biostatistics Manual Prashant Rao, Sarika Rao

o Ratio

▪ It refers to data which can measured as multiples of one


another

▪ That is data which can be multiplied or divided

35
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Measures of Central tendency

✓ Mean

✓ Median

✓ Mode

• Mean

o It is derived by adding all the individual observations and then


dividing it by the total number of observations

o Example:

If individual observations are a, b, c and d; the

mean = a + b + c + d/ 4

o Advantages

▪ Easy to calculate

▪ Easy to understand

▪ All values in the distribution are included in its calculation

▪ It is most commonly used statistical measure of central


tendency

o Disadvantage

▪ It is affected by the extreme values which may result in skewed


results
▪ The value at times may look ridiculous
36
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Median

o It is derived by first arranging the data in ascending or descending


order; the value of the observation in the middle of the set is the
median

o It is a better indicator of central tendency as compared to mean,


when the lowest and highest observation are wide apart or when
they are unevenly distributed

o Advantage

▪ It is easy to calculate

▪ Easy to understand

▪ Not affected by sampling variations

o Disadvantage

▪ It does not consider all values in the distribution

o Example

▪ If individual observations are 1, 2, 3, 4, 5, then the median is 3

▪ If individual observations are 1, 2, 3, 4, then the median is


2 + 3/2 = 2.5

37
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Mode

o It refers to the most commonly occurring value in a distribution of


data

o It is the most frequent item or the most fashionable value in the


series of observations

o Types

▪ Unimodal: single mode

▪ Bimodal: distribution having 2 modes

▪ Multimodal: distribution having more than 2 modes

o Advantage

▪ Easy to calculate

▪ Easy-to-understand

▪ Not affected by sample variation

o Disadvantage

▪ Exact location is often uncertain and not clearly defined

38
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Measures of Dispersion of Data


• Also known as variability of data
• Measures of dispersion or variability of a data give an idea of the extent to
which the values are clustered or spread out.
• In other words, it gives an idea of homogeneity and heterogeneity of data.
• Two sets of data can have similar measures of central tendency but
different measures of dispersion
• Therefore, measures of central tendency should be reported along with
measures of dispersion.
• Measures of dispersion include:
o Range:
▪ It is the simplest measure of dispersion.
▪ It can be represented as the difference between maximum and
minimum value or simply as maximum and minimum value.
▪ Range is given with median.
o Mean deviation: It is the average of deviation from arithmetic mean
o Standard deviation:
▪ it is always given with mean.
▪ It denotes the extent of variation of values from the mean.
▪ Example: if the standard deviation is 10, then the values tend
to be about 10 units above and below the mean.
▪ Higher values of standard deviation represent higher variability
in the data and vice versa.
▪ Zero represents no variability

39
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Concept of hypothesis testing


• Hypothesis testing or significance testing is to quantify our belief against a
particular hypothesis.
• For example, in a clinical trial for testing a new drug against the current
one,
o Null hypothesis (H0): assumes no effect of the given drug (i.e. the
new drug is no better, than the current drug or there is no difference
between the two drugs).
o Alternative hypothesis (H1): holds that the null hypothesis is not true.
The alternative hypothesis is what we wish to prove (i.e. the new
drug has a significantly different effect, on average, compared to that
of the current drug).
• A P value of less than 0.05 means that probability of null hypothesis (H0)
being correct is less than 5% (less than 5 out of 100 means less than 0.05
out of 1). So if P < 0.05 then H0 is rejected and the H1 is accepted
[P value is discussed in detail in the next chapter]

40
Concise Biostatistics Manual Prashant Rao, Sarika Rao

P value

• P value = Probability value

• Definition:

o It is the probability of occurrence of an event by chance

o It is the probability of null hypothesis being true (null hypothesis


assumes that there is no significant difference between specified
populations)

o It is the probability of type 1 error

• For example, if you Toss a Coin the probability of getting head or tail is 50%
so the P value is 0.5

• Lesser the p-value, lesser is the probability of the event occurring by chance

• As the confidence interval increases, previously significant value becomes


non significant

• Interpretation of P value

o P = 0.5 means probability of occurrence of an event by chance is 50


in 100 or 50%

o P = 0.05 means the probability of occurrence of an event by chance is


5 in 100 or 5%

o P = 0.01 means the probability of the occurrence of an event by


chance is 1 in 100 or 1%

41
Concise Biostatistics Manual Prashant Rao, Sarika Rao

o P = 0.001 means the probability of occurrence of an event by chance


is 1 in 1000 cases

• P value for null hypothesis is usually kept at less than 0.05, it means that
null hypothesis is true in less than 5% cases. So, if P < 0.05, the null
hypothesis is rejected, that is alternative hypothesis is accepted and the
difference is statistically significant

• Significance:

o P < 0.05 is considered statistically significant

o Lesser the P-value, lesser is the probability of occurrence of the event


by chance

o Higher the P value lesser the significance

o As the confidence interval increases, previously significant values will


become insignificant

42
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Types of Error in Statistics

• There are four possible outcomes at the end of statistical analysis in a


research study

o True positive

o True negative

o False positive

o False negative

• Error in research results when false positive or false negative outcomes are
accepted

• Type I error: To reject the null hypothesis when it is true or

o False positive error or

o Alpha error.

o Example: type I error would mean that the effects of two drugs
studied were found to be different by statistical analysis, when in fact
there was no difference between them.

43
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Type II error: To accept the null hypothesis when it is false or

o False negative error or

o Beta error.

o Example: type II error would mean that the effects of two drugs
studied were not found different by statistical analysis, when in fact
there was difference

• One has to increase sample size to reduce error

Status of Null Based on statistical Based on statistical


hypothesis analysis Null analysis Null
hypothesis is accepted hypothesis is rejected
Null hypothesis is true True positive False positive or type I
error
Null hypothesis is false False negative or type True negative
II error

44
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Concept of power of a study


• The power of a statistical hypothesis test measures the test's ability to
reject the null hypothesis when it is actually false - that is, to make the
correct decision.
• Statistical power = rejection of null hypothesis when alternate hypothesis is true
= making a correct decision
= 1 - ß {type II error]
• The maximum power a test can have is 1 and the minimum is 0.
• Ideally, we want a test to have high power, close to 1.
• Increasing the sample size is the best way to increase the power of a
statistical test.

45
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Sample size

• Sample size is defined as the number of subjects that are included in a


given research study

• It is impossible to study whole population, so a sample is selected from the


population in a random manner

• The sample size should be just large enough to be able to detect a


difference if it exists

• This number is usually represented by the term ‘n’

• Importance of sample size

o Calculation of sample size helps in planning study

o Calculation of sample size helps to estimate the resources that would


be required, that is: manpower, money, material and time that
would be required to complete the research

o Sample size calculation helps to ensure scientific and ethical integrity


of the research study

• Factors to consider while calculating sample size

o Type of study?

o What is the primary outcome variable of the study?

o What is estimated value of primary outcome variable and acceptable


precision?

46
Concise Biostatistics Manual Prashant Rao, Sarika Rao

o What is acceptable Type 1 and 2 error for hypothesis testing?

o What is the desired effect size?

• Factors affecting sample size

o Feasibility: it refers to what is possible or what one can do,


depending upon the resources available to you

o Nonresponse rates and dropout rates

▪ Not everyone who you select in your sample will respond.


Some will be non-responders and some will dropout

▪ Nonresponse rates and dropout rates lead to reduction in the


sample size studied. power of study also decreases

• Small vs large sample size

o Small sample size

▪ May not be possible to detect significant difference even if it


exists

▪ May result in estimation of false results

▪ Objectives of study may not be achieved

47
Concise Biostatistics Manual Prashant Rao, Sarika Rao

o Larger sample size

▪ May result in wastage of resources (manpower, time, effort,


money)

▪ Very small clinically insignificant differences may be detected,


which may not be helpful in clinical practice

o Larger sample size can minimise the sampling error. That is, larger
samples tend to be associated with smaller margin of error.
However, there is a point at which increasing sample size no longer
impacts the sampling error: this is known as law of diminishing
returns

• Calculation of sample size

o Methods: software method and formula method

o For calculation one needs to know:

▪ Population size

▪ Expected frequency of disease in population

▪ Confidence limit

• Significance:

o Sample size influences 2 statistical parameters

▪ Precision of the study


▪ Power of the study to draw conclusions: power is defined as
probability of finding a statistically significant result
48
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Statistical Tests

• Non parametric tests:

o Statistical test used in the case of non parametric data

▪ Chi square test

▪ Mann Whitney u test

▪ fisher exact test

▪ Wilcoxon test

• Parametric tests:

o Statistical test used in the case of parametric data

▪ Large sample size: Z test

▪ Small sample size:

• Paired T test

• Unpaired T test

• One-way ANOVA

• Two-way ANOVA

49
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Choosing a statistical test


• Once data has been collected and tabulated into the spreadsheet, we need
to decide what statistical test needs to be performed for different variables
tabulated into the spreadsheet.
• Choice of different tests depends on the following factors:
o Whether the variable is categorical or continuous?
o Whether the data in that variable is normally distributed (parametric)
or not normally distributed (nonparametric)?
o Whether the data is paired or unpaired?

50
Concise Biostatistics Manual Prashant Rao, Sarika Rao

51
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Source:

Ghoshal UC, Tripathi S, Chourasia D (2007) Principle of statistical analysis in


clinical research: a primer. In: Mehta R (ed) Clinical gastroenterology. Paras
Publishing, Hyderabad, pp 372–386

52
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Source: https://cyfar.org/types-statistical-tests

53
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Concept of Univariate and Multivariate Analysis


• univariate analysis: it analyzes whether one variable is associated with
another or not?
o That is only one variable is analyzed at a time
o Such association does not necessarily mean causation.
• Multivariate analysis: it simultaneously tests effect of multiple factors on an
outcome
o That is more than two variables is analyzed at a time
o It helps in inferring which are the independent factors that are
associated with the outcome

54
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Correlation
• Correlation is relationship between the two sets of continuous data
o Example: relationship between height and body weight; relationship
between fasting blood sugar and body weight
• Correlation statistics is used to determine the extent to which two
independent variables are related and yields a number called coefficient of
correlation.
• Correlation coefficient may be positive or negative and may vary from -1 to
+1
• Positive correlation means that values of two different variables increase
and decrease together (direct relationship).
o For example, speed of running and pulse rate correlates positively.
• Negative correlation means that if value of one variable decreases then
value of the other variable increases (inverse relationship).
o For example, age and number of scalp hair may correlates negatively.
• The strength of a correlation is determined by absolute value of correlation
coefficient
• Closer is the value to 1, stronger is the correlation.
• Correlation between two variables is shown by scatter plot
• P value in a correlation statistics indicates whether the correlation (or no
correlation) observed is real or by chance.
• Correlation analysis is important because it can be used to predict values of
one variable on the basis of value of other variable.

55
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• A correlation does not mean causation but it also does not mean absence
of causation, that is, if two variables exhibit strong correlation then one of
the variables may cause the other.
• Correlation data is therefore not sufficient evidence for causation.
• Pearson correlation is applied for parametric data while Spearman
correlation is applied for nonparametric data.
• Combined effect of a group of variable upon a variable not included in the
group is called as multiple correlation.

Fig: Scatter plots

56
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Regression

• Regression analysis is used to predict the values of a quantitative


dependent variable based on the values of one or more independent
variables.
• In simple regression analysis, there is one quantitative dependent variable
and one independent variable.
• In multiple regression analysis, there is one quantitative dependent variable
and two or more independent variables.
o For example, one may derive a formula to predict liver span
(dependent variable) from the height of a person (independent
variable).
• Linear regression statistics finds the best-fit line (line of regression) that
predicts dependent variable from independent variable
• Linear regression statistics is applied to data where independent variable is
continuous.
• If the independent variable is categorical (e.g. present vs. absent) then
logistic regression is used.

57
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Incidence

• Incidence rate is defined as the number of new cases occurring in a defined


population during a specified period of time

IR = number of new cases of specific disease diagnosed during a given time period x 1000

population at risk during this time period

• Thus, incidence rates refer to:

o Only new cases

o Diagnosed during a given time period

o In a specified / at risk population

• Uses

o It measures the rate at which new cases are occurring in the


population

o Helps describe the magnitude of the illness

o It acts as a health status indicator

o Is useful for taking action to control the disease

o For planning research to identify etiology, pathogenesis and


distribution of disease

o To determine the efficacy of vaccination by calculating secondary


attack rate in vaccinated and unvaccinated groups

58
Concise Biostatistics Manual Prashant Rao, Sarika Rao

o To evaluate effectiveness of disease control measures such as


isolation, immunization, disinfection

• Special incidence rates

o Attack rate

o Secondary attack rate

o Hospital admission rate

• Attack rate:

o Attack rate is equal to number of new cases of a specified disease


during a specified time interval divided by total population at risk
during the same time interval multiplied with 100

o Usually expressed as a percentage

o Used during epidemics

• Secondary attack rate:

o It is defined as a number of exposed persons developing the disease


within the range of incubation period following a primary case

o It is equal to the number of exposed persons developed in the


disease within the range of incubation period divided by total
number of exposed/ susceptible contacts multiplied with 100

59
Concise Biostatistics Manual Prashant Rao, Sarika Rao

o Denominator includes only susceptible individuals; individuals who


are vaccinated or have previously suffered from the disease are
excluded

o The index case is excluded from both numerator and denominator

o Uses

▪ It is a measure of communicability of a disease

▪ Helps determine efficacy of vaccination by calculating


secondary attack rate in unvaccinated and vaccinated group

o Limitation

▪ Secondary attack rate cannot be measured for diseases with


subclinical manifestation

60
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Prevalence

• Definition: it is the total number of all individuals who have the disease at
particular time period divided by the population at risk of having the
disease in this time period

• Prevalence is a ratio

• The term refers to all current cases (new and old existing cases) at a given
time period in a given population

• There are two types:

o Point prevalence

o Period prevalence

• Point prevalence

o Point prevalence is more commonly used than period prevalence

o The term prevalence when used alone refers to point prevalence

o It is defined as the total number of all current cases (old and new) of
a disease at one point of time in a defined population

o It is equal to number of all current cases (old and new) of a disease at


one point of time divided by estimated population in the same time
multiplied with 100.

o It can be made specific for age, sex and other relevant factors

61
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Period prevalence:

o It measures the frequency of all current cases (old and new) existing
during a defined period of time in a defined population

o It includes cases arising before but extending into the defined period
as well as those arising during the defined time period

o It is equal to number of existing cases (old and new) of a specified


disease during a given period of time divided by estimated mid
interval population multiplied with 100

• Uses

o To estimate the magnitude of health problem in community

o Identify potential high-risk populations

o Useful for administrative and planning purpose; example: allocation


of hospital beds, manpower and rehabilitation facilities

62
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Relationship between incidence and prevalence

o Prevalence depends on incidence and duration of illness

o If population is stable and the incidence and duration of illness is


unchanging

o Prevalence = incidence x duration of illness

63
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Screening Test

• It is defined as the search for unrecognized disease or conditions by means


of rapidly applied tests, examinations or other procedure in an apparently
healthy individual

Screening test Diagnostic test

applied on healthy individuals applied to a diseased individual

applied to groups of individuals applied on a specific individual and


not on a group

based on a cut off point based on evaluation of number of


symptoms, signs and test findings

Less accurate it is more accurate

less expensive more expensive

not basis of initiation of treatment; used as a basis for initiation of


it is basis for further evaluation treatment
with diagnostic tests

example: newborn screening, example HbA1c in diabetics


screening of anemia in pregnant
women

64
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Uses of screening tests

o Case detection: also known as prescriptive screening

▪ Peoples are screened primarily for their own benefits

▪ Example neonatal screening

o Control of diseases: also known as prospective screening

▪ People are screened for the benefit of others

▪ Example: screening of immigrants for detection of infectious


diseases; example: yellow fever

o Research purpose: screening may sometimes be performed for


research purposes; example in chronic diseases whose natural
history is not fully known

o Educational purposes

• Types of screening

o Mass screening: screening of whole population or sizable subgroup


of population

o High risk screening: screening of a particular subgroup of population


which is deemed to be at high risk for the particular disease. It is
more productive

o Multiphasic screening: application of two or more screening test

65
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Criteria to consider diseases for screening:

o Important public health problem with high prevalence

o Recognizable by screening test in asymptomatic phase

o Natural history is well understood

o Test should be able to detect the disease prior to the onset of sign
and symptoms

o Facilities to confirm the diagnosis should be available

o Effective treatment should be available

o Good evidence is available that early detection and treatment of the


said disease reduces morbidity and mortality of the disease

o Expected benefits exceeds risk and cost of test

• Characteristics of screening test

o Acceptability: it should be acceptable to the individuals in whom it


has to be done

o Repeatability

▪ The test must give consistent results when repeated more than
once on the same individual under the same conditions

▪ Factors which affect repeatability of the test

• Observer variation: it maybe either intra-observer


variation (one observer finding different values in the
66
Concise Biostatistics Manual Prashant Rao, Sarika Rao

same patient) or inter-observer variation (two different


observers finding different value in same patient)

• Biological variation

o Validity: also known as accuracy

▪ It is defined as the ability of the test to separate or distinguish


those who have the disease from those who do not

▪ It has two components: sensitivity and specificity

▪ Both are expressed as a percentage

67
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Evaluation of an investigative test:

• Sensitivity

• Specificity

• Positive predictive value

• Negative predictive value

Screening test diseased Non diseased Total


results
Positive a b a+b
true positive false positive
negative c d c+d
false negative true negative
total a+c b+d a +b + c + d

• Sensitivity

o Defined as the ability of a test to correctly identify all those who have
the disease that is true positive

o Sensitivity = [a / a + c] x 100

• Specificity

o Defined as the ability of the test to correctly identify those who do


not have the disease that is true negative

o Specificity = [d / b + d] x 100

68
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Positive predictive value: Proportion of individuals with a positive test result


who have the disease.

o Positive predictive value = [a / a + b] x 100

• Negative predictive value: Proportion of individuals with a negative test


who do not have the disease

o Negative predictive value = [c / a + c] x 100

• Predictive value reflects diagnostic power of the test

• Diagnostic accuracy (DA): Accuracy is the proportion of all test results


(positive and negative) that are correct.

o Diagnostic accuracy = A+D / A+B+C+D

• An ideal screening test should be 100% sensitive and 100 % specific

• However, when sensitivity increases specificity of the test decreases; that is


sensitivity is inversely proportional to specificity and vice versa

• In general

o Screening test should have high sensitivity and

o Diagnostic test should have high specificity

69
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Kaplan Meier plots

• It is a non parametric statistical analysis used to estimate survival function


form lifetime data

• In medical research it is often used to measure the fraction of patients


living for a certain amount of time after treatment, that is survival

• Named after Edward L Kaplan and Pual Meier

• Basic concepts

o Kaplan Meier plots is a series of declining horizontal steps, which,


with a large enough sample size approaches the true survival
function for that specific population

o In order to generate Kaplan Meier plots at least two pieces of data


are required for each patient

▪ The status of the last observation

▪ Time to event (or time to censoring)

o Length of horizontal lines along the x-axis represent survival duration


for that interval

o Vertical axis represents estimated probability of survival

70
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Example:

o The survival of patients in surgery plus adjuvant chemotherapy group


(group 1) is better than patients in surgery only group (group 2)

• Advantage

o The advantage of KM plots is the possibility of inclusion of censored


data which means that the information about patients who are lost
at any point of time, for any reason, can be used for the analysis

o With the use of these plots professionals have an improved


understanding of the disease processes

o Kaplan Meier plots can be used in clinical practice for counselling


while dealing with patients with different types of diseases

71
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Forest plots

• Forest plot is graphical manner of presenting means and confidence


interval of studies, so that they can be easily reviewed and compared

• It is convenient and easily understandable manner of presenting study


results in a systematic review/ meta-analysis

• It is used mainly to present results of individual study in a systematic


review/ meta-analysis as well as the systematic review/ meta-analysis itself

• It shows effect size of all studies and results of meta analysis

• Example of a forest plot

• Parts of a forest plot:

o Left side: it enumerates the names of various studies which have


been included in the meta-analysis in a chronological order

o Right side: it represents the measure of effect of the studies included

72
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Fig: understanding the components of a forest plot

o The square in diagram is a measure of effect of the study (example:


mean, odds ratio, relative risk) and the horizontal line represents the
95% confidence interval of the study

o Size of each square or the area of a square is proportional to the weight


of the study in the meta-analysis

o The diamond in the plot represents the overall measure of effect of the
meta-analysis

o The vertical line represents the line of no effect/ null hypothesis

73
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Advantage

o Easy to understand

o Simple and convenient method of representing result of individual


studies and net result of meta-analysis

74
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Receiver Operating Characteristic curve


• Receiver operating characteristic is a plot of sensitivity vs 1 - Specificity or
plot of true positive rate against the false positive rate, for the different
possible cut offs (threshold values) of a diagnostic test.
• It shows the relationship between sensitivity and specificity (any increase in
sensitivity is accompanied by a decrease in specificity).
• ROC curve gives an idea of accuracy of a test (efficiency of the test to
discriminate between true positive and true negative).
• The area under the curve gives the measure of test accuracy.
• Area of ROC curves is calculated by complex mathematical models but
• can be obtained easily by various computer programs.

75
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Evidence based medicine

• It is defined as the process of turning clinical problems into questions, then


answering them by systematically locating, appraising and using research
findings to finally help with clinical decision making

• It is also known as evidence based clinical practice

• Steps in Evidence based medicine

o Evaluate your patient in form of: history, clinical examination and


laboratory investigations

o Ask appropriate clinical questions

▪ The question should include all components of PICO, that is:

• P: description of patient population

• I: intervention

• C: comparison group

• O: outcome

76
Concise Biostatistics Manual Prashant Rao, Sarika Rao

o Acquire the best evidence in the form of research available

o Appraise the evidence

o Apply the evidence to patient care

o Self evaluation

• Rules of evidence based medicine

o Not all evidence is equivalent

o Evidence alone cannot help make clinical decisions

• Advantages

o It helps upgrade the knowledge base of the clinician

o It helps improve the understanding of the clinician in aspects of


research and its methods

o It improves the confidence of the clinician in managing clinical


situations

o It improves the computer literacy and data searching skills

o It allows group problem solving and teaching

o It improves our reading habits

o Wasteful practices can be abandoned

o It helps with more effective use of resources

o Helps keep the clinician UpToDate

77
Concise Biostatistics Manual Prashant Rao, Sarika Rao

o Helps in decision making process

• Disadvantage

o It takes time to learn the methods and to put them into clinical
practice

o Research is costly

Cochrane collaboration

• One of the international Agencies which has taken up the task of building
evidence-based medicine is Cochrane collaboration

• Goal of the collaboration is

o To produce high quality systematic reviews

o To ensure that these systematic reviews are subjected to very high-


quality Peer reviews

o To disseminate these systematic reviews electronically via the


Internet

78
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Levels of evidence
From the Centre for Evidence-Based Medicine, http://www.cebm.net.

Level Type of evidence

1A Systematic review (with homogeneity) of RCTs

1B Individual RCT (with narrow confidence intervals)

1C All or none study

2A Systematic review (with homogeneity) of cohort studies

2B Individual Cohort study (including low quality RCT, e.g. <80% follow-up)

2C “Outcomes” research; Ecological studies

3A Systematic review (with homogeneity) of case-control studies

3B Individual Case-control study

4 Case series and poor quality cohort and case-control study

5 Expert opinion without explicit critical appraisal or based on physiology


bench research or “first principles”

Grades of Recommendation

A based on level 1 studies

B based on level 2 or 3 studies or extrapolations from level 1 studies

C based on level 4 studies or extrapolations from level 2 or 3 studies

based on level 5 evidence or troublingly inconsistent or inconclusive studies


D of any level

“Extrapolations” are where data is used in a situation that has potentially clinically
important differences than the original study situation.
79
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Ethics in Research

• It is defined as the philosophy of morality

• Principles of ethics:

o Respect for autonomy

o Beneficience

o Non maleficence

o Justice

• Respect for autonomy:

o It is obligation to respect the decision-making capacity of the


individual

o You need to consult research participants and obtain their


agreement before you start your work

o Informed consent is obligatory

o This principle gives the individual subject, the right to gather as much
information as possible so that they can make their informed choice
whether to go forward with the intervention or not

o Patient need not give any reason for withdrawal

80
Concise Biostatistics Manual Prashant Rao, Sarika Rao

• Beneficence (means: to do good) & non maleficence (means: first do no


harm)

o This principle confers the responsibility of protecting the physical,


mental and social well-being of the research participant during a
research to the researcher.

o The researcher must consider the principles of Beneficence & non


maleficence together and aim at producing net benefit over harm

• Justice

o Justice precludes exposing one group of individuals to risks of


research for benefit of another group

81
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Informed consent

• It is defined as consent given by a competent individual who has received


the necessary information in the language he best understands, has
adequately understood the information and after considering the
information has arrived at the decision without having been subjected to
coercion, undue influence, inducement or intimidation

• If an individual is willing to participate in a research, the investigator should


take the participants informed consent

• In case of minor or other vulnerable participant, parents/ guardians are


legally authorized representative who can give consent on their behalf

o Vulnerable participants: any individual who lacks the ability to fully


consent to participate in a study; example: minor, illiterate person

o However, in case of minors above 7 years of age, their consent (also


known as assent) should be obtained to the extent of the child's
capabilities

o For children below 7 years of age, consent has to be obtained from


parents/ guardians alone

• Elements of informed consent

o Volunteerism

o Information disclosure

o Decision making capacity

82
Concise Biostatistics Manual Prashant Rao, Sarika Rao

o Investigator should give information about following to the


participant before taking consent

o Purpose of the study

o Expectation from the participant

o Responsibilities of the investigator

o Risk and benefits of the intervention under study

o Alternatives available

o Option to withdraw from the study

o In case of complication whom to contact

83
Concise Biostatistics Manual Prashant Rao, Sarika Rao

The Clavien-Dindo classification of surgical complications

84
Concise Biostatistics Manual Prashant Rao, Sarika Rao

Notes

85

You might also like