Introduction To Biostatistics
Introduction To Biostatistics
3
Why we need to know about statistics ?
• How to properly present and describe
information
• How to draw conclusions about large
population based only on information
obtained from samples
• How to improve the processes
• How to obtain reliable forecasts
Who Uses Statistics?
Statistical techniques are used extensively by
marketing, accounting, quality control, consumers,
hospital administrators, educators, politicians,
physicians, etc...
Sources of Statistical Data
● For Researching problems
usually requires published data. Statistics on these problems can be
found in published articles, journals, and magazines.
● Published data is not always available on a given subject. In such cases,
information will have to be collected and analyzed.
● One way of collecting data is via questionnaires.
Descriptive Statistics:
Methods of organizing, summarizing, and presenting data
in an informative way.
Inferential Statistics:
A decision, estimate, prediction, or generalization about a
population, based on a sample.
7
Types of Statistics
(examples of inferential statistics)
8
Types of Variables
DATA
Qualitative or Quantitative or
attribute Numerical
(type of car owned)
Discrete Continuous
(number of (time taken for
children) an exam)
9
Types of Variables
For a Qualitative or Attribute variable the
characteristic being studied is nonnumeric.
Gender, religious affiliation, type of automobile
owned, state of birth, eye color are examples.
10
Levels of Measurement
11
Levels of Measurement
1)Nominal level:
Data that is classified into categories and cannot be
arranged in any particular order.
eye color, gender, religious affiliation
Mutually exclusive:
An individual, object, or measurement is
included in only one category.
Exhaustive:
Each individual, object, or measurement must
appear in one of the categories.
12
Levels of Measurement
2)Ordinal level:
involves data arranged in some order, but the
differences between data values cannot be determined
or are meaningless.
13
Levels of Measurement (Cont..)
3)Interval level:
similar to the ordinal level, with the additional property
that meaningful amounts of differences between data
values can be determined. There is no natural zero point.
4) Ratio level:
The interval level with an inherent zero starting point.
Differences and ratios are meaningful for this level of
measurement.
Monthly income of surgeons, or distance traveled by
manufacturer’s representatives per month.
14
Level of
data
Meaningful Meaningful
Data may only 0 point &
Data are ranked difference
be classified ratio
between values between values
15
Excercise1
For each of the following random variables,
determine whether the variable is categorical or
numerical.
If the variable is numerical, determine whether the
variable of interest is discrete or continuous.
In addition, determine the level of measurement.
Variable Qualitative / Discrete / Level of Data
Quantitative Continuous
16
Solution 1
b) Mobile phone service provider
c) Number of text messages sent per month
e) Length (in minutes) of longest call made during month
f) Colour of mobile phone
g) Monthly charge (in dollars and cents) for calls made
h) Ownership of a car charge kit
i) Number of calls made per month
j) Whether there is a telephone line connected to a
computer modem in the household
k) Whether there is a fax machine in the household
17
SAMPLING TECHNIQUES
Sample
• A Sample is a tool to infer something about a
population by selecting a sample from that
population.
20
1.The physical Why Sampling?
impossibility of checking
all items in the population.
3.Determine if a probability
3 or non-probability sampling 5.Determine sample size 5
method will be chosen
Non-Response Error
Sampling Frame Error
• Cluster Sampling
Non-probability sampling
⚫ Special type
4.Snowball Sampling
Probability Sampling
28
Probability Sampling
Simple random sampling
• The major advantage of simple random sampling is
its simplicity
• As sample size increases, sample becomes more and more
representative of population.
• Sampling is generally without replacement
• Problem: can be very costly if population is large. Choices
come from a list; who makes the list?
29
Probability Sampling
2. Systematic Sampling
This is a technique which in which an initial starting point is
selected by a random process, after which every nth number on
the list is selected to constitute part of the sample
▪ Example: From a list of 1500 name entries, a name on the list is randomly
selected and then (say) every 25th name thereafter. The sampling interval in this
case would equal 25.
▪ For systematic sampling to work best, the list should be random in nature and not
have some underlying systematic pattern
30
Probability Sampling
3. Stratified Sampling
This is a technique which in which simple random subsamples are drawn from
within different strata that share some common characteristic
31
• A population is first divided into subgroups (strata), and a
sample is selected from each subgroup.
Ex:
Suppose it is necessary to study the advertising expenditures
of 352 largest companies in SLanka. We are asked to use a
sample of 50 companies. Objective of study is to determine
whether the larger companies with higher profit Spent more
on promotional activities.
32
Sub Profitability # of Percent Number
group Firms of Total Sampled
1 Over 30% 8 2 1
2 20 up to 30% 35 10 5
3 10 up to 20% 189 54 27
4 0 up to 10% 115 33 16
5 Deficit 5 1 1
Total 352 100 50
4. Cluster Sampling
A population is first divided into primary units (clusters),
and then selecting a sample of these primary units. All
observations in these selected clusters are included in
the sample.
1 2
3 4
6
5
34
• Example :
Let’s say that a researcher is studying the
academic performance of high school students in
Sri Lanka and wanted to choose a cluster sample
based on geography. First, the researcher would
divide the entire population of the Sri Lanka
States into clusters, (different provinces or
district). Then, the researcher would select either
a simple random sample or a systematic random
sample of those clusters/states. Let’s say he or
she chose a random sample of 15 districts and he
or she wanted a final sample of 5,000 students.
The researcher would then select those 5,000
high school students from those 15 districts either
through simple or systematic random sampling.
This would be an example of a tcluster sample.
• Cluster Sampling: Advantages and Disadvantages
❖ the cost per sample point is less for cluster sampling than for other
sampling methods. Given a fixed budget, the researcher may be able
to use a bigger sample with cluster sampling than with the other
methods.