Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
21 views

Qunt Data Coding & Analysis

This webinar discusses quantitative data coding and analysis. It covers key topics like data coding concepts and processes, parametric and non-parametric tests, and data visualization techniques. The webinar defines quantitative data coding as assigning numerical values or codes to collected information for analysis. It also explains data types like continuous, discrete, nominal and ordinal data and levels of measurement like interval and ratio scales.

Uploaded by

rajesh
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Qunt Data Coding & Analysis

This webinar discusses quantitative data coding and analysis. It covers key topics like data coding concepts and processes, parametric and non-parametric tests, and data visualization techniques. The webinar defines quantitative data coding as assigning numerical values or codes to collected information for analysis. It also explains data types like continuous, discrete, nominal and ordinal data and levels of measurement like interval and ratio scales.

Uploaded by

rajesh
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 104

Eudoxia Research University USA

Eudoxia Research center India

International Webinar

Quantitative Data Coding & Analysis

Dr Rajesh G Konnur
Director / Pro V C
ERC / ERU
In this webinar-
- Concept & definition of Data Coding

- Process of Data Analysis

- Parametric and Non- parametric Tests

- Tables & Graphs

- Summary & Conclusion

2
Quantitative Data Coding:

● Quantitative data coding refers to the process of assigning


numerical values or codes to the information collected in a research
study for further analysis.

● Lockyer , Sharon- “ a systematic way in which to condense


extensive data sets into smaller analyzable units through the
creation of categories & concepts derived from the data.

3
Coding Process:
1.Variable Identification: Before coding, identify the variables you want
to study. These are the characteristics or attributes that will be measured or
observed.

2. Codebook Development: A codebook is created, which is essentially a


guide that provides definitions and instructions for assigning numerical codes
to different values or categories within each variable. It helps maintain
consistency and standardization in coding.

3. Numeric Assignment: Numerical codes are then assigned to the


responses or observations based on the established codebook. These codes
represent the different categories, levels, or values of the variables.

4
Cont…
4. Data Entry: The coded data is entered into a database or statistical
software for analysis. This could involve manual data entry or the use of
automated tools.

5. Data Cleaning: After entering the data, researchers often conduct


data cleaning to identify and rectify errors or inconsistencies. This step is
crucial for ensuring the accuracy and reliability of the data.

6. Statistical Analysis: Once the data is coded and cleaned,


researchers can perform statistical analyses to identify patterns,
relationships, and trends within the data set. Common statistical methods
include descriptive statistics, inferential statistics, regression analysis,
and more.
5
Data Types:
- In quantitative research, data types refer to the different kinds of information that
researchers collect and analyze. Quantitative data are generally numeric and can be
subjected to statistical analysis.

1. Continuous Data:

• Continuous data can take any value within a given range and can be measured with
great precision.

• Examples: Age, height, weight, temperature, income, and time are typical
examples of continuous data.

• Measurement: Continuous data is often measured on a scale and can be


subdivided into smaller units. For example, age can be measured in years, months,
days, etc. 6
2. Discrete Data:
• Discrete data are countable and take on distinct, separate
values.

• Examples: The number of students in a class, the number of


cars in a parking lot, the number of books on a shelf, and the
number of goals scored in a game are examples of discrete
data.

• Measurement: Discrete data are typically whole numbers and


cannot be subdivided further. For instance, you cannot have
half a student or three-quarters of a car.
7
3. Nominal Data:
• Nominal data represent categories or labels without any inherent order or ranking.

• Examples: Gender (male, female), eye color, ethnicity, and types of cars are examples of
nominal data.

• Measurement: Nominal data are often used for classification purposes, and the categories
have no numerical significance.

4. Ordinal Data:

• Ordinal data have categories with a meaningful order or ranking, but the intervals between the
categories are not consistent or measurable.

• Examples: Educational levels (e.g., high school, college, graduate school), socioeconomic
status, and customer satisfaction ratings (e.g., very dissatisfied, dissatisfied, neutral, satisfied,
very satisfied) are examples of ordinal data.

• Measurement: The order matters, but the differences between the categories are not
8
standardized or quantifiable.
Interval Measurement:

In interval measurement, the intervals between consecutive values are equal and
meaningful, but there is no true zero point.

• Characteristics:
• The zero point is arbitrary and does not represent the absence of the measured
quantity.

• Differences between values are meaningful and can be compared.

• Ratios between values are not meaningful.

• Ex. Temperature measured in Celsius or Fahrenheit is an interval scale. The difference


between 20°C and 30°C is the same as the difference between 30°C and 40°C, but a
temperature of 0°C does not mean the absence of heat.
9
Ratio Measurement:
• Ratio measurement has all the characteristics of interval measurement, but it also has a
true zero point, which represents the absence of the measured quantity.

• Characteristics:
• Zero represents a complete absence of the measured attribute.

• Equal intervals and meaningful differences, like in interval measurement.

• Meaningful ratios between values. A value of 10 is twice as much as a value of 5.

• Ex: Height, weight, age, and income are often measured on a ratio scale. For
example, a person with a height of 180 cm is twice as tall as a person with a
height of 90 cm.

10
Data Analysis

11
Introduction:
• Data analysis is the process of cleaning , changing &
processing raw data & extracting actionable, relevant
information that help researchers make informed
decisions.
• Statistical analysis is the organisation and analysis of
quantitative or qualitative data using statistical procedures,
including both descriptive and inferential statistics.
• It’s the science of collecting, exploring and presenting
large amounts of data to discover underlying patterns
and trends.
12
Types of Data Analysis:
1) Descriptive Analysis:
- summarize the data & for better understanding the data
- expressed to describe the variables.

2) Exploratory Analysis:
- Used to discover patterns & relationships in data
- To identify correlations or create predictive models

3)Predictive Analysis:
- Used to make predictions about future outcomes

4) Prescriptive Analysis:
- used to suggest best courses of action
13
Data Analysis Process:
 Data Requirement Gathering : …….what type of data you want to use & what
data you plan to analyze.
 Data Collection : Collecting data from varied sources , e.g. case studies,
surveys, interviews, questionnaires, observation ….
 Data Cleaning : not all of the data you collect will be useful, so it’s time to
clean it up. Remove white spaces, duplicate records & basic errors.
 Data Analysis : use of software to interpret & understand the data & arrive at
conclusions. Ex. Excel, Python, R, Looker, Rapid Miner, Chartio, Metabase,
Redash & Microsoft Power BI.
 Data Interpretation : -- after results, interpret them & come up with the best
courses of action based on the findings.
 Data Visualization: .. Graphically showing information by use of charts,
graphs, maps, bullet points or a host of other methods. It derives valuable
insights by helping to compare datasets & observe relationships.

14
Elements of Statistical Analysis:

 Understand the complex relationship among


the correlates of the disease under study.
 The analysis should start with simple comparison of
proportions and means.
 Interpretation of result should be guided by clinical and
biological consideration.
15
Important items to consider in choosing a
particular analysis
 The problem or the specific objective
If the problem requires for the data to Descriptive
be summarized and described Statistics

If the problem requires for an inference to Inferential


be made Statistics

Exploratory
If the problem requires for data to
be classified or pattern determined
Statistics

16
STATISTICAL MEASURES

 Mean
 Mode
 Median
 Interquartile Range
 Standard Deviation.

17
M EA
N

• The mean is the average of all numbers

18
Example of Mean :

• Mean of 10, 20, 30, 40


25

19
MEDIAN
• When all the observations are arranged in ascending or descending
orders of magnitude, the middle one is the median.
• For raw data, If n is the total number of observations, the value of the
𝑛+1
[ ] th item will be called median .
2
𝑛
• if n is the even number, the mean of n/2 th item and [ + th item
2
1] be median.
will

Example : Median of given data 10, 20, 30 is 20

20
M ODE
• The Mode is the value of a series which appears most frequently than
any other .
• For grouped data,
Mode, M0 = L0 +{ 𝛥1 } x c
𝛥1+𝛥2
Where, L0 is lower limit of modal class,
C is class interval
𝛥 1 is difference between modal frequency and its preceding class
∆2 is difference between modal frequency and following class
frequency.

Example: mode of given data 80, 90, 86, 80, 72, 80, 96 is
80 21
INTERQUARTILE
RANGE

• The interquartile range (IQR), is a measure of statistical dispersion, being


equal to the difference between 75th and 25th percentiles, or between
upper and lower quartiles.

• IQR = Q3 − Q1.

22
Example

Interquartile range of following data 30, 20, 40, 60 , 50

• Q1 =[ 𝑛 +1
4
]th item = 1.5th item = 20+ 0.5 (30-20) = 25
• Q3 = 3[ 𝑛 +1
4
]th item = 50 +0.5x (60-50) = 55.
• IQR = 30

23
STANDARD DEVIATION

• Used to measure the amount of variation or dispersion in a set of of


values. It is a fundamental tool for understanding the extent to which
individual data points in a data set differ from the mean (average) of the
data.

• The standard deviation is defined as the positive square root of the


arithmetic mean of the square of the deviations of given observations
from their arithmetic mean.

• The standard deviation is denoted by ‘𝜎 ’.


24
S
D:

25
EXAMPLE :

• Standard deviation of data 10, 20, 30, 40, 50 where n= 5 , 𝑥 =


30

• 𝜎 = √1000/4 = √250 = 15.


811 26
STANDARD NORMAL DISTRIBUTION CURVE AND MEAN, MEDIAN,
INTERQUARTILE RANGE AND STANDARD DEVIATION

27
TYPES

• PARAMETRIC STATISTICAL ANALYSIS

• NONPARAMETRIC STATISTICAL ANALYSIS

28
PARAMETRIC STATISTICAL
ANALYSIS

• Most commonly used type of statistical analysis.

• This analysis is referred to as parametric statistical analysis because


the findings are inferred to the parameters of a normally distributed
populations.

• Numerical data (quantitative variables) that are normally distributed


are analysed with parametric tests.
29
ASSUMPTIONS :

• The assumption of normality which specifies that the means of the


sample group are normally distributed.

• The assumption of equal variance which specifies that the variances


of the samples and of their corresponding population are equal.

• The data can be treated as random samples.

30
NONPARAMETRIC STATISTICAL
ANALYSIS

- Non-parametric analysis is used in statistical testing when certain


assumptions of the parametric tests are not met, or when the data is
not normally distributed.
Scenarios :
1. Data not normally distributed: When the data does not follow a normal distribution,
non-parametric tests are more appropriate. Parametric tests, such as t-tests or ANOVA,
assume normality, and violating this assumption can lead to inaccurate results.

2. Ordinal or ranked data: Non-parametric tests are suitable for analyzing data that can
only be ranked, such as Likert scale data or data involving ratings. Parametric tests
require interval or ratio data. 31
Cont…

3.Small sample size: When the sample size is small, non-parametric tests can be
more robust. Parametric tests may not be reliable with small sample sizes.

4.Homogeneity of variance assumption violation: Non-parametric tests do not


require the assumption of homogeneity of variance, which is required for some
parametric tests.

5.Outliers: Non-parametric tests can be more robust to the presence of outliers


compared to parametric tests.

6.Categorical data: Non-parametric tests are also used when the data is inherently
categorical, such as in the case of frequency counts or proportions.

32
EXPLORATORY DATA ANALYSIS AND
CONFIRMATORY DATA ANALYSIS
• John Tukey

• Exploratory data analysis to obtain a preliminary indication of the nature


of the data and to search data for hidden structure or models.
• Confirmatory data analysis involves traditional inferential statistics ,
which you can use to make an inference about a population or a process
based on evidence from the study sample.
33
STATISTICAL ANALYSIS DECISION MAKING
Two group Parametric Independent 2 sample t test
comparison
Mean Nonparametric Mann Witney U test

Percentage Chi-Square Test


One group
comparison
Single mean One sample t test

Mean Parametric Paired t test


Mean
differenc Non parametric Wilcoxan Signed Scale test
More than 2 e
group
Parametric ANOVA
Mean
comparison
Non parametric Kruskal Walli’s test
Percentage Chi square test 34
 Student's t-test
PARAMETRI  Z test
C STATISTICAL  Analysis of variance (ANOVA)
ANALYSIS

35
Student's t-test :

• Developed by Prof.W.S.Gossett

• Student's t-test is used to test the null hypothesis that there is no difference between
the means of the two groups.

• One-sample t-test:

• Independent Two Sample T Test (the unpaired t-test) :

• The paired t-test : 36


One-sample t-test

• To test if a sample mean (as an estimate of a population mean) differs


significantly from a given population mean.
• The mean of one sample is compared with population mean.

where 𝑥 = sample mean, u = population mean and S = standard deviation,


n = sample size.
37
Example

A random sample of size 20 from a normal population gives a sample mean of


40, standard deviation of 6. Test the hypothesis is population mean is 44. Check
whether there is any difference between mean.
• H0: There is no significant difference between sample mean and
population mean
• H1: There is no significant difference between sample mean and
population mean

mean = 40 , 𝜇 = 44, n = 20 and S = 38


Cont….

•tcalculated = 2.981
•t table value = 2.093
•tcalculated > t table value ;
Reject H0.

39
Independent Two Sample T Test (the unpaired t-test)

• To test if the population means estimated by two independent


samples differ significantly.

• Two different samples with same mean at initial point and


compare mean at the end

40
𝑥1−𝑥 2
t=
1 2 1 + 1
𝑛 1 −1 𝑆 2 + 𝑛 2 −1 𝑆2 𝑛1 𝑛2
𝑛 1 +𝑛 2 −2

Where 𝑥 1 -𝑥 2 is the difference between the means of the two


groups and S denotes the standard deviation.

41
Example

Mean Hb level of 5 male are 10, 11, 12.5, 10.5, 12 and 5 female are 10, 17.5, 14.2,15 and
14.1 . Test whether there is any significant difference between Hb values.
• H0: There is no significant difference between Hb Level
• H1: There is no significant difference between Hb level.

t= 𝑥1−𝑥 2
𝑛 1 −1 𝑆 21+ 𝑛 2 −1 𝑆2 2 1 + 1
𝑛 1 +𝑛 2 −2 𝑛1 𝑛2

42
X1 X2
X1 - 𝑥1 X2 - 𝑥2 (X1 - 𝑥1)2 (X2 - 𝑥2)2
10 10 -1.2 -4.16 1.44 17.305
11 17.5 - 0.2 3.34 0.04 11.156
12.5 14.2 1.3 0.04 1.69 0.0016
10.5 15 -0.7 0.84 0.49 0.706
12 14.1 0.8 -0.06 0.64 0.0036
Σ = 56 70.8 4.3 29.172

• 𝑥1= 11.2 , 𝑥2 =14.161 , 𝑆2 = 1.075, 𝑆22=


7.293
• calculated
t = 2.287, t = 2.306,
table t >t
calculated table value ; reject H0.

43
The paired t-test

• To test if the population means estimated by two dependent samples differ


significantly .
• A usual setting for paired t-test is when measurements are made on the
same subjects before and after a treatment.

where 𝑑 is the mean difference and Sd denotes the standard deviation of the
difference.

44
Example

Systolic BP of 5 patients before and after a drug therapy is


Before 160, 150, 170, 130, 140
After 140, 110, 120, 140, 130
Test whether there is any significant difference between BP level.

• H0: There is no significant difference between BP Level before and after


drug
• H1: There is no significant difference between BP level before and after
drug
45
Before After d d-𝑑 (d-𝑑 )2
160 140 20 -2 4
150 110 40 18 324
170 120 50 28 784
130 140 -10 -32 1024
140 130 10 -12 144
𝛴 110
𝑑= 2280
• 𝑑 = 22, Sd = 23.875
46
Z test

Generally, z-tests are used when we have large sample sizes (n > 30),
whereas t-tests are most helpful with a smaller sample size (n < 30).

Both methods assume a normal distribution of the data, but the z-tests
are most useful when the standard deviation is known.

z = (x – μ) / (σ / √n)

47
ANALYSIS OF VARIANCE
(ANOVA)

• R. A. Fischer.
• The Student's t-test cannot be used for comparison of three or more groups.
• The purpose of ANOVA is to test if there is any significant difference between the
means of two or more groups.
• The analysis of variance is the systematic algebraic procedure of decomposing the
overall variation in the responses observed in an experiment into variation.
• Two variances – (a) between-group variability and (b) within-group variability that is
variation existing between the samples and variations existing within the sample.
• The within-group variability (error variance) is the variation that cannot be
accounted for in the study design.
• The between-group (or effect variance) is the result of treatment

48
• A simplified formula for the F statistic is

where MST is the mean squares between the groups and MSE is the
mean squares within groups

49
 CHI-SQUARE TEST

NONPARAMET  THE WILCOXON'S SIGNED RANK TEST

RIC STATISTICAL  MANN-WHITNEY U TEST

ANALYSIS  KRUSKAL-WALLIS TEST

50
CHI-SQUARE
TEST

• Tests to analyse the categorical data.

• The chi-square test is a widely used test in statistical decision making.

• The test is first used by Karl pearson in 1900.

• The Chi-square test compares the frequencies and tests whether the
observed data differ significantly from that of the expected data.
51
CHI-SQUARE
TEST

It is calculated by the sum of the squared difference between observed


(O) and the expected (E) data (or the deviation, d) divided by the
expected data by the following formula:

52
Example

• Attack rates among vaccinated and not vaccinated against measles


are given in the following table. Test the association between
association between vaccination and attack of measles.

Groups Attacked Not attacked

Vaccinated 10 90
Not vaccinated 26 74
53
• H0: There is no significant association between vaccination and attack
of measles.

• H1: There is significant association between vaccination and attack of


measles.

54
Oi Ei Oi - Ei (Oi - Ei )2 (Oi - Ei )2 /
Ei

10 18 -8 64 3.556
90 82 8 64 0.780
26 18 8 64 3.556
74 82 -8 64 0.780
𝛴= 8.672
• Chi square table value = 3.841 , chi square calculated value = 8.672
• 𝑥
2
calculated > 𝑥2 table value ; Reject H0.

55
THE WILCOXON'S SIGNED RANK
TEST :

• Wilcoxon's rank sum test ranks all data points in order, calculates the rank
sum of each sample and compares the difference in the rank sums.

• For testing whether the differences observed in the values of the


quantitative variable between two correlated samples (before and
after design ) are statistically different or not.

• This test corresponds to the paired t test.

56
M ethod :

• H0: There is no difference in the paired values, on an average, between the two
groups.
• H1: There is difference in the paired values, on an average, between the two
groups.
• Compute the difference between each group of paired values in the two group.
• Rank the difference from smallest, without considering the sign of difference.
• After giving ranks, the corresponding sign should be attached.
• T+ (Sum of ranks of positive sign) and T- (Sum of ranks between negative sign). T is
taken as smallest of T+ and T-. Then Wstat is the smallest value of T- and T+ .
• Find the W critical value from Wilcoxon’s Signed rank Table .
• if Wstat < WCritical Value; Reject H0. 57
EXAM PLE :

• IQ values of 8 malnourished children of 4 years age before and after


giving some nutritious diet for 3 months are given below.

Before 40 60 55 65 43 70 80 60

After 50 80 50 70 40 60 90 85

58
• H0: There is no difference in the paired values.
• H1: There is difference in the paired values.

Before 40 60 55 65 43 70 80 60
After 50 80 50 70 40 60 90 85
Difference -10 -20 5 -5 3 10 -10 -15

Absolute 10 20 5 5 3 10 10 15
difference

Rank 5 8 2.5 2.5 1 5 5 7

59
• T+ = 8.5, T- = 27.5. T = 8.5

• Wstat = 8.5, Wcritic = 3

• Wstat > WCritical Value; Accept H0.

60
• If Assuming normal distribution for the differences, test statistic is,

Z = {|T-m| -0.5} / SD

Where T = smaller of T+ and T- , m= mean sum of ranks {n(n+1)}/4 and

𝑛 𝑛+1 2𝑛+ 1
SD = √{ }
24
• If Z is less than 1.96, H0 is accepted and if Z>1.96 , H0 is rejected

61
MANN-WHITNEY U
TEST

• For testing whether two independent samples with respect to a


quantitative variable come from the same population or not.
• Wilcoxon’s Rank Sum test.
• It is used to test the null hypothesis that two samples have the same
median or, alternatively, whether observations in one sample tend to be
larger than observations in the other.
• This test is alternative of t test for two independent samples.

62
METHOD of calculation:

H0: The average values in the two groups are the same H1: The
average values in the two groups are the different
• Let n1 is the sample size of one group and n2 is the sample size of second
group, Rank all the values in the two groups take together. Tied values
should be given same ranks.
• The ranksum of each group is taken and Ustat is calculated using
Ustat = Rank Sum - {n(n +1)/2 }.
• Both U1 and U2 is calculated and smaller value is taken as Ustat. and
Ucritical value is calculated from the Mann- Whitney U test table
• if U < U ; ; Reject H0. 63
Example

Treatment A Treatment B
3 9
4 7
2 5
6 10
2 6
5 8
64
• H0: The average values in the 2 treatment are the same.

• H1: The average values in the 2 treatment are the different


Ustat = Rank Sum - {n(n +1)/2 }.

Ranks 1 2 3 4 5 6 7 8 9 10 11 12

Values 2 2 3 4 5 5 6 6 7 8 9 10

Rank 1.5 1.5 3 4 5.5 5.5 7.5 7.5 9 10 11 12

65
• UA = 23 – 21 = 2, UB = 55- 21 =34 so Ustat = 2 (lowest
value)
• Ucritic = 5
• Ustat < UCritical value; Reject H0.
66
• Assuming that the ranks are randomly distributed in the two groups,
the test statisticis
Z = {|m-T| -0.5} / SD
Where T = smaller of T1 and T2.
T1 = sum of the ranks of smaller group, T2 = {(n1 +n2)(n1 +n2 +1) / 2} – T1 ,

m= mean sum of ranks { n1 ( n1 +n2+1)}/2

n 1 x n 2 )( n 1+ n 2+ 1
SD = √{ }
12

• If Z is less than 1.96, H0 is accepted.

• if Z>1.96 , H0 is rejected at 5% level of significance 67


KRUSKAL-WALLIS
TEST

• The Kruskal–Wallis test is a non-parametric test to analyse the


variance.

• It is for the comparison among several independent samples.

• For testing whether several independent samples of a quantitative


variable come from the same population or not.

• It corresponds to one way analysis of variance in parametric methods.


68
• It analyses if there is any difference in the median values of three or more
independent samples.
• The data values are ranked in an increasing order, and the rank
sums calculated followed by calculation of the test.

Where n is the total of sample sizes in all the groups and Ri is the sum of
the ranks in the ith group.
69
M ethod :

H0: The average values in the different groups are the same
H1: The average values in the different groups are the different
• Rank the all values taking all the group together.

• The chisquare table is used to get table value at 5% level of significane


• if Hstat is < Htable value ; reject H0

70
Example

Sample 1 Sample 2 Sample 3


8 10 13
10 9 8
9 13 9
12 14 13
11 9 17
13 16 15

71
• H0: The average values in the three groups are the same.

• H1: The average values in the three groups are the different.

Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Values 8 8 9 9 9 9 10 10 11 12 13 13 13 13 14 15 16 17
Tied 1.5 1.5 4.75 4.75 4.75 4.75 7.5 7.5 9 10 12.5 12.5 12.5 12.5 15 16 17 18
rank
72
Sample 1 Rank 1 Sample 2 Rank 2 Sample 3 Rank 3
8 1.5 10 7.5 13 12.5
10 7.5 9 4.75 8 1.5
9 4.75 13 12.5 9 4.75
12 10 14 15 13 12.5
11 9 9 4.75 17 18
13 12.5 16 17 15 16
𝛴= 45.25 𝛴= 61.5 𝛴= 65.25
• H = {12/18x19 [ (45.252 /6 ) + (61.52/6) + (65.52/6 )]} – 3x19

• Hcalculated = 56.99 , 𝑥2 table value = 5.99


• Hstat > 𝑥2 table value ; Reject H0 73
WAYS TO RULE OUT ALTERNATIVE EXPLANATIONS FOR
OUTCOMES BY USING STATISTICAL ANALYSIS

• Testing null hypothesis.

• Determining the probability of type I and type II error.

• Calculating and reporting tests of effect size.

• Ensuring data meet the fundamental assumptions of the statistical test

74
TESTING NULL HYPOTHESIS
• When attempting to determine if an outcome is related to a cause, it is necessary
to know if the outcomes or results could have occurred by chance alone.

• This cannot be done with certainity, but researchers can determine the probability
that the hypothesis is true.

• Accepting a null hypothesis is a statement that there are no differences in the


outcomes based on the intervention or observation(that is, there is no cause and
effect relationship).

• Using a null hypothesis enables the researcher to quantify and report the
probability that the outcome was due to random error.

75
DETERMINING THE PROBABILITY OF TYPE I
AND TYPE II ERROR
• Before accepting the results as evidence for practice, however
the probability that an error was made should be evaluated.

• This coupled with the results of the hypothesis test, enables


the researcher to quantify the role of error in the outcome.

• The relationship between Type I and Type II error is


paradoxical – as one is controlled, the risk of other increases.

• Both types of error should be avoided


76
CALCULATING AND REPORTING TESTS OF EFFECT SIZE
• Effect size refers to how much impact the intervention or variable is
expected to have on the outcome.

• Large effect sizes enhance the confidence of the findings.


When a treatment exerts a dramatic effect, then the validity of
the findings is not so called into question.

• On the other hand, when effect sizes are very small, then the
potential for effects from extraneous variables is more likely and
the results may have less credibility.

77
Ensuring data meet the fundamental assumptions
of the statistical test
• Data analysis is based on many assumptions about the
nature of the data, the statistical procedures that are
used to conduct the analysis and the match between the
data and the procedure.

• If assumption is violated, the result can be an


inaccurate estimate of the real relationship.


In accurate conclusions lead to an error, which in
turn affects the validity of a study.
78
RESOURCES FOR STATISTICAL
ANALYSIS PROGRAM

• Packaged computer programs can perform the data analysis


and provide with the results of analysis on a computer
printout.
• SPSS, SAS and Biomedical Data Processing (BMDP)
• If the analysis selected are inappropriate for the data, the
computer program is often unable to detect that error and
proceed to perform the analysis.
79
e-Statistical Tool (open source)

80
Pit Falls of Statistics :

• Statistics can be used, intentionally or unintentionally, to reach faulty


conclusions. Misleading information is unfortunately the norm in
advertising. The drug companies, for example, are well known to
indulge in misleading information.
• Data dredging
• Survey questions
It is therefore important that to understand not just the numbers but
the meaning behind the numbers. Statistics is a tool, not a substitute
for in-depth reasoning and analysis
81
Charts and Diagrams

-Charts and diagrams are useful methods of presenting simple data.


-They have powerful impact on imagination of people. Gives
information at a glance.

- Diagrams are better retained in memory than statistical table. -


However graphs cannot be substituted for statistical table,
because the graphs cannot have mathematical treatment where as tables
can be treated mathematically.

-Whenever graphs are compared , the difference in the scale should


be noted.
-
Common diagrams

• Pie chart
• Simple bar diagram
• Multiple bar diagram
• Component bar diagram or subdivided bar
• diagram Histogram
• Frequency polygon
• Frequency curve
• O give curve
• Scatter diagram
• Line diagram
• Pictogram
• Statistical maps
Bar charts
Year Wise Enrollment of students in Government school
300
300
260
230
250
200
200 160
150
150 120
100
100 70

50

0
One Two Three Four Five Six Eight Nine
Seven
No. of Students
Multiple Bar Charts
• Also called compound bar charts.
• More then one sub-attribute of variable can be expressed.
6
0
Population
5 Land
0
Percentage of World

4
0

3
0
Total

2
0 Asi Europe Africa Latin USSR North Oceania
a America
1
0 America
Component bar charts

• When there are many categories on X-axis (more than 5) and they
have further subcategories, then to accommodate the categories, the
bars may be divided into parts, each part representing a certain item
and proportional to the magnitude of that particular item.
India :: Growth of Population

800

Population in Million
700 Growth
600
500
400
300
200
100
0
01

31

61

81

91
11

21

41

51

71

01
19

19

19

19

19

19

19

19

19

19

20
Censs Decades
Cont…

12
0
10
0 Female
80 Male
60
40
20
Pakista US Swede
0 n A n
Histogram

• Used for Quantitative, Continuous, Variables.


• It is used to present variables which have no gaps e.g
age, weight, height, blood pressure, blood sugar etc.
• It consist of a series of blocks. The class intervals are
given along horizontal axis and the frequency along
the vertical axis.
Histogram of Grouped frequency distrinution
of serum cholestrol levels in 200 men

80
70
frequency 60
50
40
30
20
10
0
161- 171- 181- 191- 201- 211- 221- 231- 241- 251-
170 180 190 200 210 220 230 240 250
260
Serum Cholestrol, mg/dl
Frequency polygon

-Frequency polygon is an area diagram of frequency distribution over a histogram.


- It is a linear representation of a frequency table and histogram, obtained by joining
the mid points of the hitogram blocks.
- Frequency is plotted at the central point of a group

percentage total frequency

250

200
percentage total
150 frequency

100

50

0
59-69 69-79 79-89 89-99 99- 109- 119- 129-
109 119 129 139
Line diagram

Line diagrams are used to show the trend of events with the passage of time.

Line diagram showing the malaria cases reported throughout the word excluding
African region during 1972-78.

C 10

8
a

s6
e4
s 0
2

1972 73 74 75 76 77 78
Pie charts

• Most common way of presenting data.

• The value of each category is divided by the total values and


then multiplied by 360 and then each category is allocated
the respective angle to present the proportion it has.

• It is often necessary to indicate percentages in the segment


as it may not be sometimes very easy virtually, to compare
the areas of segments.
Cont…

World Population

Developi
ng
Countries
Developed Countries
26%
Developing Countries

Develope
d
Countries
74%
Pictogram

• Popular method of presenting data to those


who cannot understand orthodox charts.
• Small pictures or symbols are used to present
the data,e.g a picture of a doctor to represent the
population physician.
• Fraction of the picture can be used to represent
numbers smaller than the value of whole symbol
Statistical maps

• When statistical data refers to geographic or administrative


areas, it is presented either as statistical map or dot map.

• The shaded maps are used to present data of varying


size. The areas are shaded with different colour or
different intensities of the same colour, which is indicated
in the key.
Scatter diagram

The scatter diagram graphs pairs of numerical


data, with one variable on each axis, to look for a
relationship between them. If the variables are
correlated, the points will fall along a line or curve.
• The scatter diagram is used to find the
correlation between two variables.

• This diagram shows you how closely the


two variables are related.

• After determining the correlation between


the variables, you can easily predict the
behavior of the other variable.
97
• This chart is very useful when one
variable is easy to measure and the other
is not.
When to Use a Scatter Diagram?

 When you have paired numerical data.

When your dependent variable may have multiple values for each value of
your independent variable.

When trying to determine whether the two variables are related, such as,

• When trying to identify potential root causes of problems.

• When testing for autocorrelation before constructing a control chart.


Limitations of a Scatter Diagram

The following are a few limitations of a scatter diagram:


• Scatter diagram is unable to give you the exact extent of
correlation.
• Scatter diagram does not show you the quantitative measure of
the relationship between the variable. It only shows the
quantitative expression of the quantitative change.
• This chart does not show you the relationship for more than two
variables.
Advantages of Scatter Diagram:

The following are a few advantages of a scatter diagram:


• It shows the relationship between two variables.
• It is the best method to show you a non-linear pattern.
• The range of data flow, i.e. maximum and minimum value, can be
easily determined.
• Observation and reading is straightforward.
• Plotting the diagram is relatively simple.
Example :

By using two variables, Test Cases and Employees with designation in scatter diagram, it is
easy to determine the best productivity of employees.

Test Cases Employees


90 Manager

60 Test lead
35 Senior Test Engineer

40 Junior Test Engineer

20 Trainee
100

80

No of Test Cases 60

40

20

T TL M 102
JTE STE

Employees
Summary & Conclusion:

● Data analysis is a process of inspecting, cleansing, transforming & modeling


data with the goal of discovering useful information, informing conclusions &
supporting decision-making.

● It includes data gathering, cleaning, analysis, interpretation & visualization.

103
Thanks

104

You might also like