0% found this document useful (0 votes)

21 views

Qunt Data Coding & Analysis

This webinar discusses quantitative data coding and analysis. It covers key topics like data coding concepts and processes, parametric and non-parametric tests, and data visualization techniques. The webinar defines quantitative data coding as assigning numerical values or codes to collected information for analysis. It also explains data types like continuous, discrete, nominal and ordinal data and levels of measurement like interval and ratio scales.

Uploaded by

rajesh

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Qunt Data Coding & Analysis

Uploaded by

rajesh

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 104

Eudoxia Research University USA

Eudoxia Research center India

International Webinar

Quantitative Data Coding & Analysis

Dr Rajesh G Konnur
Director / Pro V C
ERC / ERU
In this webinar-
- Concept & definition of Data Coding

- Process of Data Analysis

- Parametric and Non- parametric Tests

- Tables & Graphs

- Summary & Conclusion

2
Quantitative Data Coding:

● Quantitative data coding refers to the process of assigning

numerical values or codes to the information collected in a research
study for further analysis.

● Lockyer , Sharon- “ a systematic way in which to condense

extensive data sets into smaller analyzable units through the
creation of categories & concepts derived from the data.

3
Coding Process:
1.Variable Identification: Before coding, identify the variables you want
to study. These are the characteristics or attributes that will be measured or
observed.

2. Codebook Development: A codebook is created, which is essentially a

guide that provides definitions and instructions for assigning numerical codes
to different values or categories within each variable. It helps maintain
consistency and standardization in coding.

3. Numeric Assignment: Numerical codes are then assigned to the

responses or observations based on the established codebook. These codes
represent the different categories, levels, or values of the variables.

4
Cont…
4. Data Entry: The coded data is entered into a database or statistical
software for analysis. This could involve manual data entry or the use of
automated tools.

5. Data Cleaning: After entering the data, researchers often conduct

data cleaning to identify and rectify errors or inconsistencies. This step is
crucial for ensuring the accuracy and reliability of the data.

6. Statistical Analysis: Once the data is coded and cleaned,

researchers can perform statistical analyses to identify patterns,
relationships, and trends within the data set. Common statistical methods
include descriptive statistics, inferential statistics, regression analysis,
and more.
5
Data Types:
- In quantitative research, data types refer to the different kinds of information that
researchers collect and analyze. Quantitative data are generally numeric and can be
subjected to statistical analysis.

1. Continuous Data:

• Continuous data can take any value within a given range and can be measured with
great precision.

• Examples: Age, height, weight, temperature, income, and time are typical
examples of continuous data.

• Measurement: Continuous data is often measured on a scale and can be

subdivided into smaller units. For example, age can be measured in years, months,
days, etc. 6
2. Discrete Data:
• Discrete data are countable and take on distinct, separate
values.

• Examples: The number of students in a class, the number of

cars in a parking lot, the number of books on a shelf, and the
number of goals scored in a game are examples of discrete
data.

• Measurement: Discrete data are typically whole numbers and

cannot be subdivided further. For instance, you cannot have
half a student or three-quarters of a car.
7
3. Nominal Data:
• Nominal data represent categories or labels without any inherent order or ranking.

• Examples: Gender (male, female), eye color, ethnicity, and types of cars are examples of
nominal data.

• Measurement: Nominal data are often used for classification purposes, and the categories
have no numerical significance.

4. Ordinal Data:

• Ordinal data have categories with a meaningful order or ranking, but the intervals between the
categories are not consistent or measurable.

• Examples: Educational levels (e.g., high school, college, graduate school), socioeconomic
status, and customer satisfaction ratings (e.g., very dissatisfied, dissatisfied, neutral, satisfied,
very satisfied) are examples of ordinal data.

• Measurement: The order matters, but the differences between the categories are not
8
standardized or quantifiable.
Interval Measurement:

In interval measurement, the intervals between consecutive values are equal and
meaningful, but there is no true zero point.

• Characteristics:
• The zero point is arbitrary and does not represent the absence of the measured
quantity.

• Differences between values are meaningful and can be compared.

• Ratios between values are not meaningful.

• Ex. Temperature measured in Celsius or Fahrenheit is an interval scale. The difference

between 20°C and 30°C is the same as the difference between 30°C and 40°C, but a
temperature of 0°C does not mean the absence of heat.
9
Ratio Measurement:
• Ratio measurement has all the characteristics of interval measurement, but it also has a
true zero point, which represents the absence of the measured quantity.

• Characteristics:
• Zero represents a complete absence of the measured attribute.

• Equal intervals and meaningful differences, like in interval measurement.

• Meaningful ratios between values. A value of 10 is twice as much as a value of 5.

• Ex: Height, weight, age, and income are often measured on a ratio scale. For
example, a person with a height of 180 cm is twice as tall as a person with a
height of 90 cm.

10
Data Analysis

11
Introduction:
• Data analysis is the process of cleaning , changing &
processing raw data & extracting actionable, relevant
information that help researchers make informed
decisions.
• Statistical analysis is the organisation and analysis of
quantitative or qualitative data using statistical procedures,
including both descriptive and inferential statistics.
• It’s the science of collecting, exploring and presenting
large amounts of data to discover underlying patterns
and trends.
12
Types of Data Analysis:
1) Descriptive Analysis:
- summarize the data & for better understanding the data
- expressed to describe the variables.

2) Exploratory Analysis:
- Used to discover patterns & relationships in data
- To identify correlations or create predictive models

3)Predictive Analysis:
- Used to make predictions about future outcomes

4) Prescriptive Analysis:
- used to suggest best courses of action
13
Data Analysis Process:
 Data Requirement Gathering : …….what type of data you want to use & what
data you plan to analyze.
 Data Collection : Collecting data from varied sources , e.g. case studies,
surveys, interviews, questionnaires, observation ….
 Data Cleaning : not all of the data you collect will be useful, so it’s time to
clean it up. Remove white spaces, duplicate records & basic errors.
 Data Analysis : use of software to interpret & understand the data & arrive at
conclusions. Ex. Excel, Python, R, Looker, Rapid Miner, Chartio, Metabase,
Redash & Microsoft Power BI.
 Data Interpretation : -- after results, interpret them & come up with the best
courses of action based on the findings.
 Data Visualization: .. Graphically showing information by use of charts,
graphs, maps, bullet points or a host of other methods. It derives valuable
insights by helping to compare datasets & observe relationships.

14
Elements of Statistical Analysis:

 Understand the complex relationship among

the correlates of the disease under study.
 The analysis should start with simple comparison of
proportions and means.
 Interpretation of result should be guided by clinical and
biological consideration.
15
Important items to consider in choosing a
particular analysis
 The problem or the specific objective
If the problem requires for the data to Descriptive
be summarized and described Statistics

If the problem requires for an inference to Inferential

be made Statistics

Exploratory
If the problem requires for data to
be classified or pattern determined
Statistics

16
STATISTICAL MEASURES

 Mean
 Mode
 Median
 Interquartile Range
 Standard Deviation.

17
M EA
N

• The mean is the average of all numbers

18
Example of Mean :

• Mean of 10, 20, 30, 40

19
MEDIAN
• When all the observations are arranged in ascending or descending
orders of magnitude, the middle one is the median.
• For raw data, If n is the total number of observations, the value of the
𝑛+1
[ ] th item will be called median .
2
𝑛
• if n is the even number, the mean of n/2 th item and [ + th item
2
1] be median.
will

Example : Median of given data 10, 20, 30 is 20

20
M ODE
• The Mode is the value of a series which appears most frequently than
any other .
• For grouped data,
Mode, M0 = L0 +{ 𝛥1 } x c
𝛥1+𝛥2
Where, L0 is lower limit of modal class,
C is class interval
𝛥 1 is difference between modal frequency and its preceding class
∆2 is difference between modal frequency and following class
frequency.

Example: mode of given data 80, 90, 86, 80, 72, 80, 96 is
80 21
INTERQUARTILE
RANGE

• The interquartile range (IQR), is a measure of statistical dispersion, being

equal to the difference between 75th and 25th percentiles, or between
upper and lower quartiles.

• IQR = Q3 − Q1.

22
Example

Interquartile range of following data 30, 20, 40, 60 , 50

• Q1 =[ 𝑛 +1
4
]th item = 1.5th item = 20+ 0.5 (30-20) = 25
• Q3 = 3[ 𝑛 +1
4
]th item = 50 +0.5x (60-50) = 55.
• IQR = 30

23
STANDARD DEVIATION

• Used to measure the amount of variation or dispersion in a set of of

values. It is a fundamental tool for understanding the extent to which
individual data points in a data set differ from the mean (average) of the
data.

• The standard deviation is defined as the positive square root of the

arithmetic mean of the square of the deviations of given observations
from their arithmetic mean.

• The standard deviation is denoted by ‘𝜎 ’.

24
S
D:

25
EXAMPLE :

• Standard deviation of data 10, 20, 30, 40, 50 where n= 5 , 𝑥 =

• 𝜎 = √1000/4 = √250 = 15.

811 26
STANDARD NORMAL DISTRIBUTION CURVE AND MEAN, MEDIAN,
INTERQUARTILE RANGE AND STANDARD DEVIATION

27
TYPES

• PARAMETRIC STATISTICAL ANALYSIS

• NONPARAMETRIC STATISTICAL ANALYSIS

28
PARAMETRIC STATISTICAL
ANALYSIS

• Most commonly used type of statistical analysis.

• This analysis is referred to as parametric statistical analysis because

the findings are inferred to the parameters of a normally distributed
populations.

• Numerical data (quantitative variables) that are normally distributed

are analysed with parametric tests.
29
ASSUMPTIONS :

• The assumption of normality which specifies that the means of the

sample group are normally distributed.

• The assumption of equal variance which specifies that the variances

of the samples and of their corresponding population are equal.

• The data can be treated as random samples.

30
NONPARAMETRIC STATISTICAL
ANALYSIS

- Non-parametric analysis is used in statistical testing when certain

assumptions of the parametric tests are not met, or when the data is
not normally distributed.
Scenarios :
1. Data not normally distributed: When the data does not follow a normal distribution,
non-parametric tests are more appropriate. Parametric tests, such as t-tests or ANOVA,
assume normality, and violating this assumption can lead to inaccurate results.

2. Ordinal or ranked data: Non-parametric tests are suitable for analyzing data that can
only be ranked, such as Likert scale data or data involving ratings. Parametric tests
require interval or ratio data. 31
Cont…

3.Small sample size: When the sample size is small, non-parametric tests can be
more robust. Parametric tests may not be reliable with small sample sizes.

4.Homogeneity of variance assumption violation: Non-parametric tests do not

require the assumption of homogeneity of variance, which is required for some
parametric tests.

5.Outliers: Non-parametric tests can be more robust to the presence of outliers

compared to parametric tests.

6.Categorical data: Non-parametric tests are also used when the data is inherently
categorical, such as in the case of frequency counts or proportions.

32
EXPLORATORY DATA ANALYSIS AND
CONFIRMATORY DATA ANALYSIS
• John Tukey

• Exploratory data analysis to obtain a preliminary indication of the nature

of the data and to search data for hidden structure or models.
• Confirmatory data analysis involves traditional inferential statistics ,
which you can use to make an inference about a population or a process
based on evidence from the study sample.
33
STATISTICAL ANALYSIS DECISION MAKING
Two group Parametric Independent 2 sample t test
comparison
Mean Nonparametric Mann Witney U test

Percentage Chi-Square Test

One group
comparison
Single mean One sample t test

Mean Parametric Paired t test

Mean
differenc Non parametric Wilcoxan Signed Scale test
More than 2 e
group
Parametric ANOVA
Mean
comparison
Non parametric Kruskal Walli’s test
Percentage Chi square test 34
 Student's t-test
PARAMETRI  Z test
C STATISTICAL  Analysis of variance (ANOVA)
ANALYSIS

35
Student's t-test :

• Developed by Prof.W.S.Gossett

• Student's t-test is used to test the null hypothesis that there is no difference between
the means of the two groups.

• One-sample t-test:

• Independent Two Sample T Test (the unpaired t-test) :

• The paired t-test : 36

One-sample t-test

• To test if a sample mean (as an estimate of a population mean) differs

significantly from a given population mean.
• The mean of one sample is compared with population mean.

where 𝑥 = sample mean, u = population mean and S = standard deviation,

n = sample size.
37
Example

A random sample of size 20 from a normal population gives a sample mean of

40, standard deviation of 6. Test the hypothesis is population mean is 44. Check
whether there is any difference between mean.
• H0: There is no significant difference between sample mean and
population mean
• H1: There is no significant difference between sample mean and
population mean

mean = 40 , 𝜇 = 44, n = 20 and S = 38

Cont….

•tcalculated = 2.981
•t table value = 2.093
•tcalculated > t table value ;
Reject H0.

39
Independent Two Sample T Test (the unpaired t-test)

• To test if the population means estimated by two independent

samples differ significantly.

• Two different samples with same mean at initial point and

compare mean at the end

40
𝑥1−𝑥 2
t=
1 2 1 + 1
𝑛 1 −1 𝑆 2 + 𝑛 2 −1 𝑆2 𝑛1 𝑛2
𝑛 1 +𝑛 2 −2

Where 𝑥 1 -𝑥 2 is the difference between the means of the two

groups and S denotes the standard deviation.

41
Example

Mean Hb level of 5 male are 10, 11, 12.5, 10.5, 12 and 5 female are 10, 17.5, 14.2,15 and
14.1 . Test whether there is any significant difference between Hb values.
• H0: There is no significant difference between Hb Level
• H1: There is no significant difference between Hb level.

t= 𝑥1−𝑥 2
𝑛 1 −1 𝑆 21+ 𝑛 2 −1 𝑆2 2 1 + 1
𝑛 1 +𝑛 2 −2 𝑛1 𝑛2

42
X1 X2
X1 - 𝑥1 X2 - 𝑥2 (X1 - 𝑥1)2 (X2 - 𝑥2)2
10 10 -1.2 -4.16 1.44 17.305
11 17.5 - 0.2 3.34 0.04 11.156
12.5 14.2 1.3 0.04 1.69 0.0016
10.5 15 -0.7 0.84 0.49 0.706
12 14.1 0.8 -0.06 0.64 0.0036
Σ = 56 70.8 4.3 29.172

• 𝑥1= 11.2 , 𝑥2 =14.161 , 𝑆2 = 1.075, 𝑆22=

7.293
• calculated
t = 2.287, t = 2.306,
table t >t
calculated table value ; reject H0.

43
The paired t-test

• To test if the population means estimated by two dependent samples differ

significantly .
• A usual setting for paired t-test is when measurements are made on the
same subjects before and after a treatment.

where 𝑑 is the mean difference and Sd denotes the standard deviation of the
difference.

44
Example

Systolic BP of 5 patients before and after a drug therapy is

Before 160, 150, 170, 130, 140
After 140, 110, 120, 140, 130
Test whether there is any significant difference between BP level.

• H0: There is no significant difference between BP Level before and after

drug
• H1: There is no significant difference between BP level before and after
drug
45
Before After d d-𝑑 (d-𝑑 )2
160 140 20 -2 4
150 110 40 18 324
170 120 50 28 784
130 140 -10 -32 1024
140 130 10 -12 144
𝛴 110
𝑑= 2280
• 𝑑 = 22, Sd = 23.875
46
Z test

Generally, z-tests are used when we have large sample sizes (n > 30),
whereas t-tests are most helpful with a smaller sample size (n < 30).

Both methods assume a normal distribution of the data, but the z-tests
are most useful when the standard deviation is known.

z = (x – μ) / (σ / √n)

47
ANALYSIS OF VARIANCE
(ANOVA)

• R. A. Fischer.
• The Student's t-test cannot be used for comparison of three or more groups.
• The purpose of ANOVA is to test if there is any significant difference between the
means of two or more groups.
• The analysis of variance is the systematic algebraic procedure of decomposing the
overall variation in the responses observed in an experiment into variation.
• Two variances – (a) between-group variability and (b) within-group variability that is
variation existing between the samples and variations existing within the sample.
• The within-group variability (error variance) is the variation that cannot be
accounted for in the study design.
• The between-group (or effect variance) is the result of treatment

48
• A simplified formula for the F statistic is

where MST is the mean squares between the groups and MSE is the
mean squares within groups

49
 CHI-SQUARE TEST

NONPARAMET  THE WILCOXON'S SIGNED RANK TEST

RIC STATISTICAL  MANN-WHITNEY U TEST

ANALYSIS  KRUSKAL-WALLIS TEST

50
CHI-SQUARE
TEST

• Tests to analyse the categorical data.

• The chi-square test is a widely used test in statistical decision making.

• The test is first used by Karl pearson in 1900.

• The Chi-square test compares the frequencies and tests whether the
observed data differ significantly from that of the expected data.
51
CHI-SQUARE
TEST

It is calculated by the sum of the squared difference between observed

(O) and the expected (E) data (or the deviation, d) divided by the
expected data by the following formula:

52
Example

• Attack rates among vaccinated and not vaccinated against measles

are given in the following table. Test the association between
association between vaccination and attack of measles.

Groups Attacked Not attacked

Vaccinated 10 90
Not vaccinated 26 74
53
• H0: There is no significant association between vaccination and attack
of measles.

• H1: There is significant association between vaccination and attack of

measles.

54
Oi Ei Oi - Ei (Oi - Ei )2 (Oi - Ei )2 /
Ei

10 18 -8 64 3.556
90 82 8 64 0.780
26 18 8 64 3.556
74 82 -8 64 0.780
𝛴= 8.672
• Chi square table value = 3.841 , chi square calculated value = 8.672
• 𝑥
2
calculated > 𝑥2 table value ; Reject H0.

55
THE WILCOXON'S SIGNED RANK
TEST :

• Wilcoxon's rank sum test ranks all data points in order, calculates the rank
sum of each sample and compares the difference in the rank sums.

• For testing whether the differences observed in the values of the

quantitative variable between two correlated samples (before and
after design ) are statistically different or not.

• This test corresponds to the paired t test.

56
M ethod :

• H0: There is no difference in the paired values, on an average, between the two
groups.
• H1: There is difference in the paired values, on an average, between the two
groups.
• Compute the difference between each group of paired values in the two group.
• Rank the difference from smallest, without considering the sign of difference.
• After giving ranks, the corresponding sign should be attached.
• T+ (Sum of ranks of positive sign) and T- (Sum of ranks between negative sign). T is
taken as smallest of T+ and T-. Then Wstat is the smallest value of T- and T+ .
• Find the W critical value from Wilcoxon’s Signed rank Table .
• if Wstat < WCritical Value; Reject H0. 57
EXAM PLE :

• IQ values of 8 malnourished children of 4 years age before and after

giving some nutritious diet for 3 months are given below.

Before 40 60 55 65 43 70 80 60

After 50 80 50 70 40 60 90 85

58
• H0: There is no difference in the paired values.
• H1: There is difference in the paired values.

Before 40 60 55 65 43 70 80 60
After 50 80 50 70 40 60 90 85
Difference -10 -20 5 -5 3 10 -10 -15

Absolute 10 20 5 5 3 10 10 15
difference

Rank 5 8 2.5 2.5 1 5 5 7

59
• T+ = 8.5, T- = 27.5. T = 8.5

• Wstat = 8.5, Wcritic = 3

• Wstat > WCritical Value; Accept H0.

60
• If Assuming normal distribution for the differences, test statistic is,

Z = {|T-m| -0.5} / SD

Where T = smaller of T+ and T- , m= mean sum of ranks {n(n+1)}/4 and

𝑛 𝑛+1 2𝑛+ 1
SD = √{ }
24
• If Z is less than 1.96, H0 is accepted and if Z>1.96 , H0 is rejected

61
MANN-WHITNEY U
TEST

• For testing whether two independent samples with respect to a

quantitative variable come from the same population or not.
• Wilcoxon’s Rank Sum test.
• It is used to test the null hypothesis that two samples have the same
median or, alternatively, whether observations in one sample tend to be
larger than observations in the other.
• This test is alternative of t test for two independent samples.

62
METHOD of calculation:

H0: The average values in the two groups are the same H1: The
average values in the two groups are the different
• Let n1 is the sample size of one group and n2 is the sample size of second
group, Rank all the values in the two groups take together. Tied values
should be given same ranks.
• The ranksum of each group is taken and Ustat is calculated using
Ustat = Rank Sum - {n(n +1)/2 }.
• Both U1 and U2 is calculated and smaller value is taken as Ustat. and
Ucritical value is calculated from the Mann- Whitney U test table
• if U < U ; ; Reject H0. 63
Example

Treatment A Treatment B
3 9
4 7
2 5
6 10
2 6
5 8
64
• H0: The average values in the 2 treatment are the same.

• H1: The average values in the 2 treatment are the different

Ustat = Rank Sum - {n(n +1)/2 }.

Ranks 1 2 3 4 5 6 7 8 9 10 11 12

Values 2 2 3 4 5 5 6 6 7 8 9 10

Rank 1.5 1.5 3 4 5.5 5.5 7.5 7.5 9 10 11 12

65
• UA = 23 – 21 = 2, UB = 55- 21 =34 so Ustat = 2 (lowest
value)
• Ucritic = 5
• Ustat < UCritical value; Reject H0.
66
• Assuming that the ranks are randomly distributed in the two groups,
the test statisticis
Z = {|m-T| -0.5} / SD
Where T = smaller of T1 and T2.
T1 = sum of the ranks of smaller group, T2 = {(n1 +n2)(n1 +n2 +1) / 2} – T1 ,

m= mean sum of ranks { n1 ( n1 +n2+1)}/2

n 1 x n 2 )( n 1+ n 2+ 1
SD = √{ }
12

• If Z is less than 1.96, H0 is accepted.

• if Z>1.96 , H0 is rejected at 5% level of significance 67

KRUSKAL-WALLIS
TEST

• The Kruskal–Wallis test is a non-parametric test to analyse the

variance.

• It is for the comparison among several independent samples.

• For testing whether several independent samples of a quantitative

variable come from the same population or not.

• It corresponds to one way analysis of variance in parametric methods.

68
• It analyses if there is any difference in the median values of three or more
independent samples.
• The data values are ranked in an increasing order, and the rank
sums calculated followed by calculation of the test.

Where n is the total of sample sizes in all the groups and Ri is the sum of
the ranks in the ith group.
69
M ethod :

H0: The average values in the different groups are the same
H1: The average values in the different groups are the different
• Rank the all values taking all the group together.

• The chisquare table is used to get table value at 5% level of significane

• if Hstat is < Htable value ; reject H0

70
Example

Sample 1 Sample 2 Sample 3

8 10 13
10 9 8
9 13 9
12 14 13
11 9 17
13 16 15

71
• H0: The average values in the three groups are the same.

• H1: The average values in the three groups are the different.

Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Values 8 8 9 9 9 9 10 10 11 12 13 13 13 13 14 15 16 17
Tied 1.5 1.5 4.75 4.75 4.75 4.75 7.5 7.5 9 10 12.5 12.5 12.5 12.5 15 16 17 18
rank
72
Sample 1 Rank 1 Sample 2 Rank 2 Sample 3 Rank 3
8 1.5 10 7.5 13 12.5
10 7.5 9 4.75 8 1.5
9 4.75 13 12.5 9 4.75
12 10 14 15 13 12.5
11 9 9 4.75 17 18
13 12.5 16 17 15 16
𝛴= 45.25 𝛴= 61.5 𝛴= 65.25
• H = {12/18x19 [ (45.252 /6 ) + (61.52/6) + (65.52/6 )]} – 3x19

• Hcalculated = 56.99 , 𝑥2 table value = 5.99

• Hstat > 𝑥2 table value ; Reject H0 73
WAYS TO RULE OUT ALTERNATIVE EXPLANATIONS FOR
OUTCOMES BY USING STATISTICAL ANALYSIS

• Testing null hypothesis.

• Determining the probability of type I and type II error.

• Calculating and reporting tests of effect size.

• Ensuring data meet the fundamental assumptions of the statistical test

74
TESTING NULL HYPOTHESIS
• When attempting to determine if an outcome is related to a cause, it is necessary
to know if the outcomes or results could have occurred by chance alone.

• This cannot be done with certainity, but researchers can determine the probability
that the hypothesis is true.

• Accepting a null hypothesis is a statement that there are no differences in the

outcomes based on the intervention or observation(that is, there is no cause and
effect relationship).

• Using a null hypothesis enables the researcher to quantify and report the
probability that the outcome was due to random error.

75
DETERMINING THE PROBABILITY OF TYPE I
AND TYPE II ERROR
• Before accepting the results as evidence for practice, however
the probability that an error was made should be evaluated.

• This coupled with the results of the hypothesis test, enables

the researcher to quantify the role of error in the outcome.

• The relationship between Type I and Type II error is

paradoxical – as one is controlled, the risk of other increases.

• Both types of error should be avoided

76
CALCULATING AND REPORTING TESTS OF EFFECT SIZE
• Effect size refers to how much impact the intervention or variable is
expected to have on the outcome.

• Large effect sizes enhance the confidence of the findings.

When a treatment exerts a dramatic effect, then the validity of
the findings is not so called into question.

• On the other hand, when effect sizes are very small, then the
potential for effects from extraneous variables is more likely and
the results may have less credibility.

77
Ensuring data meet the fundamental assumptions
of the statistical test
• Data analysis is based on many assumptions about the
nature of the data, the statistical procedures that are
used to conduct the analysis and the match between the
data and the procedure.

• If assumption is violated, the result can be an

inaccurate estimate of the real relationship.

•
In accurate conclusions lead to an error, which in
turn affects the validity of a study.
78
RESOURCES FOR STATISTICAL
ANALYSIS PROGRAM

• Packaged computer programs can perform the data analysis

and provide with the results of analysis on a computer
printout.
• SPSS, SAS and Biomedical Data Processing (BMDP)
• If the analysis selected are inappropriate for the data, the
computer program is often unable to detect that error and
proceed to perform the analysis.
79
e-Statistical Tool (open source)

80
Pit Falls of Statistics :

• Statistics can be used, intentionally or unintentionally, to reach faulty

conclusions. Misleading information is unfortunately the norm in
advertising. The drug companies, for example, are well known to
indulge in misleading information.
• Data dredging
• Survey questions
It is therefore important that to understand not just the numbers but
the meaning behind the numbers. Statistics is a tool, not a substitute
for in-depth reasoning and analysis
81
Charts and Diagrams

-Charts and diagrams are useful methods of presenting simple data.

-They have powerful impact on imagination of people. Gives
information at a glance.

- Diagrams are better retained in memory than statistical table. -

However graphs cannot be substituted for statistical table,
because the graphs cannot have mathematical treatment where as tables
can be treated mathematically.

-Whenever graphs are compared , the difference in the scale should

be noted.
-
Common diagrams

• Pie chart
• Simple bar diagram
• Multiple bar diagram
• Component bar diagram or subdivided bar
• diagram Histogram
• Frequency polygon
• Frequency curve
• O give curve
• Scatter diagram
• Line diagram
• Pictogram
• Statistical maps
Bar charts
Year Wise Enrollment of students in Government school
300
300
260
230
250
200
200 160
150
150 120
100
100 70

0
One Two Three Four Five Six Eight Nine
Seven
No. of Students
Multiple Bar Charts
• Also called compound bar charts.
• More then one sub-attribute of variable can be expressed.
6
0
Population
5 Land
0
Percentage of World

4
0

3
0
Total

2
0 Asi Europe Africa Latin USSR North Oceania
a America
1
0 America
Component bar charts

• When there are many categories on X-axis (more than 5) and they
have further subcategories, then to accommodate the categories, the
bars may be divided into parts, each part representing a certain item
and proportional to the magnitude of that particular item.
India :: Growth of Population

800

Population in Million
700 Growth
600
500
400
300
200
100
0
01

91
11

01
19

20
Censs Decades
Cont…

12
0
10
0 Female
80 Male
60
40
20
Pakista US Swede
0 n A n
Histogram

• Used for Quantitative, Continuous, Variables.

• It is used to present variables which have no gaps e.g
age, weight, height, blood pressure, blood sugar etc.
• It consist of a series of blocks. The class intervals are
given along horizontal axis and the frequency along
the vertical axis.
Histogram of Grouped frequency distrinution
of serum cholestrol levels in 200 men

80
70
frequency 60
50
40
30
20
10
0
161- 171- 181- 191- 201- 211- 221- 231- 241- 251-
170 180 190 200 210 220 230 240 250
260
Serum Cholestrol, mg/dl
Frequency polygon

-Frequency polygon is an area diagram of frequency distribution over a histogram.

- It is a linear representation of a frequency table and histogram, obtained by joining
the mid points of the hitogram blocks.
- Frequency is plotted at the central point of a group

percentage total frequency

250

200
percentage total
150 frequency

100

0
59-69 69-79 79-89 89-99 99- 109- 119- 129-
109 119 129 139
Line diagram
•

Line diagrams are used to show the trend of events with the passage of time.

Line diagram showing the malaria cases reported throughout the word excluding
African region during 1972-78.

C 10

8
a

s6
e4
s 0
2

1972 73 74 75 76 77 78
Pie charts

• Most common way of presenting data.

• The value of each category is divided by the total values and

then multiplied by 360 and then each category is allocated
the respective angle to present the proportion it has.

• It is often necessary to indicate percentages in the segment

as it may not be sometimes very easy virtually, to compare
the areas of segments.
Cont…

World Population

Developi
ng
Countries
Developed Countries
26%
Developing Countries

Develope
d
Countries
74%
Pictogram

• Popular method of presenting data to those

who cannot understand orthodox charts.
• Small pictures or symbols are used to present
the data,e.g a picture of a doctor to represent the
population physician.
• Fraction of the picture can be used to represent
numbers smaller than the value of whole symbol
Statistical maps

• When statistical data refers to geographic or administrative

areas, it is presented either as statistical map or dot map.

• The shaded maps are used to present data of varying

size. The areas are shaded with different colour or
different intensities of the same colour, which is indicated
in the key.
Scatter diagram

The scatter diagram graphs pairs of numerical

data, with one variable on each axis, to look for a
relationship between them. If the variables are
correlated, the points will fall along a line or curve.
• The scatter diagram is used to find the
correlation between two variables.

• This diagram shows you how closely the

two variables are related.

• After determining the correlation between

the variables, you can easily predict the
behavior of the other variable.
97
• This chart is very useful when one
variable is easy to measure and the other
is not.
When to Use a Scatter Diagram?

 When you have paired numerical data.

When your dependent variable may have multiple values for each value of
your independent variable.

When trying to determine whether the two variables are related, such as,

• When trying to identify potential root causes of problems.

• When testing for autocorrelation before constructing a control chart.

Limitations of a Scatter Diagram

The following are a few limitations of a scatter diagram:

• Scatter diagram is unable to give you the exact extent of
correlation.
• Scatter diagram does not show you the quantitative measure of
the relationship between the variable. It only shows the
quantitative expression of the quantitative change.
• This chart does not show you the relationship for more than two
variables.
Advantages of Scatter Diagram:

The following are a few advantages of a scatter diagram:

• It shows the relationship between two variables.
• It is the best method to show you a non-linear pattern.
• The range of data flow, i.e. maximum and minimum value, can be
easily determined.
• Observation and reading is straightforward.
• Plotting the diagram is relatively simple.
Example :

By using two variables, Test Cases and Employees with designation in scatter diagram, it is
easy to determine the best productivity of employees.

Test Cases Employees

90 Manager

60 Test lead
35 Senior Test Engineer

40 Junior Test Engineer

20 Trainee
100

No of Test Cases 60

T TL M 102
JTE STE

Employees
Summary & Conclusion:

● Data analysis is a process of inspecting, cleansing, transforming & modeling

data with the goal of discovering useful information, informing conclusions &
supporting decision-making.

● It includes data gathering, cleaning, analysis, interpretation & visualization.

103
Thanks

104

STAT5002 Midterm Review Solutions N
No ratings yet
STAT5002 Midterm Review Solutions N
8 pages
Intro To Course and Basic Statistics
No ratings yet
Intro To Course and Basic Statistics
31 pages
CH01 - Introduction To Statistics 2
No ratings yet
CH01 - Introduction To Statistics 2
52 pages
ANL303 - Week - 3 - Jan 2023
No ratings yet
ANL303 - Week - 3 - Jan 2023
69 pages
BRM Chapter 6
No ratings yet
BRM Chapter 6
8 pages
Lecture 8 Data Analysis
No ratings yet
Lecture 8 Data Analysis
30 pages
Topic 8 Data Processing and Analysis PDF
No ratings yet
Topic 8 Data Processing and Analysis PDF
157 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
No ratings yet
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
44 pages
Notes On Data Processing, Analysis, Presentation
No ratings yet
Notes On Data Processing, Analysis, Presentation
63 pages
data analysis
No ratings yet
data analysis
26 pages
Unit .......
No ratings yet
Unit .......
45 pages
Data Management
No ratings yet
Data Management
36 pages
Research Bussiness
No ratings yet
Research Bussiness
9 pages
Research Methodology: Result and Analysis (Part 1)
No ratings yet
Research Methodology: Result and Analysis (Part 1)
65 pages
Data Analysis Topics Discussed Getting Data Ready For Analysis 1) - Editing Data (Definition)
No ratings yet
Data Analysis Topics Discussed Getting Data Ready For Analysis 1) - Editing Data (Definition)
8 pages
Lesson 2 Notes
No ratings yet
Lesson 2 Notes
11 pages
Presentation On Data Analysis: Submitted by
No ratings yet
Presentation On Data Analysis: Submitted by
38 pages
CH 8 Data Analysis
No ratings yet
CH 8 Data Analysis
34 pages
Lecture Notes: (Introduction To Medical Laboratory Science Research)
No ratings yet
Lecture Notes: (Introduction To Medical Laboratory Science Research)
13 pages
QT Module-2
No ratings yet
QT Module-2
45 pages
Process Data Analysis
No ratings yet
Process Data Analysis
24 pages
Tutoring Session 2023 - Statistics For Business
No ratings yet
Tutoring Session 2023 - Statistics For Business
65 pages
Nursing Research Methods: PH.D in Nursing
No ratings yet
Nursing Research Methods: PH.D in Nursing
66 pages
Chap 5 Data Analysis (1)
No ratings yet
Chap 5 Data Analysis (1)
38 pages
Data Analysis Procedure
0% (1)
Data Analysis Procedure
27 pages
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
33 pages
Statistics Introduction
No ratings yet
Statistics Introduction
37 pages
Quantitative Analysis Paper
No ratings yet
Quantitative Analysis Paper
15 pages
File tổng hợp kiến thức SB
No ratings yet
File tổng hợp kiến thức SB
148 pages
Bustat Reviewer
No ratings yet
Bustat Reviewer
6 pages
Week - 1 Day - 1 Descriptive Statistics
No ratings yet
Week - 1 Day - 1 Descriptive Statistics
40 pages
Data Management
No ratings yet
Data Management
75 pages
Section 6 Three Slides On A Page - Methods and Tools 2022
No ratings yet
Section 6 Three Slides On A Page - Methods and Tools 2022
124 pages
Chapter 5 - Analysis and Presentation of Data
No ratings yet
Chapter 5 - Analysis and Presentation of Data
30 pages
Data Analysis
No ratings yet
Data Analysis
30 pages
1.9 Data and data analysis
No ratings yet
1.9 Data and data analysis
31 pages
Descriptive Statistics: Instructor: Maira Sami
No ratings yet
Descriptive Statistics: Instructor: Maira Sami
55 pages
Pa 1 2024
No ratings yet
Pa 1 2024
88 pages
Fundamentals of Data Science and Analytics On Descriptive Analysis
No ratings yet
Fundamentals of Data Science and Analytics On Descriptive Analysis
53 pages
Business Analytics (MIS171) Summary Notes
No ratings yet
Business Analytics (MIS171) Summary Notes
6 pages
Levels of Data
100% (1)
Levels of Data
26 pages
Quantitative Analysis: Dr. Basheer Ahmad Samim
No ratings yet
Quantitative Analysis: Dr. Basheer Ahmad Samim
71 pages
E-Note_33325_Content_Document_20250319114322AM
No ratings yet
E-Note_33325_Content_Document_20250319114322AM
69 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
22 pages
Data science-Unit-3-Complete
No ratings yet
Data science-Unit-3-Complete
33 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
44 pages
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
No ratings yet
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
46 pages
Lecture 1
No ratings yet
Lecture 1
39 pages
3Is Q4 Complete Notes
No ratings yet
3Is Q4 Complete Notes
20 pages
Introduction To Statistical Analysis
No ratings yet
Introduction To Statistical Analysis
24 pages
New Week 3 4
No ratings yet
New Week 3 4
15 pages
Ms Data Science S, 24 (WEEK# 1) Unlock
No ratings yet
Ms Data Science S, 24 (WEEK# 1) Unlock
31 pages
Ms Data Science S, 24 (WEEK# 1)
No ratings yet
Ms Data Science S, 24 (WEEK# 1)
30 pages
Qualitative Vs Quantitative Data
No ratings yet
Qualitative Vs Quantitative Data
6 pages
DOC-20241117-WA0002.
No ratings yet
DOC-20241117-WA0002.
2 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
35 pages
Unit 3 Data Preprocessing - Data
No ratings yet
Unit 3 Data Preprocessing - Data
90 pages
Data Management
No ratings yet
Data Management
48 pages
Ahsan Stats
No ratings yet
Ahsan Stats
9 pages
Business Statistics
From Everand
Business Statistics
Knowledge Flow
No ratings yet
Box Plots and Cumulative Frequency Curves
No ratings yet
Box Plots and Cumulative Frequency Curves
23 pages
AMSG 20 Statistics PDF
No ratings yet
AMSG 20 Statistics PDF
13 pages
Practical Exams Preparation Notes For M.SC Psychology I Year
No ratings yet
Practical Exams Preparation Notes For M.SC Psychology I Year
66 pages
Prof Ed 103 Module 2
No ratings yet
Prof Ed 103 Module 2
21 pages
Creating Box Plots 1
No ratings yet
Creating Box Plots 1
2 pages
(Ebook) Schaum's Easy Outline of Probability and Statistics by Murray Spiegel, John Schiller, Alu Srinivasan ISBN 9780071398381, 0071398384 download pdf
100% (3)
(Ebook) Schaum's Easy Outline of Probability and Statistics by Murray Spiegel, John Schiller, Alu Srinivasan ISBN 9780071398381, 0071398384 download pdf
81 pages
Statistics Again?: Cal State Northridge Ψ 427 Andrew Ainsworth Phd
No ratings yet
Statistics Again?: Cal State Northridge Ψ 427 Andrew Ainsworth Phd
21 pages
CHAPTER 6 Statistics III Notes and Exercises
No ratings yet
CHAPTER 6 Statistics III Notes and Exercises
18 pages
Measures of Variability
No ratings yet
Measures of Variability
7 pages
Applied Mathematics Notes
No ratings yet
Applied Mathematics Notes
31 pages
Business Analytics (OP301) : Name: Firdos Solanki Email ID: Firdoss@nuv - Ac.in Contact: 9925658986
No ratings yet
Business Analytics (OP301) : Name: Firdos Solanki Email ID: Firdoss@nuv - Ac.in Contact: 9925658986
43 pages
Mathematics Standard Year 11 Topic Guide Statistical Analysis
No ratings yet
Mathematics Standard Year 11 Topic Guide Statistical Analysis
12 pages
Lesson Plan
No ratings yet
Lesson Plan
7 pages
Feature Engineering
No ratings yet
Feature Engineering
23 pages
Exercise 2E: 1 A CF 4 8 10 17 37 61 71 B Q
No ratings yet
Exercise 2E: 1 A CF 4 8 10 17 37 61 71 B Q
3 pages
Chapter 1 Introduction To Statistics For Engineers
No ratings yet
Chapter 1 Introduction To Statistics For Engineers
91 pages
AP Statistics Tutorial - Exploring The Data
No ratings yet
AP Statistics Tutorial - Exploring The Data
39 pages
4 - Data Pre-Processing I
No ratings yet
4 - Data Pre-Processing I
37 pages
Malhotra 2014
No ratings yet
Malhotra 2014
7 pages
Statistics and Probability Reviewer
100% (1)
Statistics and Probability Reviewer
6 pages
Data Mining Essential Concepts for Analytics (Dr K Seefeld) (Z-Library)
No ratings yet
Data Mining Essential Concepts for Analytics (Dr K Seefeld) (Z-Library)
168 pages
Midterm Psych
No ratings yet
Midterm Psych
84 pages
Python Pandas II Notes XII
No ratings yet
Python Pandas II Notes XII
20 pages
Presenting The Data SL&HL
No ratings yet
Presenting The Data SL&HL
6 pages
??statistical Concepts That Every Data Scientist Should Know?? - ??? - ?!! - by Dhilip Maharish - AI Mind
No ratings yet
??statistical Concepts That Every Data Scientist Should Know?? - ??? - ?!! - by Dhilip Maharish - AI Mind
22 pages
Box and Whisker Plots (D3.2.15)
No ratings yet
Box and Whisker Plots (D3.2.15)
11 pages
OMAC Data Analyst
No ratings yet
OMAC Data Analyst
91 pages
Final Assignment Solution
No ratings yet
Final Assignment Solution
21 pages
Question Bank Converted 1
No ratings yet
Question Bank Converted 1
33 pages