Descriptive statistics, lecture 4 '

The document discusses the analysis of dependence, focusing on methods to analyze relationships between statistical features, including correlation and regression analyses. It details various coefficients such as Tschuprow's T, Pearson's product-moment, and Spearman's rank correlation, along with their properties and applications. Additionally, it covers data presentation methods like correlation series and contingency tables, and introduces concepts of multiple and partial correlation for analyzing more than two variables.

Uploaded by

braianszafran17

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Descriptive statistics, lecture 4 '

Uploaded by

braianszafran17

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Analysis of dependence

Descriptive statistics, lecture 4

The subject of analysis of dependence
• Analysis of dependence analyses relations between two or more
statistical features.
Methods of analysis of dependence
• graphical (scatterplot),
• analytical:
• correlation analysis:
• Tschuprow's T coefficient (Txy = Tyx),
• Spearman’s rank correlation coefficient (Rxy = Ryx),
• Correlation ratios (exy, eyx),
• Pearson product-moment correlation coefficient (rxy = ryx).
• regression analysis:
• empirical regression lines,
• theoretical regression lines.
Methods of data presentation
• Correlation series,
• Contingency table.
Correlation series
Values of Values of
variable x variable y
xi yi
x1 y1
x2 y2
⁝ ⁝
xn yn
Contingency table
Variants of variable y
Variants of
ni.
variable x y1 y2 … yl

x1 n11 n12 … n1l n1.

x2 n21 n22 … n2l n2.
⁝ ⁝ ⁝ ⋱ ⁝ ⁝
xk nk1 nk2 … nkl nk.
n.j n.1 n.2 … n.l n
Tschuprow's T coefficient
𝜒2
𝑇𝑥𝑦 = 𝑇𝑦𝑥 = ,
𝑛∙ 𝑘−1 ∙ 𝑙−1
where:
𝑘 𝑙 2
2
𝑛𝑖𝑗 − 𝑛ො 𝑖𝑗
𝜒 = ෍෍ ,
𝑛ො 𝑖𝑗
𝑖=1 𝑗=1
𝑛𝑖 .∙ 𝑛.𝑗
𝑛ො 𝑖𝑗 = ,
𝑛
k – number of variants of the variable x (number of rows in the contingency table),
l – number of variants of the variable y (number of columns in the contingency table),
nij – empirical numbers,
𝑛ො 𝑖𝑗 – theoretical numbers.
Tschuprow's T coefficient – properties
• It is symmetric (Txy = Tyx).
• It can be calculated only for contingency table.
• Both variables can be nominal (qualitative, non-measurable) – it is
calculated on the basis of numbers, not the variants of variables.
• It takes values from the interval [0, 1] – it measures only the
correlation strength, not the direction.
• If its value is 0 – then there is no correlation between the two features,
if it is equal to 1 – then the correlation is functional. The closer to 1 it
is, the stronger the correlation is.
• The empirical (nij) or theoretical (𝑛ො 𝑖𝑗 ) number must be at least 5.
Correlation ratios
𝑆 2 𝑦ത𝑖 𝑆 2 𝑥𝑗ҧ
𝑒𝑦𝑥 = 2 ; 𝑒𝑥𝑦 = 2 ;
𝑆 (𝑦) 𝑆 (𝑥)
where:
2
σ𝑘𝑖=1
𝑦
ത𝑖 − 𝑦
ത ∙ 2
𝑛𝑖 . σ𝑙𝑗=1 𝑥𝑗ҧ − 𝑥ҧ ∙ 𝑛.𝑗
𝑆 2 𝑦ത𝑖 = ; 𝑆 2 𝑥𝑗ҧ = ;
𝑛 𝑛
σ𝑙𝑗=1 𝑦𝑗 ∙ 𝑛𝑖𝑗 σ𝑘𝑖=1 𝑥𝑖 ∙ 𝑛𝑖𝑗
𝑦ത𝑖 = ; 𝑥𝑗ҧ = ;
𝑛𝑖 . 𝑛.𝑗
𝑦ത𝑖 – conditional means of variable y,
𝑥𝑗ҧ – conditional means of variable x.
Correlation ratios – properties
• They are not symmetric (exy ≠ eyx).
• They can be calculated only for contingency table.
• At least one variable – dependent (y for the coefficient eyx and x for the
coefficient exy) must be numerical (measurable).
• It takes values from the interval [0, 1] – it measures only the
correlation strength, not the direction.
• If its value is 0 – then there is no correlation between the two features,
if it is equal to 1 – then the correlation is functional. The closer to 1 it
is, the stronger the correlation is.
Pearson product-moment correlation
coefficient
cov(𝑥, 𝑦)
𝑟𝑥𝑦 = 𝑟𝑦𝑥 = ;
𝑆 𝑥 ∙ 𝑆(𝑦)
where:
cov 𝑥, 𝑦 = 𝑥 ∙ 𝑦 − 𝑥ҧ ∙ 𝑦;
ത
σ𝑛
𝑖=1 𝑥𝑖 ∙𝑦𝑖
𝑥∙𝑦 = – for the correlation series,
𝑛
σ𝑘 σ 𝑙
𝑖=1 𝑗=1 𝑥𝑖 ∙𝑦𝑗 ∙𝑛𝑖𝑗
𝑥∙𝑦 = – for the contingency table.
𝑛
2 2
𝑟𝑥𝑦 ∙ 100% = ∙ 100% – coefficient of linear determination. It says,
𝑟𝑦𝑥
in how many percent changes of one variable were determined by
changes of the second one.
Pearson product-moment correlation
coefficient – properties
• It is symmetric (rxy = ryx).
• It can be calculated for both the correlation series or the contingency table.
• Both variables must be strictly numerical.
• The relations between variables must be linear – if it is not, then its value will be
underestimated.
• It takes the values from the interval [-1, 1] – it measures both the correlation strength, and
the direction.
• If the correlation is negative, then if one variable increases, the other decreases and vice
versa.
• If the correlation is positive, then if one variable increases, the other also increases and
vice versa.
• If its value is 0 – then there is no correlation between the two features, if it is equal to -1
or 1 – then the correlation is functional. The closer to -1 or 1 it is, the stronger the
correlation is.
Estimation of the degree of nonlinearity (only
for the contingency table)
As correlation ratios (exy and eyx) are always at least equal to the
|rxy|=|ryx|, the formulas:
2 2 2 2
𝑚𝑥𝑦 = 𝑒𝑥𝑦 − 𝑟𝑥𝑦 , 𝑚𝑦𝑥 = 𝑒𝑦𝑥 − 𝑟𝑦𝑥
measure the degree of nonlinearity of relationship (mxy – x is dependent
on y and myx – y is dependent on x).
Spearman’s rank coefficient
• We use this coefficient when:
• variables are numerical, but conditions for the Pearson product-moment
correlation coefficient (linearity of relationship and normality of variables) are
not satisfied,
• when at least one variable is measured on the ordinal scale.
• In the first step we assign ranks to the variabes:
• We set the values of both variables in the ascending or descending order (but
we must be consequent – both variables must be set in the same order).
• We assign ranks to the subsequent values.
• If two or more units have the same values, we calculate the mean from the
subsequent ranks that would be assigned to them.
Spearman’s rank coefficient
cov(𝑟𝑥 , 𝑟𝑦 )
𝑅𝑥𝑦 = 𝑅𝑦𝑥 =
𝑆 𝑟𝑥 ∙ 𝑆(𝑟𝑦 )
where:
𝑟𝑥 = rank of variable 𝑥
𝑟𝑦 = rank of variable 𝑦
Spearman’s rank coefficient
When all ranks are distinct, then the formula simplifies to:
6 ∙ σ𝑛𝑖=1 𝑑𝑖2
𝑅𝑥𝑦 = 𝑅𝑦𝑥 = 1 −
𝑛 ∙ 𝑛2 − 1
where:
𝑑𝑖 = rank 𝑥𝑖 − rank 𝑦𝑖
Spearman’s rank coefficient
In case of tied ranks, we obtain:
𝑛3 − 𝑛
− σ𝑛𝑖=1 𝑑𝑖2 − 𝑇𝑋 − 𝑇𝑌
𝑅𝑥𝑦 = 𝑅𝑦𝑥 = 6
𝑛3 − 𝑛 𝑛3 − 𝑛
− 2 ⋅ 𝑇𝑋 ⋅ − 2 ⋅ 𝑇𝑌
6 6
where:
1
𝑇𝑥 = ෍ 𝑡𝑗3 − 𝑡𝑗 ,
12
𝑗
1
𝑇𝑦 = ෍ 𝑢𝑘3 − 𝑢𝑘 ,
12
𝑘
tj – number of observations having the same, j-th rank value of variable x,
uk – number of observations having the same, k-th rank value of variable y.
Spearman’s rank coefficient – properties
• It is symmetric (Rxy = Ryx).
• It can be calculated only for the correlation series.
• Both variables must be at least on the ordinal scale.
• It takes the values from the interval [-1, 1] – it measures both the correlation
strength, and the direction.
• If the correlation is negative, then if one variable increases, the other
decreases and vice versa.
• If the correlation is positive, then if one variable increases, the other also
increases and vice versa.
• If its value is 0 – then there is no correlation between the two features, if it
is equal to -1 or 1 – then the correlation is functional. The closer to -1 or 1 it
is, the stronger the correlation is.
Regression analysis – empirical regression
lines
• They are drawn on the basis of the contingency table.
• They are based on the conditional means.
• Both variables must be strictly numerical (measurable).
• We draw two lines joining the following points:
xi ഥ𝒊
𝒚 ഥ𝒋
𝒙 yj
x1 𝑦ത1 𝑥1ҧ y1
x2 𝑦ത2 𝑥ҧ2 y2
⁝ ⁝ ⁝ ⁝
xk 𝑦ത𝑘 𝑥ҧ𝑙 yl
Empirical regression lines – properties
• Empirical regression lines cross each other in one point.
• The smaller the angle between them is, the stronger dependence is.
• If the empirical regression lines directly cover each other then the
dependence is functional.
• If the angle between them is 90 degrees then there is no dependence
between both variables.
• If one empirical line is ascending, the other is also ascending and
dependence between variables is positive and vice versa.
Theoretical regression lines
• By theoretical regression lines we mean the fitted mathematical
function that describes dependence between both variables.
• Let us assume the linear regression between analysed variables:
𝑦 = 𝑎𝑦 ∙ 𝑥 + 𝑏𝑦 – variable y is the dependent one and x – independent
𝑥 = 𝑎𝑥 ∙ 𝑦 + 𝑏𝑥 – variable x is the dependent one and y – independent
ay, ax – slope parameters,
by, bx – intercepts.
Parameters estimation
• Parameters are estimated by means of the Ordinary Least Squares method (OLS).
• Parameters estimates:
cov(𝑥, 𝑦) 𝑟𝑦𝑥 ∙ 𝑆(𝑦)
𝑎𝑦 = 2
= ; 𝑏𝑦 = 𝑦ത − 𝑎𝑦 ∙ 𝑥.ҧ
𝑆 (𝑥) 𝑆(𝑥)
cov(𝑥, 𝑦) 𝑟𝑥𝑦 ∙ 𝑆(𝑥)
𝑎𝑥 = 2
= ; 𝑏𝑥 = 𝑥ҧ − 𝑎𝑥 ∙ 𝑦.
ത
𝑆 (𝑦) 𝑆(𝑦)
ay – it says, how much the variable y will change, if the variable x increases by one
unit.
ax – it says, how much the variable x will change, if the variable y increases by one
unit.
by, bx – generally do not have the economic interpretation.
Correlation between more than two variables
• multiple correlation – the total influence of all independent variables
on the dependent one;
• partial correlation – the correlation between two variables, with the
omission of the influence of remaining ones.
Multiple correlation
Multiple correlation coefficient is calculated by means of the following formula:
det 𝑅𝑛
𝑅𝑦.𝑥1 ,𝑥2 ,…,𝑥𝑘 = 𝑅𝑤 = 1 − ,
det 𝑅𝑚
where:
Rn – correlation matrix,
Rm – correlation matrix after removing the row and the column that refer to the
dependent variable.
For three variables, the formula can be rewritten as follows:
2 2
𝑟12 + 𝑟13 − 2 ∙ 𝑟12 ∙ 𝑟13 ∙ 𝑟23
𝑅𝑦.𝑥1 ,𝑥2 = 𝑅1.23 = 2 .
1 − 𝑟23
Multiple correlation coefficient – properties
• It takes values from the interval [0, 1] – it measures only the
correlation strength, not the direction.
• If its value is 0 – then there is no correlation, if it is equal to 1 – then
the correlation is functional. The closer to 1 it is, the stronger the
correlation is.
• Squared multiple correlation coefficient gives us the coefficient of
linear determination that says, in how many percent changes of the
dependent variable were explained by changes of the independent
ones.
Partial correlation
Partial correlation coefficient is calculated by means of the following
formula:
−𝑅12 𝑟12 − 𝑟13 ∙ 𝑟23
𝑟12.3 = = ,
𝑅11 ∙ 𝑅22 2 ∙ 1 − 𝑟2
1 − 𝑟13 23
where:
Rij – cofactor matrix of the element of the matrix Rn, standing in the i-th row
and the j-th column:
𝑅𝑖𝑗 = −1 𝑖+𝑗 𝑀𝑖𝑗 ,
where:
Mij – minor, or the determinant of the submatrix, originated by the removal of
the i-th row and the j-th column from the matrix Rn.
Properties of the partial correlation
coefficient
• It takes the values from the interval [-1, 1] – it measures both the
correlation strength, and the direction.
• If the correlation is negative, then if one variable increases, the other
decreases and vice versa.
• If the correlation is positive, then if one variable increases, the other
also increases and vice versa.
• If its value is 0 – then there is no correlation between the two features,
if it is equal to -1 or 1 – then the correlation is functional. The closer to
-1 or 1 it is, the stronger the correlation is.

Biostatistics With R Solutions
100% (1)
Biostatistics With R Solutions
51 pages
Estimation of Mean Vector and Variance Covariance Matrix PDF
No ratings yet
Estimation of Mean Vector and Variance Covariance Matrix PDF
7 pages
Chapter 9 Simple Linear Regression and Correlation (1) (1)
No ratings yet
Chapter 9 Simple Linear Regression and Correlation (1) (1)
56 pages
Tensor Analysis - Copy To PDF
No ratings yet
Tensor Analysis - Copy To PDF
45 pages
ICTT Unit IV Complex Variables Notes
No ratings yet
ICTT Unit IV Complex Variables Notes
83 pages
Complex Variables - 1 - Differentiation
No ratings yet
Complex Variables - 1 - Differentiation
14 pages
Simplex Algorithm
No ratings yet
Simplex Algorithm
5 pages
3.1-Multivariate-Analysis
No ratings yet
3.1-Multivariate-Analysis
32 pages
CAT 2 Study Materials MAT 2001 PDF
No ratings yet
CAT 2 Study Materials MAT 2001 PDF
12 pages
Estimation of Parameter
No ratings yet
Estimation of Parameter
10 pages
21 Mle
No ratings yet
21 Mle
24 pages
노트_241119_2
No ratings yet
노트_241119_2
64 pages
Simple Linear Regressionclassroom
No ratings yet
Simple Linear Regressionclassroom
37 pages
04 - Multiple Regression Asymptotics (1)
No ratings yet
04 - Multiple Regression Asymptotics (1)
32 pages
Statistical Inference 2 Note 02
No ratings yet
Statistical Inference 2 Note 02
7 pages
Syndicated Learning Program - II (SLP-II) Regression Analysis
No ratings yet
Syndicated Learning Program - II (SLP-II) Regression Analysis
26 pages
03 - Measures - of - Center - Variation
No ratings yet
03 - Measures - of - Center - Variation
45 pages
Lecture 3 Tests of Hypotheisis About Regression Coefficients
No ratings yet
Lecture 3 Tests of Hypotheisis About Regression Coefficients
10 pages
Linear Regression Formula Class 12
No ratings yet
Linear Regression Formula Class 12
2 pages
Data Sheet
No ratings yet
Data Sheet
8 pages
Mathematics Handbook
No ratings yet
Mathematics Handbook
11 pages
Sampling CH-5
No ratings yet
Sampling CH-5
6 pages
ML_Lec 3- Review of Linear Algebra
No ratings yet
ML_Lec 3- Review of Linear Algebra
16 pages
Financial Statistics - Formula Sheet
No ratings yet
Financial Statistics - Formula Sheet
26 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Lecture 3-MSDA 3055
No ratings yet
Lecture 3-MSDA 3055
44 pages
SAT Math Formula Sheet
No ratings yet
SAT Math Formula Sheet
2 pages
List of Formula - Managerial Statistics
No ratings yet
List of Formula - Managerial Statistics
6 pages
Lesson 2 - Background for AI [Autosaved]new
No ratings yet
Lesson 2 - Background for AI [Autosaved]new
37 pages
Script Confidence Intervals PDF
No ratings yet
Script Confidence Intervals PDF
16 pages
AB1202 Statistics and Analysis
No ratings yet
AB1202 Statistics and Analysis
16 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Harmonics in English
No ratings yet
Harmonics in English
7 pages
Lecture-5-6 Moment, Skewness and Kurtosis
No ratings yet
Lecture-5-6 Moment, Skewness and Kurtosis
5 pages
Inference using normal and t distribution
No ratings yet
Inference using normal and t distribution
9 pages
Module 4 & 5 Formula Hand Book
No ratings yet
Module 4 & 5 Formula Hand Book
9 pages
课本附录 (二) - 公式表 Formula Sheet - final
No ratings yet
课本附录 (二) - 公式表 Formula Sheet - final
2 pages
Unit 5: Hypothesis Testing
No ratings yet
Unit 5: Hypothesis Testing
6 pages
4 2 Numerical Analysis Function Approximation Nonlinear Regression 1483e70352ca4ac91fc61dbab22c0b2d
No ratings yet
4 2 Numerical Analysis Function Approximation Nonlinear Regression 1483e70352ca4ac91fc61dbab22c0b2d
28 pages
3.2-Properties of Analytic Function - 025514
No ratings yet
3.2-Properties of Analytic Function - 025514
18 pages
00 Econometrics Notes
No ratings yet
00 Econometrics Notes
8 pages
統計摘要
No ratings yet
統計摘要
12 pages
Cramer-Rao Lower Bound: 4.1 Estimator Accuracy
No ratings yet
Cramer-Rao Lower Bound: 4.1 Estimator Accuracy
7 pages
Formulas
No ratings yet
Formulas
2 pages
Lecture # 13 (Analytic Functions and Applications)
No ratings yet
Lecture # 13 (Analytic Functions and Applications)
49 pages
Tarea 1
No ratings yet
Tarea 1
6 pages
Chapter 5
No ratings yet
Chapter 5
5 pages
Matrices lecture6
No ratings yet
Matrices lecture6
23 pages
Unit-II - Complex Analysis - MA231BT
No ratings yet
Unit-II - Complex Analysis - MA231BT
26 pages
Continuous Probability Distribution.
100% (1)
Continuous Probability Distribution.
10 pages
Convergence and Divergence
No ratings yet
Convergence and Divergence
3 pages
9correlation and Regression
No ratings yet
9correlation and Regression
41 pages
Lecture Note 3
No ratings yet
Lecture Note 3
4 pages
List of Provided Formulas
No ratings yet
List of Provided Formulas
5 pages
TIme-series Analysis
No ratings yet
TIme-series Analysis
17 pages
EC2303 Final Formula Sheet PDF
No ratings yet
EC2303 Final Formula Sheet PDF
8 pages
Lecture # 3 (Heteroskedasticity in Cross-Sectional Data)
No ratings yet
Lecture # 3 (Heteroskedasticity in Cross-Sectional Data)
5 pages
Module Wise Important Formulae
No ratings yet
Module Wise Important Formulae
45 pages
Lecture 5
No ratings yet
Lecture 5
5 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Instant Download Applied Econometrics 4th Edition Dimitrios Asteriou PDF All Chapters
No ratings yet
Instant Download Applied Econometrics 4th Edition Dimitrios Asteriou PDF All Chapters
51 pages
Output SPSS
No ratings yet
Output SPSS
19 pages
Mme PDF
No ratings yet
Mme PDF
9 pages
Concepts of Statistics
No ratings yet
Concepts of Statistics
29 pages
Measurement and The DGP - Tagged
No ratings yet
Measurement and The DGP - Tagged
59 pages
Econometrics Work Sheet For Mid Exam
No ratings yet
Econometrics Work Sheet For Mid Exam
2 pages
Business Statistics: A Decision-Making Approach: Analysis of Variance
No ratings yet
Business Statistics: A Decision-Making Approach: Analysis of Variance
14 pages
P.planing Assi
No ratings yet
P.planing Assi
35 pages
Likert Scale
No ratings yet
Likert Scale
10 pages
MSA Example Workbook
No ratings yet
MSA Example Workbook
41 pages
Inferential Statistical Analysis Using Python -
No ratings yet
Inferential Statistical Analysis Using Python -
22 pages
What - Are - Confidence Interval and P Value
100% (1)
What - Are - Confidence Interval and P Value
8 pages
Statistical Analysis Dr. Shamsuddin
No ratings yet
Statistical Analysis Dr. Shamsuddin
62 pages
Mock Exam For Final de Thi Thu Cuoi Ky Mon XSTK
No ratings yet
Mock Exam For Final de Thi Thu Cuoi Ky Mon XSTK
12 pages
Chapter 6-Linear Regression With Multiple Regressors
No ratings yet
Chapter 6-Linear Regression With Multiple Regressors
68 pages
Use of Statistics by Scientist
No ratings yet
Use of Statistics by Scientist
22 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Jurnal Aset (Akuntansi Riset)
No ratings yet
Jurnal Aset (Akuntansi Riset)
14 pages
Stat4006 2022-23 PS3
No ratings yet
Stat4006 2022-23 PS3
3 pages
Formulas Linear Regression PDF
No ratings yet
Formulas Linear Regression PDF
5 pages
1911-The Manuscript (Full Article Text) - 5278-1!10!20200906
No ratings yet
1911-The Manuscript (Full Article Text) - 5278-1!10!20200906
12 pages
Book MCS226 DataScience BigData 2022
No ratings yet
Book MCS226 DataScience BigData 2022
70 pages
Solutions Manual to accompany Miller & Freund’s Probability and Statistics for Engineers 8th edition 0321640772 - Download Instantly To Experience The Full Content
100% (3)
Solutions Manual to accompany Miller & Freund’s Probability and Statistics for Engineers 8th edition 0321640772 - Download Instantly To Experience The Full Content
51 pages
Assignment 2 - IUK108
No ratings yet
Assignment 2 - IUK108
8 pages
Lec 29-34
No ratings yet
Lec 29-34
27 pages
Time Series Forecasting: Group Assignment - Group 5: Answer
100% (2)
Time Series Forecasting: Group Assignment - Group 5: Answer
29 pages
Practical Guidance For Bayesian Inference in Astronomy
No ratings yet
Practical Guidance For Bayesian Inference in Astronomy
10 pages
Mixed Cost Separation: Approaches
No ratings yet
Mixed Cost Separation: Approaches
2 pages
AI Sesors and Dashboards
No ratings yet
AI Sesors and Dashboards
10 pages