The document discusses the analysis of dependence, focusing on methods to analyze relationships between statistical features, including correlation and regression analyses. It details various coefficients such as Tschuprow's T, Pearson's product-moment, and Spearman's rank correlation, along with their properties and applications. Additionally, it covers data presentation methods like correlation series and contingency tables, and introduces concepts of multiple and partial correlation for analyzing more than two variables.
The document discusses the analysis of dependence, focusing on methods to analyze relationships between statistical features, including correlation and regression analyses. It details various coefficients such as Tschuprow's T, Pearson's product-moment, and Spearman's rank correlation, along with their properties and applications. Additionally, it covers data presentation methods like correlation series and contingency tables, and introduces concepts of multiple and partial correlation for analyzing more than two variables.
The subject of analysis of dependence • Analysis of dependence analyses relations between two or more statistical features. Methods of analysis of dependence • graphical (scatterplot), • analytical: • correlation analysis: • Tschuprow's T coefficient (Txy = Tyx), • Spearman’s rank correlation coefficient (Rxy = Ryx), • Correlation ratios (exy, eyx), • Pearson product-moment correlation coefficient (rxy = ryx). • regression analysis: • empirical regression lines, • theoretical regression lines. Methods of data presentation • Correlation series, • Contingency table. Correlation series Values of Values of variable x variable y xi yi x1 y1 x2 y2 ⁝ ⁝ xn yn Contingency table Variants of variable y Variants of ni. variable x y1 y2 … yl
x1 n11 n12 … n1l n1.
x2 n21 n22 … n2l n2. ⁝ ⁝ ⁝ ⋱ ⁝ ⁝ xk nk1 nk2 … nkl nk. n.j n.1 n.2 … n.l n Tschuprow's T coefficient 𝜒2 𝑇𝑥𝑦 = 𝑇𝑦𝑥 = , 𝑛∙ 𝑘−1 ∙ 𝑙−1 where: 𝑘 𝑙 2 2 𝑛𝑖𝑗 − 𝑛ො 𝑖𝑗 𝜒 = , 𝑛ො 𝑖𝑗 𝑖=1 𝑗=1 𝑛𝑖 .∙ 𝑛.𝑗 𝑛ො 𝑖𝑗 = , 𝑛 k – number of variants of the variable x (number of rows in the contingency table), l – number of variants of the variable y (number of columns in the contingency table), nij – empirical numbers, 𝑛ො 𝑖𝑗 – theoretical numbers. Tschuprow's T coefficient – properties • It is symmetric (Txy = Tyx). • It can be calculated only for contingency table. • Both variables can be nominal (qualitative, non-measurable) – it is calculated on the basis of numbers, not the variants of variables. • It takes values from the interval [0, 1] – it measures only the correlation strength, not the direction. • If its value is 0 – then there is no correlation between the two features, if it is equal to 1 – then the correlation is functional. The closer to 1 it is, the stronger the correlation is. • The empirical (nij) or theoretical (𝑛ො 𝑖𝑗 ) number must be at least 5. Correlation ratios 𝑆 2 𝑦ത𝑖 𝑆 2 𝑥𝑗ҧ 𝑒𝑦𝑥 = 2 ; 𝑒𝑥𝑦 = 2 ; 𝑆 (𝑦) 𝑆 (𝑥) where: 2 σ𝑘𝑖=1 𝑦 ത𝑖 − 𝑦 ത ∙ 2 𝑛𝑖 . σ𝑙𝑗=1 𝑥𝑗ҧ − 𝑥ҧ ∙ 𝑛.𝑗 𝑆 2 𝑦ത𝑖 = ; 𝑆 2 𝑥𝑗ҧ = ; 𝑛 𝑛 σ𝑙𝑗=1 𝑦𝑗 ∙ 𝑛𝑖𝑗 σ𝑘𝑖=1 𝑥𝑖 ∙ 𝑛𝑖𝑗 𝑦ത𝑖 = ; 𝑥𝑗ҧ = ; 𝑛𝑖 . 𝑛.𝑗 𝑦ത𝑖 – conditional means of variable y, 𝑥𝑗ҧ – conditional means of variable x. Correlation ratios – properties • They are not symmetric (exy ≠ eyx). • They can be calculated only for contingency table. • At least one variable – dependent (y for the coefficient eyx and x for the coefficient exy) must be numerical (measurable). • It takes values from the interval [0, 1] – it measures only the correlation strength, not the direction. • If its value is 0 – then there is no correlation between the two features, if it is equal to 1 – then the correlation is functional. The closer to 1 it is, the stronger the correlation is. Pearson product-moment correlation coefficient cov(𝑥, 𝑦) 𝑟𝑥𝑦 = 𝑟𝑦𝑥 = ; 𝑆 𝑥 ∙ 𝑆(𝑦) where: cov 𝑥, 𝑦 = 𝑥 ∙ 𝑦 − 𝑥ҧ ∙ 𝑦; ത σ𝑛 𝑖=1 𝑥𝑖 ∙𝑦𝑖 𝑥∙𝑦 = – for the correlation series, 𝑛 σ𝑘 σ 𝑙 𝑖=1 𝑗=1 𝑥𝑖 ∙𝑦𝑗 ∙𝑛𝑖𝑗 𝑥∙𝑦 = – for the contingency table. 𝑛 2 2 𝑟𝑥𝑦 ∙ 100% = ∙ 100% – coefficient of linear determination. It says, 𝑟𝑦𝑥 in how many percent changes of one variable were determined by changes of the second one. Pearson product-moment correlation coefficient – properties • It is symmetric (rxy = ryx). • It can be calculated for both the correlation series or the contingency table. • Both variables must be strictly numerical. • The relations between variables must be linear – if it is not, then its value will be underestimated. • It takes the values from the interval [-1, 1] – it measures both the correlation strength, and the direction. • If the correlation is negative, then if one variable increases, the other decreases and vice versa. • If the correlation is positive, then if one variable increases, the other also increases and vice versa. • If its value is 0 – then there is no correlation between the two features, if it is equal to -1 or 1 – then the correlation is functional. The closer to -1 or 1 it is, the stronger the correlation is. Estimation of the degree of nonlinearity (only for the contingency table) As correlation ratios (exy and eyx) are always at least equal to the |rxy|=|ryx|, the formulas: 2 2 2 2 𝑚𝑥𝑦 = 𝑒𝑥𝑦 − 𝑟𝑥𝑦 , 𝑚𝑦𝑥 = 𝑒𝑦𝑥 − 𝑟𝑦𝑥 measure the degree of nonlinearity of relationship (mxy – x is dependent on y and myx – y is dependent on x). Spearman’s rank coefficient • We use this coefficient when: • variables are numerical, but conditions for the Pearson product-moment correlation coefficient (linearity of relationship and normality of variables) are not satisfied, • when at least one variable is measured on the ordinal scale. • In the first step we assign ranks to the variabes: • We set the values of both variables in the ascending or descending order (but we must be consequent – both variables must be set in the same order). • We assign ranks to the subsequent values. • If two or more units have the same values, we calculate the mean from the subsequent ranks that would be assigned to them. Spearman’s rank coefficient cov(𝑟𝑥 , 𝑟𝑦 ) 𝑅𝑥𝑦 = 𝑅𝑦𝑥 = 𝑆 𝑟𝑥 ∙ 𝑆(𝑟𝑦 ) where: 𝑟𝑥 = rank of variable 𝑥 𝑟𝑦 = rank of variable 𝑦 Spearman’s rank coefficient When all ranks are distinct, then the formula simplifies to: 6 ∙ σ𝑛𝑖=1 𝑑𝑖2 𝑅𝑥𝑦 = 𝑅𝑦𝑥 = 1 − 𝑛 ∙ 𝑛2 − 1 where: 𝑑𝑖 = rank 𝑥𝑖 − rank 𝑦𝑖 Spearman’s rank coefficient In case of tied ranks, we obtain: 𝑛3 − 𝑛 − σ𝑛𝑖=1 𝑑𝑖2 − 𝑇𝑋 − 𝑇𝑌 𝑅𝑥𝑦 = 𝑅𝑦𝑥 = 6 𝑛3 − 𝑛 𝑛3 − 𝑛 − 2 ⋅ 𝑇𝑋 ⋅ − 2 ⋅ 𝑇𝑌 6 6 where: 1 𝑇𝑥 = 𝑡𝑗3 − 𝑡𝑗 , 12 𝑗 1 𝑇𝑦 = 𝑢𝑘3 − 𝑢𝑘 , 12 𝑘 tj – number of observations having the same, j-th rank value of variable x, uk – number of observations having the same, k-th rank value of variable y. Spearman’s rank coefficient – properties • It is symmetric (Rxy = Ryx). • It can be calculated only for the correlation series. • Both variables must be at least on the ordinal scale. • It takes the values from the interval [-1, 1] – it measures both the correlation strength, and the direction. • If the correlation is negative, then if one variable increases, the other decreases and vice versa. • If the correlation is positive, then if one variable increases, the other also increases and vice versa. • If its value is 0 – then there is no correlation between the two features, if it is equal to -1 or 1 – then the correlation is functional. The closer to -1 or 1 it is, the stronger the correlation is. Regression analysis – empirical regression lines • They are drawn on the basis of the contingency table. • They are based on the conditional means. • Both variables must be strictly numerical (measurable). • We draw two lines joining the following points: xi ഥ𝒊 𝒚 ഥ𝒋 𝒙 yj x1 𝑦ത1 𝑥1ҧ y1 x2 𝑦ത2 𝑥ҧ2 y2 ⁝ ⁝ ⁝ ⁝ xk 𝑦ത𝑘 𝑥ҧ𝑙 yl Empirical regression lines – properties • Empirical regression lines cross each other in one point. • The smaller the angle between them is, the stronger dependence is. • If the empirical regression lines directly cover each other then the dependence is functional. • If the angle between them is 90 degrees then there is no dependence between both variables. • If one empirical line is ascending, the other is also ascending and dependence between variables is positive and vice versa. Theoretical regression lines • By theoretical regression lines we mean the fitted mathematical function that describes dependence between both variables. • Let us assume the linear regression between analysed variables: 𝑦 = 𝑎𝑦 ∙ 𝑥 + 𝑏𝑦 – variable y is the dependent one and x – independent 𝑥 = 𝑎𝑥 ∙ 𝑦 + 𝑏𝑥 – variable x is the dependent one and y – independent ay, ax – slope parameters, by, bx – intercepts. Parameters estimation • Parameters are estimated by means of the Ordinary Least Squares method (OLS). • Parameters estimates: cov(𝑥, 𝑦) 𝑟𝑦𝑥 ∙ 𝑆(𝑦) 𝑎𝑦 = 2 = ; 𝑏𝑦 = 𝑦ത − 𝑎𝑦 ∙ 𝑥.ҧ 𝑆 (𝑥) 𝑆(𝑥) cov(𝑥, 𝑦) 𝑟𝑥𝑦 ∙ 𝑆(𝑥) 𝑎𝑥 = 2 = ; 𝑏𝑥 = 𝑥ҧ − 𝑎𝑥 ∙ 𝑦. ത 𝑆 (𝑦) 𝑆(𝑦) ay – it says, how much the variable y will change, if the variable x increases by one unit. ax – it says, how much the variable x will change, if the variable y increases by one unit. by, bx – generally do not have the economic interpretation. Correlation between more than two variables • multiple correlation – the total influence of all independent variables on the dependent one; • partial correlation – the correlation between two variables, with the omission of the influence of remaining ones. Multiple correlation Multiple correlation coefficient is calculated by means of the following formula: det 𝑅𝑛 𝑅𝑦.𝑥1 ,𝑥2 ,…,𝑥𝑘 = 𝑅𝑤 = 1 − , det 𝑅𝑚 where: Rn – correlation matrix, Rm – correlation matrix after removing the row and the column that refer to the dependent variable. For three variables, the formula can be rewritten as follows: 2 2 𝑟12 + 𝑟13 − 2 ∙ 𝑟12 ∙ 𝑟13 ∙ 𝑟23 𝑅𝑦.𝑥1 ,𝑥2 = 𝑅1.23 = 2 . 1 − 𝑟23 Multiple correlation coefficient – properties • It takes values from the interval [0, 1] – it measures only the correlation strength, not the direction. • If its value is 0 – then there is no correlation, if it is equal to 1 – then the correlation is functional. The closer to 1 it is, the stronger the correlation is. • Squared multiple correlation coefficient gives us the coefficient of linear determination that says, in how many percent changes of the dependent variable were explained by changes of the independent ones. Partial correlation Partial correlation coefficient is calculated by means of the following formula: −𝑅12 𝑟12 − 𝑟13 ∙ 𝑟23 𝑟12.3 = = , 𝑅11 ∙ 𝑅22 2 ∙ 1 − 𝑟2 1 − 𝑟13 23 where: Rij – cofactor matrix of the element of the matrix Rn, standing in the i-th row and the j-th column: 𝑅𝑖𝑗 = −1 𝑖+𝑗 𝑀𝑖𝑗 , where: Mij – minor, or the determinant of the submatrix, originated by the removal of the i-th row and the j-th column from the matrix Rn. Properties of the partial correlation coefficient • It takes the values from the interval [-1, 1] – it measures both the correlation strength, and the direction. • If the correlation is negative, then if one variable increases, the other decreases and vice versa. • If the correlation is positive, then if one variable increases, the other also increases and vice versa. • If its value is 0 – then there is no correlation between the two features, if it is equal to -1 or 1 – then the correlation is functional. The closer to -1 or 1 it is, the stronger the correlation is.
Solutions Manual to accompany Miller & Freund’s Probability and Statistics for Engineers 8th edition 0321640772 - Download Instantly To Experience The Full Content
Solutions Manual to accompany Miller & Freund’s Probability and Statistics for Engineers 8th edition 0321640772 - Download Instantly To Experience The Full Content