Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Thaophamm (1)

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

1.

Data
Data given:

X1 X2 X3 X4 X5 X6 X7
478 184 40 74 11 31 20
494 213 32 72 11 43 18
643 347 57 70 18 16 16
341 565 31 71 11 25 19
773 327 67 72 9 29 24
603 260 25 68 8 32 15
484 325 34 68 12 24 14
546 102 33 62 13 28 11
424 38 36 69 7 25 12
548 226 31 66 9 58 15
506 137 35 60 13 21 9
819 369 30 81 4 77 36
541 109 44 66 9 37 12
491 809 32 67 11 37 16
514 29 30 65 12 35 11
371 245 16 64 10 42 14
457 118 29 64 12 21 10
437 148 36 62 7 81 27
570 387 30 59 15 31 16
432 98 23 56 15 50 15
619 608 33 46 22 24 8
357 218 35 54 14 27 13
623 254 38 54 20 22 11
547 697 44 45 26 18 8
792 827 28 57 12 23 11
799 693 35 57 9 60 18
439 448 31 61 19 14 12
867 942 39 52 17 31 10
912 1017 27 44 21 24 9
462 216 36 43 18 23 8
859 673 38 48 19 22 10
805 989 46 57 14 25 12
652 630 29 47 19 25 9
776 404 32 50 19 21 9
919 692 39 48 16 32 11
732 1517 44 49 13 31 14
657 879 33 72 13 13 22
1419 631 43 59 14 21 13
989 1375 22 49 9 46 13
821 1139 30 54 13 27 12
1740 3545 86 62 22 18 15
815 706 30 47 17 39 11
760 451 32 45 34 15 10
936 433 43 48 26 23 12
863 601 20 69 23 7 12
783 1024 55 42 23 23 11
715 457 44 49 18 30 12
1504 1441 37 57 15 35 13
1324 1022 82 72 22 15 16
940 1244 66 67 26 18 16
Overview variables:

Variable Obs Mean Std. Dev. Min Max

x1 50 717.96 293.9388 341 1740


x2 50 616.18 573.7392 29 3545
x3 50 37.76 13.82036 16 86
x4 50 58.8 9.965246 42 81
x5 50 15.4 6.023762 4 34

x6 50 29.9 14.80106 7 81
x7 50 13.82 5.157479 8 36

- The value of X1 will range from 341 to 1740


- And 29 to 3545 for X2
- X3 we have the value from 16 to 86
- X4 range from 42 to 81
- X5 range from 4 to 34
- X6 range from 7 to 81
- X7 range from 8 to 36
Suppose we have a regression model:
X1 = β1 + β2X2 + β3X3 + β4X4 + β5X5 + β6X6 + β7X7
To reduce the impact of large data points on the model, in this study we
will transform the Natural Logarithm
 LnX1 = β1 + β2ln(X2) + β3ln(X3) + β4ln(X4) + β5ln(X5) + β6ln(X6)
+ β7ln(X7)
Regression model (1) using Stata, we obtain the following rerult:
Table 2: regression model Ln(X1) = β1 + β2ln(X2) + β3ln(X3) + β4ln(X4)
+ β5ln(X5) + β6ln(X6) + β7ln(X7)

Dependent variable: Ln(X1)

Method: Least Squares

Date:19/11/2024

Time:11:54 AM

Included observation:50
Source SS df MS Number of obs = 50
F( 6, 43) = 8.84
Model 3.69591933 6 .615986555 Prob > F = 0.0000
Residual 2.99679948 43 .069693011 R-squared = 0.5522
Adj R-squared = 0.4898
Total 6.69271881 49 .136586098 Root MSE = .26399

ln_x1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

ln_x2 .2603337 .0510629 5.10 0.000 .1573555 .3633118


ln_x3 .3287415 .1357469 2.42 0.020 .0549817 .6025012
ln_x4 .3108223 .4737788 0.66 0.515 -.6446437 1.266288
ln_x5 .0152426 .1959793 0.08 0.938 -.3799872 .4104725
ln_x6 .0955764 .1566919 0.61 0.545 -.2204229 .4115758
ln_x7 -.2479215 .2394942 -1.04 0.306 -.7309076 .2350645
_cons 2.776674 2.370395 1.17 0.248 -2.003684 7.557032

From the above estimation results we obtain:


(PRF): E(X1/X2,X3,X4,X5,X6,X7)= β1 + β2ln(X2) + β3ln(X3) + β4ln(X4) +
β5ln(X5) + β6ln(X6) + β7ln(X7)
(SRF): Ln(X5) = 2.77 +0.26ln(X2) +0.32ln(X3) + 0.31ln(X4)+0.01ln(X5)+
0.09ln(X6) – 0.24ln(X7)

2. Analyze regression results


2.1. Economic significance of regression coefficients
- B1 = 2.77 > 0
Shows that has a constant amount of 2.77
- B2 = 0.26 > 0
Inverse ratio shows that every 1 unit of ln(X2) increases will increase 0.26
unit ln(X1)
- B3 = 0.32 > 0
Inverse ratio shows that every 1 unit of ln(X3) increases will increase 0.32
unit ln(X1)
- B4 = 0.31 > 0
Inverse ratio shows that every 1 unit of ln(X4) increases will increase 3.49
unit of ln(X1)
- B5 = 0.01 >0
Inverse ratio shows that every 1 unit of ln(X5) increases will increase 0.01
unit ln(X1)
- B6 = 0.09 >0
Inverse ratio shows that every 1 unit of ln(X6) increases will increase 0.09
unit ln(X1)
- B7 = -0.24 < 0
Inverse ratio shows that every 1 unit of ln(X7) increases will decrease 0.24
unit ln(X1)

2.2. Statistical significance of regression coefficients


Test pair of hypotheses:

{ H 0: β J =0
H 1 : β J ≠0
(j=2)

^β −β
J j
Inspection standards: T= T ( n−2 )
Sⅇ ( β^ )

w α =(T :|T |>t 0.025=1.96)


48

Tqs2 = 5.1 > t 51 0.025 =1.96→ Reject H0, accept H1 → B2 is statistically


significant
Tqs3 = 2.24 → Reject H0, accept H1 → B3 is statistically significant
Tqs4 = 0.66 → Reject H0, accept H1→ B4 is statistically significant
Tqs5 = 0.08 → Reject H0, accept H1→ B4 is not statistically significant
Tqs6 = 0.61 → Reject H0, accept H1→ B4 is not statistically significant
Tqs7 = 1.04→ Reject H0, accept H1→ B4 is not statistically significant
Fix:
Eliminate x7,x6, x5, x4, so now the regression equation will be
reconsidered on the 3 variables x1, x3 and x2
Ln(X1) = β1 + β2ln(X2) + β3ln(X3)
Table 3: regression model Ln(X1) = β1 + β2ln(X2) + β3ln(X3)

Dependent variable: Ln(X1)

Method: Least Squares

Date:19/11/2024

Time:1:00 PM

Included observation:50
Source SS df MS Number of obs = 50
F( 2, 47) = 27.61
Model 3.61561629 2 1.80780814 Prob > F = 0.0000
Residual 3.07710252 47 .065470266 R-squared = 0.5402
Adj R-squared = 0.5207
Total 6.69271881 49 .136586098 Root MSE = .25587

ln_x1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

ln_x2 .2383047 .0400974 5.94 0.000 .1576392 .3189702


ln_x3 .2994823 .1194239 2.51 0.016 .0592324 .5397322
_cons 3.99405 .4259659 9.38 0.000 3.137117 4.850983

(PRF): E(X1/X3,X2)= β1 + β3ln(X3) + β2ln(X2)


(SRF): Ln(X1) = 3.99 + 0.299ln(X3) -0.238ln(X3)
Test pair of hypotheses:

{ H 0: β J =0
H 1 : β J ≠0
(j=2)

^β −β
J j
Inspection standards: T= T ( n−2 )
Sⅇ ( β^ )

w α =(T :|T |>t 0.025=1.96)


52

Tqs2 = > t 52
0.025 =1.96→ Reject H1, accept H0

→ B2 is statistically significant
Tqs3 = 3.33 > t 51
0.025 =1.96→ Reject H1, accept H0

→ B3 is statistically significant

3. Confidence interval for regression coefficients


The confidence interval for the regression coefficients is given by the
following formula:

 ^β i – t a/ 2 Se( ^β i) < βi < ^β i + t a/ 2 Se( ^β i)


n−k n−k

 The confidence interval for the intercept is calculated as:

^β 1 - t (n−2)
a/ 2
Se( ^β 1 ) < β1 < ^β 1 + t (n−2)
a/ 2
Se( ^β 1 )

 3.99-1.96*0.42< β1 < 3.99+1.96*0.42

 3.16< β1 <4.81
Holding reported violence and police funding constant, the crime rate is in
rage (3.16;4.81) unit

Similarly we have:
^β 2 - t (n−2) ^ ^ (n−2) ^
a/ 2 Se( β 2 ) < β2 < β 2 + t a/ 2 Se( β 2 )

0.15< β2 < 0.3


Under conditions where the reported violent crime rate remains constant,
the crime rate lies within the rage (0.15;0.3) unit
0.08<Β3 <0.51
Under the condition that annual police funding remains unchanged, the
crime rate is in the range (0.08;0.51) unit
4. Test the appropriateness of the model.
Test pair of hypotheses:

H0: R2 = 0

H1: R2 ≠ 0

2
R /(2)
F¿ F (2 ; 47 )
(1−R 2)/(51)

Domain rejected Wa = (F: F > F0.05(2;47) = 3.15)

We have: F = 27.6 ∈ Wa

 Reject H0, accept H1


 Suitable model

R2 = 0.5402 shows that the independent variable explains 54.02% of the


variation in the dependent variable

5. Check the model for defects


5.1 Multicollinearity
5.1.1 Testing multicollinearity phenomenon
Assuming that X2 and X3 are linear to each other, according to the
regression:
Ln(X2)= β1 + β2ln(X3)

Dependent variable: Ln(X2)

Method: Least Squares


Date:19/11/2024

Time:1:20 PM

Included observation:50

Source SS df MS Number of obs = 50


F( 1, 48) = 4.58
Model 3.88906193 1 3.88906193 Prob > F = 0.0374
Residual 40.720402 48 .848341709 R-squared = 0.0872
Adj R-squared = 0.0682
Total 44.6094639 49 .910397223 Root MSE = .92105

ln_x2 Coef. Std. Err. t P>|t| [95% Conf. Interval]

ln_x3 .8793954 .4107213 2.14 0.037 .053585 1.705206


_cons 2.899484 1.475121 1.97 0.055 -.0664458 5.865414

(SRF): Ln(X2) = 2.89 + 0.88ln(X3)


Test pair of hypotheses:

H0: R2 = 0

H1: R2 ≠ 0

2
R /(2)
F¿ F ( 1 ; 48 )
(1−R 2)/(48)

Domain rejected Wa = (F: F > F0.05(1;48) = 4.08)

We have: F = 18.21 ∈ Wa

Multicollinearity occurs

Fix

Remove X3: regression model Ln(X1) = β1 + β2Ln(X2)


Dependent variable: Ln(X1)

Method: Least Squares

Date:19/11/2024

Time:1:45 PM
Included observation:50

Source SS df MS Number of obs = 50


F( 1, 48) = 44.08
Model 3.20389451 1 3.20389451 Prob > F = 0.0000
Residual 3.4888243 48 .07268384 R-squared = 0.4787
Adj R-squared = 0.4679
Total 6.69271881 49 .136586098 Root MSE = .2696

ln_x1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

ln_x2 .2679943 .040365 6.64 0.000 .186835 .3491537


_cons 4.885961 .2469886 19.78 0.000 4.389357 5.382565

(SRF): Ln(X1) = 4.88+ 0.26ln(X2)


Test pair of hypotheses:

H0: R2 = 0

H1: R2 ≠ 0

2
R /(2)
F¿ F ( 1 ; 48 )
(1−R 2)/(48)

Domain rejected Wa = (F: F > F0.05(1;48) = 4.08)

We have: F = 44.08 ∈ Wa

R2 = 0.4787 shows that the independent variable explains 47.87% of the


variation in the dependent variable

You might also like