Data Final Regression
Data Final Regression
1 Panel OLS
[32]: import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
data = pd.read_csv('/Users/isaacbougherab/Desktop/nour1_1.csv')
data_reset = data.reset_index()
print(model.summary())
1
C(company_id)[T.3] 1696.7247 228.659 7.420 0.000 1246.616
2146.833
C(company_id)[T.4] 959.7391 243.509 3.941 0.000 480.398
1439.080
C(company_id)[T.5] 682.1991 321.635 2.121 0.035 49.069
1315.330
C(company_id)[T.6] 1784.4294 200.859 8.884 0.000 1389.044
2179.815
C(company_id)[T.7] 579.2368 199.973 2.897 0.004 185.595
972.878
C(company_id)[T.8] 726.8245 211.744 3.433 0.001 310.012
1143.637
C(company_id)[T.9] 431.4618 208.545 2.069 0.039 20.946
841.977
C(company_id)[T.10] 980.8839 323.823 3.029 0.003 343.448
1618.320
C(company_id)[T.11] 747.9023 409.535 1.826 0.069 -58.256
1554.061
C(company_id)[T.12] 71.7132 449.902 0.159 0.873 -813.907
957.333
C(company_id)[T.13] 526.5121 242.000 2.176 0.030 50.143
1002.882
C(company_id)[T.14] 252.8241 192.842 1.311 0.191 -126.781
632.429
C(company_id)[T.15] 792.2757 384.911 2.058 0.040 34.589
1549.962
C(company_id)[T.16] 1265.7232 583.550 2.169 0.031 117.021
2414.425
C(company_id)[T.17] 802.2170 420.656 1.907 0.058 -25.833
1630.267
C(company_id)[T.18] 1032.9546 207.091 4.988 0.000 625.303
1440.607
C(company_id)[T.19] 501.1618 278.328 1.801 0.073 -46.719
1049.042
C(company_id)[T.20] 895.4253 218.496 4.098 0.000 465.322
1325.529
C(company_id)[T.21] 559.2011 186.371 3.000 0.003 192.334
926.068
C(company_id)[T.22] 539.3729 192.305 2.805 0.005 160.826
917.919
C(company_id)[T.23] 581.4894 232.680 2.499 0.013 123.466
1039.513
C(company_id)[T.24] 1280.7350 308.707 4.149 0.000 673.054
1888.416
C(company_id)[T.25] 217.5453 200.044 1.087 0.278 -176.236
611.327
C(company_id)[T.26] 96.4090 185.117 0.521 0.603 -267.989
460.807
2
C(company_id)[T.27] 392.1751 184.590 2.125 0.034 28.815
755.535
C(company_id)[T.28] 3252.6314 231.511 14.050 0.000 2796.908
3708.355
C(company_id)[T.29] 1101.5438 306.117 3.598 0.000 498.962
1704.126
C(company_id)[T.30] 177.4431 171.402 1.035 0.301 -159.957
514.843
C(company_id)[T.31] 857.0456 379.330 2.259 0.025 110.345
1603.746
C(company_id)[T.32] -31.4485 320.476 -0.098 0.922 -662.297
599.400
C(company_id)[T.33] 517.0795 272.877 1.895 0.059 -20.072
1054.231
C(year_id)[T.2014] -35.3306 76.691 -0.461 0.645 -186.294
115.633
C(year_id)[T.2015] 4.8012 76.507 0.063 0.950 -145.800
155.403
C(year_id)[T.2016] 42.2214 76.462 0.552 0.581 -108.291
192.734
C(year_id)[T.2017] 129.3250 76.690 1.686 0.093 -21.637
280.287
C(year_id)[T.2018] 151.9537 78.648 1.932 0.054 -2.863
306.771
C(year_id)[T.2019] 159.2207 78.128 2.038 0.042 5.428
313.013
C(year_id)[T.2020] 181.6642 80.725 2.250 0.025 22.759
340.569
C(year_id)[T.2021] 192.7232 82.584 2.334 0.020 30.159
355.288
C(year_id)[T.2022] 180.6225 85.513 2.112 0.036 12.293
348.952
ROA -1.6138 1.592 -1.014 0.312 -4.747
1.520
Leverage -0.8246 0.162 -5.082 0.000 -1.144
-0.505
div 161.1431 92.554 1.741 0.083 -21.047
343.333
independecy -1.5112 1.548 -0.976 0.330 -4.559
1.537
gender_composition 14.7329 3.572 4.125 0.000 7.702
21.763
size 315.0929 115.371 2.731 0.007 87.989
542.197
Q("board size") -28.5152 24.609 -1.159 0.248 -76.957
19.926
==============================================================================
Omnibus: 160.132 Durbin-Watson: 0.637
3
Prob(Omnibus): 0.000 Jarque-Bera (JB): 8650.525
Skew: -1.185 Prob(JB): 0.00
Kurtosis: 28.008 Cond. No. 3.85e+04
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
[2] The condition number is large, 3.85e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
p_values = model.pvalues
coeff_names = p_values.index
filtered_p_values = p_values[specified_variables]
filtered_names = coeff_names.intersection(specified_variables)
sorted_indices = np.argsort(filtered_p_values)
sorted_p_values = filtered_p_values[sorted_indices]
sorted_names = filtered_names[sorted_indices]
sorted_significance = [significance_labels[i] for i in sorted_indices]
plt.figure(figsize=(8, 6))
bars = plt.bar(sorted_names, sorted_p_values, color='skyblue')
plt.xticks(rotation=45)
plt.xlabel('Variables')
plt.ylabel('p-values')
plt.title('P-values and Significance Levels')
plt.axhline(y=0.05, color='r', linestyle='--', linewidth=0.7)
4
plt.text(bar.get_x() + bar.get_width() / 2.0, yval, label, ha='center',␣
↪ va='bottom', fontsize=8)
plt.tight_layout()
plt.show()
X = sm.add_constant(X)
vifdata = pd.DataFrame()
vifdata["Variable"] = X.columns
5
vifdata["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.
↪shape[1])]
latex_safe_print(vifdata)
Variable VIF
0 const 47.315123
1 ROA 1.069206
2 Leverage 1.226521
3 div 1.381452
4 independecy 1.351410
5 gender\_composition 1.151373
6 size 2.252770
7 board size 2.021347
const: The VIF value for the constant is very high (47.315123), but multicollinearity is not usually
a concern for the constant.
ROA: With a VIF of 1.069206, this variable shows no sign of significant multicollinearity.
Leverage: A VIF of 1.226521 indicates the absence of multicollinearity problems for this variable.
div: The VIF is 1.381452, which is also well below the common threshold of 5 or 10, suggesting
that there is no multicollinearity of concern.
independecy: With a VIF of 1.351410, this variable does not seem to suffer from multicollinearity.
gender composition: The VIF value of 1.151373 is low, indicating that there are no multicollinearity
issues here.
size: A VIF of 2.252770 is slightly higher than the other variables, but remains well below the
threshold of concern, indicating an absence of serious multicollinearity.
board size: The VIF of 2.021347 also suggests that this variable does not suffer from significant
multicollinearity.
residuals = model.resid
exog = model.model.exog
6
bp_test_rounded = dict(zip(labels, [round(val, 2) for val in bp_test]))
print(bp_test_rounded)
jb_test = jarque_bera(residuals)
residuals = model.resid
dw_statistic = durbin_watson(residuals)
7
2 OLS demonstration
[29]: import pandas as pd
import statsmodels.api as sm
import numpy as np
from scipy.stats import t
company_dummies = company_dummies.astype(int)
year_dummies = year_dummies.astype(int)
#X'X
XTX = np.dot(X.T, X)
#(X'X)^-1
XTX_inv = np.linalg.inv(XTX)
#X'Y
XTY = np.dot(X.T, Y)
n, k = X.shape
residuals = Y - y_pred
se = np.sqrt(np.diagonal(mse * XTX_inv))
t_values = beta / se
#R²
8
ss_total = np.sum((Y - np.mean(Y))**2)
ss_res = np.sum(residuals**2)
r_squared = 1 - ss_res / ss_total
#R² ajusté
r_squared_adj = 1 - (1 - r_squared) * (n - 1) / (n - k)
results_df = pd.DataFrame({
'Coefficient': beta,
'Std. Error': se,
't-value': t_values,
'P>|t|': p_values
}, index=['const'] + list(X_original.columns) + list(company_dummies.columns) +␣
↪list(year_dummies.columns))
additional_stats = {
'R-squared': r_squared,
'Adj. R-squared': r_squared_adj,
'No. Observations': n
}
results_df['P>|t|'] = results_df['P>|t|'].round(4)
print(results_df)
print("\nAdditional Statistics:")
for stat, value in additional_stats.items():
print(f"{stat}: {value:.3f}")
9
company_12 71.713155 449.902047 0.159397 0.8735
company_13 526.512143 241.999528 2.175674 0.0304
company_14 252.824070 192.842361 1.311040 0.1909
company_15 792.275716 384.910750 2.058336 0.0405
company_16 1265.723225 583.549915 2.169006 0.0309
company_17 802.217004 420.655878 1.907062 0.0575
company_18 1032.954570 207.090541 4.987937 0.0000
company_19 501.161788 278.327716 1.800618 0.0728
company_20 895.425312 218.496163 4.098128 0.0001
company_21 559.201149 186.371414 3.000466 0.0029
company_22 539.372896 192.304671 2.804783 0.0054
company_23 581.489429 232.679506 2.499100 0.0130
company_24 1280.735002 308.706780 4.148710 0.0000
company_25 217.545348 200.044300 1.087486 0.2778
company_26 96.408979 185.117340 0.520799 0.6029
company_27 392.175069 184.589913 2.124575 0.0345
company_28 3252.631383 231.511421 14.049550 0.0000
company_29 1101.543797 306.116541 3.598446 0.0004
company_30 177.443145 171.401973 1.035246 0.3014
company_31 857.045569 379.329949 2.259367 0.0246
company_32 -31.448507 320.475929 -0.098131 0.9219
company_33 517.079466 272.877189 1.894916 0.0591
year_2014 -35.330639 76.690696 -0.460690 0.6454
year_2015 4.801188 76.506748 0.062755 0.9500
year_2016 42.221361 76.461543 0.552191 0.5813
year_2017 129.324965 76.689925 1.686336 0.0928
year_2018 151.953732 78.648230 1.932068 0.0544
year_2019 159.220694 78.127940 2.037948 0.0425
year_2020 181.664185 80.724961 2.250409 0.0252
year_2021 192.723196 82.584161 2.333658 0.0203
year_2022 180.622451 85.512668 2.112230 0.0356
Additional Statistics:
R-squared: 0.836
Adj. R-squared: 0.808
No. Observations: 329.000
[ ]:
10