Proiect Econometrie
Proiect Econometrie
Proiect Econometrie
expectancy
In Europe
Professor: Daniela erban
Student: Brlic Denisa Andreea
Group 136, Series B
Academy of Economic Studies
Faculty of Business Administration
Page | 2
TABLE OF CONTENTS
INTRODUCTION.................................................................................................................... 3
SHORT THEORY ................................................................................................................... 3
HYPOTHESIS TESTING...4
Problem 1 ........................................................................................................................................ 4
Problem 2.5
SIMPLE LINEAR REGRESSION ........................................................................................ 6
MULTIPLE LINEAR REGRESSION ................................................................................ 10
CONCLUSION...14
REFERENCES ....................................................................................................................... 14
ANNEX14
Page | 3
This project tries to apply the concepts studied in Econometrics Course on real life data samples, explore
the relationships between variables and analyze them. This study is divided in three different parts, each
revolving around a concept. The first part represent the hypothesis testing, the second part depicts the
analysis of a simple regression, while the third depicts that of a multiple regression. The simple
regression is built on a database which comprises data about the correlation between Life expectancy
and Total health expenditure per capita in the European countries with a very high Human Development
Index . As for the multiple regression, another variable, Physicians density per 1000 population, comes
into the equation.
Introduction
This project is meant to analyze both simple and multiple regressions for a given database regarding the
relation between life expectancy and total health expenditure per capita and physicians density per
1000 population in those countries of Europe with a very high Human Development Index (> 0.8) and
also to perform the hypothesis testing. In order to do this I used a database from the World Health
Organization. This database contains 30 European countries.
The project has the following structure: firstly two hypothesis tests are proposed, and then the simple
and multiple regressions will be analyzed.
For a better understanding of the three variables used in the analysis a short description is needed.
Life expectancy is the average number of years a population is living in a certain period. It is usually
divided into male and female, and is influenced by factors such as the quality of medicine, hygiene,
wars, etc, even though now usually refers only to persons who have a non-violent death. Life expectancy
at birth is an estimate of the average number of years to be lived a group of people born in the same
year if the movements in the mortality rate of the evaluated region remained constant. It is one of the
most common indicators of the quality of life, although it is difficult to be measured.
Physicians density per 1000 population represents the number of physicians (medical doctors) per 1000
population. One of the most important challenges for a country's health system is preparing the health
workforce to work towards the attainment of its health objectives. Methodologically, there are not any
gold standards for achieving a sufficient health work which will address the health care needs of a given
population.
Total health expenditure per capita is the sum of public and private health expenditure divided by
population. It covers the provision of health services (preventive and curative), family planning activities,
nutrition activities, and emergency aid designated for health but does not include provision of water and
sanitation.
Short theory
Methods used: hypothesis testing, two regression models: (simple and multiple).
Page | 4
A statistical hypothesis test is a method to judge whether a statistical population is consistent with that
observed data in the sample of that population. A hypothesis is an assumption. We have an initial
hypothesis (null hypothesis), a second hypothesis (alternative hypothesis, with the mention that
alternative doesnt always mean opposite) and an event. We want to decide if null hypothesis can be
rejected in favor of alternative hypothesis. We can have two types of errors: if we reject the null
hypothesis when it is true we are said to have committed the error of type 1; if we fail to reject the null
hypothesis when in reality its false we have committed the error of type 2 (second rang error).
A linear regression is a statistical tool used in many scientific studies, while a regression analysis is a
measurement method for the correlation between two or more phenomena. For example, if we want to
know what is the connection between the square per meter area of a house and its price, or what is the
connection between the time spent by students to make their French homework and the grades they
receive, the regression analysis can make these links. However, this correlation is not the cause. In
consequence, careful studies are needed to locate meaningful cause-effect relations. There are two
types of regression model: simple and multiple. The simple linear regression explain the correlation
between two variables, one dependent (regressant) and one independent (regressors) using a first
degree function. The multiple linear regression model analyze the impact of various simultaneous
influences by fitting a linear equation to the observed data. It consists of more than two variables, one
dependent (regressant) and two or more independent (regressors).
Hypothesis testing
1. According to the World Bank database, the average total health expenditure per capita in
Europe is 3.36 with a standard deviation of 1.09 (n=40 countries). According to a database from
World Health Organization the average total health expenditure per capita in Europe is 3.37
with a standard deviation of 1.06 (n=40 countries). Apply a relevant test in order to find out if
there is a significant difference between the two sources of data.
Steps:
Define the hypothesis
H
0
(null hypothesis): There is no difference between the two sources of data
H
1
(alternative hypothesis): There is a significant difference between the two sources of data
Formalize the hypothesis
H
o
:
1
=
2
H
1
:
1
-
2
0
Establish the type of test
Both sided test upon two means
Page | 5
Choose
= 5%
Establish critical values and rejection region
Cut-off values = 1.96
Rejection Region = ( - ; -1.96) U (1.96; )
Compute Z
calc
Z
calc
=
= -0.04
Interpretation of result
-1.96 < -0.04 < 1.96
Z
calc
doesnt belong to the rejection region, therefore we cannot reject H
0
in 95% of cases, which means
we reject H
1
with 5% chances of being wrong.
Conclusion
There is no significant difference between the two sources of information.
2. It is believed by the World Health Organization that the percentage of countries which have the
Life expectancy bellow 76 years in Europe is 20%. To test this assumption a database from the
World Health Organization was used with a sample of 40 countries. It was found that 11
countries out of 40 have a Life expectancy bellow 76.
= 20%, n = 40, = 11/40 * 100 = 27.5%
Steps:
Define the hypothesis
H
0
(null hypothesis): The percentage of countries with a Life expectancy bellow 76 years is 20%
H
1
(alternative hypothesis): The percentage of countries with a Life expectancy bellow 76 years is
different of 20%
Formalize the hypothesis
Page | 6
H
o
: = 20%
H
1
: : 20%;
Establish the type of test
Both sided test
Choose
= 5%
Establish critical values and rejection region
Cut-off values = 1.96
Rejection Region = ( - ; -1.96) U (1.96; )
Compute Z
calc
( )
( )
Interpretation of result
-1.96 < 1.18 < 1.96
Z
calc
falls into the non rejection region. In 95% of cases we do not have enough sample evidence to reject
the null Hypothesis.
Conclusion
The sample evidence is compatible with the World Health Organizations belief.
Simple linear regression model
Step 1: Identifying and specifying the variables
In this case we will have the following variables recorded for a sample of 30 countries:
Dependent variable (the regressant) Y: Life expectancy
Independent variable (regressor) X1: Total health expenditure per capita
Page | 7
Step 2: Characterize the correlation between the dependent and independent variables
Regression Statistics
Multiple R 0.79469895
R Square 0.631546422
Adjusted R Square 0.618387365
Standard Error 1.887251421
Observations 30
Multiple R is the correlation coefficient between the actual (observed) and predicted values. It ranges
from 0 to 1. This case indicates the degree of correlation as being 0.79469895. It tends to 1 so in
consequence determine a quite high level of correlation between Life expectancy and total health
expenditure . The covariance of regressor is influencing with high intensity.
R Squared (R
2
) is the percent of the variance in the dependent variable, explained by the independent
variables. In this case R
2
is 0.631546421 and is greater than 0.5 which means a relative high relation
between the variables. 63% of the life expectancy is explained by the regressor.
Adjusted R Square is determining the variation of Life expectancy explained by the Physicians
density/1000 population and the Total health expenditure per capita, but also taking into account other
influences not recorded. Adjusted R
2
is 0.618387365 which means 61% of variation is explained by the
regressor.
The average difference between the actual Life expectancy and the predicted Life expectancy values
according to the linear function is 1.8872$.
Step 3: Type of the regression model
The life expectancy is described by the total health expenditure per capita. Based on the sample outputs
we can write the least squares regression equation which has the following general form:
, where:
Yi = estimated (or predicted) y value for observation i;
0
= estimate of the regression intercept;
1
= estimate of the regression slope;
x
i
= value of x for observation y;
= error.
Page | 8
The linear function obtained from a sample of 30 countries is:
Life expectancy = 74.00609 + 0.0017 Total health expenditure per capita +
The intercept is 74.00609, which means that if the total health expenditure per capita is 0, the value of
life expectancy would be 74.00609 years.
Because the slope is positive it influences in a positive way the Life expectancy. This means that an
increase in the total health expenditure per capita of 1$ leads to an increase of 0.0017$ in the Life
expectancy.
Step 4: Is the model valid for the sample of 30 countries?
Step 1: We have the following hypotheses:
H
0
: The model is not statistically valid, meaning that Life expectancy values have the same level.
1
=
2
= =
30
H
1
: The model is valid, meaning we can identify at least two predicted Life expectancy values which are
different, therefore the slope exists.
i
j
Step2: Overall variations
We chose the significance level to be =5% (the chosen probability to commit error type 1)
SS Residual (Sum of Squares residual) value is 99.728 and represents the sum of all errors from the
regression.
Step 3: Average variation:
MS regression =
= 170.938
MS residual =
= 3.561, where:
n= sample size
k= number of regressor
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0%Upper 95.0%
Intercept 74.00609191 0.842642385 87.82622 9.82E-36 72.28001725 75.732167 72.2800173 75.7321666
Total health
expenditure
per capita 0.001708782 0.000246659 6.927719 1.57E-07 0.001203525 0.002214 0.00120352 0.00221404
Page | 9
Step 4: Fisher Test:
F
calc
=
=
= 47.9932
Step 5: Significance F:
The significance F is 1.56872E-07, which according to Excel scientific notation means that the
significance F has the value of
, where:
Yi = estimated (or predicted) y value for observation i;
0
= estimate of the regression intercept;
1
= estimate of the regression slope caused by x
1
;
2
= estimate of the regression slope caused by x
2
;
x
i
= value of x for observation y;
= error.
The linear function obtained from a sample of 30 countries is:
Life expectancy = 72.62 + 0.4423 Physicians density + 0.0016 Total health expenditure per capita +
The intercept is 72.62, which means that if both physicians density/1000 population and total health
expenditure per capita are 0, the value of life expectancy would be 72.62 years.
Because both slopes are positive they influence in a positive way the Life expectancy. This means that an
increase in the total health expenditure per capita of 1$ leads to an increase of 0.0016 years in the Life
expectancy. While an additional physician leads to an increase of 0.44 years in the Life expectancy.
Step 4: Is the model valid for the sample of 30 countries?
Step 1: We have the following hypotheses:
H
0
: The model is not statistically valid, meaning that Life expectancy values have the same level.
1
=
2
= =
30
H
1
: The model is valid, meaning we can identify at least two predicted Life expectancy values which are
different, therefore the slope exists.
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 72.62001037 1.725478392 42.08688 3.7E-26 69.07962119 76.1604 69.0796212 76.16039955
Physicians density
(physicians/1,000
population) 0.442391061 0.480170904 0.92132 0.365043 -0.542838241 1.4276204 -0.5428382 1.427620363
Total health
expenditure
per capita 0.001655014 0.000254119 6.512744 5.54E-07 0.001133604 0.0021764 0.0011336 0.002176424
Page | 12
i
j
Step2: Overall variations
We chose the significance level to be =5% (the chosen probability to commit error type 1)
SS Residual (Sum of Squares residual) value is 96.688 and represents the sum of all errors from the
regression.
Step 3: Average variation:
MS regression =
= 86.989
MS residual =
= 3.581, where:
n= sample size
k= number of regressor
Step 4: Fisher Test:
F =
=
= 24.2915
Step 5: Significance F:
The significance F is 1.12E-11, which according to Excel scientific notation means that the significance F
has the value of
,
which means that we have strongly statistical support to reject the null hypothesis stating that the
slope is 0 and we are sure in 95% that we can make the inference from sample to the entire
population.
Confidence interval for the slope coefficient B1 is (-0.542838, 1.42762) for 95% confidence level. The
confidence class can not be interpreted because it comprises 0. The effect on Life expectancy is small.
We are stating that we are 95% sure that the slope in the population can be 0. Therefore we can not
reject the null hypothesis claiming the fact that the slope is 0.
Confidence interval for the slope coefficient B2 is (0.001133, 0.002176) for 95% confidence level and
does not comprise the value 0. In consequence we can extend the results of the sample to the entire
population.
From the Total health
expenditure Line Fit Plot we can
see that the line doesnt fit very
well to the scattered points.
Because the points are not
extremely gathered there is a
medium intensity of relation.
From the Physicians
density/1000 population Line Fit
Plot we can see that the
variances are constant or that
the points are homoscedastic.
60
80
100
0 2 4 6 8
L
i
f
e
e
x
p
e
c
t
a
n
c
y
Physicians density (physicians/1,000 population)
Physicians density
(physicians/1,000 population)
Line Fit Plot
Life expectancy
Predicted Life
expectancy
70
80
90
0 2000 4000 6000 8000
L
i
f
e
e
x
p
e
c
t
a
n
c
y
Total health expenditure per capita
Total health expenditure per
capita Line Fit Plot
Life expectancy
Predicted Life
expectancy
Page | 14
Conclusion
In conclusion, the analysis of both simple and multiple regression has proven that there is a positive
relationship between Life expectancy and Total health expenditure per capita for 30 countries that have
a very high Human Development Index in Europe. Also, we found out that Physicians density per 1000
population has a small effect on Life expectancy since the confidence interval comprises the value 0.
There is a high correlation between Life expectancy and Total health expenditure per capita, and also
between both regressors and Life expectancy for the 30 countries in Europe.
There are 20% of European countries where Life expectancy is bellow 76 years. This is not necessarily a
small value but it is for sure that we need to improve the health of the worlds population. Definitely,
measures must be taken to ensure our well being of us all.
All in all, we should take better care of ourselves so we wont become a statistic.
References
World health organization database
http://www.who.int/research/en/ Retrieved 8 January, 2014
Data base from where I chose the countries with a very high Human Development Index
http://en.wikipedia.org/wiki/List_of_sovereign_states_in_Europe_by_Human_Development_Ind
ex Retrieved 16 January, 2014
Annex: World Health Organization database
Country
Name
Life
expectancy
Physicians
density
(physicians/1,000
population)
Total health
expenditure
per capita $
Andorra 82 3.912 3403
Austria 81 4.862 4288
Belgium 80 3.782 3948
Croatia 77 2.715 1556
Cyprus 71 3756 782
Czech
Republic
78 3.708 2107
Denmark 79 3 4345
Estonia 76 3.343 1338
Finland 81 2.735 3226
France 82 3.381 3969
Page | 15
Finland 81 2.735 3226
Germany 81 3.689 4219
Greece 81 6.043 3054
Hungary 75 3.408 1510
Iceland 82 3.456 3577
Ireland 81 3.187 3761
Italy 82 3.802 3071
Latvia 74 2.899 1066
Lithuania 74 3.641 1292
Luxembourg 82 2.779 6592
Malta 80 3.226 2141
Netherlands 81 3.921 4881
Norway 81 4.076 5353
Poland 76 2.068 1391
Portugal 80 3.755 2690
Slovakia 76 3.000 2084
Slovenia 80 2.542 2551
Spain 82 3.961 3067
Sweden 82 3.868 3722
Switzerland 83 4.082 5105
United
Kingdom
80 343 3438