Multiple Regression
Multiple Regression
Lecturer, Faculty of Business Regression Analysis The regression analysis is a technique of analyzing or studying the dependence of one variable (called dependent variable), on one or more variables (called explanatory variables), with a view to estimating and or predicting the population mean or average value of the former in terms of the known or fixed values of the latter. Uses of Regression Analysis 1. Estimate the relationship that exists, on the average, between the dependent and the explanatory variables. 2. Determine the effect of each of the explanatory variables on the dependent variable, controlling the effects of all other explanatory variables. 3. Predict the value of the dependent variable for a given value of the explanatory variable. Multiple Regression Model The regression equation that describes how the dependent variable related to the independent or explanatory variables error term is
and an
is called the multiple regression model. The multiple independent variables can be written as:
where,
the value of dependent variable s = the value of ith independent variable, intercept the regression coefficients the random error component
Lecturer, Faculty of Business The equation that describes how the mean value of independent or explanatory variables is related to the
regression equation. The multiple regression equation with variables can be written as:
where, variable
Estimated Multiple Regression Equation Generally, the parameters , are unknown in the regression
model. The ordinary least square (OLS) method is used to estimate the parameters based on the sample observations. The estimated values of the parameters provide the following estimated multiple regression equation:
where,
and
In generally, the estimated equation of the multiple regression model by ordinary least square method based on sample observations is known as estimated multiple regression equation.
Lecturer, Faculty of Business Assumptions about the Error Term Model In order to estimate the unknown parameter by OLS, the error term should satisfy the following assumptions:
The error
is a random variable with mean or expected value . , is the same for all values of
of zero. That is
The variance of , denoted by the independent variable. The values of The error
are independent.
Relationship among SST, SSR and SSE In multiple regression analysis, the total sum of squares (SST) can be partitioned into two components: the sum of squares due to regression (SSR) and the sum of squares due to error (SSE). The relationship among them is given below: SST = SSR + SSE where, SST = total sum of squares = SSR = sum of squares due to regression = SSE = sum of squares due to error = Multiple Coefficient of Determination The multiple coefficient of determination is defined as the ratio of sum of squares due to regression (SSR) and sum of squares due to error (SSE). It is used to evaluate the goodness of fit for the estimated regression equation. It is denoted by and mathematically can be written as:
proportion of the variability in the independent variable that can be explained by the estimated multiple regression equation. Adjusted Multiple Coefficient of Determination The adjusted multiple coefficient of determination can be calculated as:
where,
= adjusted multiple coefficient of determination = multiple coefficient of determination = number of observations = number of regression coefficient including intercept.
Example 1 Butler Trucking Company, an independent trucking company in southern California involves deliveries throughout its local area. To develop the better work schedule, the manager wants to estimate the total daily travel time for their drivers. Initially, the manager believed that the total travel time would be closely related to the number of miles traveled and number of deliveries. A simple random sample of 10 drivers provided the information about total daily travel time (in hours), miles travel and number of deliveries. The multiple regression output is given below:
Predictor Constant miles deliveri S = 0.5731 Coef -0.8687 0.061135 0.9234 SE Coef 0.9515 0.009888 0.2211 T -0.91 6.18 4.18 P 0.392 0.000 0.004
R-Sq = 90.4%
R-Sq(adj) = 87.6%
Analysis of Variance Source Regression Residual Error Total DF 2 7 9 SS 21.601 2.299 23.900 MS 10.800 0.328 F 32.88 P 0.000
Lecturer, Faculty of Business a) Develop the regression model of travel time (hours) on miles traveled and number of deliveries. b) Write down the estimated regression equation interpretation on the regression coefficients. and make
c) Make a comment on the goodness of fit of the estimated regression equation. d) Test whether there is a significant effect of miles travel on travel times at 5% level of significance. e) Test whether there is a significant effect of number of deliveries on travel times at 5% level of significance. f) Test whether the overall relationship between travel time and the set of independent variables miles travel and number of deliveries is significant at 5% level of significance. g) Estimate travel time hours when miles travel and number of deliveries are 70 and 4 respectively. Solution a) The regression model of travel time (hours) on miles traveled and number of deliveries can be written as:
where,
= the value of travel time (hours) = intercept are the regression coefficients = the value of miles travel = the value of number of deliveries = random error component
Alternative: The regression model of travel time (hours) on miles traveled and number of deliveries can be written as:
(or Interpretation
The estimated coefficient 0.0611 indicates that for an increase of one mile in the distance traveled, the expected travel time will increase by 0.0611 hours when the number of deliveries is held constant. The estimated coefficient 0.9231 indicates that for an increase of one delivery, the expected travel time will increase by 0.9231 hours when the number of miles traveled is held constant. c) From the output, we have which indicates that 90.4% of the
variability in travel time is explained by the estimated multiple regression equation with miles traveled and number of deliveries as the independent variables. Or from the adjusted we may conclude that 87.6% of the variability in
travel time is explained by the estimated multiple regression equation with miles traveled and number of deliveries as the independent variables. d) Critical Value Approach: In order to know whether there is a significant effect of miles traveled on travel time, we need to test the following hypothesis: against
Under the null hypothesis, the value of test statistic is given as:
is
As
From the hypothesis test, we may conclude that there is a significant effect of miles travelled on travel time (hours). p Value Approach: In order to know whether there is a significant effect of miles traveled on travel time, we need to test the following hypothesis: against
As p value
From the hypothesis test, we may conclude that there is a significant effect of miles travelled on travel time (hours). e) Critical Value Approach: In order to know whether there is a significant effect of number of deliveries on travel time, we need to test the following hypothesis: against
Lecturer, Faculty of Business Under the null hypothesis, the value of test statistic is given as:
is
As
From the hypothesis test, we may conclude that there is a significant effect of number of deliveries on travel time (hours). p Value Approach: In order to know whether there is a significant effect of number of deliveries on travel time, we need to test the following hypothesis: against
As p value
From the hypothesis test, we may conclude that there is a significant effect of number of deliveries on travel time (hours). f) Critical Value Approach: To test whether the overall relationship between travel time and the set of independent variables miles travel and number of deliveries is significant or not, we need to test the following hypothesis: against
Lecturer, Faculty of Business Under the null hypothesis, the value of test statistic is given as:
As
From the hypothesis test, we may conclude that there is a significant overall relationship between travel time and the set of independent variables miles travel and number of deliveries. p Value Approach: To test whether the overall relationship between travel time and the set of independent variables miles travel and number of deliveries is significant or not, we need to test the following hypothesis: against
As p value
From the hypothesis test, we may conclude that there is a significant overall relationship between travel time and the set of independent variables miles travel and number of deliveries. g) The estimated travel time when miles travel and number of deliveries are respectively 70 and 4 will be
Lecturer, Faculty of Business Hence we can say that the estimated travel time will be 8.84 hours if the driver wants to travel 70 miles and give 4 deliveries. Example 1 Consider the following data for a dependent variable independent variables and . and two
96 90 95 92 94
a) Compute the total sum of squares (SST), sum of squares due to regression (SSR), and sum of squares due error (SSE).
b) Find the value of
and adjusted
c) Compute F and perform the appropriate F test at 5% level of significance. Solution a) We know that, Total sum of squares (SST) = Sum of squares due to regression (SSR) = Sum of squares due to error (SSE) = SST = SSR + SSE Calculation Table: and
96
5.0
1.5
96.43
6.76
9.15
Lecturer, Faculty 90 95 92 94
93.40
Hence,
Total sum of squares (SST) = 23.20 Sum of squares due to regression (SSR) = 22.26 Sum of squares due to error (SSE) = SST SSR = 23.20 22.26 = 0.94 b)
0.96
Comment: The
The adjusted
will be
in y can be explained by the estimated regression equation with independent variables and .
Test Statistic: Under the null hypothesis, the test statistic will be
Critical Value: The critical value of F with degrees of freedom 2 and 2 will be
Comment: From the hypothesis test, we may conclude that at least one parameter is not equal to zero.