Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Bi Is The Slope of The Regression Line Which Indicates The Change in The Mean of The Probablity Bo Is The Y Intercept of The Regression Line

Download as rtf, pdf, or txt
Download as rtf, pdf, or txt
You are on page 1of 5

Simple Regression Analysis

Regression Analysis is a statistical tool that utilizes the relation between two or more quantitative
variables so that one variable can be predicted from the other or others.

Functional Relation

A functional relation between two variables is expressed by a mathematical formula given

Y=f(X) , where a given particular value of X, the function f indicates the corresponding value of Y.

The observed values fall directly on the curve of relationship.

Statistical Relation

A statistical relation, unlike a functional relation, is not a perfect one. In general, the observations for a
statistical relation do not fall directly on the curve of relationship.

**Note

No matter how strong the statistical relation between X and Y, no cause and effect pattern is implied by
the regression model.

Important Features of Model

1. The response Yi in the ith trial is the sum of two components: constant and random term
2. Error terms are assumed to have constant variance.
3. Error terms are assumed to be uncorrelated

Meaning of Regression Parameters


The parameter Bo and Bi in regression model are called regression coefficients.
Bi is the slope of the regression line which indicates the change in the mean of the probablity
distribution of Y per unit increase in X
Bo is the Y intercept of the regression line

Data for Regression Analysis

Observational Data - obtained from nonexperimental studies - such studies do not control the
explanatory or predictor variable(s) of interest.

Experimental Data - naa'y treatment and experimental units

Properties of Fitted Regression Line

1. Sum of the residuals is zero


2. Sum of the squared residuals is a minimum
3. sum of observe values = sum of fitted values
4. Sum of weighted residuals is zero when the residual in the ith trial is weighted by the level of the
predictor variable in the ith trial (summation sa xiei = 0 )
5. Sum of weighted residual is zero when the residual in the ith trial is wieghted by the fitted value of the
response variable for the ith trial (summation sa (y-hat)iei = 0 )
6. regression line always go through the point (x-bar,y-bar)

Model

Multicollinearity and its effects

1. Multicollinearity or inter correlation exists when at least some of the predictor variables are correlated
among themselves
2. linear relation between the predictors
3. When independent variables are correlated, the regression coefficient of any independent variable
depends on which other independent variables are included and which variables are left. Thus, a
regression coefficient does not reflect any inherent effect of the particular independent variable on the
dependent variable but only a marginal or partial effect.

Interaction Regression Models

In statistics, an interaction may arise when considering the relationship among three or more variables,
and describes a situation in which the simulations influence of two variables on a third is not additive.
Most commonly, interactions are considered in the context of regression analyses.
Interaction effects occur when the effect of one variable depends on the value of another variable.

The output below tells us that the interaction effect (food*condiment) is statistically significant.
Consequently, we know that the satisfaction you derive from the condiment depends on the type of food
- the best way to interpret interacction effect and understand what the data are saying is to do
interaction plot (dependent = y , values of first indep var = x axis). On interaction plot, parallel lines
indicate that there is no interaction effect while different slopes suggest that one might be present.
Interpretation of Interaction Effect

Crossed line on the graph suggest that there is an interaction effect which the significant p-value for
food*condiment term confirms. The graph shows that enjoyment levels are higher for chocolate sauce
when the food is ice cream. Conversely, sastisfaction levels are higher for mustard when the food is hot
dog. If you put mustard on ice cream or chocolate sauce on hotdogs, you wont be happy.

Moderator Analysis - used to determine whether the relationship between 2 variables depends on the
value ofa third variable. Moderator Analysis is really just a multiple regression equation with an
interaction term.

ASSUMPTIONS INTERACTIONS EFFECT

1. Dependent variable should be measured on a continuous scale (either interval or ratio). Examples of
variables that meet this criterion include revision time (measure in hours), intelligence(measured using
IQ score), exam perf(0 to 100), weight(kg)
2. One independent variable which is continuous (interval or ratio variable) one moderator variable that
is dichotomous (nominal variable with two groups). Example of dichotomous variables include gender
(male/female), physical activity level (sedentary/active), body composition(normal/obese)
3. We should have independence of observations(independence of residuals) which we can check using
Durbin-Watson.
4. There needs to be a linear relationship between dependent variable and independent variable for
each group of dichotomous moderator variable. Daghan ways to check for linear relationship but create
a scatterplot and visually inspect scatterplot to check for linearity.
If relationship is not linear, run a non linear regression analysis or "transform" our data
5. Our data needs to show HOMOSCEDASTICITY, which is when the error variances are the same for all
combinations of independent and moderator variables. When analyzing, plot the studentized residuals
againsts unstandardized predicted values for both groups of moderator variable.
6. Data must not show MULTICOLLINEARITY, which occurs when two or more independent variables are
highly correlated with each other’s. It leads to problems with understanding which independent variable
contributes to the variance explained in the dependent variable
7. There should be no significant outliers, high leverage points (observations made at extreme or
outlying values of independent variables). These different classifications of unusual points reflect the
different impact they have on the moderated multiple regression.
All these points can have a very negative effect on the regression equations - it reduces the accuracy of
our results as well as the statistical significance.
8. Residuals (errors) are approximately normally distribute - methods to do this can be based either on
graphical or numerical methods.

MULTIPLE LINEAR REGRESSION

Yi = Bo +BiXi + B2Xi2 + Ei

Bo - Y intercept
B1 - change in the mean response for unit increase in Xi when X2 is held constant
B2 - change in the mean response per unit increase in X2 when X1 is held constant.

*Note - when independent variables are correlated among themselves, inter-correlation or


multicollinearity among them is said to exist.

ASSUMPTIONS OF MULTIPLE LINEAR REGRESSION

1. Linear relationship between outcome variable and independent variable from which scatterplots can
show whether there is a linear or curvilinear relationship is exhibited.
2. Multivariate Normality - multiple regression assumes that residuals are normally distributed - errors
between observed and predicted values should be normally distributed. Normality can be checked
through kolmogorov-smirnov test though this test must be conducted on the residuals themselves.
3. No Multicollinearity - multiple regression assumes that the independent variables are not higly
correlated with each other -- tested using Variance Inflation Factor values. Multicollinearity may be
checked multiple ways: correlation matrix - magnitude of correlation coefficient should be less than 0.80
VIF higher than 10 indicate multicollinearity. If multicollinearity is found, one possible solution is to
center the data - to center, subtract mean score from ech observation for each independent variable.
Pero the simplest is to identify variables causing multicollinearity issues and drop the variables from the
regression.
4. Homoscedasticity - this assumption states that the variance of error terms are similar across the
values of the independent variables. Plot of standardized residuals vs predicted values show whether
points are equally distributed across all values of the independent variables. There should be no clear
pattern in the distribution. If there is a cone-shaped pattern, data is heteroscedastic.

BUILDING REGRESSION MODELS

Variable selection is intended to select "best" subset of predictors. In building a model, we aim to select
a set of explanatory variables that best explains the variability in the outcome.

Descriptive Analysis - assess distributions of explanatory variables and to graphically evaluate their
association with the outcome - this analysis will give you preliminary idea about association of
explanatory variables with the outcome.
Univariable Analysis - we conduct such analysis to test unconditional association of each explanatory
variable with the outcome

Testing of Collinearity - if two explanatory variables are highly correclated with each other, they can
cause problems during multivariable analysis. For two quantitative variables, correlation is tested by
Pearson correlation. Spearman rank correlation coefficient if they are not approximately normally
distributed.

Multivariable Model Building - allows us to test associations of variables with the outcome after
accounting for other varibles.

Backward Elimination
Test one interaction term at a time and delete if it is not significant; keep on deleting terms until all
terms remaining in the model are statistically significant. Basta oy keep deleting terms until ang mibilin
kay signifcant

Forward Selection
Start sa null model. dayn pag add2 until significant tnan.

Stepwise Selection
Both forward and backward are conducted.

You might also like