Regression Analysis (Simple)

Linear regression analysis is used to determine the relationship between two variables. The dependent variable (Y) is modeled as a linear function of the independent variable (X) using an equation of the form Y = a + bX + ε, where a is the intercept, b is the slope or regression coefficient, and ε is the error term. Regression analysis aims to 1) determine if a relationship exists, 2) describe the nature of the relationship mathematically, and 3) assess the accuracy of prediction. Key assumptions include a linear relationship between variables and random sampling of Y values for each X. Statistical tests evaluate the overall fit and significance of regression coefficients.

Uploaded by

MUHAMMAD HASAN NAGRA

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

101 views

Regression Analysis (Simple)

Uploaded by

MUHAMMAD HASAN NAGRA

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 8

Regression Analysis (Simple) With regression we are trying to be more reflective of the population than the mean (of

the Y, or dependent value) alone, which would otherwise be our best estimate of a predicted value from a set of given values. We are analyzing the relationship between variables. The statements: The more a candidate spends in a campaign, the more votes they will get And, Cabeiri is taller than Arzoo, are different in that the first implies a causal or functional relationship, and the second does not. One of the activities of researchers is to examine hypothesized functional relationships. Therein lies the rub of regression. The dependent variable is denoted Y, the independent variable, X. The variables will never be perfectly related, so there is always an error term. Variation from the regression line, can be thought of as having two parts: explained variation, which is accounted for by the independent variable, and unexplained variation, which is unaccounted for by the independent variable (this is error term). That is, part of the change in a variable is due to another variable that we hypothesize, and part is due to other factors outside our hypotheses. The relationship could be random as wella spurious one, and it is our role to determine if this is the case. Linear Regression: We are concerned with whether the relationship pattern between two values of variables can be described as a straight line, which is the simplest and most commonly used form. Remember from geometry class that a line is described by the formula: Y = a + bX (in geometry we said Y = mx + b where m was slope and b was y-int)

Where Y is the dependent variable, measured in units of the dependent variable, X is the independent variable, measured in units of the independent variable, and a and b are constants defining the nature of the relationship between the variables X and Y. The a or Y-intercept (aka Yint) is the value of Y when X = 0. The b is the slope of the line and is known as the regression coefficient and is the change in Y associated with a one-unit change in X.

The greater the slope or regression coefficient, the more influence the independent variable has on the dependent variable, and the more change in Y associated with a change in X. The regression coefficient is typically more important than the intercept from a policy researcher perspective as we are usually interested in the effect of one variable on another. Coming back to the equation, we also have a term to capture the error in our estimating equation, denoted or e. Also known as the residual, it reflects the unexplained variation in Y, and its magnitude reflects the goodness of fit of the regression line. The smaller the error, the closer the points are to our line. So our general equation describing a line is: Y = a + bX + e Remember, b is the regression coefficient and is interpreted as the change in Y associated with a one-unit change in X.

Example of interpretation of a regression equation: Say we are interested in the relationship between family food consumption and family income. We calculate a regression equation, in which consumption is denoted C and income I, both measured in dollars, of: C = 1375 + .064 I What is the intercept? 1375 What does it mean? That for a family with no income their food consumption is $1,375. What is the regression coefficient? How is it interpreted? For every dollar increase in family income there is a .064 dollar increase in food consumption. Note that we generally would have hypothesized a relationship and dep/indep variables. The relationship of I to C could have been reversed. The direction (sign) could have been opposite. This would likely reflect on a prior theory we may have had. The goal of regression is to draw a line through our data that best represents or describes the relationship between the two variables. Essentially we are trying to do better than just taking the mean observation. Simple regression is a procedure to find specific values for the slope and the intercept. If the line we draw to describe the data is upward sloping, the data suggest a positive relationship. If the line is downward sloping, the data suggest a negative relationship. If horizontal, the data suggest no relationship. In drawing our line, we want to minimize the distance between points and our linein the normal case we plot the dependent variable on the vertical (Y axis) and the independent variable on the horizontal (X axis).

Distance is then measured vertically from an observed point to our estimated line. Since we cannot draw a line that minimizes the distance between all points and the line at the same time, we need a way to average the distances to get a best-fitting line. In the most common form of regression analysis, the technique is to find the sum of the squared values of the vertical distance: (draw a scatterplot and demonstrate these things on it)

Y i

That form of regression is called Ordinary Least Squares, or Least Squares, and it has two key properties: 1. The sum of all actual values minus expected values equals zero 2. The sum of all (actual expected) squared is the minimum value possible. In equation form: 1. 2.

(Y
i

Y i

= minimum Y i

) =0

Hypothesized Regression Equation/Model and the Estimating Equation When we follow the steps in regression (coming up shortly) we come up with two forms of our regression line or model. The first is a hypothesized model (following the general format of steps to research design) From a previous example, on Effort and Performance in 520, we had this: Ex.: Some have hypothesized that there is a cause/effect relationship in this class: CAUSE EFFECT Efforti Performancei Independent Dependent

This relationship is expressed in an equation form that uses a CONSTANT and a PARAMETER: Gradei = 1.0 + .0002(hours) Constant Parameter

Constant, measured in units of the dependent variable, performance: grade points Parameter, measured in units of both, like gp/h or mph. In a more general expression of this, we might suggest this as our hypothesized model:

Gi = 0 + Grade of ith person, Hypothesized dep var, in u of a, constant in units (we know) of DV (unknown)

1 Ei + is regression slope coefficient in units of both DV and V, E is IV effort

i Error term, where i = Yi Y i ( = actualexpected)

Estimating Equation, where parameters (the betas) are determined (by computer): = b0 + b1xi Y i or: Gi = 1.0 + .0002 Ei

The formula for b = And a: a = Y bX

(Y Y )( X X ) ( X X )
2

Cross Section versus Time Series Fixed time versus measurement over time. The example above is fixed time, a snapshot in time. To denote a time series analysis, the subscript changes to t OLS cannot do pooled cross-sectional and time series Simple vs Complex or Multiple Regression Simple linear regression has only one independent variable: Yi = 0 + 1 Xi + i Multiple linear Regression has multiple independent variables Yi = 0 + 1 X1i + 2 X2i + 3 X3i + i Where linear means in the parameters (Bs are to the power of one) but not necessarily the variables.

REGRESSIONS 11 STEPS TO ULTIMATE HAPPINESS 1) Clearly define problem 2) Conceptualize problem (define appropriate variables, identify plausible reasons for change in dependent variable) 3) Operationalize 4) Hypothesize regression model 5) Collect data

6) Check for multicollinearity (multiple regression only) 7) Estimate OLS equation (computer) 8) Do statistical test a. For equationsum of squares b. For coefficients 9) Interpret coefficients 10) Check OLS assumptions 11) Conclusions, limitations Exercise top of 226 W&C in class, by pairs, on computer. Step 1) Define the problem, clearly define the question. Are expenditures per pupil related to the average performance of pupils on a standardized exam? Step 2) Conceptualize Problem: What are our variables? What might contribute to performance on standardized exam? How do we speculate the relationship might work? Step 3) Operationalize How would we measure this stuff? Expenditure in dollars per student, Performance on points on standardized exam. Step 4) Hypothesize Regression Model Yi = 0 + 1Xi + i Scorei = 0 + + 1Expenditurei + i Step 5) Collect data Thank you Welch and Comer, see table 225/226 on expenditure/scores. Step 6) Check for Multicollinearity (done! Well, not done, but only need for mult. regress) Step 7) Estimate OLS equation (can be done with Data Analysis tool in Excel, but well do simple form in the class exercise so people understand the deconstructed version of the black box that is excel) Step 8) Do Statistical Tests 8a. Goodness of fit

Simple Regression II Review/summary of objectives of regression: 1. To determine whether a relationship exists between two variables 2. To describe the nature of the relationship, should one exist, in the form of a mathematical equation 3. To assess the degree of accuracy of description or prediction achieved by the regression equation, and 4. In multiple regression, assess the relative importance of the various predictor variables in their contribution to variation of the dependent variable. Assumptions of Linear Regression: 1. Relationship is approximately linear (approximates a straight line in scatter plot of Y, X) 2. For each value of X there is a probability distribution of independent values of Y, and from each of these Y distributions one or more values is sampled at random. 3. The means of the Y distributions fall on the regression line. Thus any individual observation can vary from the line, and this variation is captured by the error term, . Left off at Step 8, Statistical tests. 8a) Overall Goodness of Fit Test Total sum of squares = sum of squares due to regression + sum of squares about regression:

(Y
TSS

Y = Y i

+ Yi Y i

SSDue

SSAbout (aka error, )

R2, or the coefficient of determination, is defined as the percent of variation in Y about its mean that is explained by the linear influence of the variation of X. Mathematically it is described by: R2 = SSD/TSS and will range between 0 and 1. Closer to one is a poorer model, closer to one is a better model. Example: say you had a regression model for which you calculated SSD/TSS as: 463.7/502.5 = .92 or, the model explains 92% of the variation about the mean. 8b) Statistical significance of regression coefficients Need to ask ourselves: statistically speaking, is 1 significantly different from zero? (We generally do not test the constant) Ho: 1 = 0 Ha: 1 0

b1 1 std .err.b1

where d.f.= n-k-1 where k is the number of independent variables

Say youve got a b1 of -.459 and a std err of b1 of .047: tcalc = -.459-0/.047 = -9.77 (standard errors, or calculated t) tcrit.alpha/2, n-k-1= tcrit .025, 8= 2.306 9) Interpret Regression Coefficients Change in X associated with a one-unit change in Y. Specific language for definition of b1 for time series and cross-sectional studies: Cross Sectional: If A is one unit higher on the independent variable than another B then A will be b1 units of Y greater or less than B. Example: If a shopping center A has 1 square foot greater space than another shopping center, B, it will generate .003 more trips than the other. Time Series: When the independent variable increases by one unit, then dependent variable changes by b1 units of Y. Since we are less confident about point estimates, we give a confidence interval for our regression coefficient. The formula is: bi s.e. b1 (talpha/2, n-k-1) at the 95% confidence level, or alpha = .05 bi .047 (2.306) = -.459 .108 = range of 0.57 and 0.36 Pr[-.57 1 -.36] = .95 or, we are 95% sure that the range will include 1. Step 10) Four Tests for OLS Assumptions and How to Test Them I. Normality: the error term is distributed normally around a mean of zero. If not normal it calls into question 1.

II.

Homoskedasticity: assumes equal variance of error term for every level of independent variable (typically a problem with cross-sectional data).

III.

Non-Auto Regressive: an error term i, associated with one observation, is not associated with error term of the next observation (typically a problem with time series data). You should not be able to see trends, or guess the next error term.

IV.

Random effects: observations on independent variable X are 1) randomly selected, and 2) independent of all other independent variables (for multiple regression)

What to do: plot vs Xi and look at the first three tests with that. Interpretation of the error: If a model predicted X and the actual value was X-5, the model overpredicted the value by 5 units. If is positive (that is, Yi-Yi-hat > 0), model is underestimating, if is negative, model is overestimating. indicates the success (or lack thereof) of your OLS analysis.

Unit 9 Simple Linear Regression: Structure
No ratings yet
Unit 9 Simple Linear Regression: Structure
22 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
6.1 Basics-of-Statistical-Modeling
No ratings yet
6.1 Basics-of-Statistical-Modeling
17 pages
Regression
No ratings yet
Regression
14 pages
Linear Regression - Module 3
No ratings yet
Linear Regression - Module 3
16 pages
Simple Regression Model: Erbil Technology Institute
No ratings yet
Simple Regression Model: Erbil Technology Institute
9 pages
ArunRangrej
No ratings yet
ArunRangrej
5 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Handout 05 Regression and Correlation PDF
No ratings yet
Handout 05 Regression and Correlation PDF
17 pages
Regression Course For Second Year (Chap 1-3)
No ratings yet
Regression Course For Second Year (Chap 1-3)
59 pages
correlation
No ratings yet
correlation
13 pages
STA2100-Regression Analysis
No ratings yet
STA2100-Regression Analysis
15 pages
Lecture Note #8_PEC-CS701E
No ratings yet
Lecture Note #8_PEC-CS701E
20 pages
Student Notes Madule 2
No ratings yet
Student Notes Madule 2
12 pages
Lecture 3.1.9 (REGRESSION)
No ratings yet
Lecture 3.1.9 (REGRESSION)
9 pages
1.5.Linear Regression
No ratings yet
1.5.Linear Regression
5 pages
Chapter 2-Simple Regression Model
No ratings yet
Chapter 2-Simple Regression Model
25 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
39 pages
ML UNIT 4 MATERIAL
No ratings yet
ML UNIT 4 MATERIAL
20 pages
RESEARCH METHODS LESSON 18 - Multiple Regression
No ratings yet
RESEARCH METHODS LESSON 18 - Multiple Regression
6 pages
Tema 0 Econometrics
No ratings yet
Tema 0 Econometrics
6 pages
SPSS Regression Spring 2010
No ratings yet
SPSS Regression Spring 2010
9 pages
Bus 173 - Lecture 5
No ratings yet
Bus 173 - Lecture 5
38 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
21 pages
Simple Linear Regression: Definition of Terms
No ratings yet
Simple Linear Regression: Definition of Terms
13 pages
03 ES Regression Correlation
No ratings yet
03 ES Regression Correlation
14 pages
Regression and Correlation
No ratings yet
Regression and Correlation
13 pages
STATG5 - Simple Linear Regression Using SPSS Module
No ratings yet
STATG5 - Simple Linear Regression Using SPSS Module
16 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Regression
No ratings yet
Regression
25 pages
Unit-3 Data Analysis
No ratings yet
Unit-3 Data Analysis
36 pages
Biostatistics (Correlation and Regression)
100% (1)
Biostatistics (Correlation and Regression)
29 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Unit Regression Analysis: Objectives
No ratings yet
Unit Regression Analysis: Objectives
18 pages
To understand Regression Models using first principles thinking
No ratings yet
To understand Regression Models using first principles thinking
3 pages
Simple Linear Regression: Coefficient of Determination
No ratings yet
Simple Linear Regression: Coefficient of Determination
21 pages
Regression and Correlation
No ratings yet
Regression and Correlation
37 pages
Regression
No ratings yet
Regression
3 pages
Week+12+Presentation
No ratings yet
Week+12+Presentation
99 pages
Chap_2_Econometrics I Jonse (3)
No ratings yet
Chap_2_Econometrics I Jonse (3)
41 pages
Econometrics for finance (2017-I)
No ratings yet
Econometrics for finance (2017-I)
6 pages
Satyam
No ratings yet
Satyam
4 pages
Unit1 - Data Science - SPPU
No ratings yet
Unit1 - Data Science - SPPU
15 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
20 pages
Topic 4 = ETC1000
No ratings yet
Topic 4 = ETC1000
13 pages
Regression Primer
No ratings yet
Regression Primer
4 pages
Iskak, Stats 2
No ratings yet
Iskak, Stats 2
5 pages
Oe Statistics Notes
No ratings yet
Oe Statistics Notes
32 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Unit 2 ML
No ratings yet
Unit 2 ML
201 pages
M1 Stat-701 SLR 2022
No ratings yet
M1 Stat-701 SLR 2022
17 pages
Report Statistical Technique in Decision Making (GROUP BPT) - Correlation & Linear Regression123
No ratings yet
Report Statistical Technique in Decision Making (GROUP BPT) - Correlation & Linear Regression123
20 pages
Correlation
No ratings yet
Correlation
29 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Regression and Correlation (Ch.14))
No ratings yet
Regression and Correlation (Ch.14))
7 pages
Chapter4 Notes
No ratings yet
Chapter4 Notes
18 pages
Chapter 4 ECON NOTES
No ratings yet
Chapter 4 ECON NOTES
8 pages
Explain The Linear Regression Algorithm in Detail
No ratings yet
Explain The Linear Regression Algorithm in Detail
12 pages
LINEAR REGRESSION IN R
No ratings yet
LINEAR REGRESSION IN R
6 pages
Endocrinology Mcqs For DNB
100% (1)
Endocrinology Mcqs For DNB
33 pages
Pathology General Pathology
No ratings yet
Pathology General Pathology
23 pages
13 B
100% (4)
13 B
38 pages
ENT Throat and Esophagus
No ratings yet
ENT Throat and Esophagus
41 pages
Chapter 1 OB 1
No ratings yet
Chapter 1 OB 1
17 pages
Pharmacology Drugs Affecting Blood and Blood Formation
No ratings yet
Pharmacology Drugs Affecting Blood and Blood Formation
5 pages
Six Steps in Regression Analysis by Hasan Nagra Econometrics Sir Atif Notes
No ratings yet
Six Steps in Regression Analysis by Hasan Nagra Econometrics Sir Atif Notes
30 pages
General Knowledge PDF
No ratings yet
General Knowledge PDF
145 pages
Presentation On Communication Means.
No ratings yet
Presentation On Communication Means.
11 pages
Measures of Dispersion PDF
No ratings yet
Measures of Dispersion PDF
19 pages
Adjusted Predictions & Marginal Effects For Multiple Outcome Models & Commands (Including Ologit, Mlogit, Oglm, & Gologit2)
No ratings yet
Adjusted Predictions & Marginal Effects For Multiple Outcome Models & Commands (Including Ologit, Mlogit, Oglm, & Gologit2)
10 pages
14-18 Stat Inferential Non-Parametric
No ratings yet
14-18 Stat Inferential Non-Parametric
77 pages
Probabilty Distributions
No ratings yet
Probabilty Distributions
7 pages
1ST Quarter Exam
No ratings yet
1ST Quarter Exam
4 pages
Lecture 2 - Probability Definition, Joint, Marginal and Conditional Probability
No ratings yet
Lecture 2 - Probability Definition, Joint, Marginal and Conditional Probability
29 pages
HW 1 Stat
No ratings yet
HW 1 Stat
1 page
Lim Xin Yong
No ratings yet
Lim Xin Yong
5 pages
3 XWEgynrp de 5 DPZ 4
No ratings yet
3 XWEgynrp de 5 DPZ 4
14 pages
MPC-006 E 2024-25 (MAPC ) GSPH@9891268050
No ratings yet
MPC-006 E 2024-25 (MAPC ) GSPH@9891268050
25 pages
Introduction to the mathematical and statistical foundations of econometrics Bierens - The ebook with rich content is ready for you to download
100% (2)
Introduction to the mathematical and statistical foundations of econometrics Bierens - The ebook with rich content is ready for you to download
50 pages
COSM - Lesson Plan (CSE)
No ratings yet
COSM - Lesson Plan (CSE)
4 pages
Stat Quiz Ball
No ratings yet
Stat Quiz Ball
85 pages
BDA Assignment (Savi Bilandi)
No ratings yet
BDA Assignment (Savi Bilandi)
10 pages
Measures of Central Tendency and Dispersion/ Variability
No ratings yet
Measures of Central Tendency and Dispersion/ Variability
35 pages
Afghari Et Al. - 2019 - Effects of Globally Obtained Informative Priors On Bayesian Safety Performance Functions Developed For Australia
No ratings yet
Afghari Et Al. - 2019 - Effects of Globally Obtained Informative Priors On Bayesian Safety Performance Functions Developed For Australia
11 pages
Chapter 2 Statistics Review 2023
No ratings yet
Chapter 2 Statistics Review 2023
21 pages
Level of Significance
No ratings yet
Level of Significance
4 pages
Corelation and Reg.-12-27
No ratings yet
Corelation and Reg.-12-27
16 pages
EC203 Tutorial 4
No ratings yet
EC203 Tutorial 4
3 pages
Association
No ratings yet
Association
57 pages
UNIT-2-Normal-Distribution
No ratings yet
UNIT-2-Normal-Distribution
8 pages
Lampiran 1 Perhitungan CV (Coefficient of Variation) : 1. Kategori 12 Periode: A. Abita Satin
No ratings yet
Lampiran 1 Perhitungan CV (Coefficient of Variation) : 1. Kategori 12 Periode: A. Abita Satin
117 pages
Lesson 3: Samples and Sample Size
No ratings yet
Lesson 3: Samples and Sample Size
26 pages
Pengaruh Motivasi Dan Disiplin Kerja Terhadap Kinerja Karyawan PDAM Kota Tomohon
No ratings yet
Pengaruh Motivasi Dan Disiplin Kerja Terhadap Kinerja Karyawan PDAM Kota Tomohon
7 pages
NLP Assignment-9 Solution
100% (1)
NLP Assignment-9 Solution
4 pages
Free Online Course On PLS-SEM Using SmartPLS 3.0 - Introduction
0% (1)
Free Online Course On PLS-SEM Using SmartPLS 3.0 - Introduction
73 pages
Chap 011
No ratings yet
Chap 011
42 pages
Ramesh Ananth Assignment7 ISE500
No ratings yet
Ramesh Ananth Assignment7 ISE500
6 pages
probability-and-statistics-ii
No ratings yet
probability-and-statistics-ii
94 pages