Correlation and Regression
Correlation and Regression
Outline
Regression
Simple Linear Regression Using the TI-83 Model/Formulas
Outline continued
Applications
Real-life Applications Practice Problems
Internet Resources
Applets Data Sources
Correlation
Correlation
Specific Example
For seven random summer days, a person recorded the temperature and their water consumption, during a three-hour period spent outside.
Temperature (F)
Water Consumption (ounces)
75 83 85 85 92 97 99
16 20 25 27 32 48 48
Direction of Association
Positive Correlation Negative Correlation
Interpretation
perfect positive linear relationship no linear relationship
-1
Interpretation
strong association moderate association weak association
Formula
= the sum n = number of paired items xi = input variable x = x-bar = mean of xs sx= standard deviation of xs
Regression
Regression
Specific statistical methods for finding the line of best fit for one response (dependent) numerical variable based on one or more explanatory (independent) variables.
Regression
Includes using statistical methods to assess the "goodness of fit" of the model. (ex. Correlation Coefficient)
for one response (dependent) numerical variable based on one explanatory (independent) variable.
GOAL minimize the sum of the square of the errors of the data points.
Example
Draw a scatterplot of the data. Visually, consider the strength of the linear relationship.
Draw a scatterplot of the data. Visually, consider the strength of the linear relationship. If the relationship appears relatively strong, find the correlation coefficient as a numerical verification.
Draw a scatterplot of the data. Visually, consider the strength of the linear relationship. If the relationship appears relatively strong, find the correlation coefficient as a numerical verification. If the correlation is still relatively strong, then find the simple linear regression line.
Preliminary Step
Example
Temperature (F) Water Consumption (ounces)
75 83 85 85 92 97 99
16 20 25 27 32 48 48
Press STAT. Under EDIT, select 1: Edit. Enter x-values (input) into L1 Enter y-values (output) into L2. After data is entered in the lists, go to 2nd MODE to quit and return to the home screen. Note: If you need to clear out a list, for example list 1, place the cursor on L1 then hit CLEAR and ENTER .
Press 2nd Y= (STAT PLOTS). Select 1: PLOT 1 and hit ENTER. Use the arrow keys to move the cursor down to On and hit ENTER. Arrow down to Type: and select the first graph under Type. Under Xlist: Enter L1. Under Ylist: Enter L2. Under Mark: select any of these.
Press 2nd MODE to quit and return to the home screen. To plot the points, press ZOOM and select 9: ZoomStat. The scatterplot will then be graphed.
Press STAT. Press CALC. Select 4: LinReg(ax + b). Press 2nd 1 (for List 1) Press the comma key, Press 2nd 2 (for List 2) Press ENTER.
Write down the equation of the line in slope intercept form. Press Y= and enter the equation under Y1. (Clear all other equations.) Press GRAPH and the line will be graphed through the data points.
Questions ???
Interpretation in Context
Regression Equation:
y=1.5*x - 96.9
Water Consumption = 1.5*Temperature - 96.9
Interpretation in Context
for each 1 degree F increase in temperature, you expect an increase of 1.5 ounces of water drank.
Interpretation in Context
y-intercept = -96.9
For this example, when the temperature is 0 degrees F, then a person would drink about -97 ounces of water. That does not make any sense! Our model is not applicable for x=0.
Prediction Example
Predict the amount of water a person would drink when the temperature is 95 degrees F.
Solution: Substitute the value of x=95 (degrees F) into the regression equation and solve for y (water consumption). If x=95, y=1.5*95 - 96.9 = 45.6 ounces.
2 r
Interpretation of
2 r
Questions ???
There are mathematical assumptions behind the concepts that we are covering today.
Formulas
Prediction Equation:
Nonlinear Application
Predicting when Solar Maximum Will Occur
http://science.msfc.nasa.gov/ssl/pad/ solar/predict.htm
Practice Problems
Measure Height vs. Arm Span Find line of best fit for height. Predict height for one student not in data set. Check predictability of model.
Practice Problems
Practice Problems
Can the number of points scored in a basketball game be predicted by The time a player plays in the game?
Idea modified from Steven King, Aiken, SC. NCTM presentation 1997.)
Resources
Curriculum and Evaluation Standards for School Mathematics. Addenda Series, Grades 9-12. NCTM. 1992.
Internet Resources
Correlation
Guessing Correlations - An interactive site that allows you to try to match correlation coefficients to scatterplots. University of Illinois, Urbanna Champaign Statistics Program. http://www.stat.uiuc.edu/~stat100/j ava/guess/GCApplet.html
Internet Resources
Regression
Internet Resources
Regression
Estimate the Regression Line. Compare the mean square error from different regression lines. Can you find the minimum mean square error? Rice University Virtual Statistics Lab. http://www.ruf.rice.edu/~lane/stat_si m/reg_by_eye/index.html
FEDSTATS. "The gateway to statistics from over 100 U.S. Federal agencies" http://www.fedstats.gov/ "Kid's Pages." (not all related to statistics) http://www.fedstats.gov/kids.html
Internet Resources
Other Statistics Applets. Using Web Applets to Assist in Statistics Instruction. Robin Lock, St. Lawrence University. http://it.stlawu.edu/~rlock/maa99/
Internet Resources
Other
Ten Websites Every Statistics Instructor Should Bookmark. Robin Lock, St. Lawrence University. http://it.stlawu.edu/~rlock/10site s.html