Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
78 views

Lab 5 Instructions

This document provides instructions for completing Laboratory Assignment #5 on epidemiology and biostatistics. It includes tasks to practice and demonstrate: - Calculating single and multiple linear regression using Excel's Data Analysis Toolpak - Creating scatterplots for independent and dependent variables - Developing regression equations from regression analysis results - Applying regression equations to predict outcomes - Creating scatterplots in Excel

Uploaded by

Rachael
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views

Lab 5 Instructions

This document provides instructions for completing Laboratory Assignment #5 on epidemiology and biostatistics. It includes tasks to practice and demonstrate: - Calculating single and multiple linear regression using Excel's Data Analysis Toolpak - Creating scatterplots for independent and dependent variables - Developing regression equations from regression analysis results - Applying regression equations to predict outcomes - Creating scatterplots in Excel

Uploaded by

Rachael
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

HSC-H322 Epidemiology and Biostatistics – Laboratory Assignment #5

Learning Objectives:

• Calculate single and multiple linear regression using the Data Analysis Toolpak in Excel
• Create a scatterplot for independent (x-axis) and dependent (y-axis) variables
• Using the results of single and multiple regression analysis to develop regression equations
• Apply the regression equations to predict an outcome
• Demonstrate the ability to create a scatterplot in Excel.

Practice Tasks:
Simple Linear Regression Analysis
Open the Lab 5 Excel file. You will use the data presented in the Regression tab. You will create
two simple regression analyses in this section.

• To estimate the line that describes the relationship between two variables the Regression tool
in Data Analysis is used. We will look at comparing systolic blood pressure and age.
o Navigate to Data, then Data Analysis.
o Choose Regression and hit OK.
• For the “Input Y Range” highlight the dependent or outcome variable. In this case, it is systolic
blood pressure. Make sure to include the column title when selecting cells.
• For the “Input X Range” highlight the independent variable. In this case, it is age. Make sure to
include the column title when selecting cells.
• Check the Labels box since those were included in the data selection.
• Change the “Output Range” to the same worksheet and hit OK.
o The Summary Output includes data that is not necessary for this lab, but the
coefficients allow for the determination of the regression equation for the best-fit line:

o y = mx + b, where m = slope, x = predictor variable input b = y-intercept, y = estimated


value

o The Y intercept is the coefficient intercept in the Summary Output.


o The slope is the coefficient for age in the Summary Output.
• Write the equation describing the line that best fits the association between age and SBP.
Write the equation below the Summary Output. You can list number to two past the decimal
point.
• Use the regression tool (and instructions listed above) to assess the association between
systolic blood pressure and current smoking status. Use the data generated in the Summary
Output to write the best fit line equation.
HSC-H322 Epidemiology and Biostatistics – Laboratory Assignment #5
Creating a Scatterplot for Two Continuous Variables
Open the Lab 5 Excel file. You will use the data presented in the Scatter Practice tab. These data are
SAT scores for the verbal and math sections from 527 applicant to Wabash College in the mid-1990s.

• In the Charts section of the Insert tab, choose the x, y scatter chart. This looks like a graph
with only dots showing.
• When the box appears, you will need to choose the data used to create the chart. You can do
this two different ways.
o Right click on the box of data and select the Select Data button. OR
o Under the Chart Tool section of the upper navigation bar, select the Design tab, then
click the Select Data button.
• A box will appear for you to choose your data points.
o First we will select the Legend Entries (Series). Under that section choose Add,
highlight Cell A1 in your spreadsheet for the Series name. For the Series X values,
highlight all of the verbal data points. For the Series Y values, highlight all of the math
data points.
▪ Typically, the y series is the dependent variable and the x series is the
independent variable, in this case this does not matter because there is no
clear dependent or independent variable.
o You do not need to do anything with the Horizontal (Category) Axis Labels.
• By clicking Add Chart Element, Add “Math SAT Score” for the x-axis (horizontal axis) title and
“Verbal SAT Score” for the y-axis (vertical axis) title.
• Under Add Chart Element, choose Trendline and choose Linear.
• Under Add Chart Elements, choose Trendline and chose More Trendline Options. Add the
equation and R2 value to the chart.
o Make sure the equation and R2 are moved to a location in the chart that they are
readable and not covered by the data plots.
Creating a Scatterplot using Correlation Data
Open the Lab 5 Excel file. You will use the data presented in the Regression tab.

• Plot the statistically significant association between systolic blood pressure and age.
o Under Insert and Charts, choose the x, y scatter.
o Select data by right clicking the empty chart or using the option in the task bar.
o When the box appears, select Add under “Legend Entries (Series)”.
o For the “Series X Values” choose the age and for the “Series Y Value” choose the
systolic blood pressure data.
▪ Remember that the y series is the dependent variable and the x series is the
independent variable.
o Click OK to create the chart.
• By clicking Add Chart Element, Add “Age” for the x-axis (horizontal axis) title and “Systolic BP”
for the y-axis (vertical axis) title.
• Under Add Chart Element, choose Trendline and choose Linear.
o Under Tendline Options, add the equation and R2 value.
HSC-H322 Epidemiology and Biostatistics – Laboratory Assignment #5
• Compare this data to that generated from the simple liner regression in the section above.
• Be sure to move your equation and R2 value to a location in the chart so it can read easily.
Move the entire chart to a location in the worksheet so the other data generated can be seen.

Multiple Linear Regression Analysis


Open the Lab 5 Excel file. You will use the data presented in the Regression tab. You will create
two multiple regression analyses in this section.

• Multiple linear regression analysis is a technique for estimating the mathematical equation
that best describes the association between a continuous outcome variable (y) and a set of
independent variables, which can be continuous or dichotomous
• The regression equation is as follows:

o y = b + m1x1 + m2x2 + m3x3 ….., where y = the predicted value of the dependent
variable, b = y-intercept, m = estimated regression coefficients, x = independent
variable input

• The regression tool in Excel can be used to analyze multiple independent variables
• Navigate to Data, then Data Analysis, and select Regression.
• Keep the SBP data as the “Input Y Range”, but highlight both age and sex data for the “Input X
Range”. Note: Excel can only accept multiple data ranges for X variables if the columns
containing these data are next to each other. If they were not, you would need to
appropriately position the columns before you run the regression.
• Check the Labels box since those were included in the data selection.
• Change the “Output Range” to the same worksheet and hit OK.
• Write the equation describing the line that best fits the association between SBP, age, and
sex. Note: Excel can only accept multiple data ranges for X variables if the columns containing
these data are next to each other. If they were not, you would need to appropriately position
the columns before you run the regression. Write the equation below the Summary Output.
You can list number to two past the decimal point.
• Now, assess the association between HDL (dependent) and BMI and sex simultaneously
(independent variables) using the instructions listed above. Note: Excel can only accept
multiple data ranges for X variables if the columns containing these data are next to each
other. If they were not, you would need to appropriately position the columns before you run
the regression.
o Determine the equation that best describes the association.
HSC-H322 Epidemiology and Biostatistics – Laboratory Assignment #5

Graded Tasks:
Laboratory 5 Homework

1. Open the Lab 5 Excel file. You will use the data presented in the HW 1 Scatterplot tab. You have been
given data for BMI and systolic blood pressure (SBP) for 12 individuals. No regression analysis is required
for this set of data.
o Generate a Scatterplot for the data in HW Data 2 tab. (0.2 point)
o Label the axes and graph appropriately. (0.2 point)
o Add a linear trendline to the graph. (0.2 point)
o Add the equation and R2 value to the chart, making sure they are located where they can be read. (0.4
point)

Open the Lab 5 Excel file. You will use the data presented in the HW 2 Regression Tool tab. Make
sure all data generated is located in the workbook so it can be read easily.

2. Using the data presented, run the regression analysis and determine the simple linear regression
equation relating number of cups of coffee per week to grade point average (GPA). Consider GPA
to be the dependent or outcome variable. (1.5 points)
3. Create a Scatterplot of the cups of coffee and GPA data. Be sure to appropriately label the axes.
(0.5 point)
4. Using the regression tool determine the simple linear regression equation relating female gender
to GPA. (1.5 points)
5. Determine the multiple linear regression equation relating number of cups of coffee per week,
female sex, and number of hours of exercise per week, considered simultaneously, to GPA
(outcome or dependent variable). Please remember the positioning notes from your practice
tasks. (3.0 points)
6. Using the regression tool determine the simple linear regression equation relating hours of
exercise per week to average number of drinks per week. Consider the average number of drinks
per week to be the outcome or dependent variable. (1.5 points)
7. Create a Scatterplot and be sure to appropriately label the axes. (0.5 point)

Please label your work appropriately and do not include unnecessary work in the graded portion
of the assignment. (0.5 point)

Save your file and submit to Canvas using the file upload function for Lab 5. All laboratory
homework assignments are to be submitted on Canvas. It is your responsibility to make sure that
the laboratory assignments are submitted online correctly. The document should be visible after
you upload your submission. Double check your files after submission.

You might also like