Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Sas Baseball PROJECT

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

PROJECT: Baseball Player Performance

The Baseball dataset contains details of baseball players in the year 1986. The data also has
parameters depicting performance of the players and their career records.

Do the following using SAS:

a) Import the data in SAS.

Solution: proc import datafile="/folders/myfolders/baseball.xlsx"

out=work.baseball

DBMS=xlsx

replace;

run;

proc print data=work.baseball;

run;

b) Generate Descriptive Statistics of the entire data.

Solution: proc means data=work.baseball;

run;
c) Generate a list of the top 5 Home Run Players.

Solution: proc sort data=work.baseball

out=baseball_data;

by descending nHome;

run;

data top_5H;

set baseball_data (obs=5);

run;

Title "Top 5 Home Run Scorer";

proc print data=top_5H;

run;

d) Generate a list of the top 5 paid Players.


Solution: proc sort data=work.baseball

out=baseball2;

by descending Salary;

run;

data Top_paid;

set baseball2 (obs=5);

run;

title "Top 5 paid Player";

proc print data=top_paid;

run;

e) Find the impact of Home Runs on Salary using Linear Regression.

Solution: proc reg data=work.baseball;

Model Salary=nHome;

output out= Predicted predicted=Pred_Salary;

title "Regression analysis(Salary~nHome)";

run;
f) Add more explanatory variables nAtBat, nHits, nHome, nRuns, nRB, nBB, NBB, nOuts, nError.

Solution: proc reg data=work.baseball;

Model Salary=nHome nAtBat nHits nRuns nRBI nBB nOuts nError;

output out=Pred_Salary residual=resid Predicted=Pred;

title "Regression analysis 2";

run;
g) Identify from the results, which factors have high impact on Salary in comparison to Home
Runs.

Solution: From the above results we can see that nHits, Nbb, nOuts,nAtBat are significant
factors that have impact on salary as p value for thaem is less than 0.05 While p-value for
nHome is 0.7838 (>0.05). So nHome is insignificant and does not impact the Salary.Also For
Factors like nRuns ,Nrbi and nError p-value >0.05 So these factors are also insignificant. So
nHits, Nbb, nOuts,nAtBat have high impact on Salary as compared to nHome.

h) Calculate performance scores (ps) by applying the following formula:


ps= 3*nHome + 0.5*nHits + 1*nRuns +1* nAtBat - 1*nRBI + 0.3*nBB + 2*nOuts - 1*nError

Solution: data Performance_score;

set work.baseball;

Do ps=3*nHome + 0.5*nHits + 1*nRuns +1* nAtBat - 1*nRBI + 0.3*nBB +


2*nOuts - 1*nError;

end;

run;

proc print data=Performance_score;

run;
i) Calculate the impact of Performance Scores (ps) on Salary.

Solution: proc reg data=performance_score;

model Salary=ps;

output out=performance_score Predicted=Pred;

run;
j) Explain the results.

Solution: From the above results we can see that although ps is significant as p-value for ps
(<0.0001) is less than 0.05 but adjusted R-square value is 0.1573 i.e. adjusted R-square <0.7 so
the regression model is insignificant this implies that salary is correlated with ps but ps does
not explain much of variability in salary.

You might also like