Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

2023_Fall_DAP_COMM215

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

COMM 215

2023 Fall Data Analysis Project (DAP)

Group Members
Nouran Ismail, 402443244
Tiffany Chee, 40249188
Kristen Podwalski, 40229066

1. Calculate the mean, median, and mode for each indicator

Indicator 1: mean=4.42, median=4.38, (mode=4.73) check


Indicator 2: mean=6.53, median=6.58, mode=6.67
Indicator 3: mean=4.57, median=4.57, mode=3.71
Indicator 4: mean=4.43, median=4.46, mode=6.25
Indicator 5: mean=1.67, median=1.73, mode=1.78
Indicator 6: mean=5.60, median=5.77, mode=5.63

2. Calculate the Interquartile Range (IQR) for each indicator

Indicator 1: IQR= 2.05


Indicator 2: IQR= 1.44
Indicator 3: IQR= 1.9
Indicator 4: IQR= 3.26
Indicator 5: IQR= 0.49
Indicator 6: IQR= 2.81
Indicator 7: IQR= 2.0

3. Calculate the variance for each indicator and comment on the dispersion and
outliers of each indicator.

Indicator 1: variance= 2.410748


 above average variation from the mean- normal to wider spread, some outliers
Indicator 2: variance= 1.113176
 scores do not vary much from the mean- narrow spread, very little outliers
Indicator 3: variance= 1.63997
 scores vary a little from the mean- narrow to normal spread, little outliers
Indicator 4: variance= 4.291126
 scores vary greatly from the mean- wide spread, many outliers
Indicator 5: variance= 0.229172
 scores do not vary almost at all from the mean- narrow spread, almost no outliers
Indicator 6: variance= 3.953107
 scores vary quite greatly from the mean- wide spread, many outliers
Indicator 7: variance= 1.88943
 average variation from the mean- normal spread, some outliers

4. Construct a histogram for each indicator using the number of classes and class
length explained in Chapter 2.
5. Determine for each indicator whether it is skewed to the right, skewed to the left, or
not skewed.
Indicator 1: not skewed
Indicator 2: skewed to the right
Indicator 3: not skewed
Indicator 4: not skewed
Indicator 5: skewed to the left
Indicator 6: skewed to the right
Indicator 7: skewed to the right
6. For the Blue Team, calculate the expected probability of someone being in Category
A.
7 in category A
40 in all categories
7/40=0.175
7. For the Blue Team, given that someone is female, what is the probability that she
belongs to Category C?
The probability that someone is in category C given they are female is equal to the
probability they are in category C and they are female divided by the probability they are
female.
7 in category C
18 female
7/18=0.3889

8. For the Red Team, calculate the expected probability of someone being in Category
D.

10 individuals in Category D on the red team


Total of 40 individuals on the red team
10/40 = 0.25

9. For the Red Team, given that someone is male, what is the probability that he
belongs to Category B?
4 Males in category B
22 males
4/22= 0.182

10. Construct a pie chart to illustrate the proportion of each category within the Red
Team.
11. Construct a pie chart to illustrate the proportion of each category within the Blue
Team.
12. Joylandia advocates that the number of people in each of the four categories is
equal. Test this claim based on the Red Team and Blue Team samples at a 1% level
of significance.

Observed:
Red Team Blue Team Total

A 13 7 20

B 8 11 19

C 9 11 20

D 10 11 21

Tota 40 40 80
l

Expected:
Red Team Blue Team Total

A 10 10 20

B 9.5 9.5 19

C 10 10 20

D 10.5 10.5 21

Tota 40 40 80
l

H0: the distribution of people across all four categories = for the red team and the blue team
HA: the distribution across all four categories ≠ for the red team and the blue team

Chi-Square Test:
X² = (13-10)²/10 + (8-9.5)²/9.5 + (9-10)²/10 + (10-10.5)²/10.5 + (7-10)²/10 + (11-9.5)²/9.5 + (11-
10)²/10 + (11-10.5)²/10.5 = 2.521303256

Level of significance = 0.01


Degrees of Freedom = 3
P-Value = 0.5-0.1 therefore P > 0.01→ do not reject H0

13. Using the original data, test the overall significance of this multiple regression
model. The independent variables are the seven indicators and the dependent
variable is the performance index. Use a significance level of α=.01.
SUMMARY OUTPUT

Regression Statistics
0.9997520
Multiple R 7

R Square 0.9995042
Adjusted R
Square 0.9995007
Standard 0.0166878
Error 1

Observations 1000

ANOVA
Significance
df SS MS F F
79.558969
Regression 7 556.912787 5 285687.043 0
0.0002784
Residual 992 0.27625508 8

Total 999 557.189042

Standard Lower Upper


Coefficients Error t Stat P-value Lower 95% Upper 95% 95.0% 95.0%
0.0520210 0.0636589
Intercept 6 0.00593055 8.7717114 7.5484E-18 0.0403832 2 0.0403832 0.06365892
0.0968927 202.11047 0.0978334
Indicator 1 2 0.0004794 5 0 0.09595196 8 0.09595196 0.09783348
0.1509830 225.54053 0.1522967
Indicator 2 5 0.00066943 4 0 0.1496694 1 0.1496694 0.15229671

SUMMARY OUTPUT

Regression Statistics
0.9997520
Multiple R 7

R Square 0.9995042
Adjusted R
Square 0.9995007
Standard 0.0166878
Error 1

Observations 1000

ANOVA
Significance
df SS MS F F
79.558969
Regression 7 556.912787 5 285687.043 0
0.0002784
Residual 992 0.27625508 8

Total 999 557.189042

Standard Lower Upper


Coefficients Error t Stat P-value Lower 95% Upper 95% 95.0% 95.0%
0.0520210 0.0636589
Intercept 6 0.00593055 8.7717114 7.5484E-18 0.0403832 2 0.0403832 0.06365892
0.0968927 202.11047 0.0978334
Indicator 1 2 0.0004794 5 0 0.09595196 8 0.09595196 0.09783348
0.1509830 225.54053 0.1522967
Indicator 2 5 0.00066943 4 0 0.1496694 1 0.1496694 0.15229671
0.1492226 262.98711 0.1503361
Indicator 3 7 0.00056741 6 0 0.1481092 4 0.1481092 0.15033614
0.1481392 361.88443 0.1489425
Indicator 4 9 0.00040936 9 0 0.14733599 9 0.14733599 0.14894259
0.1495389 130.14923 0.1517936
Indicator 5 1 0.00114898 1 0 0.1472842 2 0.1472842 0.15179362
359.72543 0.1503647
Indicator 6 0.1495489 0.00041573 1 0 0.14873309 2 0.14873309 0.15036472
0.1394718 193.36236 0.1408872
Indicator 7 3 0.0007213 9 0 0.13805638 7 0.13805638 0.14088727

H0: the overall model is not significant.


HA: the overall model is significant
P < 0.01 therefore we reject H0

14. Joylandia calculates the performance index using the indicators as building blocks.
However, during the games, the performance index was computed based on other
criteria. Test whether the multiple regression model is still significant for both the
Red and Blue teams at a 1% level of significance. Note: The independent variables
remain the seven indicators, but the dependent variable is now the performance
index during the games.
BLUE:

SUMMARY OUTPUT

Regression Statistics
Multiple R 1
R Square 1
Adjusted R
Square 1
Standard
Error 2.153E-16
Observations 40

ANOVA
Significanc
df SS MS F eF
Regression 1 35.8129499 35.8129499 7.726E+32 0
Residual 38 1.7615E-30 4.6354E-32
Total 39 35.8129499

Coefficient Standard Upper Lower Upper


s Error t Stat P-value Lower 95% 95% 95.0% 95.0%
5.5669E- 2.1424E- 1.4103E- 2.1424E-
Intercept 1.7764E-15 1.808E-16 9.8248062 12 1.4103E-15 15 15 15
X Variable 1 1 3.5977E-17 2.7796E+16 0 1 1 1 1

H0: the overall model is not significant


HA: the overall model is significant

p-value is above 0.01therefore, the regression model is not significant for predicting the
performance index during games for Blue teams at a 1% significance level.

RED:

SUMMARY OUTPUT

Regression Statistics
0.4047825
Multiple R 2
0.1638488
R Square 9
Adjusted R -
Square 0.0190592
Standard 0.9186405
Error 4
Observation
s 40

ANOVA
Significanc
df SS MS F eF
5.2917573 0.7559653 0.8957991 0.5215157
Regression 7 7 4 9 1
27.004814 0.8439004
Residual 32 3 5
32.296571
Total 39 7

Coefficient Standard Lower Upper Lower Upper


s Error t Stat P-value 95% 95% 95.0% 95.0%
5.1794882 1.5274730 3.3908867 0.0018678 2.0681274 8.2908490 2.0681274 8.2908490
Intercept 2 6 8 6 1 3 1 3
- - - 0.2438910 - 0.2438910
X Variable 1 0.0405612 0.1396473 0.2904543 0.7733428 0.3250134 9 0.3250134 9
0.0417476 0.2594172 0.1609285 0.8731613 0.5701632 0.5701632
X Variable 2 4 4 7 5 -0.486668 6 -0.486668 6
- 0.4709600 - -
X Variable 3 0.1232506 0.1689359 -0.72957 6 0.4673617 0.2208606 0.4673617 0.2208606
- 0.1223506 - 0.4306467 - 0.1515615 - 0.1515615
X Variable 4 0.0976585 1 0.7981856 9 0.3468785 5 0.3468785 5
- 0.3976561 - 0.2950055 - 0.3866374 - 0.3866374
X Variable 5 0.4233616 6 1.0646425 8 1.2333607 6 1.2333607 6
- 0.1385423 - 0.2725549 - 0.1275298 - 0.1275298
X Variable 6 0.1546716 3 1.1164214 8 0.4368731 7 0.4368731 7
0.3397379 0.2304030 1.4745374 0.1501081 - 0.8090536 - 0.8090536
X Variable 7 4 6 1 8 0.1295777 2 0.1295777 2

p-value is above 0.01therefore, the regression model is not significant for predicting the
performance index during games for Blue teams at a 1% significance level.

15. Reflect on the limitations of this project. If you were to expand this project, what
improvements would you make; what other questions would you ask; would you request
more data; and what other comments would you provide? Justify your answers.
This project has a good variety of subjects that are helpful for the final exam. In order to extract
valuable and applicable insights, expanding the project would necessitate more thorough data, an
improved model, and a greater comprehension of the context. Enhanced project outcomes could
be achieved by professional collaboration, ongoing iteration, and a comprehensive approach to
data collecting and analysis.

You might also like