Assignment 2
Assignment 2
No error of commission as the means of all the variables are close to the scale points.
2. Missing Values
Respondent IDs 288 and 303 contain missing values in all the variables.
Thus, these two responses are removed.
Other missing values which are replaced by the mean whole values:
3,10,12,28,31,172,184,275
3. Outliers:
Removal of respondent cases which fall in extreme outliers:
56 unique cases removed due to extreme outlying property
Remaining cases: 327/384
4. Skewness and Kurtosis:
The likert scale data will be skewed or kurtotic and thus, removing the variables which
have high variability will make no sense. If the data was continuous, then we would
have considered the standard error x 3 criteria to remove the variables.
5. Normality:
Shapiro-Wilk test tells us whether our data represents a normal distribution or not. As
we can see that all the significance values are less than 0.05, thus, all our data is a
normal distribution (with 95% confidence level). Though, it makes no sense to prove
normality for likert scale data as the data can be skewed and still we will process
without analysis.
6. Correlation:
Checking multicollinearity in the variables if the Karl Pearson value >0.9, then we will
remove those variables. On seeing the correlation matrix, we found that there are no
variables with very high correlation values, thus, no multicollinearity.
KMO value 0.923, which means that the data and the sample size is good enough for
the test to proceed. Bartlett’s Test is significant and thus proves that the correlation
matrix is not an identity matrix.
All the variables have something related to each other, thus, the communalities
among them are >0.4.
Total variances explained by top 7 components is approximately 69%.
Rotations converged after 6 iterations, and we got variables clubbed into the 7 factors
or components. These 7 constructs are the same as defined earlier. Two variables
exhibited cross-loadings which are not acceptable as the factor analysis checks the
uni-dimentionality of the data.
Thus, removing InfoAcq_4 and InforAcq_5 will be a good step, and run factor
analysis again. After removing those two variables, and running factor analysis again,
the following were observed:
KMO and Bartlett’s value remain significantly good to proceed with the test.
Factor 1: Useful
Factor 2: Joy
Factor 3: Decision Quality
Factor 4: Playful
Factor 5: Usage Type
Factor 6: Competency
Factor 7: Acquired Information
8. Cronbach’s Alpha:
Factor 1: Useful
Factor 2: Joy
Factor 4: Playful
Factor 6: Competency
Factor 7: Acquired Information
2. Discriminant Analysis
Experience as an dependent variable by recoding it into a median split variable. New variable
created is Exp_Split. The median value for Experience (through descriptives and frequencies)
is 3. Thus, Experience values less than and equal to 3 are termed as ‘Low’ with value 0, and
Experience values greater than 3 are termed as ‘High’ with value 1.
The variables Education, Experience, Playful, Competence, Type Usage and Information
Acquired are significant at 95% confidence level (sig value < 0.05). Wilk’s lambda tells us the
relative importance among the independent variables. Smaller the Wilk’s lambda values,
more the importance. Thus, importance:
Experience > Comp_Mean > Type_Mean > Playful_Mean > Education > others
Pearson's correlation between the discriminant scores and the two groups is high (0.8)
36% of variances are not explained by the differences in the two groups (High and Low Exp),
thus, there is a greater discriminatory ability of the function.
The associated chi-square statistic tests the hypothesis that the means of the functions listed
are equal across groups. The small significance value indicates that the discriminant function
does better than chance at separating the groups High and Low Exp.
Though the cluster size is significantly varied, but when we perform K-Means Cluster
Analysis with 3 or 5 clusters, the cluster size vary drastically.
Cluster sizes are considerably different. Thus, decreasing the cluster input in the test.
When the clusters were two, the ratio of sizes were relatively less than the other tests we did
earlier.
Conclusion: 2 Clusters
Predictor importance of frequency is the highest. We can see what happens if we remove it.
Removing frequency brings us to 3 clusters.
After removing gender too, we are left with 2 clusters.
After this, all the TwoStep Cluster analysis were giving significantly varied cluster sizes.
Thus, we stop here and conclude that only 2 clusters can be formed.
Structural Equation Modelling
Looking at the estimates, we see that all the observed variables significantly measure the
unobserved variables with confidence greater than 99.9% (CR value > 2.54). The relative
importance of each measure for the construct can be seen through Estimate value or from
the standardized regression weights.
Estimate
DecQual <--> InfoAcq .652
DecQual <--> Joy .419
DecQual <--> Useful .553
DecQual <--> Usage .186
DecQual <--> Competency .166
DecQual <--> Playful .291
Joy <--> InfoAcq .501
Useful <--> InfoAcq .576
Usage <--> InfoAcq .188
Competency <--> InfoAcq .290
Playful <--> InfoAcq .373
Useful <--> Joy .450
Usage <--> Joy .240
Competency <--> Joy .322
Playful <--> Joy .509
Usage <--> Useful .210
Competency <--> Useful .255
Playful <--> Useful .322
Competency <--> Usage .426
Playful <--> Usage .414
Playful <--> Competency .437
A snippet of Standardized total effect tables shows us the exact same result that we got from
our Exploratory Factor Analysis. Thereby confirming our constructs and measures.
RMSEA
RMSEA value need to be near 0 (strict cut-off: 0.05) which tells us the badness of fit.
Thus, through the model fit tests, we come to know that our model is the best fit model.