Educ 707 Portfolio
Educ 707 Portfolio
Educ 707 Portfolio
Jessica Buckle
WINTER 2020
CALIFORNIA STATE UNIVERSITY, SAN BERNARDINO
1
Contents
Assignment 1, Terminology and Statistical Tests.......................................................................................... 2
Assignment 2, Terminology and Central Tendency ...................................................................................... 8
Assignment 3, T-Tests ................................................................................................................................. 19
Assignment 4, Terminology, Research Questions, Hypothesis Testing and Assumptions of Parametric
Statistics ...................................................................................................................................................... 32
Assignment 5, Terminology, Significance, Type I and II Errors, Confidence Intervals, Effect Size, Power, Z-
Test.............................................................................................................................................................. 41
Assignment 6, Terminology, ANOVA, F-Statistic......................................................................................... 47
Assignment 7, Terminology, Correlations, Reliability, Validity, and Generalizability ................................. 62
Assignment 8, Terminology, Correlations, and Linear Regression ............................................................. 87
Assignment 9, Terminology, Nonparametric Statistics, and Chi-Square .................................................. 103
2
Jessica Buckle
EDUC 707 W20
EDUC 707
3. Inferential Statistics – Inferential statistics are generally (but not always) the next step after
collecting and summarizing data. It is used to make inferences based on s smaller group of data,
usually called a sample (portion/subset) of a population.
4. What are some examples of inferential data?
Examples of inferential data such as making generalizations about all fifth
graders in the state of New York after using a sample of data collected from 150
students or making generalizations about women who shop at Target after using a
sample of data collected from 200 women who have shopped at Target.
5. Sample – A sample is a portion or subset of a population.
6. What is an example of a sample?
Examples of a sample would be a group of students at a school, a group of
workers at a business or in an occupational field, or a group of shoppers that
frequent a store or buy a certain product.
and 52 the median would be 76. If you have an even number of values then the median is the
mean of the two middle values. For example, with the values 51, 43, 34, 32, 27, and 12 the
median is 33.
12. Mode – The mode is the value that occurs most frequently. For example if there is a multiple
choice question and 57 people pick A, 20 people pick B, 12 people pick C, and 11 people pick D
the modal response is A since it was the answer that was selected the most.
13. What do parametric statistics assume? Parametric statistics assume the variance of each
group are similar and that the sample is large enough to represent the population- it takes a
sample size of about 30 (according to the book) to fulfill this assumption. We have a normal
distribution that results in a bell curve.
14. List 3 examples of parametric tests:
Examples of parametric tests are the Paired t-test, Unpaired t-test, and Pearson
correlation
15. What do nonparametric statistics NOT assume? Nonparametric statistics do not follow
require the same, possibly restrictive assumptions as the parametric test but it does NOT assume
there are no assumptions.
16. List 3 versions of nonparametic tests that align with the parametric tests listed
for number 14:
The nonparametric tests that align with the above parametric tests from #14 are
the Wilcoxon Rank sum Test, the Mann-Whitney U test, and the Spearman correlation
17. What are Nominal, Ordinal, Interval, and Ratio data? Please provide a definition and
example of each.
Nominal – The nominal level of measurement is defined by the characteristics of an
outcome that fit into one and only one class or category. They can be the least precise level of
measurement. Examples are gender (female and male), ethnicity (Caucasian or African
American), or political affiliation (Democrat, Republican, or Independent)
Ordinal – The ordinal level of measurement examines things that are ordered. Examples
would be rank of candidates for a job or a rank of students based on gpa.
Interval – The interval level of measurement is a test or assessment tool based on some
underlying continuum wherein we can talk about how much more something is or less than
something is (versus just knowing the order or things). The intervals or points along the scale are
equal to one another. An example would be a spelling test where you get 10 answers correct. 10
is twice as many as only getting 5 correct and it is 2 more than only getting 8 correct.
Ratio – The ratio level of measurement is characterized by the presence of an absolute
zero on the scale (meaning there could be an absence of the trait being measured. Examples
include age, height, weight, and years of education.
18. Z-test –
4
What is a Z-test used for? The Z-test is used to determine whether two population means are
different when the variances are known and the sample size is large (greater than 30).
What are the assumptions of a Z-test?
The assumptions of the Z-test are:
• Interval or ratio scale of measurement (approximately interval)
• Random sampling from a defined population
• Characteristic is normally distributed in the population
Paired (dependent) samples t-test compares two different means from the same group.
One sample t-test tests the mean from one group against a known mean.
What are the assumptions of a T-test?
Data is on a continuous or ordinal scale.
Random sample
Normal distribution
Large enough sample to achieve a bell-shaped curve (i.e. normal distribution)
Homogeneity of variance (i.e. equal variance, the standard deviation from the samples are
approximately equal.)
What is the Null Hypothesis (H0) for a T-test?
One-tail t-test:
H0: µ ≤ a known mean (i.e. sample mean is less than or equal to a known mean)
H0: µ ≥ a known mean (i.e. sample mean is greater than or equal to a known mean)
Two-tail t-test:
H0: µ1 = µ2 (i.e. sample 1 mean is equal to sample two mean)
What is the Alternative Hypothesis (Ha) for a T-test?
One-tail t-test:
Ha: µ > a known mean (i.e. sample mean is greater than a known mean)
Ha: µ < a known mean (i.e. sample mean is less than a known mean)
Two-tail t-test:
Ha: µ1 ≠ µ2 (i.e. sample 1 mean is not equal to sample two mean)
What is an example of a basic question that could be answered using a T-test?
RQ: Did students in Dr. Hughes’ statistics class perform better than the students in the
same course previously?
This is known as a one-sided, directional, or one-tailed t-test. The words “performed
better” indicate a direction.
H0: µ ≥ a known mean
Ha: µ < a known mean
RQ: Did students in Dr. Hughes’ statistics class perform the same as the students in same
course previously?
6
Ha: µ1 > µ2 (One of the therapies is significantly better than the others)
21. Chi-Square –
What is a Chi-Square Test used for? The Chi-Square test is used to compare categorical
variables.
What are the two primary types of a Chi-Square Test?
1. Goodness of fit test, which determines if a sample matches the population
2. A chi-square fit test for two independent variables in a contingency table to check if the
data fits
a. A Small chi-square value means the data fits
b. A high chi-square value means that data does not fit
What is an example of a basic question that could be answered using a Chi-Square Test?
RQ: Does gender affect the preference of the type of vacation?
8
Jessica Buckle
Winter 2020
EDUC 707
4. Provide at least two examples of percentile points from the data in questions 1 and 2.
If we are given the data set of 9, 8, 6, 5, 3, and 2 we know that 5.5 (the median) is the 50th
percentile (also known as Q2) - below this point is where 50% of the distribution of cases falls.
From there, if we divide our data set into quarters it will give us further percentile points- a score
of 3 would mark the 25th percentile (or Q1) and a score of 8 would mark the 75th percentile (or
Q3).
5. Explain quartiles?
Quartiles are similar to percentile points in that they divide the data into four quarters and
give us percentile points.
10
In this instance there are two modes because 6 and 3 both occur three times, which is the
most number of times. So in this data set the mode is 3 and 6.
9. What is skew?
A skew is the quality of a distribution that defines the disproportionate frequency of
certain scores. A longer right tail than left corresponds to a smaller number of occurrences at the
high end of the distribution (which would be a positive skew). A shorter right tail than left
corresponds to a larger number of occurrences at the high end of the distribution (which would
be a negative skew).
12
The range is the most general measure of variability. It gives you an idea of how far apart
scores are from one another. It is computed by subtracting the lowest score in a distribution from
the highest score in the distribution. The formula is r= h-l so with the data set above we would
take 9 (which is the highest score) and subtract 2 (which is the lowest score) to get a range of 7
(9-2=7).
14. What is the standard deviation?
The Standard Deviation is the average distance for each score from the mean or how
concentrated the data is around the mean (a smaller SD indicated more concentration around the
mean). For example, if the mean is 90 points and the standard deviation of students test scores
out of 100 points is 1 point, 68% of students test scores are within 1 point of 90 points, and 95%
of students test scores are within 2 points of 90 points.
14
15. Calculate the standard deviation of this data 9, 8, 6, 5, 3, and 2 (show all work):
Step 1: Work out the mean- 9+8+6+5+3+2=33, 33/6=5.5 so the mean is 5.5
15
Step 2: For each number subtract the mean and square the result- (9-5.5=3.5^2=12.25), (8-
5.52.5^2=6.25), (6-5.5=.5^2=.25), (5-5.5=-.5^2=.25), (3-5.5=-2.5^2=6.25), (2-5.5=-3.5^2=12.25)
Step 3: work out the mean of those squared differences- (12.25+6.25+.25+.25+6.25+12.25-37.5)
divided by n-1(6-1=5), 37.5/5=7.5
Step 4: Take the square root- square root of 7.5 = 2.73861 so that is the Standard Deviation
16. What is the mean deviation?
The mean deviation is how far, on average, all values are from the middle.
17. What is the mean deviation of this data -9, 8, 6, -5, 3, and -2 (show all work and
explain):
19. What are the similarities and differences between variance and standard deviation?
Standard Deviation and Variance are similar in that they are both measures of variability,
dispersion, and spread. The formulas used to compute them are also similar. They are different in
several ways- the most important of which is that standard deviation is stated in the original units
from which it was derived (because we take the square root) whereas in variance the units stated
are squared.
17
20. What is the variance of this data 9, 8, 6, 5, 3, and 2 (show all work):
18
Jessica Buckle
EDUC 707
Meeting 3 HW
Assignment 3, T-Tests
Research Question 1
Section 1: Research Questions: In this section you will write 5 research questions based on the data
provided in Blackboard
Do male professors make the same salary as female professors at the same university?
Section 2: Null and Alternative Hypotheses: In this section you will write Null and Alternative
Hypotheses for T-tests based on your research questions and the data provided in Blackboard
Null Hypothesis: Male professors’ salaries ARE equal to female professors’ salaries.
Alternative Hypothesis: Male professors’ salaries are NOT equal to female professors’ salaries
Section 3: Check your assumptions and present evidence to suggest your check of the assumptions
(Note: Assumption videos in presentation)
Test for Normality: For this research question, the assumption is that the null hypothesis
distributions are normal. According to the Shapiro-Wik test of normality, which provides us with p-
values of .805 and .847 (Table 1), we can assume the distributions are normal because both values are
above .05.
Table 1
20
Homogeneity of Variance: For this research question, the assumption is that variances are
equal. According to the Shapiro-Wik test for homogeneity of variance, we are provided with a p-value
of .831 (Table 2) between groups, which is greater than .05, so we accept the null hypothesis that the
Table 2
Because both the test for normality and the test for homogeneity of variance confirm our assumptions,
we can run the t-test.
Section 4: Run the t-tests (even if the assumptions are not upheld) to answer your hypotheses Because
both the test for normality and the test for homogeneity of variance confirm our assumptions,
we can run the t-test.
Section 5: Present the results of the t-tests using text and tables
21
T-Test: After running the t-test, we are provided with the p-value of .213 (Table 3).
Because this is above .05 we can accept the null hypothesis that the salaries of male and female
professors are equal.
Table 3
Research Question 2
Section 1: Research Questions: In this section you will write 5 research questions based on the data
provided in Blackboard
Is the percentage of Democratic and Republican voters equal in the recent national election?
Section 2: Null and Alternative Hypotheses: In this section you will write Null and Alternative
Hypotheses for T-tests based on your research questions and the data provided in Blackboard
22
Alternative Hypothesis: The percentage of Democratic voters is NOT equal to the percentage of
Section 3: Check your assumptions and present evidence to suggest your check of the assumptions
(Note: Assumption videos in presentation)
Test for Normality: For this research question, the assumption is that the null hypothesis
distributions are normal. According to the Shapiro-Wik test of normality, which provides us with p-
values of .134 and .466 (Table 4), we can assume the distributions are normal because both values are
above .05.
Table 4
Homogeneity of Variance: For this research question, the assumption is that variances are
equal. According to the Shapiro-Wik test for homogeneity of variance, we are provided with a p-value
of .931 between groups (Table 5), which is greater than .05, so we accept the null hypothesis that the
variances are equal.
Table 5
23
Because both the test for normality and the test for homogeneity of variance confirm our assumptions,
we can run the t-test.
Section 4: Run the t-tests (even if the assumptions are not upheld) to answer your hypotheses
Because both the test for normality and the test for homogeneity of variance confirm our
assumptions, we can run the t-test.
Section 5: Present the results of the t-tests using text and tables
After running the t-test, we are provided with the p-value of .947 (Table 6). Because this is above .05
we can accept the null hypothesis that the percentage of democratic voters is equal to the percentage
of Republican voters.
Table 6
Our research question was do male professors make the same salary as female professors at
the same university? Before we ran our t-test, we checked our assumptions by running a test of
normality and a test of homogeneity of variance. Based on the p-values in each test being above the
demarcation point of .05 (.134 and .466 for the test of normality and .931 for the homogeneity of
variance), our assumptions were proven to be true, which allowed us to run the t-test. After running
the t-test, we were given a p-value of .947, which is above the demarcation point of .05. Therefore, we
can accept the null hypothesis that the percentage of Democratic voters is equal to the percentage of
Republican voters in the recent national election. Thus, the answer to our research question is yes, the
24
percentage of Democratic voters is equal to the percentage of Republican voters in the recent national
election.
Research Question 3
Section 1: Research Questions: In this section you will write 5 research questions based on the data
provided in Blackboard
Is the height of basketball players the same as the height of football players?
Section 2: Null and Alternative Hypotheses: In this section you will write Null and Alternative
Hypotheses for T-tests based on your research questions and the data provided in Blackboard Null
Hypothesis: The height of basketball players IS the same as the height of football
players.
Alternative Hypothesis: The height of basketball players is NOT the same as the height of football
players.
Section 3: Check your assumptions and present evidence to suggest your check of the assumptions
(Note: Assumption videos in presentation)
Test for Normality: For this research question, the assumption is that the null hypothesis
distributions are normal. According to the Shapiro-Wik test of normality, which provides us with p-
values of .043 and .320 (Table 7), we can assume the distributions are NOT normal because one of the
values is below .05.
Table 7
25
Homogeneity of Variance: For this research question, the assumption is that variances are
equal. According to the Shapiro-Wik test for homogeneity of variance, we are provided with a p-value
of .0004078 between groups (Table 8), which is less than .05, so we must reject the null hypothesis that
the variances are equal because there is a significant statistical difference.
Table 8
The tests for normality and homogeneity of variance demonstrate that are assumptions have been
violated as there is not a normal distribution, nor are the variances equal. However, despite this we will
still run a t-test.
Section 4: Run the t-tests (even if the assumptions are not upheld) to answer your hypotheses
The tests for normality and homogeneity of variance demonstrate that are assumptions have
been violated as there is not a normal distribution, nor are the variances equal. However, despite this
we will still run a t-test.
Section 5: Present the results of the t-tests using text and tables
If we look at the p-value for the equal variances not assumed (due to the fact that the test for
homogeneity of variance proved the variances were not equal), we see the significance of .000 (Table
9). Because this value is below the demarcation point of .05 we must reject the null hypothesis that the
height of basketball players and football players is the same because there is a statistically significant
difference.
Table 9
26
Our research question was is the height of basketball players the same as the height of football
players? Before we ran our t-test, we checked our assumptions by running a test of normality and a
test of homogeneity of variance. Based on the p-values in each test being below the demarcation point
of .05 (.043 and .320 for the test of normality and .0004078 for the homogeneity of variance), our
assumptions were rejected because there was not a normal distribution nor were the variances equal,
however we ran the t-test anyways. After running the t-test, we were given a p-value of .000, which is
below the demarcation point of .05. Therefore, we must reject the null hypothesis that the height of
basketball players is equal to the height of football players and accept the alternative hypothesis that
the height of basketball players is not equal to the height of football players . Thus, the answer to our
research question is no, the height of basketball players is not the same as football players.
Research Question 4
Section 1: Research Questions: In this section you will write 5 research questions based on the data
provided in Blackboard
Is the birth rate in California equal to the birth rate in Maine?
Section 2: Null and Alternative Hypotheses: In this section you will write Null and Alternative
Hypotheses for T-tests based on your research questions and the data provided in Blackboard
Null Hypothesis: The birth rate in California IS equal to the birth rate in Maine.
Alternative Hypothesis: The birth rate in California is NOT equal to the birth rate in Maine.
Section 3: Check your assumptions and present evidence to suggest your check of the assumptions
(Note: Assumption videos in presentation)
27
Test for Normality: For this research question, the assumption is that the null hypothesis
distributions are normal. According to the Shapiro-Wik test of normality, which provides us with p-
values of .995 and .504 (Table 10), we can assume the distributions are normal because both values are
above .05.
Table 10
Homogeneity of Variance: For this research question, the assumption is that variances are
equal. According to the Shapiro-Wik test for homogeneity of variance, we are provided with a p-value
of .000252797 (Table 11) between groups, which is less than .05, so we must reject the null hypothesis
that the variances are equal because there is a significant statistical difference.
Table 11
28
The tests for normality shows that our assumption of normal distribution is correct but the test for
homogeneity of variance demonstrated that the assumption of equal variances has been violated.
However, despite this we will still run a t-test.
Section 4: Run the t-tests (even if the assumptions are not upheld) to answer your hypotheses
The tests for normality shows that our assumption of normal distribution is correct but the test
for homogeneity of variance demonstrated that the assumption of equal variances has been violated.
However, despite this we will still run a t-test.
Section 5: Present the results of the t-tests using text and tables
If we look at the p-value for the equal variances not assumed (due to the fact that the test for
homogeneity of variance proved the variances were not equal), we see the significance of .000 (Table
12). Because this value is below the demarcation point of .05 we must reject the null hypothesis that
the birth rate in California is equal to the birth rate in Maine.
Table 12
Our research question was is the birth rate in California equal to the birth rate in Maine? Before we ran
our t-test, we checked our assumptions by running a test of normality and a test of homogeneity of
variance. Based on the p-values of .995 and .504 for the test of normality we concluded that the
distributions were equal, therefore our assumption was correct. However, when we ran the
29
homogeneity of variance test we got the p-value of .000252797,which meant we had to reject our
assumption of the variances being equal as there is a significant statistical difference, however we ran
the t-test anyways. After running the t-test, we were given a p-value of .000, which is below the
demarcation point of .05. Therefore, we must reject the null hypothesis that the birth rate in California
is equal to the birth rate in Maine and accept the alternative hypothesis that the birth rate in California
is not equal to the birth rate in Maine . Thus, the answer to our research question is no, the birth rate
in California is not equal to the birth rate in Maine.
Research Question 5
Section 1: Research Questions: In this section you will write 5 research questions based on the data
provided in Blackboard
Are the scores of students in Dr. Hughes’ class equal to the scores of students in Dr.
Made-up’s class?
Section 2: Null and Alternative Hypotheses: In this section you will write Null and Alternative
Hypotheses for T-tests based on your research questions and the data provided in Blackboard Null
Hypothesis: The scores of students in Dr. Hughes’ class ARE equal to the scores of
students in Dr. Made-Up’s class.
Alternative Hypothesis: The scores of students in Dr. Hughes’ class are NOT equal to the scores of
Section 3: Check your assumptions and present evidence to suggest your check of the assumptions
(Note: Assumption videos in presentation)
Test for Normality: For this research question, the assumption is that the null hypothesis
distributions are normal. According to the Shapiro-Wik test of normality, which provides us with p-
values of .423 and .037 (Table 13), we can assume the distributions are NOT normal because one of the
values is below .05.
30
Table 13
Homogeneity of Variance: For this research question, the assumption is that variances are
equal. According to the Shapiro-Wik test for homogeneity of variance, we are provided with a p-value
of .0000000001 (Table 14) between groups, which is less than .05, so we must reject the null
hypothesis that the variances are equal because there is a significant statistical difference. This
assumption has been violated.
Table 14
The tests for normality and homogeneity of variance demonstrate that our assumptions have been
violated as there is not a normal distribution, nor are the variances equal. However, despite this we will
still run a t-test.
Section 4: Run the t-tests (even if the assumptions are not upheld) to answer your hypotheses
The tests for normality and homogeneity of variance demonstrate that our assumptions have
been violated as there is not a normal distribution, nor are the variances equal. However, despite this
we will still run a t-test.
31
Section 5: Present the results of the t-tests using text and tables
If we look at the p-value for the equal variances not assumed (due to the fact that the test for
homogeneity of variance proved the variances were not equal), we see the significance of .000 (Table
15). Because this value is below the demarcation point of .05 we must reject the null hypothesis that
the scores of the students in Dr. Hughes’ class are equal to the scores in Dr. made-Up’s class.
Table 15
Jessica Buckle
Winter 2020
EDUC 700
3. What is a hypothesis?
A hypothesis is a proposed explanation for a phenomenon. This might be something we
believe, something we are interested in, maybe just a question we have. For it to be a
research hypothesis, the hypothesis must be testable using a statistical analysis that meets
the established criteria for that analysis.
4. What is a null hypothesis?
A null hypothesis is a general statement or default position that there is nothing
significantly different happening, like there is no association among groups or variables,
or that there is no relationship between two or more measured phenomena. For example,
if the research question was: Is the birth rate in California equal to the birth rate in Maine,
the null hypothesis would be: The birth rate in California is equal to the birth rate in
Maine.
5. What is the purpose of a null hypothesis?
The purpose of the null hypothesis is to effectively express a testable hypothesis. It
effectively communicates the common/default belief/position. Research questions can be
confusing, the null hypothesis is meant to be straightforward.
6. What is an example of non-directional research hypothesis?
RQ: Does the birth rate in California equal the birth rate in Maine?
RH: The birth rate in California equals the birth rate in Maine.
7. Explain the non-directional research hypothesis and why it is non-directional.
These are examples of non-directional because it uses the word “equal.” It is a two-tailed
test because you are looking for the 5%, 2.5% on either end of the distribution. It will be
unlikely to use or see a directional test. A non-directional test is more robust and still
provides you with the same information sought after in a directional test.
34
power to detect an effect in one direction by not testing the effect in the other direction.
The null hypothesis states there is no relationship between the measured phenomenon (the
dependent variable) and the independent variable. You do not need to believe that the null
hypothesis is true to test it. On the contrary, you will likely suspect that there is a relationship
between a set of variables. One way to prove that this is the case is to reject the null hypothesis.
Rejecting a hypothesis does not mean an experiment was "bad" or that it didn't produce results.
In fact, it is often one of the first steps toward further inquiry.
Note: In quantitative research, Dr. Hughes believes that it is much easier to write a research
question. I take the Null hypothesis and add in more detail to develop a research question. OR,
I have a question and I consider the best statistical test to answer that question. Then I adjust
my question to align with null hypothesis of that statistical test. Writing good research
questions is difficult. I encourage you to keep a notebook with you so that you can write down
questions you have and then try to find ways to answer those questions. Your approach might
be quantitative, qualitative, or mixed methods...
13. In statistics, what is probability?
Probability is the measure of the likelihood that an event will occur in a Random
Experiment. Probability is quantified as a number between 0 and 1, where, loosely
speaking, 0 indicates impossibility and 1 indicates certainty. The higher the probability of
an event, the more likely it is that the event will occur.
Example: A simple example is the tossing of a fair (unbiased) coin. Since the coin is fair,
the two outcomes (“heads” and “tails”) are both equally probable; the probability of
“heads” equals the probability of “tails”; and since no other outcomes are possible, the
probability of either “heads” or “tails” is 1/2 (which could also be written as 0.5 or 50%).
14. Why is probability important?
What we are doing in class is the basics. But you can use statistics to determine the
impact of something like a treatment. Is statistics only concerned with the middle?
Simple answer NO. I want to know why a group exists or does not exist near what is
considered the “middle”. Quantitative research is a “method” that uses statistics to help
answer a question.
Probability is a notion which we use to deal with uncertainty. If an event can have an
number of outcomes, and we don't know for certain which outcome will occur, we can
use probability to describe the likelihood of each of the possible events. ... There are two
possible outcomes: the coin could come up heads or tails.
15. On a distributions curve, what is an asymptotic tail?
An asymptotic value comes close to but never touches the x-axis. This is significant
because you’re looking for differences (situations where people exist in the tails) and this
makes you aware that those people exist.
37
21. Find, copy, and paste a relatively high quality z-score table here (Note: the image
should have a normal distribution pictured or something indicated like “Table Values
39
Represents Area to the left of the Z-score” AND both positive and negative z-score values
(Note: https://www.math.arizona.edu/~rsims/ma464/standardnormaltable.pdf):
40
22. What does having a significance value of .05 and a z-score of 2.6 tell you?
It tells us that that individual person is way out in the extremes because you are looking
at the tail. Only 4.7% of people exist beyond the z-score in this instance. They exist in a zone of
statistical significance/difference.
41
Jessica Buckle
Winter 2020
EDUC 707
8. Explain the statement, just because something is statistically significant does not mean it
is practically significant.
Statistically significant means nothing if the study is not designed well or if there is an
insufficient effect size or the power is low, meaning that it is not practically significant (it does
not have a big enough effect for it to make a difference). For example, a person with a doctorate
is not necessarily smarter or a better person than a person without a doctorate.
9. What are the steps/events in making an inference?
(1) Random representative sample; (2) Select appropriate test for statistical significance
(i.e. treatment and control groups); (3) Results (i.e. difference between groups); (4) Conclusion
(i.e. why are the groups different?)
10. What are the steps in the plan for testing significance? (step 2 of making an inference)
1. State the null hypothesis: usually state the research question first
2. Selecting the level of significance, also level of risk for type 1 error
43
helpful to know whether results have a statistically significant effect, but also the magnitude of
any observed effects.
16. Why is effect size important?
Effect size is important because the larger the effect size the stronger the relationship
between two variables. It helps us to not only know if it is statistically significant but if it is
practically significant.
17. What is statistical power?
Power is the probability of rejecting the null hypothesis when, in fact, it is false. It is the
probability of avoiding a type II error. It is the probability of making a correct decision (i.e. to
reject the null when the null hypothesis is false).
18. Why is statistical power important?
Statistical power is important because it tells us the probability of rejecting the null
hypothesis when, in fact, it is false. It is also important because it helps us avoid a Type II error.
It helps us to discover if there is a statistically significant difference as well as to help us to
determine how big of a sample size we need.
Note: In addition to these Notes you will run a Z-Test using data in Blackboard for
assignment 5.
Research Question 1:
RQ1: Does a sample of male professional basketball players equal the average height of
professional basketball players?
Null Hypothesis 1: The height of the sample population of professional male basketball players
equals the average height of male professional basketball players.
Alternative Hypothesis 1: The height of the sample population of male basketball players does
not equal the average height of male professional basketball players.
Assumptions for RQ1: We cannot check the normality of the population just of our sample
therefore we are unable to run a normality test. We are assuming that the normality and the
homogeneity of variance are accurate as these tests cannot be easily run in SPSS.
Table 1
45
Results for RQ1: The p-value of .01294 is below .05 so the sample mean does not equal the
population. Therefore, we must reject the null hypothesis and accept the alternative hypothesis.
Thus, the answer to RQ1 is the sample population of male professional basketball players does
not equal the average height of male professional basketball players.
Research Question 2:
RQ2: Does the sample populations’ fasting glucose level equal the average glucose level of
nondiabetics’ glucose levels?
Null Hypothesis 2: The sample populations’ fasting glucose level is equal to the average glucose
level of nondiabetics.
Alternative Hypothesis 2: The sample populations’ fasting glucose level is not equal to the
average glucose level of nondiabetics.
Assumptions for RQ2: We cannot check the normality of the population just of our sample
therefore we are unable to run a normality test. We are assuming that the normality and the
homogeneity of variance are accurate as these tests cannot be easily run in SPSS.
Table 2
Results for RQ2: The p-value of .00000 is below .05 so the sample mean does not equal the
population. Therefore, we must reject the null hypothesis and accept the alternative hypothesis.
Thus, the answer to RQ2 is the sample populations’ fasting glucose level is not equal to the
average glucose level of nondiabetics.
Research Question 3:
RQ3: Does the sample percentage of adults with a college degree equal the general percentage
of the population with a college degree?
Null Hypothesis 3: The sample percentage of adults with a college degree is equal to the general
percentage of the population with a college degree.
Alternative Hypothesis 3: The sample percentage of adults with a college degree is not equal to
the general percentage of the population with a college degree.
46
Assumptions for RQ3: We cannot check the normality of the population just of our sample
therefore we are unable to run a normality test. We are assuming that the normality and the
homogeneity of variance are accurate as these tests cannot be easily run in SPSS.
Table 3:
Results for RQ3: The p-value of .00000 is below .05 so the sample mean does not equal the
population. Therefore, we must reject the null hypothesis and accept the alternative hypothesis.
Thus, the answer to RQ3 is the sample percentage of adults with a college degree is not equal to
the general percentage of the population with a college degree.
47
Jessica Buckle
Winter 2020
EDUC 707
It is important because it tells you if the means are significantly different. It lets us know
if we need to accept or reject the null hypothesis. This is still based on the significance of
examining if the value is below or above .05.
6. Explain ANOVA: two-factor with replication:
A two-way ANOVA with replication is performed when you have two groups and
individuals within that group are doing more than one thing (i.e. taking two tests). An example
would be if you were comparing students’ tests scores across different tests from two or more
colleges.
7: Explain ANOVA: two-factor without replication:
A two-way ANOVA without replication is if you only have one group. An example
would be if you were comparing one group of students’ scores across different tests.
8. What is a factorial analysis of variance?
A factorial analysis of variance is an analysis with more than one factor or independent
variable. An example would include two factors such as gender (male and female) and treatment
(high impact or low impact exercise program), and the outcome (weight loss).
9. Why is a factorial analysis of variance important?
Factorial analysis of variance is important because it allows us to test for multiple factors
and between two or more groups.
10. What is the main effect in factorial analysis of variance?
The main effect is examining one of the independent variables effect on the dependent
variable.
11. When would you use a factorial ANOVA rather than a simple ANOVA?
You would use a factorial ANOVA over a simple ANOVA when you have multiple
independent variables to compare. For example, if you are looking at does ethnicity or gender
affect test scores you would run a factorial ANOVA because you are looking at the possible
effect of ethnicity and/or gender on test scores.
Note: In addition to these Notes you will run ANOVAs using data in Blackboard for
assignment 6.
Research Question 1:
RQ1: Does a difference in business startup costs exist between Pizza, Bakery, Shoe, Gift, and
Pet shops/stores?
Null RQ1: Mean startup costs are equal for Pizza, Bakery, Shoe, Gift, and Pet shop/stores.
Alternative RQ1: Mean startup costs are NOT equal for Pizza, Bakery, Show, Gift, and Pet
shop/stores.
49
Assumptions for RQ1: To check our assumptions that the data is normally distributed and that
the homogeneity of variance is equal we ran the tests. Based on Table 1, we can see based on the
significance values being below .05 that they are not normally distributed. However, as the
homogeneity of variance is more important, we ran that test anyways. Based on Table 2 we see
the mean significance of .173, which is above the critical value of .05 so we accept the null
hypothesis and will run the One-way ANOVA.
Table 1
50
Table 2
Results for RQ1: If we look at the ANOVA test in Table 2 and look at the significance value
between groups, we get a significance value of .000. This is below the critical value of .05 so we
reject the null hypothesis that all mean start-up costs are equal, and we must accept the
alternative hypothesis that all the means are not equal. Therefore, the answer to RQ1 is: Yes,
there is a difference in business startup costs exist between Pizza, Bakery, Shoe, Gift, and Pet
shops/stores.
Research Question 2:
RQ2: Which business has a higher start-up cost: Pizza, Bakery, Shoe, Gift, and Pet shops/stores?
Null RQ2: The mean startup costs are equal for Pizza, Bakery, Shoe, Gift, and Pet shop/stores.
51
Alternative RQ2: The mean startup costs are NOT equal for Pizza, Bakery, Show, Gift, and Pet
shop/stores.
Assumptions for RQ2: To check our assumptions that the data is normally distributed and that
the homogeneity of variance is equal we ran the tests. Based on Table 1, we can see based on the
significance values being below .05 that they are not normally distributed. However, as the
homogeneity of variance is more important, we ran that test anyways. Based on Table 2 we see
the mean significance of .173, which is above the critical value of .05 so we accept the null
hypothesis and will run the One-way ANOVA.
Table 1
52
Table 2
Results for RQ2: If we look at the ANOVA test in Table 2 and look at the significance value
between groups, we get a significance value of .000. This is below the critical value of .05 so we
reject the null hypothesis that all mean start-up costs are equal, and we must accept the
alternative hypothesis that all the means are not equal. If we go back to the Descriptives in Table
1, we can see the mean for group 1 (Baker) is higher than the other groups. Therefore, the answer
to RQ2 is: There is a higher start-up cost for bakers than there is for pizza, shoe stores, gift
shops, or pet stores.
Research Question 3:
RQ 3: Does day of the week and section of the paper impact number of responses a paper
advertisement received?
53
Null Hypothesis RQ3 #1: Factors 1 and 2 have no interaction OR the magnitude of responses is
equal across all days of the week and sections of the paper
Null Hypothesis RQ3 #2: Factor 1 mean is the same for a groups OR number of responses are
equal based on day advertisement is posted (main effect)
Null Hypothesis RQ3 #3: Factor 2 mean is the same for a groups OR number of responses are
equal based on section of the paper that the advertisement is posted (main effect)
Alternative RQ3 #1: Factors 1 and 2 have interaction, Alternative RQ2 #3: Factor 1 mean is
not equal for a groups, Alternative RQ3 #3: Factor 2 mean is not equal for a groups
Assumptions RQ 3: We need to check our assumptions by running a test for normality and a
test for homogeneity of variance. Based on Table 3, we can see a significance value of .298,
which is above the critical value of .05 so our assumption that there is a normal distribution is
correct. If we look at our Levene’s Test in Table 4 and examine the significance value based on
means we get a value of .931, which is above .05 so we accept the null hypothesis and will run
the Two-way ANOVA.
Table 3
Table 4
54
Results RQ3:
Based on the p-values from Table 5, we can see we must reject all three null hypotheses
and accept all three alternative hypotheses. If we look at Factor 1&2 the p-value is .000, which is
below .05 so we reject null hypothesis RQ2 #1 and accept Alternative RQ2 #1 that there is
interaction between the two variables and that they are not equal across all days of the week and
sections of the paper. If we look at Factor 1, we get a p-value of .000, which is below .05 so we
reject the null hypothesis RQ2 #2 and accept the Alternative RQ2 #2 that the mean of responses
to advertisements is based on the days of the week is not equal. If we look at Factor 2, we have a
value of .000, which is below .05 so we reject Null Hypothesis RQ2 #3 and accept Alternative
RQ2 #3 that the number of responses for advertisements is not equal across sections of the
newspaper. Therefore, the answer to RQ#2 is yes, the day of the week and the section of the
paper impact the number of responses a paper advertisement received.
Table 5
Research Question #4
RQ4: What is the best day to advertise in the paper?
Null Hypothesis RQ4 #: The magnitude of responses is equal across all days of the week.
Alternative RQ4: The magnitude of responses is NOT equal across all days of the week.
Assumptions RQ4: We need to check our assumptions by running a test for normality and a test
for homogeneity of variance. Based on Table 3, we can see a significance value of .298, which is
above the critical value of .05 so our assumption that there is a normal distribution is correct. If
we look at our Levene’s Test in Table 4 and examine the significance value based on means we
get a value of .931, which is above .05 so we accept the null hypothesis and will run the Two-
way ANOVA.
Table 3
55
Table 4
Table 5
56
Table 6
Results RQ4: Based on the p-values from Table 5, we can see we must reject the null
hypotheses and accept the alternative hypotheses. If we look at Factor 1, we get a p-value of
.000, which is below .05 so we reject the null hypothesis RQ4 and accept the Alternative RQ4
that the mean of responses to advertisements is based on the days of the week is not equal. If we
look at the data from Table 6 and examine the significance values and the mean differences we
can tell that the best day to advertise in the paper is Friday. Therefore, the answer to RQ#4 is
Friday is the best day of the week to place an ad to impact the number of responses a paper
advertisement received.
Research Question #5
RQ5: What is the best section of the paper to advertise in?
Null Hypothesis RQ5: The magnitude of responses is equal across all sections of the paper.
Alternative RQ5: The magnitude of responses is NOT equal across all sections of the paper.
Assumptions RQ5: We need to check our assumptions by running a test for normality and a test
for homogeneity of variance. Based on Table 3, we can see a significance value of .298, which is
above the critical value of .05 so our assumption that there is a normal distribution is correct. If
57
we look at our Levene’s Test in Table 4 and examine the significance value based on means we
get a value of .931, which is above .05 so we accept the null hypothesis and will run the Two-
way ANOVA.
Table 3
Table 4
Table 5
58
Table 7
Results RQ5: Based on the p-values from Table 5, we can see we must reject the null
hypotheses and accept the alternative hypotheses. If we look at Factor 2, we get a p-value of
.000, which is below .05 so we reject the null hypothesis RQ5 and accept the Alternative RQ5
that the mean of responses to advertisements is based on the sections of the week is not equal. If
we look at the data from Table 7 and examine the significance values and the mean differences
we can tell that the best section of the paper is the news section. Therefore, the answer to RQ#5
is the news section is the best section to place an ad to impact the number of responses a paper
advertisement received.
Research Question #6
RQ6: What is the best day and section of the newspaper to place an advertisement?
59
Null Hypothesis RQ6 #1: Factors 1 and 2 have no interaction OR the magnitude of responses is
equal across all days of the week and sections of the paper
Null Hypothesis RQ6 #2: Factor 1 mean is the same for a groups OR number of responses are
equal based on day advertisement is posted (main effect)
Null Hypothesis RQ6 #3: Factor 2 mean is the same for a groups OR number of responses are
equal based on section of the paper that the advertisement is posted (main effect)
Alternative RQ6 #1: Factors 1 and 2 have interaction, Alternative RQ2 #6: Factor 1 mean is
not equal for a groups, Alternative RQ6 #3: Factor 2 mean is not equal for a groups
Assumptions RQ6: We need to check our assumptions by running a test for normality and a test
for homogeneity of variance. Based on Table 3, we can see a significance value of .298, which is
above the critical value of .05 so our assumption that there is a normal distribution is correct. If
we look at our Levene’s Test in Table 4 and examine the significance value based on means we
get a value of .931, which is above .05 so we accept the null hypothesis and will run the Two-
way ANOVA.
Table 3
Table 4
60
Table 5
Table 6
Table 7
61
Results RQ6: Based on the p-values from Table 5, we can see we must reject all three null
hypotheses and accept all three alternative hypotheses. If we look at Factor 1&2 the p-value is
.000, which is below .05 so we reject null hypothesis RQ6 #1 and accept Alternative RQ6 #1 that
there is interaction between the two variables and that they are not equal across all days of the
week and sections of the paper. If we look at Factor 1, we get a p-value of .000, which is below
.05 so we reject the null hypothesis RQ6 #2 and accept the Alternative RQ6 #2 that the mean of
responses to advertisements is based on the days of the week is not equal. If we look at Factor 2,
we have a value of .000, which is below .05 so we reject Null Hypothesis RQ6 #3 and accept
Alternative RQ6 #3 that the number of responses for advertisements is not equal across sections
of the newspaper. If we look at the data from Table 6 and examine the significance values and
the mean differences we can tell that the best day to advertise in the paper is Friday. If we look at
the data from Table 7 and examine the significance values and the mean differences we can tell
that the best section of the paper is the news section. Therefore, the answer to RQ#6 is Fridays in
the news section is the best place to place an ad to impact the number of responses a paper
advertisement received.
62
Jessica Buckle
Winter 2020
EDUC 707
outcomes are repeatable. In essence, reliability is being able to hit the same spot each time while
validity is taking a good measure or sample.
19. Explain the idea of accuracy (i.e. validity) versus precision (Note: consider having a
picture and description):
68
Accuracy describes the difference between the measurement and the part’s actual value,
while precision describes the variation you see when you measure the same part repeatedly with
the same device.
dressed Foosball player judging was .91, which indicates a high degree of agreement of
agreement among judges.
24. What is content validity?
Content validity is when you want to know whether a sample of items truly reflects an
entire universe of items in a certain topic. To do this you would ask Mr. or Ms. Expert to make a
judgement that the test items reflect the universe of items in the topic being measures. For
example, my weekly quiz is my stats class fairly assesses the chapter’s content.
25. What is criterion validity?
Criterion validity is when you want to know whether test scores are systematically related
to other criteria that indicate the test taker is competent in a certain area. To do this you would
correlate the scores from the test with some other measure that is already valid and assesses the
same set of abilities. For example, performance of the EATS test (of culinary skills) has been
shown to be correlated with being a fine chef 2 years after culinary school.
26. What is construct validity?
Construct validity is when you want to know whether a test measures some underlying
psychological construct. To do this you would correlate the set of test scores with some theorized
outcome that reflects the construct for which the test is being designed. For example, it’s true-
men who participate in body contact and physically dangerous sports score higher on the
TEST(osterone) test of aggression.
27. Explain the idea of measurement error (i.e. observational error):
Measurement error is when there is a difference between the value you got when you
measured (what you observed) and the actual value (true value). For example, a witness to a
crime- witnesses will have different observations about height/weight/clothing color.
28. What is random error?
Random error is an error in measurement caused by factors which vary from one
measurement to another. Such as when Chips Ahoy advertised that their cookies had 1000 chips
in every bag.
29. What is systematic error?
Systematic error is when systematic things impact how accurate something is, for
example a using a ruler to measure something but the ruler is inaccurate therefore everything you
measure is going to be inaccurate.
30. Explain three other potential sources of error:
70
1. Environmental errors: The environmental errors occur due to some external conditions of
the instrument such as pressures, temperature, humidity, or magnetic fields.
2. Observational errors: Types of errors that occur due to wrong observations or reading in
the instruments. For example, reading an energy meter reading incorrectly.
3. Theoretical errors: Caused by simplification of the model system. For example, a theory
states that the temperature of the system surrounding will not change the readings taken
when it actually does, then this factor will begin a source of error in measurement.
Table 1
Table 2
Table 3
72
Table 4
Table 5
Even though we violated our assumptions by not having a normal distribution and by having
outliers, we ran our test for correlation anyways. If we look at Table 5, we can see a correlation
coefficient of -.026 which tells us they are not correlated. If we look at our p-value, it is .856,
which is over the critical value of .05, so we accept the null hypothesis that there is no
correlation. Therefore, the answer to RQ1 is: There is no correlation between crime rate and
college graduation.
Research Question #2:
RQ#2: Is there a correlation between age and household income in the thousands?
Null Hypothesis RQ2: There is NO correlation between age and household income in the
thousands.
Alternative Hypothesis RQ2: There is a correlation between age and household income in the
thousands.
Assumptions for RQ2: As we can see in Table 6, the p-value is .000, which is below .05, so we
must reject the null hypothesis that the data distribution is normal and accept the alternative
hypothesis that the data is not distributed normally. This tells us we might want to consider a
non-parametric test like Kendall test. Based on Table 7, we can see there are also outliers, which
tells us there is no correlation. However, we are going to continue the test. If we look at Table 8,
we can see that the data is not linear. If we look at Table 9, it does not look like there is
homoscedasticity as there are multiple outliers.
74
Table 6
Table 7
Table 8
75
Table 9
76
Even though we violated our assumptions by not having a normal distribution and by having
outliers, we ran our test for correlation anyways. If we look at Table 10, we can see a correlation
77
coefficient of .476 which tells us they are not strongly correlated. If we look at our p-value, it is
.000, which is below the critical value of .05, so we reject the null hypothesis that there is no
correlation and accept the alternative hypothesis that there is a correlation. Therefore, the answer
to RQ2 is: There is a correlation between age and household income in the thousands but
because we violated our assumptions this correlation does not have as much meaning.
Research Question #3:
RQ#3: Is there a correlation between age and credit card debt in the thousands?
Null Hypothesis RQ3: There is NO correlation between age and credit card debt in the
thousands?
Alternative Hypothesis RQ3: There is a correlation between age and credit card debt in the
thousands.
Assumptions for RQ3: As we can see in Table 11, the p-value is .000, which is below .05, so
we must reject the null hypothesis that the data distribution is normal and accept the alternative
hypothesis that the data is not distributed normally. This tells us we might want to consider a
non-parametric test like Kendall test. Based on Table 12, we can see there are also outliers,
which tells us there is no correlation. However, we are going to continue the test. If we look at
Table 13, we can see that the data is not linear. If we look at Table 14, it does not look like there
is homoscedasticity as there are multiple outliers.
Table 11
Table 12
78
Table 13
Table 14
79
Even though we violated our assumptions by not having a normal distribution and by having
outliers, we ran our test for correlation anyways. If we look at Table 15, we can see a correlation
coefficient of .279 which tells us they are not strongly correlated. If we look at our p-value, it is
.000, which is below the critical value of .05, so we reject the null hypothesis that there is no
correlation and accept the alternative hypothesis that there is a correlation. Therefore, the answer
80
to RQ3 is: There is a correlation between age and credit card debt in the thousands, but because
we violated our assumptions this correlation does not have as much meaning
Research Question #4:
RQ#4: Is there a correlation between age and debt to income ratio?
Null Hypothesis RQ4: There is NO correlation between age and debt to income ratio.
Alternative Hypothesis RQ4: There is a correlation between age and debt to income ratio.
Assumptions for RQ4: As we can see in Table 16, the p-value is .000, which is below .05, so
we must reject the null hypothesis that the data distribution is normal and accept the alternative
hypothesis that the data is not distributed normally. This tells us we might want to consider a
non-parametric test like Kendall test. Based on Table 17, we can see there are also outliers,
which tells us there is no correlation. However, we are going to continue the test. If we look at
Table 18, we can see that the data is not linear. If we look at Table 19, it looks like there is
homoscedasticity.
Table 16
Table 17
81
Table 18
82
Table 19
Even though we violated our assumptions by not having a normal distribution and by having
outliers, we ran our test for correlation anyways. If we look at Table 20, we can see a correlation
coefficient of .008 which tells us they are not correlated. If we look at our p-value, it is .810,
which is below the above the critical value of .05, so we accept the null hypothesis that there is
83
no correlation. Therefore, the answer to RQ4 is: There is no correlation between age and debt to
income ratio.
Research Question #5:
RQ#5: Is there a correlation between age and years with current employer?
Null Hypothesis RQ5: There is NO correlation between age and years with current employer.
Alternative Hypothesis RQ5: There is a correlation between age and years with current
employer.
Assumptions for RQ5: As we can see in Table 21, the p-value is .000, which is below .05, so
we must reject the null hypothesis that the data distribution is normal and accept the alternative
hypothesis that the data is not distributed normally. This tells us we might want to consider a
non-parametric test like Kendall test. Based on Table 22, we can see there are also outliers,
which tells us there is no correlation. However, we are going to continue the test. If we look at
Table 23, we can see that the data is not linear. If we look at Table 24, it looks like there is not
homoscedasticity because there are multiple outliers.
Table 21
Table 22
84
Table 23
85
Table 24
Even though we violated our assumptions by not having a normal distribution and by having
outliers, we ran our test for correlation anyways. If we look at Table 25, we can see a correlation
coefficient of .554 which tells us they are not strongly correlated. If we look at our p-value, it is
.000, which is below the above the critical value of .05, so we reject the null hypothesis that there
86
is no correlation and accept the alternative hypothesis. Therefore, the answer to RQ5 is: There is
a correlation between age and years with current employer but because we violated our
assumptions this correlation does not have as much meaning.
87
Jessica Buckle
EDUC 707
Linear regression consists of finding the best-fitting straight line through the points.
The best-fitting line is called a regression line. ... The error of prediction for a point is the value
of the point minus the predicted value (the value on the line); The error is the distance from the
point to the line.
6. What is a dependent variable?
A dependent variable is a variable that depends on another (the independent variable). In
an example of a grade based on time studying, the grade would be the dependent variable.
7. What is an independent variable?
In independent variable is a variable that is depended upon by another (the dependent
variable). In an example of a grade based on time studying, time would be the independent
variable.
8. What is an easy way to tell the different between an independent and dependent
variable?
An easy way to tell the difference between an independent variable and a dependent
variable is to look at which one is permanent (independent) and which one changes (dependent).
9. What is the difference between a linear and curvilinear relationship?
When a ratio of change is not constant the correlation is supposed to be linear. In another
term, when the entire points on the scatter diagram start to bend close to a smooth curve, then the
correlation is non-linear (curvilinear).
10. What is the basic mathematical function of a straight line? (Note: label the variables or
write them in words)
𝑦 = 𝑚𝑥 + 𝑏. b is the point where the line crosses the y-axis m is the slope of the line x and y are
points on the coordinated system
11. Based on the equation above, what do you need to know to be able to draw a line?
89
multicollinearity. Lastly, Table 4 shows us there is homoscedasticity as most of the data is equal
distance from the line and there are no major outliers, therefore we accept the null that all are
equal distance from the line. We can accept that our assumptions are true and move on to
running our regression.
Table 1
Table 2
Table 3
91
RQ #1 Results:
Table 4
Based on Table 4 we can see there is a statistical significance because .022 is above .05.
Our R squared value is .104, which tells us there is a predictive ability because it is above 0.
Therefore, we must accept the alternative hypothesis that there is a statistically significant
relationship between high school dropout rate and crime rate. And the answer to our research
question is that as high school drop out rate increases it will have a 10% increase impact on the
crime rate.
RQ#2
RQ#2: How does high school dropout rate, reported violent crime rate, annual police
funding, rate of people with a high school degree, rate of people in college, and rate of college
graduates affect crime rate?
RQ #2 Null Hypothesis: Each added independent variable does not improve the fit.
RQ #2 Alternative Hypothesis: Each added independent variable does improve the fit
RQ #2 Assumptions: Before we can run our regression, we need to check the following
assumptions: linear relationship, multivariate normality, multicollinearity, and homoscedasticity.
Based on Table 5, we can see that not all values are above .05 so we must reject the null that it is
normally distributed since not ALL the data is normally distributed. To check linearity we ran a
correlation, shown in Table 6. The correlation matrix in Table 6 shows that half of the variables
are above .05 so we are going to accept the null that all values exist on a line. However, Table 6
also tells us that we cannot accept the null for multicollinearity because the independent
variables seem to be correlated to one another and not independent. However, we will run the
regression anyways. Table 7 shows us that we can accept the null for homoscedasticity as there
are not too many extreme outliers.
92
Table 5
Table 6
Table 7
93
RQ #2 Results:
Table 8
Based on Table 8, our R squared value is .613, which tells us that it is strongly correlated
and a pretty good predictor. Knowing these factors, we can answer our research question by
saying we can say that as the independent variables change, they will influence crime rate by
61%.
RQ#3
RQ#3: Does age impact car accidents?
RQ #3 Null Hypothesis: There is no significant relationship between age and car
accidents.
RQ #3 Alternative Hypothesis: There is a statistically significant relationship between
age and car accidents.
RQ #3 Assumptions: Before we can run our regression, we need to check the following
assumptions: linear relationship, multivariate normality, multicollinearity, and homoscedasticity.
94
Based on a significance value of .488 we can accept the null that the data is normally distributed.
Based on Table 10 it does not look like there is a linear relationship but if we run a correlation,
as seen in Table 11, we can see that there is a correlation so we will accept the null that all the
data exists on a line. Since there is only one independent variable, we do not have to check the
multicollinearity. Based on Table 12 we can accept the null for homoscedasticity as all points
appear equal distance from the line so we will run our regression.
Table 9
Table 10
Table 11
95
Table 12
RQ #3 Results:
Table 13
96
Based on Table 13 we are given an R squared value of .002. Even though this is slight it
is above 0, which means we must reject the null hypothesis and accept the alternative hypothesis
that there is a statistically significant relationship between age and car accidents. To answer our
research question, we can attribute a 2% variability in car accidents based on age.
RQ#4
RQ#4: Does age impact credit card debt?
RQ #4 Null Hypothesis: There is no significant relationship between age and credit card
debt.
RQ #4 Alternative Hypothesis: There is a statistically significant relationship between
age and credit card debt.
RQ #4 Assumptions: Before we can run our regression, we need to check the following
assumptions: linear relationship, multivariate normality, multicollinearity, and homoscedasticity.
Based on a significance value below .05 as seen in Table 14 we can tell that the data is not
normally distributed, and we must reject the null. Based on Table 15, it does not look like there
is a linear relationship so we will run a correlation to check. In Table 16 we see a value of .279
and a significance value of .000, which tells us there is some correlation, but it is not strong.
Because there is only one independent variable, we do not have to check the multicollinearity.
Table 17 shows us the data appears to be equal distance from the line so we will accept the null
for homoscedasticity. Despite not all our assumptions being met, we will run the regression.
Table 14
Table 15
97
Table 16
Table 17
98
RQ #4 Results:
Table 18
According to Table 18, after running our regression, we get an R value of .279 and an R
squared value of .078, which tells us there is a predictive ability because it is above 0. Therefore,
we must accept the alternative hypothesis that there is a statistically significant relationship
between age and credit card debt. And the answer to our research question is that there is almost
an 8% variability in credit card debt based on age.
RQ#5
RQ#5: Does age, years with current employer, years at current address, household
income, debt to income ratio, and other debt impact credit card debt?
RQ #5 Null Hypothesis: Each added independent variable does not improve the fit.
RQ #5 Alternative Hypothesis: Each added independent variable does improve the fit.
RQ #5 Assumptions: Before we can run our regression, we need to check the following
assumptions: linear relationship, multivariate normality, multicollinearity, and homoscedasticity.
Based on Table 19, we can see that we must reject the null and accept the alternative that the data
is not normally distributed. To check for linear relationship, we ran a correlation. Based on Table
20, we can tell from our correlation matrix that there appears to be a linear relationship. We can
also tell from Table 20 that we cannot uphold the null for multicollinearity because it appears
that most of the variables are correlated, however we will still run our regression. According to
Table 21, it looks like we can accept the null for homoscedasticity as it appears most of the data
is the same distance from the line with a few outliers.
Table 19
99
Table 20
Table 21
100
RQ #5 Results:
Table 22
Based on Table 22, we get an R value of .771 and an R squared value of .594. This tells
us that it is strongly correlated and a good predictor. Therefore, we must reject the null
hypothesis and accept the alternative hypothesis. Thus, to answer out research question, knowing
these factors, we can predict that as the independent variables change, credit card debt will vary
by 59%.
RQ#6
RQ#6: Does age affect a violent first crime?
RQ #6 Null Hypothesis: There is no significant relationship between age and a violent
first crime.
RQ #6 Alternative Hypothesis: There is a statistically significant relationship between
age and a violent first crime.
101
RQ #6 Assumptions: Before we can run our regression, we need to check the following
assumptions: linear relationship, multivariate normality, multicollinearity, and homoscedasticity.
Based on a p-value of .000, we can reject the null hypothesis and accept the alternative that the
data is not normally distributed. Based on Table 24 it does not look like there is a linear
relationship, but we will run a correlation to check. Based on Table 25 it looks like there is a
slight correlation so we will accept the null hypothesis that the values exist on a line. Since there
is only one independent variable, we do not have to check multicollinearity. Based on Table 26 it
appears we can accept the null for homoscedasticity as all points seem equal distance from the
line.
Table 23
Table 24
Table 25
102
Table 26
RQ #6 Results:
Table 27
Based on Table 27, we get an R value of .010 and an R squared value that is close to 0,
therefore we will accept the null hypothesis that there is not a significant statistical relationship
between age and a violent first crime. Therefore, the answer to our research question is that age
does not significantly impact violent first crime.
103
Jessica Buckle
Winter 2020
EDUC 707
Chi-square is used to determine whether your observed frequencies is what you would
expect. A one-sample chi-square only has one categorical variable. For example, are the number
of respondents equally distributed across all three levels of education (no college, some college,
and college degree)?
4. What is a two-sample chi-square?
A two-sample chi square has two categorical variables, for example it might be used to
test whether preference for school vouchers is independent of political affiliation and gender.
5. What are the eight steps in the chi-square test?
1. State the null and research hypotheses
a. the null hypothesis states that there is no difference in the frequency or the
proportion of occurrences in each category
2. Set the level of risk (or the level of significance of Type I error) associates with the null
hypothesis
a. The Type I error rate is set at .05
3. Select the appropriate test statistic
a. Use the flowchart to determine
4. Compute the test statistic value (called the obtained value)
5. Determine the value needed for rejection of the null hypothesis using the appropriate
table of critical values for the particular statistic
a. Look at degrees of freedom
i. Use this number and the level of risk to look up the critical value on the
chi-square table
6. Compare the obtained value with the critical value
7. And 8. Decision time!
Note: chi-squared is a special case of the gamma distribution. Researching the gamma
distribution will help you explore the chi-square distribution.
Note: In addition to these Notes you will run Nonparametric tests using data in Blackboard
for assignment 9.
RQ1-(Chi-sqaure)
RQ #1: Is there an association between major and socioeconomic status?
Null Hypothesis RQ #1: Major and socioeconomic status are independent.
Alternative Hypothesis RQ #1: Major and socioeconomic status are NOT independent.
Assumptions RQ #1: For a chi-square test the assumptions are that there is a random
sample, a large enough sample, that the observations are independent, and that the observations
are not normally distributes. Based on Table 1 we can accept the null for the assumptions that
there is a random sample and that the sample is large enough as there is a sample of 272
participants. After running a correlation, we can tell based on Table 2 that the variables are not
106
strongly correlated so we will accept the null that the observations are independent. Lastly, we
checked distribution, as seen in Table 3 and 4, and can accept the null that both observations are
not normally distributed.
Table 1
Table 2
Table 3
Table 4
107
Results RQ #1:
Table 5
Table 6
Table 7
108
Based on a significance value of below .05, we must reject the null and accept the alternative that
the variance between groups is not normal.
Table 8
Results RQ #2:
Table 9
110
As we can see in Table 9, we get an Asymp. Sig value of .330, which is above .05, so we
must accept the null hypothesis that the distributions are similar. Therefore, we must accept the
null hypothesis that median male faculty salaries are equal to median female faculty salaries. So,
the answer to our research question is yes, male faculty salaries equal female faculty salaries.
Although we must take this with a grain of salt since we violated the homogeneity variance
assumption.
RQ3- (Wilcox Sign Rank Sum Test)
RQ #3: Is the percentage of voters who voted Democrat equal to the percentage of voters
who voted Republican in a recent national election?
Null Hypothesis RQ #3: The median of Democratic voters is equal to the median of
Republican voters.
Alternative Hypothesis RQ #3: The median of Democratic voters is NOT equal to the
median of Republican voters.
Assumptions RQ #3: The assumptions for the Wilcox Sign Rank Sum Test are that the
two samples are dependent, that they are independently drawn, that the dependent variable is at
the interval or ratio level, that the independent variable is at least at the ordinal level, and that
there is a homogeneity of variance. . We can accept the null for the assumptions that the samples
are dependent, that they are independently drawn, that the dependent variable is at the interval or
ratio level, and that the independent variable is at least at the ordinal scale by looking at the data.
Next we check homogeneity of variance by running a one-way ANOVA as seen in Table 10.
Levene’s test of homogeneity could not be computed so we must reject the null that the variances
are equal.
Table 10
111
Results RQ #3:
Table 11
As we can see in Table 11, we get an Asymp. Sig value of .850, which is above .05, so
we must accept the null hypothesis that the distributions are similar. Therefore, we must accept
the null hypothesis that median of Democratic voters is equal to the median of Republican
voters. So, the answer to our research question is yes, the percentage of voters who voted
Democrat equal to the percentage of voters who voted Republican in a recent national election.
Although we must take this with a grain of salt since we violated the homogeneity variance
assumption.
RQ4- (Kruskal Wallis)
RQ #4: Does a difference in business startup costs exist between Pizza, Bakery, Shoe,
Gift, and Pet shop/stores?
Null Hypothesis RQ #4: Mean startup costs are equal for Pizza, Bakery, Shoe, Gift, and
Pet shop/stores.
Alternative Hypothesis RQ #4: Mean startup costs are NOT equal for Pizza, Bakery,
Shoe, Gift, and Pet shop/stores.
Assumptions RQ #4: The assumptions for a Kruskal-Wallis test are that there is a
random sample, that the dependent variable is at an ordinal level or above, that the observed
values are independent, and that there is homogeneity of variance. All these assumptions other
than the homogeneity of variance can be verified practically by looking at the data collected. We
can accept the assumption based on the data collected our sample is random. We can also accept
that the dependent variables are at an ordinal level or above. We can also accept that the values
are independent. So, we just need to run the homogeneity of variance.
112
Results RQ #4:
RQ5- (Kruskal Wallis)
RQ #5:
Null Hypothesis RQ #5:
Alternative Hypothesis RQ #5:
Assumptions RQ #5: The assumptions for a Kruskal-Wallis test are that there is a
random sample, that the dependent variable is at an ordinal level or above, that the observed
values are independent, and that there is homogeneity of variance. All of these assumptions other
than the homogeneity of variance can be verified practically by looking at the data collected.
Results RQ #5: