Key Points - STATS

1.
Correlational Analysis:
 Description: Correlational analysis measures the strength and direction of the
relationship between two continuous variables.
 Example: To examine the relationship between the hours of study and exam
scores of students. You might use a correlation coefficient like Pearson's r to
determine if there's a linear relationship between these variables.
2. Linear Regression:
 Description: Linear regression is a statistical method used to model the
relationship between one or more independent variables and a dependent
variable by fitting a linear equation to observed data.
 Example: Predicting house prices based on factors such as square footage,
number of bedrooms, and location.
3. Student's t-test:
 Description: The Student's t-test is used to determine if there is a significant
difference between the means of two independent groups.
 Example: Comparing the mean exam scores of students who received tutoring
versus those who did not.
4. ANOVA (Analysis of Variance):
 Description: ANOVA is used to compare the means of three or more independent
groups to determine if there are statistically significant differences between them.
 Example: Comparing the effectiveness of three different teaching methods on
student performance.
5. Two-Way ANOVA:
 Description: Two-Way ANOVA is an extension of ANOVA that allows for the
examination of the effects of two independent categorical variables on a
dependent variable.
 Example: Investigating the effects of both gender and age group on exam
scores.
6. Chi-Square Test:
 Description: The Chi-Square test is used to determine if there is a significant
association between two categorical variables.
 Example: Examining whether there is a relationship between smoking status
(smoker/non-smoker) and lung cancer incidence.
7. MANOVA (Multivariate Analysis of Variance):
 Description: MANOVA is an extension of ANOVA that allows for the
simultaneous comparison of multiple dependent variables across multiple
independent groups.
 Example: Assessing whether there are differences in scores across multiple
academic subjects between students from different schools.
8. Wilcoxon Matched Pair Test:
 Description: The Wilcoxon Matched Pair Test is a non-parametric test used to
determine if there is a significant difference between two related samples.
 Example: Comparing the test scores of students before and after a tutoring
program.
9. Mann-Whitney U Test:
 Description: The Mann-Whitney U Test is a non-parametric test used to
determine if there is a significant difference between the distributions of two
independent groups.
 Example: Comparing the exam scores of two groups of students who received
different types of teaching methods.
10. Kruskal-Wallis Test:
 Description: The Kruskal-Wallis Test is a non-parametric test used to determine if
there are significant differences between three or more independent groups
when the dependent variable is ordinal or continuous.
 Example: Comparing the satisfaction levels of customers from different age
groups across multiple restaurants.
11. Factor Analysis (SPSS):
 Description: Factor Analysis is a statistical method used to identify underlying factors or
latent variables that explain the pattern of correlations among observed variables. It
helps reduce the dimensionality of data by identifying common factors.
 Example: Factor Analysis could be used to analyze responses to a survey with many
items measuring attitudes or behaviors. By identifying underlying factors such as
"customer satisfaction" or "brand loyalty," you can gain insights into the underlying
structure of the data.
INTERPRETATIONS:
1. P-value:
A p-value less than 0.05 is typically considered to be statistically significant, in which case
the null hypothesis should be rejected. A p-value greater than 0.05 means that deviation
from the null hypothesis is not statistically significant, and the null hypothesis is not
rejected.
2. R-value:
r is always a number between -1 and 1. r > 0 indicates a positive association. r < 0 indicates a
negative association. Values of r near 0 indicate a very weak linear relationship.
The sample correlation coefficient (r) is a measure of the closeness of association of the points
in a scatter plot to a linear regression line based on those points, as in the example above for
accumulated saving over time. Possible values of the correlation coefficient range from -1 to +1,
with -1 indicating a perfectly linear negative, i.e., inverse, correlation (sloping downward) and +1
indicating a perfectly linear positive correlation (sloping upward).
A correlation coefficient close to 0 suggests little, if any, correlation. The scatter plot suggests
that measurement of IQ do not change with increasing age, i.e., there is no evidence that IQ is
associated with age.
Description
Correlation Coefficient (r)
(Rough Guideline )
+1.0 Perfect positive + association
+0.8 to 1.0 Very strong + association
+0.6 to 0.8 Strong + association
+0.4 to 0.6 Moderate + association
+0.2 to 0.4 Weak + association
0.0 to +0.2 Very weak + or no association
0.0 to -0.2 Very weak - or no association
-0.2 to – 0.4 Weak - association
-0.4 to -0.6 Moderate - association
-0.6 to -0.8 Strong - association
-0.8 to -1.0 Very strong - association
-1.0 Perfect negative association
A positive relationship indicates that there is a direct relationship between the variables. A
negative relationship indicates that there is an inverse relationship between X and Y.
: the alternative hypothesis and the null hypothesis. The alternative hypothesis is the one that
claims the difference in results between conditions is due to the independent variable. null
hypothesis is set up to be the logical counterpart of the alternative hypothesis such that if the
null hypothesis is false, the alternative hypothesis must be true. Therefore, these two
hypotheses must be mutually exclusive and exhaustive. If the alternative hypothesis is
nondirectional, it specifies that the independent variable has an effect on the dependent
variable.
A Type I error is defined as a decision to reject the null hypothesis when the null hypothesis
is true
A Type II error is defined as a decision to retain the null hypothesis when the null
hypothesis is false.
The alpha level that the scientist sets at the beginning of the experiment is the level to which he
or she wishes to limit the probability of making a Type I error.
KEY POINTS:
1. Alpha (α) level:
 Explanation: The alpha level, denoted as α, is the predetermined threshold used
to determine the statistical significance of results. It represents the probability of
committing a Type I error (false positive).
 Example: If you set α = 0.05, you are saying that you are willing to accept a 5%
chance of incorrectly rejecting the null hypothesis when it is actually true.
2. Alternative hypothesis (H1):
 Explanation: The alternative hypothesis, denoted as H1 or Ha, is the hypothesis
that the researcher is trying to support. It suggests that there is a difference,
effect, or relationship in the population.
 Example: In a study comparing two treatments, the alternative hypothesis might
state that Treatment A is more effective than Treatment B.
3. Beta (β):
 Explanation: Beta, denoted as β, is the probability of committing a Type II error
(false negative). It represents the likelihood of failing to reject the null hypothesis
when it is actually false.
 Example: In a clinical trial, beta would represent the probability of failing to detect
a significant difference in treatment effectiveness when there actually is one.
4. Correct decision:
 Explanation: Making the correct decision in hypothesis testing means accurately
accepting or rejecting the null hypothesis based on the evidence.
 Example: If a study finds a significant difference between two groups and
correctly rejects the null hypothesis, it has made a correct decision.
5. Correlated groups design:
 Explanation: In a correlated groups design, participants are related in some way,
such as in repeated measures or matched pairs designs.
 Example: In a study assessing the effects of exercise on weight loss, the same
group of participants' weights are measured before and after an exercise
program. The weights before and after are correlated within each participant.
6. Directional hypothesis:
 Explanation: A directional hypothesis predicts the direction of the effect or
difference between groups.
 Example: "Participants who receive Treatment A will show greater improvement
in memory performance compared to participants who receive Treatment B."
7. Fail to reject null hypothesis:
 Explanation: When the evidence is not strong enough to reject the null
hypothesis.
 Example: A study comparing two teaching methods may find no significant
difference in exam scores between the groups, leading to a failure to reject the
null hypothesis.
8. Importance of an effect:
 Explanation: Refers to the practical significance or real-world relevance of an
observed effect or difference.
 Example: A small difference in exam scores between two teaching methods may
not be practically important, even if it is statistically significant.
9. Nondirectional hypothesis:
 Explanation: A nondirectional hypothesis does not predict the direction of the
effect or difference between groups.
 Example: "There is a difference in exam scores between students taught using
Method A and Method B."
10. Null hypothesis (H0):
 Explanation: The null hypothesis is the hypothesis of no effect, difference, or
relationship in the population being studied.
 Example: "There is no difference in mean scores between Group 1 and Group 2."
11. One-tailed probability:
 Explanation: In hypothesis testing, a one-tailed probability focuses on only one
direction of the distribution, either above or below a certain value.
 Example: When testing whether a new medication increases heart rate, a one-
tailed test would focus on whether the medication increases heart rate, without
considering the possibility of decreasing heart rate.
12. Reject null hypothesis:
 Explanation: When the evidence is strong enough to conclude that the null
hypothesis is not true, leading to its rejection.
 Example: A study comparing two treatments finds a significant difference in
recovery time between the groups, leading to the rejection of the null hypothesis.
13. Repeated measures design:
 Explanation: In a repeated measures design, the same participants are
measured multiple times under different conditions.
 Example: A study measuring participants' reaction times before and after
receiving a treatment to assess its effect on cognitive function.
14. Replicated measures design:
 Explanation: Similar to a repeated measures design, where participants are
measured multiple times, but with different groups of participants for each
condition.
 Example: A study comparing the effects of two different teaching methods on
student performance assigns different groups of students to each method and
measures their performance.
15. Retain null hypothesis:
 Explanation: When the evidence is not strong enough to reject the null
hypothesis, it is retained.
 Example: A study comparing the effectiveness of two advertising campaigns
finds no significant difference in sales between the two groups, leading to the
retention of the null hypothesis.
16. Sign test:
 Explanation: A non-parametric test used to determine whether the median of a
sample is equal to a hypothesized value.
 Example: Using the sign test to analyze whether there is a significant difference
in customer satisfaction before and after implementing a new service.
17. Significant:
 Explanation: In statistical terms, "significant" refers to a result that is unlikely to
have occurred by chance alone.
 Example: A study finds a significant difference in blood pressure between
patients who received a new medication and those who received a placebo.
18. Size of effect:
 Explanation: Refers to the magnitude or strength of an observed effect or
difference.
 Example: In a study comparing the effectiveness of two training programs on
improving muscle strength, the size of effect would indicate how much stronger
participants are after completing each program.
19. State of reality:
 Explanation: Refers to the true state of the population parameter being studied.
 Example: In a study comparing two diet plans for weight loss, the true difference
in average weight loss between the two plans represents the state of reality.
20. Two-tailed probability:
 Explanation: In hypothesis testing, a two-tailed probability considers both
directions of the distribution, either above or below a certain value.
 Example: When testing whether a new medication affects heart rate, a two-tailed
test would consider both the possibility of increasing and decreasing heart rate.
21. Type I error:
 Explanation: Occurs when the null hypothesis is incorrectly rejected when it is
actually true, leading to a false positive result.
 Example: Concluding that a new drug is effective at treating a disease when, in
fact, it has no effect.
22. Type II error:
 Explanation: Occurs when the null hypothesis is incorrectly retained when it is
actually false, leading to a false negative result.
 Example: Failing to detect a significant difference in exam scores between two
teaching methods when, in fact, one method is more effective than the other.
1. Critical region:
 Definition: The critical region is the range of values of a test statistic that leads to
the rejection of the null hypothesis. It is determined based on the chosen
significance level (α) of the test.
 Example: In a two-tailed hypothesis test with α = 0.05, the critical region would
consist of the extreme 2.5% of values in each tail of the sampling distribution.
2. Critical value of a statistic:
 Definition: The critical value of a statistic is the value that separates the critical
region from the non-critical region. It is determined based on the chosen
significance level (α) and the distribution of the test statistic.
 Example: In a z-test with α = 0.05, the critical value would be approximately
±1.96 for a two-tailed test, indicating the cutoff points beyond which the null
hypothesis would be rejected.
3. Critical value of X:
 Definition: The critical value of X refers to the value of the variable X that
separates the critical region from the non-critical region in a hypothesis test.
 Example: In a study comparing the mean heights of two populations, the critical
value of X would be the point at which the difference in mean heights becomes
statistically significant, leading to the rejection of the null hypothesis.
4. Critical value of z:
 Definition: The critical value of z is the value that separates the critical region
from the non-critical region in a z-test. It is determined based on the chosen
significance level (α) and the standard normal distribution (z-distribution).
 Example: In a z-test with α = 0.05, the critical value would be approximately
±1.96 for a two-tailed test, indicating the cutoff points beyond which the null
hypothesis would be rejected.
5. Mean of the sampling distribution of the mean:
 Definition: The mean of the sampling distribution of the mean is the average
value of all possible sample means that could be obtained from a population. It is
equal to the population mean.
 Example: If the mean height of adult males in a population is 175 cm, then the
mean of the sampling distribution of the mean would also be 175 cm.
6. Null-hypothesis population:
 Definition: The null-hypothesis population refers to the hypothetical population
described by the null hypothesis. It assumes that there is no effect or difference
in the population being studied.
 Example: In a study comparing the effectiveness of two medications, the null-
hypothesis population assumes that both medications have the same average
effect on patients.
7. Sampling distribution of a statistic:
 Definition: The sampling distribution of a statistic is the distribution of all possible
values of that statistic obtained from different samples of the same size from a
population.
 Example: The sampling distribution of the sample mean would show all possible
sample means that could be obtained from repeated samples of a given size
from a population.
8. Sampling distribution of the mean:
 Definition: The sampling distribution of the mean is the distribution of all possible
sample means obtained from samples of the same size from a population. It
follows a normal distribution if the sample size is sufficiently large (according to
the central limit theorem).
 Example: If you repeatedly take samples of 100 individuals from a population and
calculate the mean height of each sample, the distribution of those sample
means would be the sampling distribution of the mean.
9. Standard error of the mean:
 Definition: The standard error of the mean is a measure of the variability of
sample means around the population mean. It is calculated as the standard
deviation of the sampling distribution of the mean.
 Example: If the standard error of the mean is 2.5, it means that the average
distance between sample means and the population mean is 2.5 units.
10. m (real):
 Definition: In a hypothesis test, "m (real)" typically represents the true parameter
value in the population.
 Example: In a study comparing the effectiveness of two treatments on reducing
blood pressure, "m (real)" might represent the true average reduction in blood
pressure in the population when either treatment is administered.
11. mnull:
 Definition: "mnull" often represents the parameter value specified in the null
hypothesis.
 Example: In a study comparing the effectiveness of a new drug to a placebo,
"mnull" might represent the null hypothesis population mean, indicating no
difference in effectiveness between the drug and placebo.
TESTS AND USAGE:
Descriptive:
1. Central Measure of Tendency:
 Explanation: Represents a single value that summarizes the center or typical
value of a dataset.
 Usage: Provides a concise description of the dataset.
 Suitable Group: Any dataset with numerical values.
 Example: Calculating the mean (average) height of students in a classroom.
2. Student’s T-test for Independent Groups:
 Explanation: Compares means between two independent groups to determine if
there is a significant difference.
 Usage: Tests hypotheses about population means when samples are
independent.
 Suitable Group: Two independent groups with numerical data.
 Example: Comparing the mean exam scores of students who received two
different teaching methods.
3. Student’s T-test for Correlated Groups:
 Explanation: Compares means between two related groups (paired
observations) to determine if there is a significant difference.
 Usage: Tests hypotheses about population means when samples are correlated.
 Suitable Group: Two related groups with numerical data.
 Example: Comparing pre- and post-treatment blood pressure levels in the same
group of patients.
4. T-Test:
 Explanation: Determines whether the means of two groups are statistically
different from each other.
 Usage: Tests hypotheses about population means.
 Suitable Group: Two groups with numerical data.
 Example: Testing whether there is a significant difference in the mean weights of
patients before and after a diet intervention.
5. Normal Deviate Z-Test:
 Explanation: Tests hypotheses about the mean of a population when the sample
size is large and the population standard deviation is known.
 Usage: Assesses whether a sample mean differs significantly from a known
population mean.
 Suitable Group: Large samples with known population standard deviation.
 Example: Testing whether the mean cholesterol level of a sample of patients
differs significantly from the population mean.
6. Linear Regression:
 Explanation: Examines the relationship between two variables by fitting a
straight line to the data.
 Usage: Predicts the value of one variable based on the value of another variable.
 Suitable Group: Two numerical variables.
 Example: Predicting house prices based on factors such as square footage and
number of bedrooms.
7. Correlation:
 Explanation: Measures the strength and direction of the relationship between
two numerical variables.
 Usage: Examines how changes in one variable are associated with changes in
another variable.
 Example: Examining the correlation between hours spent studying and exam
scores among students.
8. Correlation Coefficient:
 Explanation: Represents the strength and direction of the linear relationship
between two variables.
 Usage: Quantifies the degree of association between variables.
 Example: Calculating the Pearson correlation coefficient between the heights
and weights of individuals in a sample.
Inferential:
1. ANOVA (Analysis of Variance):
 Explanation: Determines whether there are statistically significant differences
between three or more group means.
 Usage: Compares means across multiple groups.
 Suitable Group: Three or more independent groups with numerical data.
 Example: Comparing the mean test scores of students who received different
types of study materials.
2. One-way ANOVA, independent groups design:
 Explanation: Compares means across three or more independent groups to
determine if there is a significant difference.
 Usage: Tests hypotheses about population means when there is one
independent variable.
 Example: Comparing the mean exam scores of students who received different
types of training.
3. Kruskal–Wallis Test:
 Explanation: Nonparametric test used to compare three or more independent
groups when assumptions of ANOVA are violated.
 Usage: Tests hypotheses about population medians.
 Suitable Group: Three or more independent groups with ordinal or continuous
data.
 Example: Comparing the median income of individuals across different
education levels.
4. Mann–Whitney U test:
 Explanation: Nonparametric test used to compare two independent groups
when assumptions of the t-test are violated.
 Suitable Group: Two independent groups with ordinal or continuous data.
 Example: Comparing the median response times of participants who received
two different training programs.
5. Two-way analysis of variance:
 Explanation: Examines the effects of two independent variables on a dependent
variable.
 Usage: Tests hypotheses about population means when there are two
independent variables.
 Suitable Group: Two or more independent variables with numerical data.
 Example: Investigating the effects of both gender and age on exam
performance.
6. Wilcoxon matched-pairs signed ranks test:
 Explanation: Nonparametric test used to compare two related groups when
assumptions of the paired t-test are violated.
 Suitable Group: Two related groups with ordinal or continuous data.
 Example: Comparing the median reaction times of participants before and after a
cognitive training program.
7. The Tukey Honestly Significant Difference (HSD) Test:
 Explanation: Post hoc test used following ANOVA to identify which group means
differ significantly from each other.
 Usage: Determines pairwise differences in group means after finding a
significant omnibus ANOVA result.
 Example: Identifying which of several treatments lead to significantly different
levels of pain relief in patients.
8. The Newman–Keuls Test:
 Explanation: Post hoc test used following ANOVA to identify specific group
differences when the assumption of homogeneity of variances is violated.
 Usage: Determines pairwise differences in group means after finding a
significant omnibus ANOVA result with unequal variances.
Nonparametric tests, also known as distribution-free tests, are statistical methods used for
analyzing data when the assumptions of parametric tests are not met or when dealing with non-
normally distributed data. Unlike parametric tests, which require specific assumptions about the
population distribution, nonparametric tests do not rely on these assumptions and are more
robust in such situations. Here's a breakdown:
Meaning: Nonparametric tests are statistical procedures that do not make assumptions about
the underlying distribution of the data. Instead, they are based on the ranks or frequencies of
observations in the data set. These tests use the order or frequency of data values rather than
the actual numerical values themselves.
When to Use: Nonparametric tests are appropriate in several situations:
1. Small Sample Sizes: When sample sizes are small and the assumption of normality
cannot be verified.
2. Non-Normal Data: When the data do not follow a normal distribution, nonparametric
tests are more appropriate than parametric tests.
3. Ordinal or Categorical Data: Nonparametric tests can analyze ordinal or categorical
data, where parametric tests may not be applicable.
4. Outliers or Skewed Data: When there are outliers or the data are highly skewed,
nonparametric tests can provide more reliable results.
5. Analysis of Nominal Data: Nonparametric tests are suitable for analyzing nominal data,
where parametric assumptions may not hold.
6. Violation of Assumptions: When the assumptions of parametric tests (such as
homogeneity of variances or independence of observations) are violated.

Key Points - STATS

Uploaded by

Copyright:

Available Formats

Key Points - STATS

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Key Points - STATS

Uploaded by

Copyright:

Available Formats

1.

You might also like