Name: Kashif Ali Student ID: 0000357315 Course ID: 8614 Subject: Educational Statistics Tutor: Bhag Bhari Assignment # 2
Name: Kashif Ali Student ID: 0000357315 Course ID: 8614 Subject: Educational Statistics Tutor: Bhag Bhari Assignment # 2
Name: Kashif Ali Student ID: 0000357315 Course ID: 8614 Subject: Educational Statistics Tutor: Bhag Bhari Assignment # 2
Student ID : 0000357315
Course ID : 8614
Subject : Educational Statistics
Tutor : Bhag Bhari
Assignment # 2
Q.1 How do we calculate median? Also mention its merits and
demerits.
Median
A median is a positional number that determines the position of the middle set of data. It divides the set
of data into two parts. One part comprises all the values greater than or equal to the median and the
other part comprises all the values smaller than or equal to the median. In simple words, the median is
the middle value when a data set is organised according to the magnitude. The value of the median
remains unchanged if the size of the largest value increases because it is defined by the position of
various values.
To evaluate the median, the values must be arranged in the sequence of numbers, and the numbers
should be arranged in the value order starting from the lowest to the highest. For instance, while
evaluating the median, if there is any sort of odd amount of numbers in the list, then the median will be
the middle number with a similar number presented below or above. However, if the amount is an even
number, then the middle pair must be evaluated, combined together, and divided by two to find the
median value.
Meaning of Median
“A median is that value of the data which divides the group into two equal parts, one part comprising
all the values greater than the median and the other comprising the values less than the median.”….L.R.
Connor
Median is the middle value of the series when items are arranged either in an ascending or a
descending order.
It divides the series into two equal parts. One part comprises all the values greater than the
median and the other part comprises all the values smaller than the median
In educational statistics, the median is a measure of central tendency that indicates the middle value in a
set of data when the values are arranged in ascending or descending order. Like any statistical measure,
using the median has its merits and demerits, especially in the context of educational research and
assessment.
Conclusion
In educational research, the choice between using the median or another measure of central tendency,
like the mean, often depends on the nature of the data and the specific objectives of the analysis. The
median offers a robust and easily understood measure for certain types of data and distributions, but its
limitations make it less suitable for others. Being aware of these merits and demerits can help educators
and researchers make informed decisions about how to analyze and interpret educational data
effectively.
Q.2 Explain the process and errors in hypothesis testing.
Hypothesis testing is a statistical method used to make decisions about a population based on sample
data. The process involves testing an assumption (hypothesis) about a population parameter. The
outcome of the hypothesis test allows researchers to decide whether to reject the null hypothesis in
favor of the alternative hypothesis or not to reject the null hypothesis based on the evidence provided
by the sample data.
Choose a Significance Level (\(α\)): The significance level is the probability of rejecting the null
hypothesis when it is actually true (Type I error). Common choices for \(α\) are 0.05, 0.01, and 0.10.
Select the Appropriate Test Statistic: Based on the type of data and the hypothesis, choose a
statistical test that can help evaluate the hypothesis. Examples include the t-test, chi-square test, and
ANOVA.
Calculate the Test Statistic and P-value: Using the sample data, calculate the test statistic.
Then, determine the p-value, which is the probability of observing the test results under the assumption
that the null hypothesis is true.
Draw a Conclusion: Based on the decision, draw a conclusion about the hypotheses in the context
of the study.
Errors in Hypothesis Testing
There are two main types of errors that can occur in hypothesis testing:
1. Type I Error (\(α\)): This error occurs when the null hypothesis is rejected when it is
actually true. The probability of making a Type I error is denoted by \(α\), the significance level
chosen by the researcher. For example, setting \(α = 0.05\) means there is a 5% risk of rejecting
the null hypothesis incorrectly.
2. Type II Error (\(β\)): This error occurs when the null hypothesis is not rejected when it is
actually false. The probability of making a Type II error is denoted by \(β\). The power of a test
(1-\(β\)) is the probability of correctly rejecting the null hypothesis when it is false, and it’s
influenced by the sample size, effect size, and significance level.
Minimizing Errors
Choosing an Appropriate \(α\): A lower \(α\) level reduces the risk of a Type I error but
increases the risk of a Type II error. Researchers must balance these risks based on the context of
their study.
Increasing Sample Size: A larger sample size can increase the power of a test, reducing the
risk of a Type II error and making it easier to detect a true effect when it exists.
Consider the Effect Size: The practical significance of the findings, as measured by the
effect size, should also be considered alongside statistical significance to make meaningful
conclusions.
Understanding the process of hypothesis testing and being aware of potential errors are crucial for
conducting and interpreting research effectively. Careful design and analysis can help minimize these
errors and contribute to more reliable and valid conclusions.
Q.3 What do you understand by ‘Pearson Correlation’? Where is it
used and how is it interpreted?
The Pearson Correlation Coefficient, denoted as \( r \), is a statistical measure that calculates the
strength and direction of the linear relationship between two continuous variables. It’s a widely used
method in statistics for quantifying the degree to which two variables are linearly related.
Where It Is Used
The Pearson Correlation Coefficient is used in various fields such as psychology, finance, medicine, and
environmental science, among others, for tasks like:
1. Exploratory Data Analysis: To identify relationships between variables that may warrant
further study.
2. Predictive Modeling: To select features for models where relationships between variables are
important.
3. Validity and Reliability Testing: In psychometrics, for example, to check the consistency and
validity of survey instruments or tests.
4. Market Research: To understand the relationship between customer behavior and product sales
or service usage.
How It Is Interpreted
Direction: The sign of \( r \) (positive or negative) indicates the direction of the relationship
between the two variables.
Strength: The magnitude of \( r \) provides insight into how closely the data points fit to a
straight line. A higher magnitude (closer to 1 or -1) indicates a stronger linear relationship.
Significance: Statistical significance testing (often through a p-value) is used to determine
whether the observed correlation is not due to chance. A p-value less than the chosen
significance level (commonly 0.05) indicates that the correlation is statistically significant.
Causation: It’s critical to understand that correlation does not imply causation. Even if two
variables are strongly correlated, it does not mean that one variable causes the change in the
other.
Limitations
The t- and z-test methods developed in the 20th century were used for statistical analysis until 1918,
when Ronald Fisher created the analysis of variance method. ANOVA is also called the Fisher analysis of
variance, and it is the extension of the t- and z-tests. The term became well-known in 1925, after
appearing in Fisher’s book, “Statistical Methods for Research Workers.” It was employed in experimental
psychology and later expanded to subjects that were more complex.
Analysis of variance, or ANOVA, is a statistical method that separates observed variance data into
different components to use for additional tests.
A one-way ANOVA is used for three or more groups of data, to gain information about the
relationship between the dependent and independent variables.
If no true variance exists between the groups, the ANOVA’s F-ratio should equal close to 1.
The ANOVA test allows a comparison of more than two groups at the same time to determine whether a
relationship exists between them. The result of the ANOVA formula, the F statistic (also called the F-
ratio), allows for the analysis of multiple groups of data to determine the variability between samples
and within samples.
The type of ANOVA test used depends on a number of factors. It is applied when data needs to be
experimental. Analysis of variance is employed if there is no access to statistical software resulting in
computing ANOVA by hand. It is simple to use and best suited for small samples. With many
experimental designs, the sample sizes have to be the same for the various factor level combinations.
ANOVA is helpful for testing three or more variables. It is similar to multiple two-sample t-tests.
However, it results in fewer type I errors and is appropriate for a range of issues. ANOVA groups
differences by comparing the means of each group and includes spreading out the variance into diverse
sources. It is employed with subjects, test groups, between groups and within groups.
The Chi-Square (χ²) test is a statistical procedure used to assess two types of comparison: tests of
goodness of fit and tests of independence or association. It is a non-parametric test that doesn’t make
assumptions about the distribution of the data (unlike t-tests or ANOVAs which assume normally
distributed data). Here’s a brief overview of both applications of the Chi-Square test:
This test is used to determine whether there is a significant difference between the expected
frequencies and the observed frequencies in one or more categories. It’s useful for checking if a sample
comes from a population with a specific distribution. For example, an educator might use this test to
check if the number of students preferring each of several educational methods follows an expected
distribution.
This is used to determine if there is a significant association between two categorical variables. It’s
applied to data that’s in a contingency table, where the frequencies of variables are counted and laid out
in a table format. In educational statistics, this can be particularly useful for exploring the relationship
between variables such as gender (male/female) and choice of major (science/humanities), to see if
choice of major is independent of gender or if there is a significant association between them.
In educational research, the Chi-Square Test of Independence is widely used to explore relationships
between categorical variables. For example, an educational researcher might want to know if student
performance (categorized as high, medium, or low) is independent of the type of instructional strategy
employed (traditional vs. Innovative). By applying the Chi-Square Test of Independence, the researcher
can analyze data from a sample of students to determine if there’s a statistically significant association
between the instructional strategy and student performance levels.
Constructing a contingency table: Data is arranged in a table format showing the frequency of
observations in each category.
Calculating the Chi-Square statistic: using the observed frequencies from the contingency
table and expected frequencies based on the assumption of independence, the Chi-Square statistic is
computed.
Determining significance: The calculated Chi-Square statistic is compared to a critical value from
the Chi-Square distribution table. If the calculated value is greater than the critical value (or if the p-value
is less than the significance level, typically 0.05), the null hypothesis is rejected, indicating a significant
association between the variables.
Conclusion
This test is powerful for educational research, allowing for the investigation of relationships between
categorical variables in a variety of contexts, such as analyzing demographic factors, learning methods,
and outcomes. However, it’s important to remember that while the Chi-Square Test can identify
associations, it does not imply causation.