F-Test in Statistics

Last Updated : 03 Jan, 2024
Comments
Improve
Suggest changes
Like Article
Like
Save
Share
Report
News Follow

F test is a statistical test that is used in hypothesis testing, that determines whether or not the variances of two populations or two samples are equal. An f distribution is what the data in a f test conforms to. By dividing the two variances, this test compares them using the f statistic. Depending on the details of the situation, a f-test can be one-tailed or two-tailed. The article will provide further information on the f test, the f statistic, its calculation, critical value, and how to use it to test hypotheses.

F-distribution

In statistical hypothesis testing, the Fisher-Snedecor distribution, sometimes known as the F-distribution, is used, particularly when comparing variances or testing means across populations. Two degrees of freedom parameters, represented as ï¿° df1 (numerator) and ï¿° df2 (denominator), define this positively skewed data.

In statistical analyses such as ANOVA or regression, the F-test assesses the equality of variances or means, as it is derived from the F-distribution. In hypothesis testing, the rejection zone is determined by critical values derived from the F-distribution, taking into account the degrees of freedom and significance level. An F-distribution can be found when two independent chi-square variables are split by their corresponding degrees of freedom.

Formula for F-distribution:

\text{f-value} =\frac{sample 1/df 1}{sample 2/df 2}      (Equation 1)

  • The independent random variables, Samples 1 and 2, have a chi-square distribution.
  • The related samples’ degrees of freedom are denoted by df1 and df2.

Degree of Freedom

The degrees of freedom represent the number of observations used to calculate the chi-square variables that form the ratio. The shape of the F-distribution is determined by its degrees of freedom. It is a right-skewed distribution, meaning it has a longer tail on the right side. As the degrees of freedom increase, the F-distribution becomes more symmetric and approaches a bell shape.

What is F-Test?

The F test is a statistical technique that determines if the variances of two samples or populations are equal using the F test statistic. Both the samples and the populations need to be independent and fit into an F-distribution. The null hypothesis can be rejected if the results of the F test during the hypothesis test are statistically significant; if not, it stays unchanged.

We can use this test when:

  • The population is normally distributed.
  • The samples are taken at random and are independent samples.

F-Test Formula using two covariances

F_{calc}=\frac{\sigma_{1}^{2}}{\sigma_{2}^{2}}

Here,

  • Fcalc = Critical F-value.
  • σ12 & σ22 = variance of the two samples.

d f=n_{S}-1

Here,

  • df = Degrees of freedom of the sample.
  • nS = Sample size.

Hypothesis Testing Framework for F-test

Using hypothesis testing, the f test is performed to verify that the variances are equivalent. For various hypothesis tests, the f test formula is provided as follows:

Left Tailed Test:

Null Hypothesis: H0 : \sigma_{1}^2 = \sigma_{2}^2
Alternate Hypothesis: H1 : \sigma_{1}^2 < \sigma_{2}^2
Decision-Making Standard: The null hypothesis is to be rejected if the f statistic is less than the f critical value.

Right Tailed Test:

Null Hypothesis: H0 : \sigma_{1}^2 = \sigma_{2}^2
Alternate Hypothesis: H1 : \sigma_{1}^2 > \sigma_{2}^2
Decision-Making Standard: Dismiss the null hypothesis if the f test statistic is greater than the f test critical value.

Two Tailed Test:

Null Hypothesis: H0 : \sigma_{1}^2 = \sigma_{2}^2
Alternate Hypothesis: H1 : \sigma_{1}^2 \neq \sigma_{2}^2
Decision-Making Standard: When the f test statistic surpasses the f test critical value, the null hypothesis is declared invalid.

F Test Statistic Formula Assumptions

Several assumptions are used in the F Test equation. For the F-test Formula to be utilized, the population distribution needs to be normal. Independent events should be the basis for the test samples. Apart from this, the following considerations should also be taken into consideration.

  • It is simpler to calculate right-tailed tests. By pushing the bigger variance into the numerator, the test is forced to be right tailed.
  • Before the critical value is determined in two-tailed tests, alpha is divided by two.
  • Squares of standard deviations equal variances.

Steps to calculate F-Test

Step 1: Use Standard deviation (σ) and find variance (σ2) of the data. (if not already given)

Step 2: Determine the null and alternate hypothesis.

  •   H0: no difference in variances.
  •   H1: difference in variances.

Step 3: Find Fcalc using Equation 1 (F-value).

NOTE : While calculating Fcalc, divide the larger variance with small variance as it makes calculations easier.

Step 4: Find the degrees of freedom of the two samples.

Step 5: Find Ftable value using d1 and d2 obtained in Step-4 from the F-distribution table. Take learning rate, α = 0.05 (if not given) 

Looking up the F-distribution table: 

In the F-Distribution table (Link here), refer the table as per the given value of α in the question. 

  • d1 (Across) = df of the sample with numerator variance.  (larger)
  • d2 (Below) = df of the sample with denominator variance. (smaller)

Consider the F-Distribution table given below, while performing One-Tailed F-Test.

GIVEN: 
α = 0.05
d1 = 2
d2 = 3

d2 /d1

1

. .

1

161.4199.5. .

2

18.5119.00. .

3

10.139.55. .

:

:

:

. .

Then, Ftable = 9.55

Step 6: Interpret the results using Fcalc and Ftable.

Interpreting the results:

If Fcalc < Ftable :
Cannot reject null hypothesis.
∴ Variance of two populations are similar.

If Fcalc > Ftable :
Reject null hypothesis.
∴ Variance of two populations are not similar.

Example Problem for calculating F-Test

Consider the following example, 

Conduct a two-tailed F-Test on the following samples: 

 Sample 1Sample 2

σ

10.47

8.12

n

41

21

Step 1: The statement of the hypothesis is formatted as:

  • H0: no difference in variances.
  • H1: difference in variances.

Step 2: Let’s calculated the value of the variances in numerator and denominator.

F-value = \frac{\sigma^2_{1}}{\sigma^2_{2}}

  • σ12 = (10.47)2 = 109.63 
  • σ22 = (8.12)2 = 65.99

Fcalc = (109.63 / 65.99) =  1.66

Step 3: Now, let’s calculate the degree of freedom.

Degree of freedom = sample – 1

Sample 1 = n1 = 41
Sample 2 = n2 = 21

Degree of sample 1 = d1 = (n1 – 1) = (41 – 1) = 40

Degree of sample 2 = d2 = (n2 — 1) = (21 – 1) = 20

Step 4 – The usual alpha level of 0.05 is selected because the question does not specify an alpha level. The alpha level should be lowered during the test to half of its starting value, or 0.025.
Using d1 = 40 and d2 = 20 in the F-Distribution table. (link here)
Take α = 0.05 as it’s not given.
Since it is a two-tailed F-test,
α = 0.05/2 = 0.025

Step 5 – The critical F value is found with alpha at 0.025 using the F table. For (40, 20), the critical value at alpha equal to 0.025 IS 2.287.
Therefore, Ftable = 2.287

Step 6 – Since Fcalc < Ftable (1.66 < 2.287):
We cannot reject null hypothesis.
∴ Variance of two populations is similar to each other.

F-Test is the most often used when comparing statistical models that have been fitted to a data set to identify the model that best fits the population.

Frequently Asked Questions (FAQs)

1. What is the difference between the F-test and t-test?

The t-test is employed to assess whether the means of two groups are significantly distinct, providing a measure of the difference between them. On the other hand, the F-test is utilized to compare the variances of two or more groups, determining whether these variances are significantly different from one another.

2. Is ANOVA and F-test same?

ANOVA and F-test are related, with the F-test being component of ANOVA. ANOVA (Analysis of Variance) is a statistical technique that involves the F-test. The F-test is specifically used within ANOVA to compare the variances between different groups, helping determine if there are significant differences among group means.

3. What is p-value of the F-test?

The p-value of the F-test represents the probability of obtaining the observed variance ratio or a more extreme ratio, assuming the null hypothesis of equal variances is true. A smaller p-value indicates stronger evidence against the null hypothesis.

4. What is the F-test to compare variances?

The F-test to compare variances assesses whether the variances of two or more groups are statistically different. It involves comparing the ratio of variances, providing a test statistic that follows an F-distribution under the null hypothesis of equal variances.



Previous Article
Next Article

Similar Reads

R - Calculate Test MSE given a trained model from a training set and a test set
Mean Squared Error (MSE) is a widely used metric for evaluating the performance of regression models. It measures the average of the squares of the errors. the average squared difference between the actual and predicted values. The Test MSE, specifically, helps in assessing how well the model generalizes to new, unseen data. In this article, we wil
4 min read
Difference Between Machine Learning vs Statistics
Machine Learning: Machine Learning is the use of Artificial Intelligence (AI) that gives frameworks the capacity to naturally take in and improve as a matter of fact without being unequivocally modified. Machine Learning centers around the advancement of PC programs that can get to information and use it to learn for themselves. The way toward lear
3 min read
How to find group-wise summary statistics for R dataframe?
Finding group-wise summary statistics for the dataframe is very useful in understanding our data frame. The summary includes statistical data: mean, median, min, max, and quartiles of the given dataframe. The summary can be computed on a single column or variable, or the entire dataframe. In this article, we are going to see how to find group-wise
4 min read
How does Data Science Differ from Traditional Statistics?
Do you ever wonder how statistics are related to data science? Many people will think that statistics is a mathematical branch and data science is related to technology, How do these both relate right? In this article, we will be discussing data science, statistics, and how Data Science differs from statistics. Data Science and StatisticsData scien
8 min read
What is Inferential Statistics?
In the world of data analysis, statistics plays a big role in helping us understand patterns and insights from raw data. Descriptive statistics help us summarize and describe data, while inferential statistics take us a step further by letting us make predictions and decisions about a larger group based on a smaller sample. In this article, we'll d
10 min read
Power Analysis in Statistics with R
Power analysis is a critical aspect of experimental design in statistics. It helps determine the sample size required to detect an effect of a given size with a certain degree of confidence. In this article, we'll explore the fundamentals of power analysis, its importance, and how to conduct power analysis in R Programming Language. What is Power?I
4 min read
How to Compute High Dimensional Regression Statistics in R
High dimensional regression involves analyzing datasets with a large number of predictors, often more than the number of observations. This scenario presents unique challenges in statistical modeling, but R provides robust tools to address these challenges. This article will guide you through the process of computing high-dimensional regression sta
4 min read
Top 50 Plus Interview Questions for Statistics with Answers 2024
Statistics is a branch of mathematics that deals with large amounts of data and the analysis of that data across various industries. Now, if you are looking for career opportunities as a data analyst or data scientist, then knowledge of statistics is very important. Because in most of these interviews, you will encounter statistical questions. Henc
15+ min read
Population vs Sample in Statistics
In statistics, understanding the difference between a population and a sample is fundamental to many aspects of data analysis and inference. Population Vs SampleThe population refers to the entire group of individuals or items that we are interested in studying and drawing conclusions about. In statistics, the population is the entire set of items
5 min read
Categorical Data Descriptive Statistics in R
Categorical data, representing non-measurable attributes, requires specialized analysis. This article explores descriptive statistics and visualization techniques in R Programming Language for categorical data, focusing on frequencies, proportions, bar charts, pie charts, frequency tables, and contingency tables. Categorical Data Categorical data i
12 min read
Statistics For Machine Learning
Machine Learning Statistics: In the field of machine learning (ML), statistics plays a pivotal role in extracting meaningful insights from data to make informed decisions. Statistics provides the foundation upon which various ML algorithms are built, enabling the analysis, interpretation, and prediction of complex patterns within datasets. This art
8 min read
Power of Bayesian Statistics &amp; Probability
In the data-driven world we inhabit, statistics reign supreme. They guide our decisions, reveal hidden patterns, and empower us to predict the future. But amongst the diverse statistical arsenal, Bayesian statistics and probability stand out as a unique and powerful duo, capable of transforming how we approach uncertainty and unlock deeper insights
13 min read
What makes R so Great for Statistics?
Answer: R is considered one of the best programming languages for statistics because it was specifically designed for statistical computing and data analysis. Its comprehensive library of statistical packages, ease of use, and strong community support make it an indispensable tool for statisticians, data scientists, and researchers.Let's explore "W
3 min read
Analysis of test data using K-Means Clustering in Python
This article demonstrates an illustration of K-means clustering on a sample random data using open-cv library. Pre-requisites: Numpy, OpenCV, matplot-lib Let's first visualize test data with Multiple Features using matplot-lib tool. # importing required tools import numpy as np from matplotlib import pyplot as plt # creating two test data X = np.ra
2 min read
Python | Create Test DataSets using Sklearn
Python's Sklearn library provides a great sample dataset generator which will help you to create your own custom dataset. It's fast and very easy to use. Following are the types of samples it provides.For all the above methods you need to import sklearn.datasets.samples_generator. C/C++ Code # importing libraries from sklearn.datasets import make_b
3 min read
Levene's test
In this article, we will learn about Levene's test which is generally used to assess the equality of variances between two or more groups or samples. What is Levene Test? Levene's test is used to assess the equality of variance between two different samples. For every case, it calculates the absolute difference between the value of that case and it
4 min read
Generate Test Datasets for Machine learning
Whenever we think of Machine Learning, the first thing that comes to our mind is a dataset. While there are many datasets that you can find on websites such as Kaggle, sometimes it is useful to extract data on your own and generate your own dataset. Generating your own dataset gives you more control over the data and allows you to train your machin
3 min read
How to split a Dataset into Train and Test Sets using Python
Here we will discuss how to split a dataset into Train and Test sets in Python. The train-test split is used to estimate the performance of machine learning algorithms that are applicable for prediction-based Algorithms/Applications. This method is a fast and easy procedure to perform such that we can compare our own machine learning model results
3 min read
Runs Test of Randomness in Python
Random numbers are an imperative part of many systems, including simulations, cryptography and much more. So the ability to produce values randomly, with no apparent logic and predictability, becomes a prime function. Since computers cannot produce values which are completely random, algorithms, known as pseudorandom number generators (PRNG) are us
4 min read
Kolmogorov-Smirnov Test in R Programming
The Kolmogorov-Smirnov Test is a type of non-parametric test of the equality of discontinuous and continuous a 1D probability distribution that is used to compare the sample with the reference probability test (known as one-sample K-S Test) or among two samples (known as two-sample K-S test). A K-S Test quantifies the distance between the cumulativ
4 min read
Shapiro–Wilk Test in R Programming
The Shapiro-Wilk's test or Shapiro test is a normality test in frequentist statistics. The null hypothesis of Shapiro's test is that the population is distributed normally. It is among the three tests for normality designed for detecting all kinds of departure from normality. If the value of p is equal to or less than 0.05, then the hypothesis of n
4 min read
Wilcoxon Signed Rank Test in R Programming
The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used to compare two related samples, matched samples, or repeated measurements on a single sample to estimate whether their population means ranks differ e.g. it is a paired difference test. It can be applied as an alternative to the paired Student's t-test also known as
7 min read
Kruskal Wallis Test
Kruskal Wallis Test: It is a nonparametric test. It is sometimes referred to as One-Way ANOVA on ranks. It is a nonparametric alternative to One-Way ANOVA. It is an extension of the Man-Whitney Test to situations where more than two levels/populations are involved. This test falls under the family of Rank Sum tests. It depends on the ranks of the s
4 min read
Mann and Whitney U test
Mann and Whitney's U-test or Wilcoxon rank-sum test is the non-parametric statistic hypothesis test that is used to analyze the difference between two independent samples of ordinal data. In this test, we have provided two randomly drawn samples and we have to verify whether these two samples is from the same population. The assumption for Mann-Whi
4 min read
Mood's Median Test
Mood's Median Test: It is a non-parametric alternative to one way ANOVA. It is a special case of Pearson's Chi-Squared Test. It tests whether the medians of two or more groups differ and also calculates a range of values that is likely to include the difference between population medians. In this test, different data groups have similarly shaped di
4 min read
Wilcoxon Signed Rank Test
Prerequisites: Parametric and Non-Parametric Methods Hypothesis Testing Wilcoxon signed-rank test, also known as Wilcoxon matched pair test is a non-parametric hypothesis test that compares the median of two paired groups and tells if they are identically distributed or not. We can use this when: Differences between the pairs of data are non-normal
4 min read
How to Perform Grubbs’ Test in Python
Prerequisites: Parametric and Non-Parametric Methods, Hypothesis Testing In this article, we will be discussing the different approaches to perform Grubbs’ Test in Python programming language. Grubbs’ Test is also known as the maximum normalized residual test or extreme studentized deviate test is a test used to detect outliers in a univariate data
4 min read
Generate and Test Search
Introduction: Generate and Test Search is a heuristic search technique based on Depth First Search with Backtracking which guarantees to find a solution if done systematically and there exists a solution. In this technique, all the solutions are generated and tested for the best solution. It ensures that the best solution is checked against all pos
5 min read
How to Conduct a Two Sample T-Test in Python
In this article, we are going to see how to conduct a two-sample T-test in Python. This test has another name as the independent samples t-test. It is basically used to check whether the unknown population means of given pair of groups are equal. tt allows one to test the null hypothesis that the means of two groups are equal Assumptions Before con
7 min read
Durbin Watson Test
Durbin Watson Test: A test developed by statisticians professor James Durbin and Geoffrey Stuart Watson is used to detect autocorrelation in residuals from the Regression analysis. It is popularly known as Durbin-Watson d statistic, which is defined as [Tex]$d=\frac{\sum_{t=2}^{t=n}\left(u_{t}-u_{t-1}\right)^{2}}{\sum_{t=1}^{t=n} u_{t}^{2}}$[/Tex]
4 min read
Article Tags :
Practice Tags :
three90RightbarBannerImg