Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
12 views

Introduction to Statistical Hypothesis Testing in R

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Introduction to Statistical Hypothesis Testing in R

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Introduction to Statistical Hypothesis Testing in R

A statistical hypothesis is an assumption made by the researcher


about the data of the population collected for any experiment.

It is not mandatory for this assumption to be true every time.


Hypothesis testing, in a way, is a formal process of validating the
hypothesis made by the researcher.
In order to validate a hypothesis, it will consider the entire population
into account. However, this is not possible practically. Thus, to
validate a hypothesis, it will use random samples from a population.
On the basis of the result from testing over the sample data, it either
selects or rejects the hypothesis.
Hypothesis Testing is a type of statistical analysis in which you put
your assumptions about a population parameter to the test. It is used
to estimate the relationship between 2 statistical variables.

Let's discuss few examples of statistical hypothesis from real-life -

 A teacher assumes that 60% of his college's students come from


upper-middle-class families.
 A doctor believes that 3D (Diet, Dose, and Discipline) is 90%
effective for diabetic patients.
Statistical Hypothesis Testing can be categorized into two types as
below:
 Null Hypothesis – Hypothesis testing is carried out in order
to test the validity of a claim or assumption that is made
about the larger population. This claim that involves
attributes to the trial is known as the Null Hypothesis. The
null hypothesis testing is denoted by H0.
 Alternative Hypothesis – An alternative hypothesis would
be considered valid if the null hypothesis is fallacious. The
evidence that is present in the trial is basically the data and
the statistical computations that accompany it. The
alternative hypothesis testing is denoted by H or H .
1 a

Let’s take an example of the coin. We want to conclude that a coin is


unbiased or not. Since null hypothesis refers to the natural state of an
event, thus, according to the null hypothesis, there would an equal
number of occurrences of heads and tails, if a coin is tossed several
times. On the other hand, the alternative hypothesis negates the null
hypothesis and refers that the occurrences of heads and tails would
have significant differences in number.

Simple and Composite Hypothesis Testing


Depending on the population distribution, you can classify the
statistical hypothesis into two types.

Simple Hypothesis: A simple hypothesis specifies an exact value for


the parameter.

Composite Hypothesis: A composite hypothesis specifies a range of


values.

Example:

A company is claiming that their average sales for this quarter are
1000 units. This is an example of a simple hypothesis.

Suppose the company claims that the sales are in the range of 900 to
1000 units. Then this is a case of a composite hypothesis.

One-Tailed and Two-Tailed Hypothesis Testing


The One-Tailed test, also called a directional test, considers a critical
region of data that would result in the null hypothesis being rejected if
the test sample falls into it, inevitably meaning the acceptance of the
alternate hypothesis.

In a one-tailed test, the critical distribution area is one-sided,


meaning the test sample is either greater or lesser than a specific
value.

In two tails, the test sample is checked to be greater or less than a


range of values in a Two-Tailed test, implying that the critical
distribution area is two-sided.

If the sample falls within this range, the alternate hypothesis will be
accepted, and the null hypothesis will be rejected.

Example:

Suppose H0: mean = 50 and H1: mean not equal to 50

According to the H1, the mean can be greater than or less than 50.
This is an example of a Two-tailed test.

In a similar manner, if H0: mean >=50, then H1: mean <50

Here the mean is less than 50. It is called a One-tailed test.

Level of Significance
The alpha value is a criterion for determining whether a test statistic is
statistically significant. In a statistical test, Alpha represents an
acceptable probability of a Type I error. Because alpha is a
probability, it can be anywhere between 0 and 1.

In practice, the most commonly used alpha values are 0.01, 0.05, and
0.1, which represent a 1%, 5%, and 10% chance of a Type I error,
respectively (i.e. rejecting the null hypothesis when it is in fact
correct).

P-Value
A p-value is a metric that expresses the likelihood that an observed
difference could have occurred by chance. As the p-value decreases
the statistical significance of the observed difference increases. If the
p-value is too low, you reject the null hypothesis.

Here you have taken an example in which you are trying to test
whether the new advertising campaign has increased the product's
sales.

The p-value is the likelihood that the null hypothesis, which states
that there is no change in the sales due to the new advertising
campaign, is true. If the p-value is .30, then there is a 30% chance that
there is no increase or decrease in the product's sales. If the p-value is
0.03, then there is a 3% probability that there is no increase or
decrease in the sales value due to the new advertising campaign.

As you can see, the lower the p-value, the chances of the alternate
hypothesis being true increases, which means that the new advertising
campaign causes an increase or decrease in sales.

Hypothesis Testing in R
Statisticians use hypothesis testing to formally check whether the
hypothesis is accepted or rejected. Hypothesis testing is conducted in
the following manner:
1. State the Hypotheses – Stating the null and alternative
hypotheses.
2. Formulate an Analysis Plan – The formulation of an
analysis plan is a crucial step in this stage.
3. Analyze Sample Data – Calculation and interpretation of
the test statistic, as described in the analysis plan.
4. Interpret Results – Application of the decision rule
described in the analysis plan.

Hypothesis testing ultimately uses a p-value to weigh the strength of


the evidence or in other words what the data are about the population.
The p-value ranges between 0 and 1. It can be interpreted in the
following way:
 A small p-value (typically ≤ 0.05) indicates strong evidence
against the null hypothesis, so you reject it.(5% of error)
 A large p-value (> 0.05) indicates weak evidence against the
null hypothesis, so you fail to reject it.
A p-value very close to the cutoff (0.05) is considered to be marginal
and could go either way.
TRUE-Experimental FALSE-Exp.
(Actual) (Actual)
TRUE True Positive Case False Positive(type-II
Error)
(Predicted)
FALSE False Negative True Negative
Case(type-1 Error)
(Predicted)

True Positive:

Interpretation: You predicted positive and it’s true.

You predicted that a woman is pregnant and she actually


is.
True Negative:

Interpretation: You predicted negative and it’s true.

You predicted that a man is not pregnant and he actually is


not.

False Positive: (Type 1 Error)

Interpretation: You predicted positive and it’s false.

You predicted that a man is pregnant but he actually is not.

False Negative: (Type 2 Error)

Interpretation: You predicted negative and it’s false.

You predicted that a woman is not pregnant but she


actually is.

Decision Errors in R
The two types of error that can occur from the hypothesis testing:
 Type I Error – Type I error occurs when the researcher
rejects a null hypothesis when it is true. The term
significance level is used to express the probability of Type I
error while testing the hypothesis. The significance level is
represented by the symbol α (alpha).
 Type II Error – Accepting a false null hypothesis H is 0

referred to as the Type II error. The term power of the test is


used to express the probability of Type II error while testing
hypothesis. The power of the test is represented by the
symbol β (beta).

Using the Student’s T-test in R


The Student’s T-test is a method for comparing two samples. It can be
implemented to determine whether the samples are different. This is a
parametric test, and the data should be normally distributed.
R can handle the various versions of T-test using
the t.test() command. The test can be used to deal with two- and one-
sample tests as well as paired tests.
Listed below are the commands used in the Student’s t-test and their
explanation:
 t.test(data.1, data.2) – The basic method of applying a t-test
is to compare two vectors of numeric data.
 var.equal = FALSE – If the var.equal instruction is set to
TRUE, the variance is considered to be equal and the
standard test is carried out. If the instruction is set to FALSE
(the default), the variance is considered unequal and the
Welch two-sample test is carried out.
 mu = 0 – If a one-sample test is carried out, mu indicates the
mean against which the sample should be tested.
 alternative = “two.sided” – It sets the alternative
hypothesis. The default value for this is “two.sided” but a
greater or lesser value can also be assigned. You can
abbreviate the instruction.
 conf.level = 0.95 – It sets the confidence level of the interval
(default = 0.95).
 conf.level = 0.99 p<.01 Ho: is get rejected else Accepted

 paired = FALSE – If set to TRUE, a matched pair T-test is


carried out.
 t.test(y ~ x, data, subset) – The required data can be
specified as a formula of the form response ~ predictor. In
this case, the data should be named and a subset of the
predictor variable can be specified.
 subset = predictor %in% c(“sample.1”, sample.2”) – If
the data is in the form response ~ predictor, the two samples
to be selected from the predictor should be specified by the
subset instruction from the column of the data.

You might also like