Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
11 views

BA unit2

This document covers statistical sampling methods and hypothesis testing, including sampling distributions, estimation of population parameters, and types of hypothesis tests. It details various sampling methods, the concept of sampling error, and provides examples of one-sample and two-sample hypothesis testing, as well as ANOVA and Chi-Square tests. The document serves as a comprehensive guide for understanding and applying statistical sampling techniques in business analytics.

Uploaded by

Shaik Ashraaf
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

BA unit2

This document covers statistical sampling methods and hypothesis testing, including sampling distributions, estimation of population parameters, and types of hypothesis tests. It details various sampling methods, the concept of sampling error, and provides examples of one-sample and two-sample hypothesis testing, as well as ANOVA and Chi-Square tests. The document serves as a comprehensive guide for understanding and applying statistical sampling techniques in business analytics.

Uploaded by

Shaik Ashraaf
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Unit 2

Statistical Sampling
BUSINESS ANALYTICS
B.Tech(CSE) IV Year - I Semester
Open Elective - III

Prof. S.Adinarayana, Dept of CS&SE, College of Engineering, Andhra


University
Prof. S.Adinarayana, Dept of
CS&SE, College of Engineering, 1
Unit 2
Statistical Sampling

Prof. S.Adinarayana, Dept of CS&SE,


College of Engineering, Andhra University

2
01 SAMPLING DISTRIBUTION

02 SAMPLING METHODS

03 ESTIMATING POPULATION PARAMETERS

04 SAMPLING ERROR

05 HYPOTHESIS TESTING

06 ONE TAIL & TWO TAIL TEST OF HYPOTHESIS

07 ONE SAMPLE & TWO SAMPLE TESTING

8. ANOVA

Prof. S.Adinarayana, Dept of CS&SE, College of Engineering,


Andhra University
9. CHI SQUARE TEST 3
SAMPLING DISTRIBUTION

A sampling distribution is a concept used in statistics. It is a probability distribution of a statistic obtained


from a larger number of samples drawn from a specific population. The sampling distribution of a given
population is the distribution of frequencies of a range of different outcomes that could possibly occur for a
statistic of a population.

• A sampling distribution is a probability distribution of a statistic that is obtained through repeated


sampling of a specific population.
• It describes a range of possible outcomes for a statistic, such as the mean or mode of some variable, of
a population.

Prof. S.Adinarayana, Dept of CS&SE, College of Engineering,


Andhra University 4
STATISTICAL SAMPLING: METHODS

Sampling
Methods

The population is
Subjective Probabilistic divided into
Sampling Sampling clusters, and a
Methods Methods random sample of
clusters is selected

Judgment Convenience Simple Systematic Stratified Cluster Continuous


Sampling Sampling Random (Periodic) Sampling Sampling Process
Sampling Sampling Sampling

Expert judgment is Samples are selected Each item in the Selects every The population is
used to select the based on ease of population has nth item from divided into natural
sample access an equal chance the population subsets (strata) Random
based on Time Random
characteristics Selection Time Points
(Gender, Age group
etc.) Choose a random Choose n random
time and then select times and select the
the next n items next item

Prof. S.Adinarayana, Dept of CS&SE, College of Engineering,


Andhra University 5
ESTIMATING POPULATION PARAMETERS

Point Estimate:
A point estimate is a single value derived from sample data to estimate an unknown population parameter. It is the most used method
in statistics.
Provides a specific numerical value as an approximation of the population parameter.
Advantages: Easy to calculate. Limitations: Lack of Precision
Interval Estimates:
An interval estimate provides a range of values within which a population parameter is expected to lie, based on sample data. It offers
more information than a point estimate by accounting for the variability and uncertainty inherent in the estimation process.
• Confidence Intervals: A confidence interval is a range of values between which the value of the population parameter is believed to be,
along with a probability that the interval correctly estimates the true (unknown) population parameter. This probability is called the
level of confidence, denoted by 1 - α, where α is a number between 0 and 1. The confidence level is usually expressed as a percent;
common values are 90%, 95%, or 99%. (Note that if the level of confidence is 90%, then α = 0.1.) The margin of error depends on the
level of confidence and the sample size.

• Prediction Intervals: A prediction interval provides a range for predicting the value of a new observation from the same population.
This is different from a confidence interval, which provides an interval estimate of a population parameter, such as the mean or
proportion. A confidence interval is associated with the sampling distribution of a statistic, but a prediction interval is associated
with the distribution of the random variable itself.
Prof. S.Adinarayana, Dept of CS&SE, College of Engineering,
Andhra University 6
SAMPLING ERROR

Sampling error is the difference between a sample statistic (e.g., sample mean) and the corresponding
population parameter (e.g., population mean) due to the fact that the sample is only a subset of the
population.
Causes of Sampling Error:
• Random Variation: Natural differences between samples due to randomness.
• Sample Size: Smaller samples tend to have larger sampling errors due to less data representing the
population.
• Sampling Method: Non-random sampling methods can introduce bias, increasing sampling error.

Prof. S.Adinarayana, Dept of CS&SE, College of Engineering,


Andhra University 7
HYPOTHESIS TESTING

Hypothesis testing involves drawing


inferences about two contrasting
propositions (each called a hypothesis)
relating to the value of one or more
population parameters (such as the mean,
proportion, standard deviation, or variance.)
• Null Hypothesis (H0): Represents the
existing theory or belief. Accepted as true
unless disproven by statistical evidence.
• Alternative Hypothesis (H1): The
complement of the null hypothesis. It is
true if the null hypothesis is false.

Prof. S.Adinarayana, Dept of CS&SE, College of Engineering,


Andhra University 8
TYPES OF ERRORS

Striking Balance 9
TYPES OF TESTING:

Types of Hypothesis Testing:


There are 2 types of Hypothesis testing based on the type of data being analysed. They are
1. Parametric Test: When you have information about the population and can make assumptions.
a. T – Test : Small sample size (Less than 30), Variance – Un Known
b. Z – Test : Large sample size (More than 30), Variance – Known
c. F – Test - Small sample size , Variance – Known
d. ANOVA – Extension of T-test

2. Non Parametric Test: Whenever a few assumptions in the given population are uncertain, we use non-
parametric tests
a. Chi-Square Test: Simple random sampling size greater than 50, Samples are independent.

Prof. S.Adinarayana, Dept of CS&SE, College of Engineering,


Andhra University 10
HYPOTHESIS TESTING STEPS:

1. Formulate Hypotheses:
o Null Hypothesis (H0): A statement of no effect or no difference, which we aim to test.
o Alternative Hypothesis (H1 or Ha): A statement that there is an effect or a difference.

2. Choose Significance Level (α):


o Common values are 0.05, 0.01, or 0.10. This is the probability of rejecting the null hypothesis when it is true (Type I
error).

3. Select the Test Statistic:


o Depends on the nature of the data and the hypothesis (e.g., t-test, z-test, chi-square test).
o Establish criteria for rejecting or failing to reject H0.

4. Calculate the Test Statistic and p-Value:


o Use the sample data to compute the test statistic.
o The p-value indicates the probability of observing the test results under the null hypothesis.

5. Make a Decision:
o Compare the p-value to the significance level (α).
o If p-value ≤ α, reject the null hypothesis.
o If p-value > α, do not reject the null hypothesis.
Prof. S.Adinarayana, Dept of CS&SE, College of Engineering,
Andhra University 11
ONE-SAMPLE HYPOTHESIS TESTING
A one-sample hypothesis test is used to determine if a sample comes from a
population with a specific mean (or another parameter). It is useful when you
want to compare the sample mean to a known population mean or a
hypothesized value.

Null Hypothesis (H₀): The statement being tested, usually a statement of no


effect or no difference. For one-sample tests, it often states that the
population mean (μ) is equal to a specific value (μ₀).
Example: H₀: μ = μ₀

Alternative Hypothesis (H₁): The statement you want to test against the null
hypothesis. It suggests that the population mean is different from the
specified value.
Example: H₁: μ ≠ μ₀ (two-tailed test)
H₁: μ > μ₀ (right-tailed test)
H₁: μ < μ₀ (left-tailed test)
Prof. S.Adinarayana, Dept of CS&SE, College of Engineering, 12
Andhra University
TWO-TAILED TEST OF HYPOTHESIS FOR MEAN

Prof. S.Adinarayana, Dept of CS&SE, College of Engineering,


Andhra University 13
TWO-SAMPLE HYPOTHESIS TESTING

Two-sample hypothesis testing is used to compare the means (or other parameters) of two independent
groups to determine if there is a statistically significant difference between them. This type of test is often
used to compare experimental and control groups, different treatment groups, or any other two
independent samples.

Prof. S.Adinarayana, Dept of CS&SE, College of Engineering,


Andhra University 14
Two Sample t-test: Example

Suppose we want to know whether or not the mean weight between two different species of turtles is equal.
To test this, will perform a two-sample t-test at significance level α = 0.05 using the following steps:

Step 1: Gather the sample data.


Suppose we collect a random sample of turtles from each population with the following information:

Prof. S.Adinarayana, Dept of CS&SE, College of


Engineering, Andhra University 15
Prof. S.Adinarayana, Dept of CS&SE, College of
Engineering, Andhra University 16
Step 4: Calculate the p-value of the test statistic t.

According to the T Score to P Value Calculator, the p-value associated with t = -


1.2508 and degrees of freedom = n1+n2-2 = 40+38-2 = 76 is 0.21484.

Step 5: Conclusion.
We fail to reject the null hypothesis since this p-value is not less than
our significance level α = 0.05.
We do not have sufficient evidence to say that the mean weight of turtles between
these two populations is different.

Prof. S.Adinarayana, Dept of CS&SE, College of Engineering,


Andhra University
17
When to use ANOVA

• ANOVA (Analysis of Variance) is used when you have three or more groups and you want to compare their
means to see if they are significantly different.
• It is a statistical method that is used in a variety of research scenarios. Here are some examples of when you
might use ANOVA:

Prof. S.Adinarayana, Dept of CS&SE, College of


Engineering, Andhra University
18
What is an ANOVA

• An ANOVA (“Analysis of Variance”) is a statistical technique used to determine whether or not


there is a significant difference between the means of three or more independent groups.
• The two most common types of ANOVAs are the one-way ANOVA and two-way ANOVA.

• A One-Way ANOVA is used to determine how one factor impacts a response variable.
• For example, we might want to know if three different studying techniques lead to different mean exam
scores.
• To see if there is a statistically significant difference in mean exam scores, we can conduct a one-way
ANOVA.

Prof. S.Adinarayana, Dept of CS&SE,


College of Engineering, Andhra University 19
• A Two-Way ANOVA is used to determine how two factors impact a response
variable and to determine whether or not there is an interaction between the
two factors on the response variable.
• For example, we might want to know how gender and how different levels of
exercise impact average weight loss.
• We would conduct a two-way ANOVA to find out.

Prof. S.Adinarayana, Dept of CS&SE,


College of Engineering, Andhra University
20
• It’s also possible to conduct a three-way ANOVA, four-way ANOVA, etc. but these are much
more uncommon and it can be difficult to interpret ANOVA results if too many factors are used.

Prof. S.Adinarayana, Dept of CS&SE, College of Engineering,


Andhra University 21
ANALYSIS OF VARIANCE (ANOVA)
ANOVA is a statistical method used to compare the means of three or more groups to determine if there is a statistically significant
difference between them. Unlike a t-test, which compares the means of two groups, ANOVA can handle multiple groups simultaneously.

If the F-statistic is greater


than the critical value, or
if the P-value is less than
the significance level (α),
reject H0.

22
CHI-SQUARE TEST FOR INDEPENDENCE
The Chi-Square Test of Independence is a statistical method used to determine if there is a significant association between two categorical
variables. Essentially, it tests whether the distribution of one variable is independent of the distribution of another variable

23
CHI-SQUARE TEST FOR INDEPENDENCE usage
• A Chi-Square Test of Independence is used to determine whether or not there is a significant association
between two categorical variables.

Example

Suppose we want to know whether or not gender is associated with political party preference.
We take a simple random sample of 500 voters and survey them on their political party preference.
The following table shows the results of the survey:

Prof. S.Adinarayana, Dept of CS&SE,


College of Engineering, Andhra University
24
Use the following steps to perform a Chi-Square test of independence to determine if gender is associated with
political party preference.

Step 1: Define the hypotheses.


We will perform the Chi-Square test of independence using the following hypotheses:
•H0: Gender and political party preference are independent.
•H1: Gender and political party preference are not independent.

Step 2: Calculate the expected values.


Next, we will calculate the expected values for each cell in the contingency table using the following formula:
Expected value = (row sum * column sum) / table sum.
For example, the expected value for Male Republicans is: (230*250) / 500 = 115.

We can repeat this formula to obtain the expected value for each cell in the table:

Prof. S.Adinarayana, Dept of CS&SE,


College of Engineering, Andhra University
25
Observed ->
Expected ->

Step 3: Calculate (O-E)2 / E for each cell in the table.

Next, we will calculate (O-E)2 / E for each cell in the table where:
•O: observed value
•E: expected value
For example, Male Republicans would have a value of: (120-115)2 /115 = 0.2174.

Prof. S.Adinarayana, Dept of CS&SE,


College of Engineering, Andhra University
26
(O-E)2 / E for each cell in the table.

We can repeat this formula for each cell in the table:

Step 4: Calculate the test statistic X2 and the corresponding p-value.

X2 = Σ(O-E)2 / E = 0.2174 + 0.2174 + 0.0676 + 0.0676 + 0.1471 + 0.1471 = 0.8642

According to the Chi-Square Score to P Value Calculator, the p-value associated with
X2 = 0.8642 and (2-1)*(3-1) = 2 degrees of freedom is 0.649198.

Prof. S.Adinarayana, Dept of CS&SE,


College of Engineering, Andhra University
27
Step 5: Draw a conclusion.

• We fail to reject the null hypothesis since this p-value is not less than
0.05.
• This means we do not have sufficient evidence to say that there is an
association between gender and political party preference.

Prof. S.Adinarayana, Dept of CS&SE, 28


Chi-square test implementation in R
• First, we will create a table to hold our data:

Prof. S.Adinarayana, Dept of CS&SE, 29


Step 2: Perform the Chi-Square Test of Independence.
Perform the Chi-Square Test of Independence using the chisq.test() function

The way to interpret the output is as follows:


•Chi-Square Test Statistic: 0.86404
•Degrees of freedom: 2 (calculated as #rows-1 * #columns-1)
•p-value: 0.6492

Prof. S.Adinarayana, Dept of CS&SE,


College of Engineering, Andhra University
30

You might also like