Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
5 views

Python Session 14092024.ipynb - Colab

The document contains a Python notebook that demonstrates various statistical tests including Z-tests, T-tests, ANOVA, and correlation analyses using libraries like NumPy, SciPy, and Statsmodels. It covers scenarios such as testing average weights of newborns, sales strategies, and the impact of marketing campaigns, providing calculated statistics like Z-scores, T-scores, and P-values. Additionally, it includes visualizations and interpretations of results for different statistical methods.

Uploaded by

psdamania1973
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Python Session 14092024.ipynb - Colab

The document contains a Python notebook that demonstrates various statistical tests including Z-tests, T-tests, ANOVA, and correlation analyses using libraries like NumPy, SciPy, and Statsmodels. It covers scenarios such as testing average weights of newborns, sales strategies, and the impact of marketing campaigns, providing calculated statistics like Z-scores, T-scores, and P-values. Additionally, it includes visualizations and interpretations of results for different statistical methods.

Uploaded by

psdamania1973
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

9/28/24, 3:14 PM Python Session 14092024.

ipynb - Colab

# Import necessary libraries


import numpy as np
import scipy.stats as stats
import pandas as pd
from statsmodels.stats.weightstats import ztest

Z-Test

keyboard_arrow_down Scenario 1: Testing if the average weight of newborns differs from the hospital's claim
# 1. Z-Test
# Scenario 1: Testing if the average weight of newborns differs from the hospital's claim
sample_mean = 3.4
population_mean = 3.5
std_dev = 0.5
n = 50
z_score = (sample_mean - population_mean) / (std_dev / np.sqrt(n))
p_value = stats.norm.cdf(z_score) * 2 # Two-tailed test
print("Z-Test (Scenario 1):")
print(f"Z-Score: {z_score}")
print(f"P-Value: {p_value}\n")

Z-Test (Scenario 1):


Z-Score: -1.4142135623730963
P-Value: 0.1572992070502848

Start coding or generate with AI.

keyboard_arrow_down Scenario 2: Testing if a marketing campaign increased average daily sales


sample_mean = 2100
population_mean = 2000
std_dev = 300
n = 50
z_score = (sample_mean - population_mean) / (std_dev / np.sqrt(n))
p_value = stats.norm.cdf(z_score) * 2 # Two-tailed test
print("Z-Test (Scenario 2):")
print(f"Z-Score: {z_score}")
print(f"P-Value: {p_value}\n")

Start coding or generate with AI.

Start coding or generate with AI.

t-test

One-Sample T-Test

keyboard_arrow_down Scenario 1: Testing if the average score of students differs from 75


# Scenario 1: Testing if the average score of students differs from 75
sample_scores = np.array([70, 72, 68, 74, 76, 69, 71, 73, 70, 75, 77, 74, 73, 69, 72])
t_score, p_value = stats.ttest_1samp(sample_scores, 75)
print("One-Sample T-Test (Scenario 1):")
print(f"T-Score: {t_score}")
print(f"P-Value: {p_value}\n")

https://colab.research.google.com/drive/1Vp85nKWtvXIv2ZReKpIguicdBscWpeVv#printMode=true 1/6
9/28/24, 3:14 PM Python Session 14092024.ipynb - Colab

keyboard_arrow_down Scenario 2: Testing if a new sales strategy increased monthly sales from $5000
# Scenario 2: Testing if a new sales strategy increased monthly sales from $5000
sample_sales = np.array([5200, 5100, 5300, 5400, 5000, 5100, 5200, 5300, 5000, 5400,
5200, 5300, 5100, 5200, 5400, 5300, 5100, 5200, 5300, 5400])
t_score, p_value = stats.ttest_1samp(sample_sales, 5000)
print("One-Sample T-Test (Scenario 2):")
print(f"T-Score: {t_score}")
print(f"P-Value: {p_value}\n")

Start coding or generate with AI.

Two-Sample T-Test

keyboard_arrow_down Scenario 1: Testing if there is a difference in average test scores between two classes
# Scenario 1: Testing if there is a difference in average test scores between two classes
class_A_scores = np.array([80, 82, 85, 87, 79, 83, 86, 88, 82, 84, 85, 86, 89, 81, 87])
class_B_scores = np.array([75, 77, 79, 76, 78, 77, 76, 79, 78, 77, 75, 76, 77, 78, 76])
t_score, p_value = stats.ttest_ind(class_A_scores, class_B_scores)
print("Two-Sample T-Test (Scenario 1):")
print(f"T-Score: {t_score}")
print(f"P-Value: {p_value}\n")

Start coding or generate with AI.

keyboard_arrow_down Scenario 2: Testing if there is a difference in sales between two stores


# Scenario 2: Testing if there is a difference in sales between two stores
store_A_sales = np.array([3000, 3100, 3200, 3300, 3400, 3100, 3200, 3000, 3100, 3200,
3300, 3400, 3000, 3100, 3200, 3000, 3100, 3300, 3400, 3100,
3200, 3000, 3100, 3200, 3300, 3400, 3000, 3100, 3200, 3300])
store_B_sales = np.array([3200, 3300, 3400, 3200, 3300, 3200, 3400, 3300, 3400, 3200,
3300, 3400, 3200, 3300, 3400, 3200, 3300, 3200, 3400, 3300,
3400, 3200, 3300, 3400, 3200, 3300, 3400, 3200, 3300, 3400])
t_score, p_value = stats.ttest_ind(store_A_sales, store_B_sales)
print("Two-Sample T-Test (Scenario 2):")
print(f"T-Score: {t_score}")
print(f"P-Value: {p_value}\n")

keyboard_arrow_down Paired T-Test


# Scenario 1: Testing if a training program improves productivity
before_training = np.array([10, 12, 11, 13, 12, 11, 14, 12, 15, 13])
after_training = np.array([15, 16, 15, 18, 17, 16, 19, 17, 20, 18])
t_score, p_value = stats.ttest_rel(before_training, after_training)
print("Paired T-Test (Scenario 1):")
print(f"T-Score: {t_score}")
print(f"P-Value: {p_value}\n")

# Scenario 2: Testing if a promotional offer impacts sales


before_promo = np.array([5000, 5100, 5200, 5300, 5000, 5100, 5200, 5300, 5000, 5400,
5200, 5300, 5100, 5200, 5400])
after_promo = np.array([5300, 5400, 5500, 5600, 5300, 5400, 5500, 5600, 5300, 5700,
5500, 5600, 5400, 5500, 5700])
t_score, p_value = stats.ttest_rel(before_promo, after_promo)
print("Paired T-Test (Scenario 2):")

https://colab.research.google.com/drive/1Vp85nKWtvXIv2ZReKpIguicdBscWpeVv#printMode=true 2/6
9/28/24, 3:14 PM Python Session 14092024.ipynb - Colab
print(f"T-Score: {t_score}")
print(f"P-Value: {p_value}\n")

Paired T-Test (Scenario 1):


T-Score: -36.0
P-Value: 4.873628935030659e-11

Paired T-Test (Scenario 2):


T-Score: -inf
P-Value: 0.0

C:\Users\admin\AppData\Local\Temp\ipykernel_5468\1620158596.py:14: RuntimeWarning: Precision loss occurred in moment calculation due


t_score, p_value = stats.ttest_rel(before_promo, after_promo)

Start coding or generate with AI.

keyboard_arrow_down 1. One-Way ANOVA


# Dataset for One-Way ANOVA (Advertising Budget vs. Sales for 3 different campaigns)
data = {'Campaign 1': [500, 700, 800, 900, 950],
'Campaign 2': [1000, 1100, 1200, 1250, 1300],
'Campaign 3': [400, 450, 500, 550, 600]}

df = pd.DataFrame(data)

df

Campaign 1 Campaign 2 Campaign 3

0 500 1000 400

1 700 1100 450

2 800 1200 500

3 900 1250 550

4 950 1300 600

# One-Way ANOVA
f_stat, p_val = stats.f_oneway(df['Campaign 1'], df['Campaign 2'], df['Campaign 3'])

print("F-Statistic:", f_stat)
print("P-Value:", p_val)

F-Statistic: 32.31279620853076
P-Value: 1.4751779256171162e-05

# Since p value is less than 0.5, hence we will reject the null hypothesis,
# Hence there is not statistical difference between these 3 campaigns

Start coding or generate with AI.

keyboard_arrow_down 2. Two-Way ANOVA


import statsmodels.api as sm
from statsmodels.formula.api import ols

# Dataset for Two-Way ANOVA


data = {'Sales': [500, 700, 800, 950, 400, 600, 1000, 1200, 1300, 1250],
'Ad_Campaign': ['Campaign 1', 'Campaign 1', 'Campaign 1', 'Campaign 1', 'Campaign 2',
'Campaign 2', 'Campaign 2', 'Campaign 2', 'Campaign 3', 'Campaign 3'],
'Store': ['Store A', 'Store B', 'Store A', 'Store B', 'Store A',
'Store B', 'Store A', 'Store B', 'Store A', 'Store B']}

df = pd.DataFrame(data)

https://colab.research.google.com/drive/1Vp85nKWtvXIv2ZReKpIguicdBscWpeVv#printMode=true 3/6
9/28/24, 3:14 PM Python Session 14092024.ipynb - Colab
# Two-Way ANOVA
model = ols('Sales ~ C(Ad_Campaign) + C(Store) + C(Ad_Campaign):C(Store)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)

sum_sq df F PR(>F)
C(Ad_Campaign) 417875.0 2.0 1.915759 0.260872
C(Store) 49000.0 1.0 0.449284 0.539387
C(Ad_Campaign):C(Store) 22875.0 2.0 0.104871 0.902836
Residual 436250.0 4.0 NaN NaN

Start coding or generate with AI.

keyboard_arrow_down 3. Pearson’s Correlation


import pandas as pd
from scipy.stats import pearsonr

# Dataset (Advertising Budget vs. Sales)


data = {'Advertising Budget ($)': [10000, 20000, 15000, 25000, 30000],
'Sales (Units)': [500, 1200, 850, 1500, 2000]}

df = pd.DataFrame(data)

# Pearson's Correlation
corr, _ = pearsonr(df['Advertising Budget ($)'], df['Sales (Units)'])
print(f'Pearson Correlation: {corr}')

Pearson Correlation: 0.9963607400395207

keyboard_arrow_down 4. Spearman’s Rank Correlation


from scipy.stats import spearmanr

# Dataset (Customer Satisfaction Rank vs. Revenue Rank)


data = {'Customer Satisfaction Rank': [1, 3, 2, 4],
'Revenue Rank': [2, 4, 1, 3]}

df = pd.DataFrame(data)

# Spearman's Correlation
corr, _ = spearmanr(df['Customer Satisfaction Rank'], df['Revenue Rank'])
print(f'Spearman Correlation: {corr}')

Spearman Correlation: 0.6000000000000001

keyboard_arrow_down 5. Point-Biserial Correlation


from scipy.stats import pointbiserialr

# Dataset (Coupon Usage vs. Amount Spent)


data = {'Used Coupon': [1, 0, 1, 0], # Binary variable: 1 = Yes, 0 = No
'Amount Spent ($)': [250, 180, 300, 120]}

df = pd.DataFrame(data)

# Point-Biserial Correlation
corr, _ = pointbiserialr(df['Used Coupon'], df['Amount Spent ($)'])
print(f'Point-Biserial Correlation: {corr}')

Point-Biserial Correlation: 0.9147021665807529

keyboard_arrow_down 6. Cross-Correlation (Time-Series)


import numpy as np
import matplotlib.pyplot as plt

https://colab.research.google.com/drive/1Vp85nKWtvXIv2ZReKpIguicdBscWpeVv#printMode=true 4/6
9/28/24, 3:14 PM Python Session 14092024.ipynb - Colab

# Time-series dataset (Social Media Mentions vs. Sales)


social_media_mentions = np.array([500, 600, 700, 850])
sales = np.array([50000, 52000, 55000, 70000])

# Cross-Correlation
cross_corr = np.correlate(social_media_mentions - np.mean(social_media_mentions), sales - np.mean(sales), mode='full')
lags = np.arange(-len(sales) + 1, len(sales))

# Plot the cross-correlation


plt.plot(lags, cross_corr)
plt.title('Cross-Correlation between Social Media Mentions and Sales')
plt.xlabel('Lag')
plt.ylabel('Cross-Correlation')
plt.show()

keyboard_arrow_down 7. Kendall’s Tau Correlation


from scipy.stats import kendalltau

# Dataset (Performance Rank vs. Promotion Rank)


data = {'Performance Rank': [2, 1, 3, 2],
'Promotion Rank': [3, 1, 4, 2]}

df = pd.DataFrame(data)

# Kendall's Tau Correlation


corr, _ = kendalltau(df['Performance Rank'], df['Promotion Rank'])
print(f'Kendall Tau Correlation: {corr}')

Kendall Tau Correlation: 0.912870929175277

Start coding or generate with AI.

https://colab.research.google.com/drive/1Vp85nKWtvXIv2ZReKpIguicdBscWpeVv#printMode=true 5/6
9/28/24, 3:14 PM Python Session 14092024.ipynb - Colab

https://colab.research.google.com/drive/1Vp85nKWtvXIv2ZReKpIguicdBscWpeVv#printMode=true 6/6

You might also like