Python Session 14092024.ipynb - Colab
Python Session 14092024.ipynb - Colab
ipynb - Colab
Z-Test
keyboard_arrow_down Scenario 1: Testing if the average weight of newborns differs from the hospital's claim
# 1. Z-Test
# Scenario 1: Testing if the average weight of newborns differs from the hospital's claim
sample_mean = 3.4
population_mean = 3.5
std_dev = 0.5
n = 50
z_score = (sample_mean - population_mean) / (std_dev / np.sqrt(n))
p_value = stats.norm.cdf(z_score) * 2 # Two-tailed test
print("Z-Test (Scenario 1):")
print(f"Z-Score: {z_score}")
print(f"P-Value: {p_value}\n")
t-test
One-Sample T-Test
https://colab.research.google.com/drive/1Vp85nKWtvXIv2ZReKpIguicdBscWpeVv#printMode=true 1/6
9/28/24, 3:14 PM Python Session 14092024.ipynb - Colab
keyboard_arrow_down Scenario 2: Testing if a new sales strategy increased monthly sales from $5000
# Scenario 2: Testing if a new sales strategy increased monthly sales from $5000
sample_sales = np.array([5200, 5100, 5300, 5400, 5000, 5100, 5200, 5300, 5000, 5400,
5200, 5300, 5100, 5200, 5400, 5300, 5100, 5200, 5300, 5400])
t_score, p_value = stats.ttest_1samp(sample_sales, 5000)
print("One-Sample T-Test (Scenario 2):")
print(f"T-Score: {t_score}")
print(f"P-Value: {p_value}\n")
Two-Sample T-Test
keyboard_arrow_down Scenario 1: Testing if there is a difference in average test scores between two classes
# Scenario 1: Testing if there is a difference in average test scores between two classes
class_A_scores = np.array([80, 82, 85, 87, 79, 83, 86, 88, 82, 84, 85, 86, 89, 81, 87])
class_B_scores = np.array([75, 77, 79, 76, 78, 77, 76, 79, 78, 77, 75, 76, 77, 78, 76])
t_score, p_value = stats.ttest_ind(class_A_scores, class_B_scores)
print("Two-Sample T-Test (Scenario 1):")
print(f"T-Score: {t_score}")
print(f"P-Value: {p_value}\n")
https://colab.research.google.com/drive/1Vp85nKWtvXIv2ZReKpIguicdBscWpeVv#printMode=true 2/6
9/28/24, 3:14 PM Python Session 14092024.ipynb - Colab
print(f"T-Score: {t_score}")
print(f"P-Value: {p_value}\n")
df = pd.DataFrame(data)
df
# One-Way ANOVA
f_stat, p_val = stats.f_oneway(df['Campaign 1'], df['Campaign 2'], df['Campaign 3'])
print("F-Statistic:", f_stat)
print("P-Value:", p_val)
F-Statistic: 32.31279620853076
P-Value: 1.4751779256171162e-05
# Since p value is less than 0.5, hence we will reject the null hypothesis,
# Hence there is not statistical difference between these 3 campaigns
df = pd.DataFrame(data)
https://colab.research.google.com/drive/1Vp85nKWtvXIv2ZReKpIguicdBscWpeVv#printMode=true 3/6
9/28/24, 3:14 PM Python Session 14092024.ipynb - Colab
# Two-Way ANOVA
model = ols('Sales ~ C(Ad_Campaign) + C(Store) + C(Ad_Campaign):C(Store)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)
sum_sq df F PR(>F)
C(Ad_Campaign) 417875.0 2.0 1.915759 0.260872
C(Store) 49000.0 1.0 0.449284 0.539387
C(Ad_Campaign):C(Store) 22875.0 2.0 0.104871 0.902836
Residual 436250.0 4.0 NaN NaN
df = pd.DataFrame(data)
# Pearson's Correlation
corr, _ = pearsonr(df['Advertising Budget ($)'], df['Sales (Units)'])
print(f'Pearson Correlation: {corr}')
df = pd.DataFrame(data)
# Spearman's Correlation
corr, _ = spearmanr(df['Customer Satisfaction Rank'], df['Revenue Rank'])
print(f'Spearman Correlation: {corr}')
df = pd.DataFrame(data)
# Point-Biserial Correlation
corr, _ = pointbiserialr(df['Used Coupon'], df['Amount Spent ($)'])
print(f'Point-Biserial Correlation: {corr}')
https://colab.research.google.com/drive/1Vp85nKWtvXIv2ZReKpIguicdBscWpeVv#printMode=true 4/6
9/28/24, 3:14 PM Python Session 14092024.ipynb - Colab
# Cross-Correlation
cross_corr = np.correlate(social_media_mentions - np.mean(social_media_mentions), sales - np.mean(sales), mode='full')
lags = np.arange(-len(sales) + 1, len(sales))
df = pd.DataFrame(data)
https://colab.research.google.com/drive/1Vp85nKWtvXIv2ZReKpIguicdBscWpeVv#printMode=true 5/6
9/28/24, 3:14 PM Python Session 14092024.ipynb - Colab
https://colab.research.google.com/drive/1Vp85nKWtvXIv2ZReKpIguicdBscWpeVv#printMode=true 6/6