0% found this document useful (0 votes)

16 views

Coding Probability and Statistics With Python From Scratch

This document provides code examples and explanations for probability and statistics concepts in Python including: 1) Calculating sample mean and standard deviation from data and defining functions for the normal, beta, and student's t-distributions. 2) Plotting the probability density functions and cumulative distribution functions for the normal, beta, and student's t-distributions. 3) An example using Khan Academy data to calculate the mean, standard deviation, and percentage of values less than or equal to 85. 4) An example of numerical integration using the centerpoint method to integrate a function.

Uploaded by

Abdullah Mirza

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Coding Probability and Statistics With Python From Scratch

Uploaded by

Abdullah Mirza

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Probability and Statistics Experiments

with Python
by Thom Ives, Ph.D.

Find this on DagsHub too. On DagsHub, I am ThomIves, and this repo is

"Probability_and_Statistics_with_Python".

What is the motivation for such an approach? The approach of coding math from scratch
without libraries or modules? I like the way my dear friend and brother Manjunatha
Gummaraju says it best.

"Hand crafting (without libraries & automation) helps to get a firm grip on
the subject, nuances & its applications. It also helps probably to author new
innovative techniques from the ground up."

Calculating (Sample) Mean and Standard

Deviation
2
∑ xi ∑(xi − μ)
μ = σ = √
n n
In [ ]: def mean(X):
mu = sum(X) / len(X)

return mu

In [ ]: def standard_deviation(X, mu=None):

if not mu:
mu = mean(X)
sigma = (sum([(x - mu)**2 for x in X])/len(X))**0.5

return sigma

Normal Probability Distribution Function

2
1 x−μ

1 − ( )
2 σ
p = e
σ√2π

In [ ]: import matplotlib.pyplot as plt

def PDF(x, mean=0, std_dev=1):

# define e and pi explicitly
e = 2.718281828
pi = 3.1415927
# calculate in two steps
p = 1.0 / (std_dev * ((2 * pi) ** 0.5))
p *= e ** (-0.5 * ((x - mean)/std_dev)**2)

return p

X = [(x - 1000)/200 for x in list(range(2001))]

P = [PDF(x) for x in X]
plt.plot(X, P)
plt.title(label="Standard Normal Distribution")
plt.xlabel(xlabel="value")
plt.ylabel(ylabel="propability of value occurance")
plt.show()
Cummulative Normal Distribution
Function
xright 2
1 x−μ
1 − ( )
2 σ
cdf = ∫ e
xlef t σ√2π

In [ ]: import matplotlib.pyplot as plt

def PDF(x, mean=0, std_dev=1):

# define e and pi explicitly
e = 2.718281828
pi = 3.1415927
# calculate in two steps
p = 1.0 / (std_dev * ((2 * pi) ** 0.5))
p *= e ** (-0.5 * ((x - mean)/std_dev)**2)

return p

def CDF(mean=0, std_dev=1, x_left=-5, x_right=5, width=0.0001):

CDF = 0
X = [] # for plotting only
CDF_y = [] # for plotting only

x = x_left + width / 2
while x < x_right:
X.append(x) # for plotting only
panel = PDF(x, mean, std_dev) * width # panel under PDF
CDF += panel # running sum of panels = integration
CDF_y.append(CDF) # for plotting only
x += width # current x value

return CDF, X, CDF_y

total_integral, X, CDF_y = CDF()
P = [PDF(x) for x in X]
total_integral = round(total_integral, 5)
msg = f'Total integral of PDF = {total_integral}'
print(msg)
plt.plot(X, P)
plt.plot(X, CDF_y)
plt.show()

Total integral of PDF = 1.0

The Beta Distribution

1 α−1 β−1
f (x, α, β) = x (1 − x)
B(α, β)

1
α−1 β−1
B(α, β) = ∫ t (1 − t) dt
0

In [ ]: import matplotlib.pyplot as plt

class Beta_Distribution:
def __init__(self, alpha, beta, panels=10000):
self.alpha = alpha
self.beta = beta
self.panels = panels
self.__Beta_Function__()

def __Beta_Function__(self):
width = 1 / self.panels
X = [x/self.panels for x in range(self.panels)]
# makes total integral of beta_PDF sum to 1
self.B = sum(
[(x**(self.alpha - 1) * \
(1 - x)**(self.beta - 1)) * width
for x in X])
def beta_PDF(self, x):
return x**(self.alpha - 1) * \
(1 - x)**(self.beta - 1) / self.B

X = [x/1000 for x in range(1000+1)]

bd = Beta_Distribution(5, 2)
Y1 = [bd.beta_PDF(x) for x in X]
Y_integral = round(sum([y*0.001 for y in Y1]), 3)
bd = Beta_Distribution(2, 5)
Y2 = [bd.beta_PDF(x) for x in X]

print(f"The total integral of beta_PDF is {Y_integral}")

plt.plot(X, Y1)
plt.plot(X, Y2)
plt.title(label="Skewed Distribution")
plt.xlabel(xlabel="values")
plt.ylabel(ylabel="probabilities")
plt.show()

The total integral of beta_PDF is 1.0

Student's T-Distribution
1

α−1 β−1
B(α, β) = ∫ t (1 − t)
0

ν+1
−
2 2
1 t
P DFt (t) = (1 + )
1 ν
√νB( , ) ν
2 2

In [ ]: class T_Distribution:
def __init__(self, dof=9):
self.beta = self.beta_function(0.5, dof/2)

self.front = 1 / (dof ** 0.5 * self.beta)

self.dof = dof
self.power = -(dof + 1)/2

def beta_function(self, x, y):

pw = 1 / 1000000
beta = 0
t = pw / 2
while t < 1.0:
beta += t ** (x - 1) * (1 - t) ** (y - 1) * pw
t += pw

return beta

def PDFt(self, t):

# The t probability distribution method
f_of_t = self.front * (1 + t**2/self.dof) ** self.power

return f_of_t

def CDFt(self, t_left, t_right):

# The t cummulative distribution method
# We simply numerically integrate under the PDFt curve
panels = self.dof * 100
width = (t_right - t_left) / panels
cdf = 0
t = t_left
prob = self.PDFt(t)
# print(panels, width, prob)
for i in range(panels):
t += i * width
prob = self.PDFt(t)
cdf += prob * width

return cdf

In [ ]: import matplotlib.pyplot as plt

def PDF(x, mean=0, std_dev=1):

# define e and pi explicitly
e = 2.718281828
pi = 3.1415927
# calculate in two steps
p = 1.0 / (std_dev * ((2 * pi) ** 0.5))
p *= e ** (-0.5 * ((x - mean)/std_dev)**2)

return p

X = [(x - 1000)/250 for x in list(range(2001))]

P = [PDF(x) for x in X]
plt.plot(X, P)
for dof in [1, 2, 3, 5, 10, 30]:
t_dist = T_Distribution(dof=dof)
TP = [t_dist.PDFt(x) for x in X]
plt.plot(X, TP)
plt.title(label="Test Distribution")
plt.xlabel(xlabel="value")
plt.ylabel(ylabel="propability of value occurance")
plt.show()

Basic Determination Of Significance Value

A Khan Academy Problem

In [ ]: X = [80]5 + [82.5]24 + [85]72 + [87.5]181 + [90]*281 + \

[92.5]*272 + [95]*136 + [97.5]*27 + [100]*2

mu = mean(X)
std = standard_deviation(X, mu=mu)
print(mu, std)

90.54 3.362796455332963

In [ ]: the_85_and_less = [x for x in X if x <= 85]

percentage_LE_85 = len(the_85_and_less)/len(X)
print(percentage_LE_85)

0.101

Basic Centerpoint Integration

w
We start t at 2
to use centerpoints for each panel.

There are other methods of numerical integration.

Centerpoint is pretty good at balancing areas above

and below the function being integrated.

In [ ]: import matplotlib.pyplot as plt
import math

w = 1/1000
f_of_t = math.sin

T = [w/2]
S = [0]
C = [-1]
for t in range(10000):
T.append(T[-1] + w) # Our time step
S.append(f_of_t(t*w)) # Our Function
C.append(f_of_t(t*w)*w + C[-1]) # Integrating

plt.plot(T, S)
plt.plot(T, C)
plt.show()

Null And Alternate Hypotheses

Distributions With Dynamic Significance
Level
For the LaTeX in MatPlotLib Inside Colab, See:
https://stackoverflow.com/a/62075348/996205

In [ ]: import matplotlib
from matplotlib import rc
import matplotlib.pyplot as plt
%matplotlib inline

rc('text', usetex=True)
matplotlib.rcParams['text.latex.preamble'] = [r'\usepackage{amsmath}']
!apt install texlive-fonts-recommended texlive-fonts-extra cm-super dvipng

In [ ]: import matplotlib.pyplot as plt

import math
import time

def PDF(x, mean=0, std_dev=1):

p = 1.0 / (std_dev * ((2 * math.pi) ** 0.5))
p *= math.e ** (-0.5 * ((x - mean)/std_dev)**2)
return p

pw = 1/1000
X = [(x + 0.5)*pw for x in range(22000)]
P1 = [PDF(x, 8, 2) for x in X]
P2 = [PDF(x, 14, 2) for x in X]

C1 = [] # C2 = []
sum1 = 0 # sum2 = 0
for i in range(len(X)):
sum1 += P1[i]*pw # sum2 += P2[i]*pw
C1.append(sum1) # C2.append(sum2)

SigLevels = [(v/2 + 90)/100 for v in range(17)]

SigLevels = [0.975]

for sl in SigLevels:
for i in range(len(X)):
if C1[i] > sl:
sig_i = i
break

plt.figure(figsize = (10,5))
plt.plot(X, P1)
plt.plot(X, P2)

plt.title(
label="Distributions For Null And Alternative Hypotheses", fontsize=
plt.xlabel(xlabel="Values", fontsize=14)
plt.ylabel(ylabel="Probability of Occurance", fontsize=14)

plt.fill_between(X[sig_i:], 0, P1[sig_i:], facecolor='green', alpha=0.5)

plt.fill_between(X[:sig_i], 0, P2[:sig_i], facecolor='orange', alpha=0.5

plt.text(8, 0.13, 'NULL', fontsize=18, ha='center')

plt.text(14, 0.13, 'ALT', fontsize=18, ha='center')

plt.text(8, 0.06, 'True\nNegative', fontsize=16, ha='center')

plt.text(14, 0.06, 'True\nPositive', fontsize=16, ha='center')

plt.text(1, 0.15, r'$\beta$ = percent'+'\nfalse negative',

fontsize=16, ha='left')
plt.text(18, 0.15, r'$\alpha$ = percent'+'\nfalse positive',
fontsize=16, ha='left')
this_text = f'significance\nlevel = {round(sl, 3)}'
plt.text(18, 0.10, this_text,
fontsize=16, ha='left')

plt.text(8, 0.1, r'$1 - \alpha$', fontsize=18, ha='center')

plt.text(14, 0.1, r'$1 - \beta$', fontsize=18, ha='center')
plt.text(X[sig_i] + 0.25, 0.005, r'$\alpha$', fontsize=18)
plt.text(X[sig_i] - 0.75, 0.005, r'$\beta$', fontsize=18)

plt.savefig(f'hypo_{round(sl, 3)}.png')
plt.show()
time.sleep(1)
plt.figure().clear()

<Figure size 432x288 with 0 Axes>

Null And Alternate Hypotheses

Distributions With Dynamic Alternative
Mean
In [ ]: import matplotlib.pyplot as plt
import math
import time

def PDF(x, mean_=0, std_dev_=1):

p = 1.0 / (std_dev_ * ((2 * math.pi) ** 0.5))
p *= math.e ** (-0.5 * ((x - mean_)/std_dev_)**2)
return p

pw = 1/1000
X = [(x + 0.5)*pw for x in range(22000)]
P1 = [PDF(x, 8, 2) for x in X]

C1 = [] # C2 = []
sum1 = 0
for i in range(len(X)):
sum1 += P1[i]*pw
C1.append(sum1)
sig_level = 0.975
for i in range(len(X)):
if C1[i] > sig_level:
sig_i = i
break

for i in range(19):
mu_alt = 12.4 + i * 0.2
mu_alt = round(mu_alt, 1)

P2 = [PDF(x, mu_alt, 2) for x in X]

plt.figure(figsize = (10,5))
plt.plot(X, P1)
plt.plot(X, P2)

plt.title(
label="Distributions For Null And Alternative Hypotheses", fontsize=
plt.xlabel(xlabel="Values", fontsize=14)
plt.ylabel(ylabel="Probability of Occurance", fontsize=14)

plt.fill_between(X[sig_i:], 0, P1[sig_i:], facecolor='green', alpha=0.5)

plt.fill_between(X[:sig_i], 0, P2[:sig_i], facecolor='orange', alpha=0.5

plt.text(8, 0.13, 'NULL', fontsize=16, ha='center')

plt.text(mu_alt, 0.13, 'ALT', fontsize=16, ha='center')

plt.text(8, 0.06, 'True\nNegative', fontsize=14, ha='center')

plt.text(mu_alt, 0.06, 'True\nPositive', fontsize=14, ha='center')

plt.text(1, 0.15, r'$\beta$ = percent'+'\nfalse negative',

fontsize=14, ha='left')
plt.text(20, 0.15, r'$\alpha$ = percent'+'\nfalse positive',
fontsize=14, ha='left')
this_text = f'significance\nlevel = {round(sig_level, 3)}'
plt.text(20, 0.10, this_text,
fontsize=14, ha='left')
plt.text(20, 0.06, r'$\mu_{alt}$' + f' = {mu_alt}',
fontsize=16, ha='left')

plt.text(8, 0.1, r'$1 - \alpha$', fontsize=16, ha='center')

plt.text(mu_alt, 0.1, r'$1 - \beta$', fontsize=16, ha='center')

plt.text(X[sig_i] + 0.25, 0.005, r'$\alpha$', fontsize=18)

plt.text(X[sig_i] - 0.75, 0.005, r'$\beta$', fontsize=18)

plt.xlim([0, 26])
plt.savefig(f'hypos_{round(mu_alt, 1)}.png')
plt.show()
time.sleep(1)
plt.figure().clear()

ROC Curve From Dynamic Significance

Level
ROC = True Positive Rate vs. False Positive Rate

In [ ]: import matplotlib.pyplot as plt

import math
import time

def PDF(x, mean=0, std_dev=1):

p = 1.0 / (std_dev * ((2 * math.pi) ** 0.5))
p *= math.e ** (-0.5 * ((x - mean)/std_dev)**2)
return p

pw = 1/1000
X = [(x + 0.5)*pw for x in range(22000)]
P1 = [PDF(x, 8, 2) for x in X]
P2 = [PDF(x, 11, 2) for x in X]

C1 = []
C2 = []
sum1 = 0
sum2 = 0
for i in range(len(X)):
sum1 += P1[i]*pw
sum2 += P2[i]*pw
C1.append(sum1)
C2.append(sum2)

ROC_X = []
ROC_Y = []
pts = 41
Sig_Levels = [(v * 100/(pts-1))/100 for v in range(0, pts)]

for sig_lev in Sig_Levels:

if sig_lev == 1:
sig_lev = 0.999
for i in range(len(X)):
if C1[i] > sig_lev:
sig_i = i
break

if sig_lev == 0.999:
sig_lev = 1

TP_Rate = 1 - C2[sig_i]
FP_Rate = 1 - C1[sig_i]

ROC_X.append(FP_Rate)
ROC_Y.append(TP_Rate)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

ax1.plot(X, P1)
ax1.plot(X, P2)

msg = "Receiver Operator Curve (ROC) From Sweeping Significance Level"

fig.suptitle(msg, fontsize=18)
msg = "Significance Level On Null & Alt Distributions"
ax1.set_title(label=msg)
ax1.set_xlabel(xlabel="Values", fontsize=14)
ax1.set_ylabel(ylabel="Probability of Occurance", fontsize=14)

ax1.fill_between(X[sig_i:], 0, P1[sig_i:], facecolor='green', alpha=0.5)

ax1.fill_between(X[:sig_i], 0, P2[:sig_i], facecolor='orange', alpha=0.5

ax1.text(1, 0.15, r'$\beta$ = percent'+'\nfalse neg',

fontsize=16, ha='left')
ax1.text(14, 0.15, r'$\alpha$ = percent'+'\nfalse pos',
fontsize=16, ha='left')
this_text = f'significance\nlevel = {round(sig_lev, 3)}'
ax1.text(14, 0.10, this_text,
fontsize=16, ha='left')

ax1.text(X[sig_i] + 0.25, 0.005, r'$\alpha$', fontsize=18)

ax1.text(X[sig_i] - 0.75, 0.005, r'$\beta$', fontsize=18)

ax1.set_xlim([0, 21])

ax2.plot([0, 1], [0, 1])

ax2.plot(ROC_X, ROC_Y)
ax2.set_title(label="Receiver Operator Curve (ROC)")
ax2.set_xlabel(xlabel="False Positive Rate", fontsize=14)
ax2.set_ylabel(ylabel="True Positive Rate", fontsize=14)
ax2.set_xlim(0, 1)
ax2.set_ylim(0, 1)
plt.show()

fig.savefig(f'hypo_{round(sig_lev, 3)}.png')
time.sleep(0.2)

ROC Curve Changes Due To Separation Of

NULL And ALT Distributions' Means
ROC = True Positive Rate vs. False Positive Rate

In [ ]: import matplotlib.pyplot as plt

import math
import time

def PDF(x, mean=0, std_dev=1):

p = 1.0 / (std_dev * ((2 * math.pi) ** 0.5))
p *= math.e ** (-0.5 * ((x - mean)/std_dev)**2)
return p

pw = 1/1000
X = [(x + 0.5)*pw for x in range(22000)]
P1 = [PDF(x, 8, 2) for x in X]

C1 = []
sum1 = 0
for i in range(len(X)):
sum1 += P1[i]*pw
C1.append(sum1)

pts = 101
Sig_Levels = [(v * 100/(pts-1))/100 for v in range(0, pts)]

for i in range(71):
mu_alt = 8 + 0.1 * i
mu_alt = round(mu_alt, 1)

P2 = [PDF(x, mu_alt, 2) for x in X]

C2 = []
sum2 = 0
for i in range(len(X)):
sum2 += P2[i]*pw
C2.append(sum2)

ROC_X = []
ROC_Y = []
for sig_lev in Sig_Levels:
if sig_lev == 1:
sig_lev = 0.999
for i in range(len(X)):
if C1[i] > sig_lev:
sig_i = i
break

if sig_lev == 0.999:
sig_lev = 1

TP_Rate = 1 - C2[sig_i]
FP_Rate = 1 - C1[sig_i]

ROC_X.append(FP_Rate)
ROC_Y.append(TP_Rate)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

ax1.plot(X, P1)
ax1.plot(X, P2)

msg = "Receiver Operator Curve (ROC) From Sweeping Significance Level"

fig.suptitle(msg, fontsize=18)

msg = "Significance Level On Null & Alt Distributions"

ax1.set_title(label=msg)
ax1.set_xlabel(xlabel="Values", fontsize=14)
ax1.set_ylabel(ylabel="Probability of Occurance", fontsize=14)

ax1.text(8, 0.09, 'NULL\nHypothesis',

fontsize=14, ha='center')
ax1.text(mu_alt, 0.04, 'ALT\nHypothesis',
fontsize=14, ha='center')

ax1.set_xlim([0, 21])
ax2.plot([0, 1], [0, 1])
ax2.plot(ROC_X, ROC_Y)
ax2.set_title(label="Receiver Operator Curve (ROC)")
ax2.set_xlabel(xlabel="False Positive Rate", fontsize=14)
ax2.set_ylabel(ylabel="True Positive Rate", fontsize=14)
ax2.set_xlim(0, 1)
ax2.set_ylim(0, 1)
plt.show()

fig.savefig(f'hypo_{round(mu_alt, 3)}.png')
# time.sleep(0.05)

ROC Curve Changes Due To Changes In

NULL And ALT Distributions' Standard
Deviations
ROC = True Positive Rate vs. False Positive Rate

In [ ]: import matplotlib.pyplot as plt

import math
import time

def PDF(x, mean=0, std_dev=1):

p = 1.0 / (std_dev * ((2 * math.pi) ** 0.5))
p *= math.e ** (-0.5 * ((x - mean)/std_dev)**2)
return p

pw = 1/1000
X = [(x + 0.5)*pw for x in range(22000)]

pts = 101
Sig_Levels = [(v * 100/(pts-1))/100 for v in range(0, pts)]

for i in range(26):
std_alt = 3.5 - 0.1 * i
std_alt = round(std_alt, 1)

P1 = [PDF(x, 8, std_alt) for x in X]

C1 = []
sum1 = 0
for i in range(len(X)):
sum1 += P1[i]*pw
C1.append(sum1)

P2 = [PDF(x, 12, std_alt) for x in X]

C2 = []
sum2 = 0
for i in range(len(X)):
sum2 += P2[i]*pw
C2.append(sum2)
ROC_X = []
ROC_Y = []
for sig_lev in Sig_Levels:
if sig_lev == 1:
sig_lev = 0.999
for i in range(len(X)):
if C1[i] > sig_lev:
sig_i = i
break

if sig_lev == 0.999:
sig_lev = 1

TP_Rate = 1 - C2[sig_i]
FP_Rate = 1 - C1[sig_i]

ROC_X.append(FP_Rate)
ROC_Y.append(TP_Rate)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

ax1.plot(X, P1)
ax1.plot(X, P2)

msg = "Receiver Operator Curve (ROC) From Sweeping Significance Level"

fig.suptitle(msg, fontsize=18)

msg = "Significance Level On Null & Alt Distributions"

ax1.set_title(label=msg)
ax1.set_xlabel(xlabel="Values", fontsize=14)
ax1.set_ylabel(ylabel="Probability of Occurance", fontsize=14)

ax1.text(8, 0.09, 'NULL\nHypothesis',

fontsize=14, ha='center')
ax1.text(12, 0.04, 'ALT\nHypothesis',
fontsize=14, ha='center')

ax1.set_xlim([0, 21])

ax2.plot([0, 1], [0, 1])

fig.savefig(f'hypo_{round(sig_lev, 3)}.png')
# time.sleep(0.05)

F1 Score
2
F1 =
1 1
+
Recall P recision
F1 = Harmonic Mean Of Recall and Precision
________________________________________________________

Accuracy, Recall, Precision

TP + TN W hat W e Got Right

Accuracy = =
TP + FP + TN + FN All Cases

TP W hat P ositives W e Got Right

Recall = =
TP + FN All Actual P ositives

TP W hat P ositives W e Got Right

P recision = =
TP + FP All P ositive P redictions

LASSO In Logistic Regression to Compare

with Statistics Version
In [ ]: from sklearn.linear_model import LogisticRegression as LR
from sklearn.datasets import load_iris

X, y = X, y = load_iris(return_X_y=True)
lr_mod = LR(penalty='l1', solver='liblinear')
lr_mod.fit(X, y)
print(lr_mod.coef_)

[[ 0. 2.52235623 -2.83220134 0. ]
[ 0.32846823 -1.79370624 0.66582088 -1.57267348]
[-2.62263278 -2.50833176 3.26131365 4.61826807]]

Regressor Coefficient From Statistics Vs.

Machine Learning Methods
In [ ]: import matplotlib.pyplot as plt
import numpy as np

# Synthesize some data (i.e. create fake data)

X = np.random.uniform(0, 1, 1000)
Y = 2.0 * X
Y_noise = np.max(Y) * 0.073
Y += np.random.normal(0, 0.073, 1000)

# Statistics Way To Create Model

X_std = np.std(X)
Y_std = np.std(Y)
r = np.corrcoef(X, Y)
Cs = Y_std / X_std * r[0, 1]
print(Cs)

# Machine Learning
mod_LR = LinearRegression(fit_intercept=False, copy_X=True)
mod_LR.fit(X.reshape(-1, 1), Y.reshape(-1, 1))
Cml = mod_LR.coef_[0, 0]
print(Cml)

# Visualize
plt.figure(figsize=(10, 5))
plt.plot(X, Cs*X+1) # + 1 separates the two exact plots
plt.plot(X, Cml*X)
plt.title('Models Determined By Stats And ML')
plt.legend(('Stats Way', 'ML Way'))
plt.show()

1.9918506427390517
1.9957689952415045

Regressor Coefficient AND "Intercept"

FROM Statistics
In [ ]: import matplotlib.pyplot as plt
import numpy as np

# Synthesize some data (i.e. create fake data)

X = np.random.uniform(0, 1, 1000)
Y = 2.0 * X + 1
Y_noise = np.max(Y) * 0.073
Y += np.random.normal(0, 0.073, 1000)

# Statistics Way To Fit Model Coefficient

X_std = np.std(X)
Y_std = np.std(Y)
r = np.corrcoef(X, Y)
Cs = Y_std / X_std * r[0, 1]
print(Cs)

# Statistics Way To Calculate Intercept

X_mean = np.mean(X)
Y_mean = np.mean(Y)
b = Y_mean - Cs*X_mean
print(b)

1.9992493711567196
1.001206029449972

In [ ]: # Visualize
plt.figure(figsize=(10, 5))
plt.scatter(X, Y, color='magenta')
plt.plot(X, Cs*X+b, color='black')
plt.ylim((0, 3.5))
plt.title('Models Determined By Stats And ML')
plt.show()

Central Limit Theorem Principles

Sample Means Distribtion For Increasing Sample

Sizes
In [ ]: import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

die_values = [1, 2, 3, 4, 5, 6]
sample_sizes = [2, 4, 8, 16, 32]

for experiment in range(1):

for sample_size in sample_sizes:
sample_means = []
for num_samples in range(1000):
die_cast = np.random.choice(
die_values, size=sample_size)
sample_mean = np.mean(die_cast)
sample_means.append(sample_mean)

experiment_mean = np.mean(sample_means)
experiment_std = np.std(sample_means)
x_min = min(sample_means)
x_max = max(sample_means)
x = np.arange(x_min, x_max, 0.01)
y = norm.pdf(x, experiment_mean, experiment_std)
plt.plot(x, y)

legend_texts = [f'Sample Size = {v}' for v in sample_sizes]

plt.legend(legend_texts)
plt.title("Distribution Changes By Sample Size")
plt.show()

Approaching The Central Limit Theorem

In [ ]: import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import os

cwd = os.getcwd()
if not os.path.isdir(f"{cwd}/images"):
os.mkdir(f"{cwd}/images")

no_images = True
image_num = 0
if no_images:
die_values = [1, 2, 3, 4, 5, 6]
sample_sizes = [2, 4, 8, 16, 32]
num_add_samples_list = [2] + [1]*8 + [2]*5 + [10]*8 + [100]*9
sample_means_D = {k: [] for k in sample_sizes}
total_samples = 0

for num_samples in num_add_samples_list:

total_samples += num_samples
plt.figure(figsize=(12, 6))
for sample_size in sample_sizes:
for sample_num in range(num_samples):
die_cast = np.random.choice(
die_values, size=sample_size)
sample_mean = np.mean(die_cast)
sample_means_D[sample_size].append(sample_mean)

experiment_mean = np.mean(sample_means_D[sample_size])
experiment_std = np.std(sample_means_D[sample_size])
x_min = min(sample_means_D[sample_size])
x_max = max(sample_means_D[sample_size])
x = np.arange(x_min, x_max, 0.001)
y = norm.pdf(x, experiment_mean, experiment_std)
plt.plot(x, y)

legend_texts = [f'Sample Size = {v}' for v in sample_sizes]

plt.legend(legend_texts)
title = f"Distribution Of Means For {total_samples} "
title += "Samples For Various Sample Sizes"
plt.title(title)
plt.xlim([1, 6])
if total_samples == 2:
for i in range(5):
plt.savefig(f"{cwd}/images/{image_num:02d}.png")
image_num += 1
elif total_samples == 1000:
for i in range(5):
plt.savefig(f"{cwd}/images/{image_num:02d}.png")
image_num += 1
else:
plt.savefig(f"{cwd}/images/{image_num:02d}.png")
image_num += 1

plt.close()

# Run below on command line to create movie - needs ffmpeg

# ffmpeg -framerate 4 -pattern_type glob -i "*.png" output.avi
Sample From Huge Population To See When
Central Limit Theorem Is Reached
In [ ]: import numpy as np
import matplotlib.pyplot as plt

die_values = [1, 2, 3, 4, 5, 6]
die_roles = [np.random.choice(die_values, size=1)[0] for _ in range(int(1e6)

# plt.hist(die_roles, bins=6, width=0.73)

# plt.show();

mean = round(np.mean(die_roles), 1)
print(f'Population mean is {mean}')

for sample_size in [2, 4, 8, 16, 32, 64]:

sample_means = []
samples = 0
while True:
samples += 1
roles = np.random.choice(die_roles, size=sample_size)
sample_mean = np.mean(roles)
sample_means.append(sample_mean)

running_mean = round(np.mean(sample_means), 2)
if running_mean == 3.50:
break

title = f'Mean sample means = {running_mean} for {samples} samples of {s

running_std = np.std(sample_means)
x = np.arange(1, 6, 0.001)
y = norm.pdf(x, running_mean, running_std)
plt.xlim([1, 6])
plt.plot(x, y)
plt.title(title)
plt.axvline(3.5)
plt.show();

The Ugly Approach To The Central Limit Theorem

In [ ]: import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import os

cwd = os.getcwd()
if not os.path.isdir(f"{cwd}/images"):
os.mkdir(f"{cwd}/images")

no_images = True
image_num = 0
if no_images:
die_values = [1, 2, 3, 4, 5, 6]
num_add_samples_list = [2] + [1]*8 + [2]*5 + [10]*8 + [100]*9
num_add_samples_list += [9000] + [90000] + [900000] + [1000000]
sample_means = []
total_samples = 0

for num_samples in num_add_samples_list:

total_samples += num_samples
for sample_num in range(num_samples):
die_casts = np.random.choice(
die_values, size=32)
sample_mean = np.mean(die_casts)
sample_means.append(sample_mean)

fig, ax1 = plt.subplots()

fig.set_figheight(6)
fig.set_figwidth(12)

color = 'tab:blue'
ax1.set_xlabel('Sample Means')
ax1.set_ylabel('Occurence Rate', color=color)
bins = len(set(sample_means))
ax1.hist(sample_means, bins=bins, density=True, stacked=True)
ax1.set_xlim([2, 5])
ax1.set_ylim([0, 4])

ax2 = ax1.twinx()

color = 'tab:red'
ax2.set_ylabel('Probability', color=color)
ax2.tick_params(axis='y', labelcolor=color)
running_mean = round(np.mean(sample_means), 2)
title = f'Mean of sample means = {running_mean} '
title += f'for {total_samples} samples of size 32'
running_std = np.std(sample_means)
x = np.arange(2, 5, 0.001)
y = norm.pdf(x, running_mean, running_std)
ax2.plot(x, y, color=color)
ax2.set_ylim([0, 2])

plt.title(title)
plt.axvline(3.5)

fig.tight_layout()

# if total_samples == 2:
# for i in range(5):
# plt.savefig(f"{cwd}/images/{image_num:02d}.png")
# image_num += 1
# elif total_samples > 1000000:
# for i in range(5):
# plt.savefig(f"{cwd}/images/{image_num:02d}.png")
# image_num += 1
# else:
# plt.savefig(f"{cwd}/images/{image_num:02d}.png")
# image_num += 1
plt.show();

# plt.close()

# Run below on command line to create movie - needs ffmpeg

# ffmpeg -framerate 4 -pattern_type glob -i "*.png" output.avi
Two Y Axes
In [ ]: import numpy as np
import matplotlib.pyplot as plt

# Create some mock data

t = np.arange(0.01, 10.0, 0.01)
data1 = np.exp(t)
data2 = np.sin(2 * np.pi * t)

fig, ax1 = plt.subplots()

color = 'tab:red'
ax1.set_xlabel('time (s)')
ax1.set_ylabel('exp', color=color)
ax1.plot(t, data1, color=color)
ax1.tick_params(axis='y', labelcolor=color)

ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis

color = 'tab:blue'
ax2.set_ylabel('sin', color=color) # we already handled the x-label with ax
ax2.plot(t, data2, color=color)
ax2.tick_params(axis='y', labelcolor=color)

fig.tight_layout() # otherwise the right y-label is slightly clipped

plt.show()

Binomial Variables

Rules Binomial Variables

1. Comprised Of Independent Trials
2. Each Trial Can Be Regarded As A Success Or Failure
3. There Are A FIXED Number Of Trials
4. Probability Of Success On Each Trial Is Constant (i.e. Rule 1 Repeats)

NOTE: Use The 10% Rule For Approximate Independence Of Trials When Resampling Is
Not Possible

Binomial Combinametrics
n k n−k
p(k of n) = ( ) p (1 − p)
k
n n!
( ) =
k k!(n − k)!

n n + k − 1 (n + k − 1)! n(n + 1)(n + 2) ⋯ (n + k − 1)

(( )) = ( ) = = ,
k k k! (n − 1)! k!

n! k n−k
p(k of n) = p (1 − p)
k!(n − k)!

In [ ]: # Independent Trials Success Probability = 0.73

import math
import matplotlib.pyplot as plt

fact = math.factorial

Ps = 0.73
p_D = {}
n = 7
for k in range(n+1):
nCk = fact(n) / (fact(k) * fact(n-k))
p = nCk * Ps**k * (1 - Ps)**(n - k)
p_D[k] = round(p, 2)

plt.bar(p_D.keys(), p_D.values())
plt.show()

Machine Learning?
100% (2)
Machine Learning?
114 pages
Python Hands On Answers
No ratings yet
Python Hands On Answers
15 pages
Assignment 7
No ratings yet
Assignment 7
6 pages
Ass 7
No ratings yet
Ass 7
6 pages
EAlab Codes
No ratings yet
EAlab Codes
6 pages
Cg Practical File
No ratings yet
Cg Practical File
21 pages
cubic_spline
No ratings yet
cubic_spline
5 pages
Maths 20 Lab
No ratings yet
Maths 20 Lab
13 pages
ppt code
No ratings yet
ppt code
32 pages
Mathlab 5
No ratings yet
Mathlab 5
7 pages
4D complex PDE
No ratings yet
4D complex PDE
4 pages
Continuity
No ratings yet
Continuity
5 pages
ML Lab
No ratings yet
ML Lab
24 pages
An Detailed paper on integration
No ratings yet
An Detailed paper on integration
5 pages
The Relationship Between The Secant and The Tangent Lines Initialization
No ratings yet
The Relationship Between The Secant and The Tangent Lines Initialization
8 pages
Assignment2_AryanSinghChaudhry_22123012
No ratings yet
Assignment2_AryanSinghChaudhry_22123012
7 pages
Sols TD2
No ratings yet
Sols TD2
6 pages
Assignment 3
No ratings yet
Assignment 3
8 pages
Tutorial Classification Py
No ratings yet
Tutorial Classification Py
7 pages
Simbolička Matematika: M. Essert: Matlab Inženjerski
No ratings yet
Simbolička Matematika: M. Essert: Matlab Inženjerski
19 pages
Vea Pa Que Se Entretenga Aprendiendo
No ratings yet
Vea Pa Que Se Entretenga Aprendiendo
24 pages
Matlab 6
No ratings yet
Matlab 6
7 pages
04_training_linear_models
No ratings yet
04_training_linear_models
35 pages
Python Programs Andhra University Cse 3-1
100% (2)
Python Programs Andhra University Cse 3-1
35 pages
$R6GWGO6
No ratings yet
$R6GWGO6
9 pages
Derivatives
No ratings yet
Derivatives
2 pages
2024 Week 5 - Jupyter Notebook
No ratings yet
2024 Week 5 - Jupyter Notebook
3 pages
math lab
No ratings yet
math lab
5 pages
TD-TP1: 1 Introduction Aux Méthodes Numériques
No ratings yet
TD-TP1: 1 Introduction Aux Méthodes Numériques
7 pages
Poison Asyssent
No ratings yet
Poison Asyssent
4 pages
Utf 8''W2 - P2 1 PDF
No ratings yet
Utf 8''W2 - P2 1 PDF
5 pages
Assignment 1 Utkarsh - Jupyter Notebook
No ratings yet
Assignment 1 Utkarsh - Jupyter Notebook
4 pages
(GEN MATH) Functions and Relations PDF
No ratings yet
(GEN MATH) Functions and Relations PDF
5 pages
Global Slopes
No ratings yet
Global Slopes
3 pages
Computer Assignment
No ratings yet
Computer Assignment
12 pages
これから学ぶ人向け「統計検定２級」チートシート
No ratings yet
これから学ぶ人向け「統計検定２級」チートシート
32 pages
Untitled
No ratings yet
Untitled
6 pages
Python 3 Oops Hands On
No ratings yet
Python 3 Oops Hands On
7 pages
‎⁨ דף נוסחאות מדמח סופי⁩
No ratings yet
‎⁨ דף נוסחאות מדמח סופי⁩
2 pages
AD
No ratings yet
AD
9 pages
SS202B 2015midterm Sol
No ratings yet
SS202B 2015midterm Sol
7 pages
Slip NO - 3
No ratings yet
Slip NO - 3
9 pages
工數傅立葉級數
No ratings yet
工數傅立葉級數
35 pages
D-Shape Breakout Signals [LuxAlgo]
No ratings yet
D-Shape Breakout Signals [LuxAlgo]
7 pages
24BCE5243_ArinSanjaySinha_23rdAugust_Assignment_3
No ratings yet
24BCE5243_ArinSanjaySinha_23rdAugust_Assignment_3
4 pages
Aiml Lab
No ratings yet
Aiml Lab
13 pages
Chap7 Derivatives
No ratings yet
Chap7 Derivatives
4 pages
Lab 05
No ratings yet
Lab 05
10 pages
ps1-sol (1)
No ratings yet
ps1-sol (1)
25 pages
8 Animation Parabole - Py
No ratings yet
8 Animation Parabole - Py
3 pages
Functions Revision Worksheet - AK
No ratings yet
Functions Revision Worksheet - AK
6 pages
Đ Xuân Trư ng-IEIESB21003-HW2.2
No ratings yet
Đ Xuân Trư ng-IEIESB21003-HW2.2
11 pages
Mathematics Lab
No ratings yet
Mathematics Lab
6 pages
Lab05_06
No ratings yet
Lab05_06
4 pages
Downloaded by R GAYATHRI (R.gayathri@aalimec - Ac.in)
No ratings yet
Downloaded by R GAYATHRI (R.gayathri@aalimec - Ac.in)
56 pages
21BCE5775 Neural Networks
No ratings yet
21BCE5775 Neural Networks
19 pages
Workshop 8: Numerical Differentiation and Integration
No ratings yet
Workshop 8: Numerical Differentiation and Integration
9 pages
advanced_level_dpp_indefinite_integration_solution_mathongo
No ratings yet
advanced_level_dpp_indefinite_integration_solution_mathongo
19 pages
ML Lab....... 3-Converted New
No ratings yet
ML Lab....... 3-Converted New
27 pages
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet
Updated Fee Structure 23-24
No ratings yet
Updated Fee Structure 23-24
2 pages
English Coposition and ComprehensionCourse Module Fall-2022
No ratings yet
English Coposition and ComprehensionCourse Module Fall-2022
6 pages
Assignment 01 PF Fall2022
No ratings yet
Assignment 01 PF Fall2022
3 pages
Study Anxiety and Scholastic Performance
No ratings yet
Study Anxiety and Scholastic Performance
56 pages
Descriptive Research Experimental Research Design
No ratings yet
Descriptive Research Experimental Research Design
11 pages
Caporado III, Armando R. Repatacodo, Shanie Niña B. Villamor, Kamille Viktoria Jhoie G
No ratings yet
Caporado III, Armando R. Repatacodo, Shanie Niña B. Villamor, Kamille Viktoria Jhoie G
28 pages
9709 s04 QP 7
No ratings yet
9709 s04 QP 7
4 pages
T Test For Correlated Samples
No ratings yet
T Test For Correlated Samples
31 pages
ADM G11 Q4 Week 1 Week 4 Set A Updated
No ratings yet
ADM G11 Q4 Week 1 Week 4 Set A Updated
31 pages
Comparison and Evaluation of Alternative System Designs
No ratings yet
Comparison and Evaluation of Alternative System Designs
44 pages
Analysis of Variance (Anova) : Week 8
No ratings yet
Analysis of Variance (Anova) : Week 8
23 pages
Research Methodology
No ratings yet
Research Methodology
12 pages
Marriage Arrangement & Divorce Powerpoint
No ratings yet
Marriage Arrangement & Divorce Powerpoint
14 pages
Problems
No ratings yet
Problems
12 pages
BA-Psy-Hon-CBCS (2020)
No ratings yet
BA-Psy-Hon-CBCS (2020)
40 pages
H2 MYE Revision Package Hypothesis Testing Solutions
No ratings yet
H2 MYE Revision Package Hypothesis Testing Solutions
9 pages
Sample Problem ANOVA For Quiz 1
No ratings yet
Sample Problem ANOVA For Quiz 1
4 pages
Econ1310 Exam
No ratings yet
Econ1310 Exam
15 pages
Exploratory Causal Analysis With Time Series Data - James M. McCracken (Morgan & Claypool, 2016)
100% (1)
Exploratory Causal Analysis With Time Series Data - James M. McCracken (Morgan & Claypool, 2016)
134 pages
Full Report
No ratings yet
Full Report
25 pages
Annova
0% (1)
Annova
19 pages
Additional Exercise - Test of Independence
No ratings yet
Additional Exercise - Test of Independence
3 pages
Effects of Power Concepts and Employee Performance On Managers' Empowering
No ratings yet
Effects of Power Concepts and Employee Performance On Managers' Empowering
19 pages
1. Ho: μ=.29 Ha: μ ≠.29 2. α=
No ratings yet
1. Ho: μ=.29 Ha: μ ≠.29 2. α=
3 pages
Lesson 6 - Chi-Square
No ratings yet
Lesson 6 - Chi-Square
18 pages
Skittles Term Project Math 1040
No ratings yet
Skittles Term Project Math 1040
6 pages
Enterprise Risk Management in The Nigerian Insurance Industry
No ratings yet
Enterprise Risk Management in The Nigerian Insurance Industry
7 pages
Module 9: Statistical Inference of Two Samples: The Z-Test
No ratings yet
Module 9: Statistical Inference of Two Samples: The Z-Test
11 pages
The World Uncertainty Index and GDP Growth Rate
No ratings yet
The World Uncertainty Index and GDP Growth Rate
5 pages
Statisticsprobability11 q4 Week6 v4
No ratings yet
Statisticsprobability11 q4 Week6 v4
9 pages
Statstictics Problems
100% (1)
Statstictics Problems
40 pages
Biological Reviews - 2007 - Nakagawa - Effect Size Confidence Interval and Statistical Significance A Practical Guide For
No ratings yet
Biological Reviews - 2007 - Nakagawa - Effect Size Confidence Interval and Statistical Significance A Practical Guide For
15 pages
Aims of Experimental Research
0% (1)
Aims of Experimental Research
11 pages