Simple Linear Regression: Math Behind
Simple Linear Regression: Math Behind
● Simple linear regression solves problems with only one input feature
● Multiple linear regression solves problems with multiple input features
Assumptions
Take-home point
Math behind
● In a nutshell, simple linear regression is based on coefficients - and which you need
to find in order to solve a line equation:
Line equation:
B1 coefficient:
B0 coefficient:
Implementation
Testing
● For validation sake, we'll split the dataset into training and testing parts:
In [4]:
from sklearn.model_selection import train_test_split
● You can now initialize and train the model, and afterwards make predictions:
In [5]:
model = SimpleLinearRegression()
model.fit(X_train, y_train)
preds = model.predict(X_test)
● If you re-train the model of the entire dataset and then make predictions for the
entire dataset, you'll get the best fit line
● You can then visualize this line with Matplotlib:
In [14]:
model_all = SimpleLinearRegression()
model_all.fit(X, y)
preds_all = model_all.predict(X)
sk_model = LinearRegression()
sk_model.fit(np.array(X_train).reshape(-1, 1), y_train)
sk_preds = sk_model.predict(np.array(X_test).reshape(-1, 1))
sk_model.intercept_, sk_model.coef_
21.351850699502783
● Ours was 21.351850699502787, so nearly identical.