Supervised Learning With Scikit-Learn: Introduction To Regression
Supervised Learning With Scikit-Learn: Introduction To Regression
Supervised Learning With Scikit-Learn: Introduction To Regression
Introduction to
regression
Supervised Learning with scikit-learn
In [2]: print(boston.head())
CRIM ZN INDUS CHAS NX RM AGE DIS RAD TAX \
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296.0
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242.0
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242.0
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222.0
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222.0
In [4]: y = boston['MEDV'].values
Supervised Learning with scikit-learn
In [7]: y = y.reshape(-1, 1)
In [12]: plt.show();
Supervised Learning with scikit-learn
In [16]: reg.fit(X_rooms, y)
In [20]: plt.show()
Supervised Learning with scikit-learn
Let’s practice!
SUPERVISED LEARNING WITH SCIKIT-LEARN
The basics of
linear regression
Supervised Learning with scikit-learn
Regression mechanics
● y = ax + b
● y = target
● x = single feature
● a, b = parameters of model
● How do we choose a and b?
● Define an error function for any given line
● Choose the line that minimizes the error function
Supervised Learning with scikit-learn
Residual
Supervised Learning with scikit-learn
Let’s practice!
SUPERVISED LEARNING WITH SCIKIT-LEARN
Cross-validation
Supervised Learning with scikit-learn
Cross-validation motivation
● Model performance is dependent on way the data is split
● Not representative of the model’s ability to generalize
● Solution: Cross-validation!
Supervised Learning with scikit-learn
Cross-validation basics
Split 1 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Metric 1
Cross-validation in scikit-learn
In [1]: from sklearn.model_selection import cross_val_score
In [4]: print(cv_results)
[ 0.63919994 0.71386698 0.58702344 0.07923081 -0.25294154]
In [5]: np.mean(cv_results)
Out[5]: 0.35327592439587058
SUPERVISED LEARNING WITH SCIKIT-LEARN
Let’s practice!
SUPERVISED LEARNING WITH SCIKIT-LEARN
Regularized
regression
Supervised Learning with scikit-learn
Why regularize?
● Recall: Linear regression minimizes a loss function
● It chooses a coefficient for each feature variable
● Large coefficients can lead to overfi"ing
● Penalizing large coefficients: Regularization
Supervised Learning with scikit-learn
Ridge regression
!
n
● Loss function = OLS loss function + α∗ 2
ai
i=1
● Alpha: Parameter we need to choose
● Picking alpha here is similar to picking k in k-NN
● Hyperparameter tuning (More in Chapter 3)
● Alpha controls model complexity
● Alpha = 0: We get back OLS (Can lead to overfi"ing)
● Very high alpha: Can lead to underfi"ing
Supervised Learning with scikit-learn
Lasso regression !
n
● Loss function = OLS loss function + α ∗ |ai|
i=1
Supervised Learning with scikit-learn
In [7]: _ = plt.ylabel('Coefficients')
In [8]: plt.show()
Supervised Learning with scikit-learn
Let’s practice!