Lab 1
Lab 1
Figure 1: A linear regression represents a linear correlation between a feature and the targets
Experimental Results:
DATA DESCRIPTION:
The Diabetes dataset consider in this experiment contains physiological data collected on 442
patients and as a corresponding target an indicator of the disease progression after a year. The
physiological data occupy the first 10 columns with values that indicate respectively the
following: • Age • Sex • Body mass index • Blood pressure • S1, S2, S3, S4, S5, and S6 (six
blood serum measurements. When we execute the below line, we get the linear regression
score(R2). The linear regression score gives the variance of the feature (R2) ie the linear
relationship between the feature and the targets. The values of R2 range from 0 to 1. An R2 close
to zero indicates a model with very little explanatory power. An R2 close to one indicates a model
with more explanatory power. But in our experiment, we got 0.58 as the score which shows that
the model is providing medium explanatory power.
linreg.score(x_test,y_test)
Out[ ]: 0.58507530226905713
Next, we have drawn regression line for all 10 features, creating 10 models and seeing the
result for each of them through a linear chart.
Figure 2. Ten Linear charts showing the correlations between physiological factors and
the progression of diabetes
From the above 10 charts, we observed that features 'age', 'bmi', 'blood
pressure','s1','s2','s3','s4','s5','s6' are showing regression line with linear relationship whereas the
feature 'sex' is not showing linear relationship with the target variable(disease).
Result:
Hence we implemented Linear Regression model on diabetes dataset .
Logistic Regression
import numpy as np
def sigmoid(x):
return 1/(1+np.exp(-x))
class LogisticRegression():
for _ in range(self.n_iters):
linear_pred = np.dot(X, self.weights) + self.bias
predictions = sigmoid(linear_pred)
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt
from LogisticRegression import LogisticRegression
bc = datasets.load_breast_cancer()
X, y = bc.data, bc.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)
clf = LogisticRegression(lr=0.01)
clf.fit(X_train,y_train)
y_pred = clf.predict(X_test)
Result:
From the output results, we found out that Logistic Regression has given 93% on breast cancer
dataset.