Assignment AI-ML
Assignment AI-ML
The Boston Housing Dataset is a derived from information collected by the U.S. Census Service concerning
housing in the area of Boston MA. The following describes the dataset columns:
LOGIC EXPLANATION :
Step-1:we have imported libraries such as pandas and numpy to import their functionalities in our
assignment. Here we have imported linear regression that uses the relationship between the data points
to draw straight line through all them. This line can be used to predict future values.
Matplotlib is a cross-platform, data visualization and graphical plotting library for Python and its
numerical extension NumPy. As such, it offers a viable open source alternative to MATLAB
OUTPUT:
LOGIC EXPLANATION:
OUTPUT:
LOGIC EXPLANATION:
Step-5: We have calculated the correlation of column [‘MEDV’] with all other columns.
The corr() method calculates the relationship between each column in your data set. The Result
of the corr() method is a table with a lot of numbers that represents how well the relationship is
between two columns. The number varies from -1 to 1.
OUTPUT:
After comparing the correlation of each column with MEDV , we have concluded that LSTAT provides
the best correlation with MEDV.That’s why we will further work on column LSTAT and MEDV to
predict the best values for our dataset.
LOGIC EXPLANATION:
Step-6: We have defined a variable y which stored the column values of LSTAT.
Further we plotted a graph between MEDV and LSTAT where X-axis denotes MEDV values and Y-axis
denotes the LSTAT values.
OUTPUT:
LOGIC EXPLANATION :
Step-7: We have taken x_mean as MEDV values and y_mean as LSTAT values .Using these two
parameters we have calculated our mean values for both.
Now we have calculated the slope as b1 and intercept as b0 and then printed the result. We have
stored the value of the resulting linear equation in variable y_pred. After this we have use plt.scatter()
function to plot a scatter plot and uses the values of y_pred to predict our result.
From the graph we can infer that the scattered points are near to the plotted line, proving that it has
high accuracy.
OUTPUT:
LOGIC EXPLANATION :
Step-8: We have imported mean_absolute_error from sklearn.metrics to calculate the Mean Absolute
Error. Mean Absolute Error calculates the average difference between the calculated values and actual
values.
Step-9: We have imported mean_squared_error from sklearn.metrics to calculate the Mean Square
Error. The Mean Squared Error of an estimator measures the average of error squares i.e. the average
squared difference between the estimated values and true value.
Step-11: We have imported r2_score from sklearn.metrics to calculate R_Square value for our data. It is
used to evaluate the performance of a linear regression model.
OUTPUT:
LOGIC EXPLANATION:
Step-13: We have used reshape() function allows us to reshape an array in Python. Reshaping basically
means, changing the shape of an array. And the shape of an array is determined by the number of
elements in each dimension. Reshaping allows us to add or remove dimensions in an array.
LinearRegression fits a linear model with coefficients to minimize the residual sum of squares between
the observed targets in the dataset, and the targets predicted by the linear approximation. Regression
analysis is a form of predictive modelling technique which investigates the relationship between a
dependent (target) and independent variable (predictor). This technique is used for forecasting, time
series modelling and finding the causal effect relationship between the variables.
OUTPUT:
CONCLUSION:
After processing the whole data we came to the conclusion that LSTAT has the best accuracy in
correlation with MEDV. The plotted graph gathers the same information about the accuracy of the data
taken.
Also at last we have shown m_pred and m_predicted values that came out to be same and hence we
have designed a linear single model of regression to predict the MEDV value.