1 - Linear - Regression - Ipynb - Colaboratory
1 - Linear - Regression - Ipynb - Colaboratory
1 - Linear - Regression - Ipynb - Colaboratory
Variable
Below table represents current home prices in monroe township based on square feet area,
new jersey
Problem Statement: Given above data build a machine learning model that can predict home
prices based on square feet area
You can represent values in above table as a scatter plot (values are shown in red markers).
After that one can draw a straight line that best fits values on chart.
You can draw multiple lines like this but we choose the one where total sum of error is
minimum
You might remember about linear equation from your high school days math class. Home
prices can be presented as following equation,
import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt
df = pd.read_csv('homeprices.csv')
df
area price
0 2600 550000
1 3000 565000
2 3200 610000
3 3600 680000
4 4000 725000
%matplotlib inline
plt.xlabel('area')
plt.ylabel('price')
plt.scatter(df.area,df.price,color='red',marker='+')
<matplotlib.collections.PathCollection at 0x25c8eb78d68>
new_df = df.drop('price',axis='columns')
new_df
area
0 2600
1 3000
2 3200
3 3600
4 4000
price = df.price
price
0 550000
1 565000
2 610000
3 680000
4 725000
# Create linear regression object
reg = linear_model.LinearRegression()
reg.fit(new_df,price)
normalize=False)
reg.predict([[3300]])
array([628715.75342466])
reg.coef_
array([135.78767123])
reg.intercept_
180616.43835616432
3300*135.78767123 + 180616.43835616432
628715.7534151643
reg.predict([[5000]])
array([859554.79452055])
area_df = pd.read_csv("areas.csv")
area_df.head(3)
area
0 1000
1 1500
2 2300
p = reg.predict(area_df)
1144708.90410959])
area_df['prices']=p
area_df
area prices
0 1000 3.164041e+05
1 1500 3.842979e+05
2 2300 4.929281e+05
3 3540 6.613048e+05
4 4120 7.400616e+05
5 4560 7.998082e+05
6 5490 9.260908e+05
7 3460 6.504418e+05
8 4750 8.256079e+05
9 2300 4.929281e+05
10 9000 1.402705e+06
11 8600 1.348390e+06
12 7100 1.144709e+06
area_df.to_csv("prediction.csv")
Exercise
Predict canada's per capita income in year 2020. There is an exercise folder here on github
at same level as this notebook, download that and you will find
canada_per_capita_income.csv file. Using this build a regression model and predict the per
capita income fo canadian citizens in year 2020
Answer
41288.69409442
Colab paid products
-
Cancel contracts here