Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
14 views

Simple Linear Regression

The document demonstrates using linear regression to predict home prices and Canadian per capita income based on area and year respectively. It shows loading and exploring data, plotting scatter plots, fitting linear regression models, making predictions, and adding prediction columns.

Uploaded by

jaymehta1444
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Simple Linear Regression

The document demonstrates using linear regression to predict home prices and Canadian per capita income based on area and year respectively. It shows loading and exploring data, plotting scatter plots, fitting linear regression models, making predictions, and adding prediction columns.

Uploaded by

jaymehta1444
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

3/8/24, 4:48 PM MlYt1.

ipynb - Colaboratory

keyboard_arrow_down Question in the video


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model

df = pd.read_csv("homeprices.csv")
df

area price

0 2600 550000

1 3000 565000

2 3200 610000

3 3600 680000

4 4000 725000

Next steps: Generate code with df


toggle_off View recommended plots

Plot a scatterplot for the data available.

%matplotlib inline
plt.xlabel("area (sqr ft)") # adds labels on the x-axis
plt.ylabel("price (US $)")
plt.scatter(df.area, df.price, color="red", marker="+")

<matplotlib.collections.PathCollection at 0x7bbdf44ec1c0>

We fit the available data into a Linear Regression model.

The various Features are added as a 2D array ( df[ [ ] ] ).

The Target variable is added as df.___

reg = linear_model.LinearRegression()
reg.fit(df[['area']], df.price)
print(reg.coef_) # gives the coeffecients of the linear equation.
print(reg.intercept_) # gives the y-intercept of the equation

[135.78767123]
180616.43835616432

%matplotlib inline
plt.xlabel("area (sqr ft)")
plt.ylabel("price (US $)")
plt.scatter(df.area, df.price, color="red", marker="+")
plt.plot(df.area, reg.predict(df[['area']]), color='blue') # plots the Linear Regression Line.

https://colab.research.google.com/drive/19T8cNCsKWIzrDmgmVNTbrb_LOBxMtsp8#scrollTo=XJelj1L0xusa&printMode=true 1/4
3/8/24, 4:48 PM MlYt1.ipynb - Colaboratory

[<matplotlib.lines.Line2D at 0x7bbdf4506aa0>]

print(reg.predict([[3300]])) # predicts the value of the given input.


print(reg.predict([[5000]]))

[628715.75342466]
[859554.79452055]
/usr/local/lib/python3.10/dist-packages/sklearn/base.py:439: UserWarning: X does not have valid feature names, but LinearRegression
warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/base.py:439: UserWarning: X does not have valid feature names, but LinearRegression
warnings.warn(

 

d = pd.read_csv("areas.csv")
d.head(3)

area

0 1000

1 1500

2 2300

Next steps: Generate code with d


toggle_off View recommended plots

p = reg.predict(d) # predicting values for an array input


p

array([ 316404.10958904, 384297.94520548, 492928.08219178,


661304.79452055, 740061.64383562, 799808.21917808,
926090.75342466, 650441.78082192, 825607.87671233,
492928.08219178, 1402705.47945205, 1348390.4109589 ,
1144708.90410959])

d['prices'] = p # adding a column in the sheet for the predicted values


d

https://colab.research.google.com/drive/19T8cNCsKWIzrDmgmVNTbrb_LOBxMtsp8#scrollTo=XJelj1L0xusa&printMode=true 2/4
3/8/24, 4:48 PM MlYt1.ipynb - Colaboratory

area prices

0 1000 3.164041e+05

1 1500 3.842979e+05

2 2300 4.929281e+05

3 3540 6.613048e+05

4 4120 7.400616e+05

5 4560 7.998082e+05

6 5490 9.260908e+05

7 3460 6.504418e+05

8 4750 8.256079e+05

9 2300 4.929281e+05

10 9000 1.402705e+06

11 8600 1.348390e+06

12 7100 1.144709e+06

Next steps: Generate code with d


toggle_off View recommended plots

d.to_csv("prediction.csv", index = False) # exporting the csv file with no index column

keyboard_arrow_down Exercise
df1=pd.read_csv("/content/canada_per_capita_income.csv")
df1.head()

year pci

0 1970 3399.299037

1 1971 3768.297935

2 1972 4251.175484

3 1973 4804.463248

4 1974 5576.514583

Next steps: Generate code with df1


toggle_off View recommended plots

plt.xlabel("Year")
plt.ylabel("Per Capita Income")
plt.scatter(df1.year, df1.pci, color='red', marker='+')

<matplotlib.collections.PathCollection at 0x7bbdf4705960>

https://colab.research.google.com/drive/19T8cNCsKWIzrDmgmVNTbrb_LOBxMtsp8#scrollTo=XJelj1L0xusa&printMode=true 3/4
3/8/24, 4:48 PM MlYt1.ipynb - Colaboratory
reg1 = linear_model.LinearRegression()
reg1.fit(df1[['year']], df1.pci)
print(reg1.coef_)
print(reg1.intercept_)

[828.46507522]
-1632210.7578554575

plt.xlabel("Year")
plt.ylabel("Per Capita Income")
plt.scatter(df1.year, df1.pci, color='red', marker='+')
plt.plot(df1.year, reg1.predict(df1[['year']]), color='blue')

output [<matplotlib.lines.Line2D at 0x7bbdf4798ca0>]

reg1.predict([[2020]])

/usr/local/lib/python3.10/dist-packages/sklearn/base.py:439: UserWarning: X does not have valid feature names, but LinearRegression
warnings.warn(
array([41288.69409442])

 

https://colab.research.google.com/drive/19T8cNCsKWIzrDmgmVNTbrb_LOBxMtsp8#scrollTo=XJelj1L0xusa&printMode=true 4/4

You might also like