Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
95 views

5 Multiple Linear Regression

The document discusses using multiple linear regression to predict home prices based on area, bedrooms, and age using a dataset, and predict prices for two sample homes based on the regression model; it also discusses using multiple linear regression on a hiring dataset to predict salaries based on a candidate's experience, test score, and interview score, and predicting salaries for two sample candidates.

Uploaded by

Sudheer Redus
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views

5 Multiple Linear Regression

The document discusses using multiple linear regression to predict home prices based on area, bedrooms, and age using a dataset, and predict prices for two sample homes based on the regression model; it also discusses using multiple linear regression on a hiring dataset to predict salaries based on a candidate's experience, test score, and interview score, and predicting salaries for two sample candidates.

Uploaded by

Sudheer Redus
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

LINEAR REGRESSION USING MULTIPLE VARIABLES / MULTIPLE LINEAR REGRESSION

Here, we take more than 1 independent variables (x1, x2, x3, … etc).

The dependent variable y = m1x1 + m2x2 + m3x3 + … + b

Problem: We are given Home prices in Monroe Township, NJ (USA). We should predict the prices
for the following homes:
a) 3000 sft area, 3 bed rooms, 40 years old
b) 2500 sft area, 4 bed rooms, 5 years old

dataset
homeprices.csv

Note: Since the one piece of data in no. of bedrooms missing in the dataset, we have to clean the
data. We want to find median of that column and substitute it in the missing cell.

Linear equation

price = m1* area + m2 * bedrooms + m3 * age + b

program

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model

# load the data into dataframe


df = pd.read_csv("E://ds/2-multiple-regression/homeprices.csv")
df

# fill the missing data (NaN) with median of bedrooms


import math
median_bedrooms = math.floor(df.bedrooms.median())
median_bedrooms # 3
# fill the missing data (NaN columns) with this median value
df.bedrooms = df.bedrooms.fillna(median_bedrooms)
df

# create linear regression model


# take the independent vars first and take dependent var next.
reg = linear_model.LinearRegression()
reg.fit(df[['area', 'bedrooms', 'age']], df['price']) # fitting means training

# print coefficients,i.e. m1, m2, m3 values


reg.coef_ # 137.25, -26025. , -6825.

# intercept
reg.intercept_ # 383725.

# predict the price of 3000 sft area, 3 bed rooms, 40 years old house
reg.predict([[3000, 3, 40]]) # 444400.

# predict the price of 2500 sft area, 4 bed rooms, 5 years old house
reg.predict([[2500, 4, 5]]) # 588625.

Task on Multiple Linear Regression

hiring.csv file contains hiring statics for a firm such as experience of candidate, his written test score
and personal interview score. Based on these 3 factors, HR will decide the salary. Given this data,
you need to build a machine learning model for HR department that can help them decide salaries
for future candidates. Using this predict salaries for following candidates,

a) 2 yr experience, 9 test score, 6 interview score

b) 12 yr experience, 10 test score, 10 interview score

dataset
hiring.csv.

You might also like