Data Science Assignment Chapter 1

This document discusses using machine learning algorithms to predict real estate or house prices based on various features. It begins with an introduction to machine learning and why price prediction is important for both buyers and sellers. A literature review is then presented covering previous work using algorithms like random forest regressors, decision trees, and linear regression on housing datasets. The proposed work section outlines developing a model that takes user-input features and predicts a property's price based on how those features impact prices in the training dataset. The system will include data cleaning, preprocessing, training various algorithms, and selecting the most accurate model for users to get price predictions.

Uploaded by

Gaming World

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views

Data Science Assignment Chapter 1

Uploaded by

Gaming World

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

House Price Prediction

Abstract:- This paper demonstrates the usage of machine learning algorithms in the
prediction of Real estate/House prices on two real datasets downloaded from Kaggle
from Boston created by Harrison, D., and Rubinfeld, D.L. and from Melbourne created
by Anthony Pino. To this day, literature about research on machine learning prediction
of house prices in India is extremely limited. This paper provides a review of the usage
of existing machine learning algorithms on two extremely different datasets and tries to
implement this prediction engine for real-life usage by users. The findings indicate that
using different algorithms can drastically change accuracy. Also, a poor dataset can
negatively affect the predictions. Furthermore, it provides sufficient proof of what
algorithm is best suitable for this task.

I. INTRODUCTION
Machine Learning (ML) is a vital aspect of present-day business and research. It
progressively improves the performance of computer systems by using algorithms and
neural network models. Machine Learning algorithms automatically build a
mathematical model using sample data – also referred to as “training data” which
form decisions without being specifically programmed to make those decisions.
People and real estate agencies buy or sell houses, people buy to live in or as an
investment and the agencies buy to run a business. Either way, we believe everyone
should get exactly what they pay for. over-valuation/under-valuation in housing
markets has always been an issue and there is a lack of proper detection measures.
Broad measures, like house/Real-estate price-to-rent ratios, give a primary pass.
However, to decide about this issue an in-depth analysis and judgment are necessary.
Here’s where machine learning comes in, by training an ML model with hundreds and
thousands

of data a solution can be developed which will be powerful enough to predict prices
accurately and can cater to everyone’s needs.

The primary aim of this paper is to use these Machine Learning Techniques and
curate them into ML models which can then serve the users. The main objective of a
Buyer is to search for their dream house which has all the amenities they need.
Furthermore, they look for these houses/Real estates with a price in mind and there is
no guarantee that they will get the product for a deserving price and not overpriced.
Similarly, A seller looks for a certain number that they can put on the estate as a price
tag and this cannot be just a wild guess, lots of research needs to be put to conclude a
valuation of a house.
Additionally, there exists a possibility of underpricing the product. If the price is
predicted for these users this might help them get estates for their deserving prices not
more not less.
III. LITERATURE REVIEW

Real Estate has become more than a necessity in this 21st century, it represents
something much more nowadays. Not only for people looking into buying Real Estate
but also the companies that sell these Estates. According to Real Estate Property is not
only the basic need of a man but today it also represents the riches and prestige of a
person. Investment in real estate generally seems to be profitable because their
property values do not decline rapidly. Changes in the real estate price can affect
various household investors, bankers, policymakers, and many. Investment in the real
estate sector seems to be an attractive choice for investments. Thus, predicting the real
estate value is an important economic index.
suggests that Every single organization in today’s real estate business is operating
fruitfully to achieve a competitive edge over alternative competitors. There is a need
to simplify the process for a normal human being while providing the best results
proposed to use machine learning and artificial intelligence techniques to develop an
algorithm that can predict housing prices based on certain input features. The business
application of this algorithm is that classified websites can directly use this algorithm
to predict prices of new properties that are going to be listed by taking some input
variables and predicting the correct and justified price i.e., avoid taking price inputs
from customers and thus not letting any error creeping in the system used Google
Colab/Jupiter IDE. Jupiter IDE is an open-source web app that helps us to share as
well create documents that have LiveCode, visualizations, equations, and text that
narrates. It contains tools for data cleaning, data transformation, simulation of
numeric values, modeling using statistics, visualization of data, and machine learning
tools. [10] designed a system that will help people to know close to the precise price
of real estate. User can give their requirements according to which they will get the
prices of the desired houses User can also get the sample plan of the house to get a
reference for houses. In [5] Housing value of the Boston suburb is analyzed and
forecast by SVM, LSSVM, and PLS methods and the corresponding characteristics.
After getting rid of the missing samples from the original data set, 400 samples are
treated as training data and 52 samples are treated as test data. Housing value of the
training data. As per [1]’s findings, the best accuracy was provided by the Random
Forest Regressor followed by the Decision Tree Regressor. A similar result is
generated by the Ridge and Linear Regression with a very slight reduction in Lasso.
Across all groups of feature selections, there is no extreme difference between all
regardless of strong or weak groups. It gives a good sign that the buying prices can be
solely used for predicting the selling prices without considering other features to
disseminate model over-fitting. Additionally, a reduction in accuracy is apparent in
the very weak features group. The same pattern of results is visible on the Root
Square Mean Error (RMSE) for all feature selections. [2] observed that their data set
took more than one day to prepare. As opposed to performing the computations
sequentially, we might utilize various processors and parallel the computations
involved, which might possibly decrease the preparation time Furthermore prediction
period. Include All the more functionalities under the model, we can give choices for
clients with select a district alternately locale should produce those high-temperature
maps, as opposed to entering in the list. used a data set of 100 houses with several
parameters. We have used 50 percent of the data set to train the machine and 50
percent to test the machine. The results are truly accurate. And we have tested it with
different parameters also. Not using PSO makes it easier to train machines with
complex problems and hence regression is used. experimented with the most
fundamental machine learning algorithms like decision tree classifier, decision tree
regression, and multiple linear regression. Work is implemented using the Scikit-
Learn machine learning tool. This work helps the users to predict the availability of
houses in the city and also to predict the prices of the houses. used machine learning
algorithms to predict house prices. We have mentioned the step-by-step procedure to
analyze the dataset. These feature sets were then given as an input to four algorithms
and a CSV file was generated consisting of predicted house prices expressed that
There is a need to use a mix of these models a linear model gives a high bias whereas
a high model complexity-based model gives a high variance The outcome of this
study can be used in the annual revision of the guideline value of land which may add
more revenue to the State Government while this transaction is made. concludes that
by conducting this experiment with various machine learning algorithms it’s been
clear that random forest and gradient boosted trees are performing better with more
accuracy percentage and with fewer error values. when this experiment is compared
with the label and to the result achieved these algorithms predict well.

IV. PROPOSED WORK

The purpose of this system is to determine the price of a house by looking at the various
features which are given as input by the user. These features are given to the ML model
and based on how these features affect the label it gives out a prediction. This will be
done by first searching for an appropriate dataset that suits the needs of the developer as
well as the user. Furthermore, after finalizing the dataset, the dataset will go through the
process known as data cleaning where all the data which is not needed will be
eliminated and the raw data will be turned into a .csv file. Moreover, the data will go
through data preprocessing where missing data will be handled and if needed label
encoding will be done. Moreover, this will go through data transformation where it will
be converted into a NumPy array so that it can finally be sent for training the model.
While training various machine learning algorithms will be used to train the model their
error rate will be extracted and consequently an algorithm and model will be finalized
which can yield accurate predictions.
Users and companies will be able to log in and then fill a form about various attributes
about their property that they want to predict the price of. Additionally, after a
thorough selection of attributes, the form will be submitted. This data entered by the
user will then go to the model and within seconds the user will be able to view the
predicted price of the property that they put in.
4.1 Block Diagram of the System
The above block diagram is the traditional Machine Learning Approach. It consists of
two sections: the training and the testing. The training has the following components:
the label, input, feature extractor, and the machine learning algorithm. The testing
section has the following components in it: the input, feature extractor, the regression
model, and the output label.
Input: The input consists of data collected from various sources.
Feature Extractor: Only important features which affect the prediction results are kept.
Other unnecessary attributes are discarded, like ID or name.
Features: After feature extraction only, some inputs are considered which largely
contribute to the prediction of the model.
Machine Learning Algorithm: The ML Algorithm is the method by which an AI
system performs its task, and is most commonly used to predict output values from
given input values. Regression is one of the main processes of machine learning.
The Regression Model: The regression model consists of a set of machine-learning
methods that allow us to predict a label variable (y) based on the values of one or
more attribute/feature variables (x). Briefly, the goal of a regression model is to build
a mathematical equation that defines y as a function of the x variables.
Label: The label is the output obtained from the model after training. The data
obtained from the dataset is given as a training input first and the relevant training
features are extracted. These training features are preprocessed to get a normalized
dataset and labeling of the data row is done. The result from the training dataset is fed
to the machine learning algorithm. The result from the Machine Learning Algorithm
is fed to the Regression model, thus producing a trained model or trained regressor.
This trained regressor can take the new data that is the extracted feature from the test
as input and predict its output label.
Handling Missing Values
The Boston dataset only had five missing values. However, the Melbourne dataset had
a lot of missing values (in thousands). Dropping the null values was not an option
since it negatively affected the accuracy. These null values were handled by replacing
them with the median value of the column. The replacement was done by
implementing the simple imputer function in the pipeline itself. So that any missing
value in the future would be handled as soon as the data passes through the pipeline.
Additionally, the Melbourne dataset had missing label/price values which had to be
dropped for better results.

5.2 Data Preprocessing

Train and Test Split

We have split the dataset into two sets i.e. the Training set and the Testing set.
Training set consists of 80% of the
dataset and the testing set has 20% of the dataset. We had
columns with only two distinct values and wanted to make sure that the splitting
should split these values in equal
proportions. Therefore, we used a stratified shuffle split for train test splitting for
better results.

Figure 5.2.a Stratified Shuffle Split in Boston dataset

Figure 5.2.b Stratified Shuffle Split in Melbourne dataset

cs2107 Cheatsheet
No ratings yet
cs2107 Cheatsheet
16 pages
KV COM+ Library Manual
100% (2)
KV COM+ Library Manual
406 pages
107VF - User Manual En-V2.12
No ratings yet
107VF - User Manual En-V2.12
41 pages
Dsbda Mini Manav
No ratings yet
Dsbda Mini Manav
17 pages
SSRN Id3565512
No ratings yet
SSRN Id3565512
5 pages
Price Prediction
No ratings yet
Price Prediction
16 pages
Real Estate Price Prediction
No ratings yet
Real Estate Price Prediction
7 pages
Real Estate Project PDF
No ratings yet
Real Estate Project PDF
8 pages
Main
No ratings yet
Main
35 pages
B4 Boston House Pricing
No ratings yet
B4 Boston House Pricing
63 pages
Bangalore House Price Prediction
No ratings yet
Bangalore House Price Prediction
4 pages
House Price Prediction Using Machine Learning: © MAY 2021 - IRE Journals - Volume 4 Issue 11 - ISSN: 2456-8880
No ratings yet
House Price Prediction Using Machine Learning: © MAY 2021 - IRE Journals - Volume 4 Issue 11 - ISSN: 2456-8880
5 pages
Khare 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012053
No ratings yet
Khare 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012053
15 pages
House Price Prediction - Research Paper FINAL DRAFT
100% (1)
House Price Prediction - Research Paper FINAL DRAFT
10 pages
Utkarsh Gupta G (73) (House Price Prediction)
No ratings yet
Utkarsh Gupta G (73) (House Price Prediction)
6 pages
Housepricepdf 2
No ratings yet
Housepricepdf 2
3 pages
Utkarsh Gupta - House Price Prediction
No ratings yet
Utkarsh Gupta - House Price Prediction
6 pages
Paper 4404
No ratings yet
Paper 4404
10 pages
Main
No ratings yet
Main
35 pages
IJIRCT2203007
No ratings yet
IJIRCT2203007
4 pages
BDA_REPORT
No ratings yet
BDA_REPORT
27 pages
House Price Forecasting Using Machine Learning Methods: Uter and Mathematics Education 11 (2021), 3624-3632
No ratings yet
House Price Forecasting Using Machine Learning Methods: Uter and Mathematics Education 11 (2021), 3624-3632
9 pages
Dsbda Mini Priyanshu
No ratings yet
Dsbda Mini Priyanshu
17 pages
ES205 Researchpaper
No ratings yet
ES205 Researchpaper
17 pages
SSRN Id4413863
No ratings yet
SSRN Id4413863
5 pages
Artificial Intelligence Approach For Modeling House Price Prediction
No ratings yet
Artificial Intelligence Approach For Modeling House Price Prediction
5 pages
Fyp Proposal
No ratings yet
Fyp Proposal
3 pages
Survey Paper Updated
No ratings yet
Survey Paper Updated
12 pages
Review paper of house rate prediction
No ratings yet
Review paper of house rate prediction
7 pages
HOUSE-PRICE-PREDICTION-Shreya-Majumder012345678910111213141516171819_sign
No ratings yet
HOUSE-PRICE-PREDICTION-Shreya-Majumder012345678910111213141516171819_sign
21 pages
Final Report
No ratings yet
Final Report
92 pages
CSIC 6132 排版870 878
No ratings yet
CSIC 6132 排版870 878
9 pages
Web Application With Machine Learning For House Price Prediction
No ratings yet
Web Application With Machine Learning For House Price Prediction
20 pages
House Resale Price Prediction Using Classification Algorithms
No ratings yet
House Resale Price Prediction Using Classification Algorithms
4 pages
synopsis of predicting house prices using decison tree
No ratings yet
synopsis of predicting house prices using decison tree
14 pages
Real Estate Price Prediction Using Machine Learning
No ratings yet
Real Estate Price Prediction Using Machine Learning
7 pages
IJCRT2111135
No ratings yet
IJCRT2111135
7 pages
House Prices
No ratings yet
House Prices
5 pages
HOUSE_PREDICTION_(1)[1]new[1][1]
No ratings yet
HOUSE_PREDICTION_(1)[1]new[1][1]
24 pages
1822 B.E Ece Batchno 120
No ratings yet
1822 B.E Ece Batchno 120
29 pages
House Prices Prediction
100% (1)
House Prices Prediction
51 pages
House Pricing Prediction System
No ratings yet
House Pricing Prediction System
36 pages
draft of prgt
No ratings yet
draft of prgt
8 pages
7th Sem Report File
No ratings yet
7th Sem Report File
41 pages
Topic - Mini Research Project (CIA 4)
No ratings yet
Topic - Mini Research Project (CIA 4)
4 pages
R D National College Mumbai University: On "House Price Prediction System"
No ratings yet
R D National College Mumbai University: On "House Price Prediction System"
14 pages
House Price Prediction Report
No ratings yet
House Price Prediction Report
2 pages
SSRN-id3635948
No ratings yet
SSRN-id3635948
8 pages
Mini Project Report Format
No ratings yet
Mini Project Report Format
22 pages
Phase 5
No ratings yet
Phase 5
5 pages
Real_Estate_Price_Prediction_Using_a_Logistic_Regression_Model (1)
No ratings yet
Real_Estate_Price_Prediction_Using_a_Logistic_Regression_Model (1)
8 pages
House Price Prediction Using Machine Learning: Bachelor of Technology
No ratings yet
House Price Prediction Using Machine Learning: Bachelor of Technology
20 pages
MBB JETIR2204579
No ratings yet
MBB JETIR2204579
5 pages
House Price Prediction Literature Review
No ratings yet
House Price Prediction Literature Review
9 pages
Real Estate Price Prediction Model
No ratings yet
Real Estate Price Prediction Model
3 pages
House Price Predictor Using ML Through A
No ratings yet
House Price Predictor Using ML Through A
4 pages
PBL-1 Research Paper
No ratings yet
PBL-1 Research Paper
5 pages
House Price Prediction Using Machine Learning and Neural Networks
No ratings yet
House Price Prediction Using Machine Learning and Neural Networks
4 pages
Oral Presentation
No ratings yet
Oral Presentation
9 pages
Aastha Mahajan Python File
No ratings yet
Aastha Mahajan Python File
17 pages
Visvesvaraya Technological University Belagavi: House Price Prediction Using Machine Learning
No ratings yet
Visvesvaraya Technological University Belagavi: House Price Prediction Using Machine Learning
9 pages
Synopsis Format1.PDF
No ratings yet
Synopsis Format1.PDF
6 pages
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
From Everand
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
Steven Taylor
No ratings yet
JavaScript Cheatsheet 2024
No ratings yet
JavaScript Cheatsheet 2024
24 pages
ssss
No ratings yet
ssss
282 pages
CCNA4 Case Study Inst en
No ratings yet
CCNA4 Case Study Inst en
9 pages
SP Col 200X400
No ratings yet
SP Col 200X400
5 pages
VERNER Assignment
No ratings yet
VERNER Assignment
8 pages
Fixed Charge Coverage
No ratings yet
Fixed Charge Coverage
2 pages
Mis PDF Lecture Notes 1 20
No ratings yet
Mis PDF Lecture Notes 1 20
187 pages
MNS MNS: Auto Transfert Switch Unit
No ratings yet
MNS MNS: Auto Transfert Switch Unit
2 pages
PDF CH 1 Understanding Mechatronics Final19 - 08 - 2024
No ratings yet
PDF CH 1 Understanding Mechatronics Final19 - 08 - 2024
82 pages
Ls Cognitive Proficiency CPI Analysis
No ratings yet
Ls Cognitive Proficiency CPI Analysis
6 pages
Account Statement From 1 Jul 2022 To 31 Dec 2022: TXN Date Value Date Description Ref No./Cheque No. Debit Credit Balance
No ratings yet
Account Statement From 1 Jul 2022 To 31 Dec 2022: TXN Date Value Date Description Ref No./Cheque No. Debit Credit Balance
10 pages
Graph Theory: Maximum Flows and Min Cost Flows
No ratings yet
Graph Theory: Maximum Flows and Min Cost Flows
13 pages
Teaching Learning Based Optimization
No ratings yet
Teaching Learning Based Optimization
50 pages
Opt Nov. 2023 Ella
No ratings yet
Opt Nov. 2023 Ella
539 pages
Delfi: Online Planner Selection For Cost-Optimal Planning: Michael Katz and Shirin Sohrabi and Horst Samulowitz
No ratings yet
Delfi: Online Planner Selection For Cost-Optimal Planning: Michael Katz and Shirin Sohrabi and Horst Samulowitz
12 pages
OBIEE 11G Certification Matrix
100% (1)
OBIEE 11G Certification Matrix
26 pages
Fuel Dispenser Guide Book ZC-C说明书08.20
100% (1)
Fuel Dispenser Guide Book ZC-C说明书08.20
36 pages
Stopping A Batch Input Session - SAP Community
No ratings yet
Stopping A Batch Input Session - SAP Community
4 pages
Download Complete (Ebook) Probabilistic Machine Learning for Civil Engineers by James-A Goulet ISBN 9780262538701, 0262538709 PDF for All Chapters
100% (2)
Download Complete (Ebook) Probabilistic Machine Learning for Civil Engineers by James-A Goulet ISBN 9780262538701, 0262538709 PDF for All Chapters
71 pages
ROS Node for Publishing YAW Data from MPU6050 Sensor
No ratings yet
ROS Node for Publishing YAW Data from MPU6050 Sensor
3 pages
Knot My Reality 1st Edition Miranda May download pdf
100% (2)
Knot My Reality 1st Edition Miranda May download pdf
47 pages
What To Put Under Skills On A Resume
100% (1)
What To Put Under Skills On A Resume
8 pages
8051 Microcontroller
No ratings yet
8051 Microcontroller
2 pages
Class 12 Informatics Practices Sample Paper Set 3
No ratings yet
Class 12 Informatics Practices Sample Paper Set 3
13 pages
Lithium Ion Battery Monitoring System AD7280A: Features Functional Block Diagram
No ratings yet
Lithium Ion Battery Monitoring System AD7280A: Features Functional Block Diagram
49 pages
Foundations of Computer Science 4th Edition by Forouzan
No ratings yet
Foundations of Computer Science 4th Edition by Forouzan
7 pages
Nick's CCIE Progress and Methods - Method Post - Using EVE-NG For INE ATC Labs
No ratings yet
Nick's CCIE Progress and Methods - Method Post - Using EVE-NG For INE ATC Labs
14 pages