Project Immo en

Real estate analysis and prediction
Younes Cherkaoui
The Beijing University of Technology
Introduction :
The hazards of a company's life and the rapid changes in society make predictability
complex and unpredictable. Trying to use mathematics and analysis to reduce the
margin of error will not be enough to guarantee investors the accuracy of their
investment. This is why the development of a recommendation tool is very useful to
favor or even anticipate an opportunity in the real estate market.
This is the real estate investment recommendation system that I built for my final
project during the university exchange at Beijing.
Plan :
1. Which data and model to use for this project :
- The model strongly depends on the outputs that I want, so I need to define
them first.
- After having defined what type of data I want to obtain, it is necessary to
determine what I want (prediction, optimization, trend ..) to define which
type of model is the most adapted for this type of data (output).
2. Pre-processing :
2.1 Data collect :

I collected diferent types of data that I considered personally important for
the prediction.
My dataset has houses prices by geographical area in Paris, the number of
real estate transactions by geographical area, the long-term borrowing rate
that corresponds to real estate loans. I wanted to add an exotic input data
that corresponds to the revenue migration in Paris. Maybe there is a link
with the migration of some slices (ranking) of income and the evolution of
the price.
2.2 Extract Transform Load – ETL :
To collect these different data, mainly for the migration of salary I had to make some
transformations. In France the declarations of wages are made by salary bracket. It's
slices have changed over time as can be seen for 2004 and 2014. I had to take into
account this change in order to be able to carry out my study.
In addition my goal was to see the link if there is between the migration of salary
levels and the impact on the price of real estate on the same geographical area.
2004 2014
As you can see here the bracket salary is different so to be able to compare them I
had to do a transformation upstream, as follow :
3 Machine Learning Algorithm :
3.1 Mutli Lineare Regression

I use the most basic machine learning and widely used statistical technique for
predictive modeling. It basically gives us an equation, where we have our
features as independent variables.
My Linear Regression equation looks like this :
Where :
Y correponds dependent variables
X corresponds independent variables
Ѳ corresponds at the coefficients (basically the weights assined to the features)
In model I have 3 variables define as follows :

Time AreaCode Migration Price Rate
INPUT : {X1= Income migration ; X2 = Real Estate Price ; X3 = Loan Rate }

OUT PUT : {Area Code, Prédiction Price}
Model :
(𝑌2 ) = Ѳ11 ∗ 𝐼𝑛𝑐𝑜𝑚𝑒 + Ѳ12 ∗ 𝑌1 + Ѳ13 ∗ 𝐿𝑜𝑎𝑛𝑅𝑎𝑡𝑒 𝑌̃3
(𝑌3 ) = Ѳ12 ∗ 𝐼𝑛𝑐𝑜𝑚𝑒 + Ѳ22 ∗ 𝑌2 + Ѳ23 ∗ 𝐿𝑜𝑎𝑛𝑅𝑎𝑡𝑒 𝑌̃4

Label Value …. prediction
𝑌2 𝑌̃3
𝑌3 𝑌̃4
𝑌4
…
= ∑𝑡−1
𝑖=1 Ѳ𝑖 ∗ 𝑋𝑖
𝑌̃5
…
𝑌𝑡−1 𝑌̃𝑡
( 𝑌𝑡 ) (𝑌̃
𝑡+1 )
Lost function :
To evaluate the accuracy of my model I use MSE (Mean Square error), in
fact the MSE assesses the quality of a predictor.
If a vector of n predictions generated from sample of n data points on all
variables, and Y is the vector observed values of the variable being
predicted, then the within sample MSE predictor is computed as :
My model :
For the prediction with linear regression model I have two way to do my
prediction :
- First one I get the teta coeficient after my training model and I used to my
prediction.
As you can see my model is not really good, in fact I just follow the price
value with a gap of one step.
For a good model the MSE should converge, in my model, the MSE behaves as
a stochastic variable. But the amplitude of the value decreases so it's a good
point.
- Second I use for each iteration the last prediction price, obtained from teta
coefficients.
As you can see here the value price are bit diferent, however each value are obtained
about only one prediction value, that’s mean I have need only one real price value to
do thhis prediction. And my model follow the trend and this one is good.
Here the MSE is not good, the amplitude does not decrease. But from a
total point of view the model does not predict the values but a trend.
3.2 LSTM
For the second part of my project I use a machine learning model named
Recurrent Neural Network, RNN is a class of artificial neural network where
connections between nodes from a directed cycle. LSTM(Long Short Term
Memory) is a variation of RNN, which has successfully tackled a lot
problem in AI.
I intend to use LSTM as backbone of my solution to get better accuracy, I

use the same label data (input, output) for my LSTM model.
The input is a datamatrix of dimension (132 x 81) composed by 20 vectors, I

have 1 vector for each area composed as follows :
INPUT : {X1= Income migration ; X2 = Real Estate Price ; X3 = Loan Rate } (132x81)
OUT PUT : {Area Code, Prediction Price} (2x1)
Model :
I did not need to do a preliminary study on the correlation link between my input
data and the output data. Since I use a deep learning model a weight will be
affected by the algorithm to different input data that are processed in the sub
layers of the algorithm. This will eliminate their importance if this data is not very
important for the analysis.
These weights occur between the values that feed into the block (including the
input vector, and the output from the previous time step) and each of the gates.
Thus, the LSTM block determines how to maintain its memory as a function of
those values, and training its weights causes the LSTM block to learn the function
that minimizes loss. LSTM blocks are usually trained with Backpropagation
through time.
Lost function :
As for the first model I use MSE to evaluate the accuracy of my model, in
fact the MSE assesses the quality of a predictor :
My model :
For the first time I implement a neural network and I admit that it is very exciting. I
encountered several problems especially for the dimension change between my input
and output data. My model is working and we can see a greater sensitivity than for
the model of linear regression. In fact, predicted values sometimes vary by only one
thousandth of a percentage for the smallest variation and a variation of 6% for the
biggest variation.
Graph :
As you can see here, my error decreases for each iteration of backpropagation. I have
tried for different number of Units and different number of epochs to get the best
accuracy possible.
About my prediction as I told you, there is a strong sensitivity of price values. It’s why
price value changes little. This problem can be coming about the poor number of my
data training. In fact to feed forward a neural network the number of data must be
important which is not my case here.
3.3 Final part :
In fact my goal was not only to make prediction of the price. But I need this
prediction to find the set of action (buy/sell/do nothing) to get best opportunities
and to maximize my profit.
For this section, consider the following dynamic programming formulation. Time
is discrete { t = i / i= 0, 1, … , T }, 𝑥𝑡 ∈ X is the state at time t and 𝑎𝑡 ∈ 𝐴𝑡 is the
action at time t.
I need to define my plan equation, I mean the state evolves according to functions
𝑓𝑡 : X x 𝐴𝑡 X
𝑥𝑡+1 = 𝑓(𝑥𝑡 , 𝑎𝑡 )
A policy π chooses an action π𝑡 at each time t. The (instantaneous) reward for

taking action a in state x at time t is 𝑟𝑡 (𝑎, 𝑥), and 𝑟𝑇 (𝑥) is the reward for
terminating in state x at time T.
Given initial state 𝑥0 , a dynamic program is the optimization
𝑊(𝑥0 ) ∶= 𝑀𝑎𝑥𝑖𝑚𝑎𝑧𝑒 𝑅(𝑎)

𝑇−1
𝑅(𝑎) ≔ ∑ 𝑟𝑡 (𝑥𝑡 , 𝑎𝑡 ) + 𝑟𝑇 (𝑥𝑇 )

𝑡=0
Bellman’s equation :
𝑊𝑇 (𝑥𝑇 ) = 𝑟𝑇 (𝑥) and for t = {T-1, …, 1,0}
𝑊𝑇 (𝑥𝑇 ) = 𝑠𝑢𝑝𝑎𝑡∈𝐴𝑡 𝑟𝑡 (𝑥𝑡 , 𝑎𝑡 ) + 𝑊𝑡+1 (𝑥𝑡+1 )
So I have define my contraints as follows :
Plan equation :
Portofolio value at time zero : 𝑥0 = 50000
Portofolio at the time T : 𝑃𝑡 = 𝑥0 + 𝑎𝑡−1 ∗ 𝑃𝑡−1
Set action : {-1,0,1} respectivement correspond at {buy, do nothing, sell}
If a the time s 𝑎𝑠 = -1 then 𝑎𝑡 = {0,1} ( with s < t ), indeed you can’t sell if you have
not brought before.
Objective :
𝑀𝑎𝑥𝑎𝑡 (𝑃𝑡 )
References :
[1] HousePrice PredictionUsing LSTM. XiaochenChen et al.,
[2] Predicting a house's selling price through inflating its previous selling price.
Andrew T. Brint et al.,
Web site :
- https://arxiv.org/ftp/arxiv/papers/1709/1709.08432.pdf
- https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-
learning-introduction-to-lstm/
- https://www.analyticsvidhya.com/blog/2018/10/predicting-stock-price-
machine-learningnd-deep-learning-techniques-python/
- Cnis.fr
- Insee.fr
- BASE.fr ( base de données notariale de France pour paris )
- PERVAL.fr ( base de données notariale de France pour PACA )
- MeilleursAgent.com
- Datastream .com
- Stat4decision.com
- Darques.eu

Project Immo en

Uploaded by

Copyright:

Available Formats

Project Immo en

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Immo en

Uploaded by

Copyright:

Available Formats

Real estate analysis and prediction

The Beijing University of Technology

2.1 Data collect :

3.1 Mutli Lineare Regression

In model I have 3 variables define as follows :

INPUT : {X1= Income migration ; X2 = Real Estate Price ; X3 = Loan Rate }

(𝑌3 ) = Ѳ12 ∗ 𝐼𝑛𝑐𝑜𝑚𝑒 + Ѳ22 ∗ 𝑌2 + Ѳ23 ∗ 𝐿𝑜𝑎𝑛𝑅𝑎𝑡𝑒 𝑌̃4

I intend to use LSTM as backbone of my solution to get better accuracy, I

The input is a datamatrix of dimension (132 x 81) composed by 20 vectors, I

3.3 Final part :

A policy π chooses an action π𝑡 at each time t. The (instantaneous) reward for

Given initial state 𝑥0 , a dynamic program is the optimization

𝑊(𝑥0 ) ∶= 𝑀𝑎𝑥𝑖𝑚𝑎𝑧𝑒 𝑅(𝑎)

𝑅(𝑎) ≔ ∑ 𝑟𝑡 (𝑥𝑡 , 𝑎𝑡 ) + 𝑟𝑇 (𝑥𝑇 )

You might also like