TTT 8
TTT 8
TTT 8
In the project you will have to collect your own data, run data analysis, and prepare a report and an
interactive dashboard with the results.
The topic of the project is rental accommodation in Australia. We all read in mass media that rental
market has gone crazy. We want to get a better understanding of this market in different locations.
You will use the web site: www.homely.com.au to collect the data. You can look on larger areas, like
SA, or focus on smaller well-defined areas, like Adelaide CBD or Glenelg or Barossa Valley.
1. Data availability – it does not make sense to run an analysis if selected location has only a
very small number of properties available. You need a reasonably large number of rental
properties.
2. Data should be comparable – it makes not much sense to compare Sydney CBD and Barossa
Valley markets. Select locations that are “somewhat comparable”, e.g. Adelaide CBD and
Melbourne CBD.
If there are too many properties in the selected location, you can limit yourself to a sub-sample of
the most recent (newest) properties on the market instead of downloading all.
You do not need to generate locations links in the code. Go to the website, select your location/s
manually and put their URL in the code.
There are two levels of pages: (1) brief information about multiple properties; and (2) detailed
information about a single selected property. You need to download and process both levels.
You select two or three (or more) locations to study and compare. You scrape as much information
as you believe necessary to investigate the areas of interest listed below. I deliberately avoid using
the word “questions”. These are not questions to answer but areas to investigate. Different students
might find important to focus on different research questions.
1. Describe the state of the rental market at the moment of data collection: number of
properties, prices, features, etc. We don’t have access to the historical data, so we focus on
the current state of the market.
2. Analyse text descriptions of properties. Property agents use some words to make properties
more attractive for potential clients. What are most common words? What are most
important words with respect to rental accommodation prices. For example, cheap
properties have no description at all as agents don’t bother; or all expensive properties
mention fantastic views.
3. Do prices analysis and make the best possible predictions for the property rental prices
based on all available data including results of above topics analysis.
Your final delivery will be (besides R script/s) a static report (MS Word/PDF file) and interactive Shiny
dashboard. Your MS Word/PDF report will include explanations of main functions of your dashboard
(besides “normal” report writing stuff).
Last five weeks of the study period will be dedicated to your project. I strongly encourage everyone
to start from week 9 and do some work towards the project every week. Week 9 gives you tools to
download and prepare the data. Week 10 shows tools for analysis of categorical and text data.
Weeks 11 and 12 teach you creating dashboards.
This is not an assignment. This is a project. There are no simple questions and there are no simple
right answers. You should do steps in the right direction and use right tools. We will be working on it
during computer practicals (at least partially), and I will be available for discussions, review, and
feedback on your progress.
This is an individual assessment, but I suggest students to discuss R and data analysis in the class and
on the forum. Just remember about academic integrity.
If you have any questions, you are always welcome to ask them on the forum or by email.
To stimulate your early start on the project, I make the first step of downloading data your Test 3.
Potential problems:
1. Some locations might have too many properties. For example, some locations might have
thousands of properties available. It is difficult to download that many properties. Other
locations might have only a small number of properties available – not enough for conclusive
analysis. Select locations wisely and/or limit download to a sub-sample of a reasonable size.
2. Download a text description of the property will be limited to the first several lines only. This
is OK. It is a specific of Domain web server and limitation of rvest functionality. You will
download whatever is possible and later do your analysis on these limited data.