Assignment 5
Assignment 5
ASSIGNMENT – 5
Submi&ed By – Pankhuri Mishra
23/UMBA/72
MBA Sec<on – B
AMESHOUSING PROJECT
Execu5ve Summary
This project focuses on uncovering pa3erns in housing sales by categorizing houses into
dis9nct clusters based on their characteris9cs. Through detailed descrip9ve and cluster
analyses, key variables such as Sale Price, Gr_Liv_Area (above ground living area), Lot Area,
and Overall Quality have been iden9fied as cri9cal factors influencing these groupings.
Leveraging these variables, the data has been systema9cally divided into four dis9nct clusters,
providing a clearer understanding of the rela9onships between housing a3ributes and their
impact on market segmenta9on.
Introduc5on
The real estate market is inherently complex, influenced by a mul9tude of factors that
determine property values and buyer preferences. To make informed decisions, it is essen9al
to analyze and group houses based on shared characteris9cs. This project addresses this need
by employing sta9s9cal techniques to explore the rela9onships between various housing
a3ributes and categorizing the dataset into clusters.
Using descrip9ve analysis, cri9cal variables such as Sale Price, Gr_Liv_Area, Lot Area, and
Overall Quality were iden9fied as significant determinants in shaping housing clusters.
Subsequently, a cluster analysis was conducted to group the houses into four categories, each
represen9ng a unique combina9on of features. This segmenta9on provides ac9onable
insights, aiding stakeholders such as developers, real estate professionals, and policymakers
in understanding market trends, targe9ng specific buyer segments, and making data-driven
decisions.
• The data is less skewed for Sale Price and more skewed for Lot Area.
• Ra9ng for overall quality lies between 1-9 and for overall condi9on, it lies between 3-
9.
• Strong and posi9ve correla9on exists between sales price & Ground living area, and
sales price & overall quality.
• Weak and posi9ve correla9on exists between Lot area & overall quality, and lot area
& overall condi9on.
SalePrice vs Gr_Liv_Area
350000.00
300000.00
250000.00
Sale Price
200000.00
150000.00
100000.00
50000.00
0.00
0.00 200.00 400.00 600.00 800.00 1000.00 1200.00 1400.00 1600.00
Gr_Liv_Area
SalePrice vs Lot_Area
350000.00
300000.00
250000.00
Sale Price
200000.00
150000.00
100000.00
50000.00
0.00
0.00 5000.00 10000.00 15000.00 20000.00 25000.00 30000.00
Lot_Area
These findings will help the client to group houses sold into different category for analysis and
implemen9ng strategies for the sale of similar houses in the future.
Conclusion
Cluster analysis is a sta9s9cal technique used to group similar data points or objects into
clusters based on their characteris9cs. It helps iden9fy pa3erns or groupings within a dataset,
which can be useful for understanding rela9onships, segmenta9on, and making predic9ons.
Using this analysis, the dealers and stakeholders involved can group customers into different
clusters according to their preferences and if someone wants a unique combina9on, they can
make customized offers for that customer.