This project presents an exploratory analysis of a car dataset using R. The goal was to uncover basic insights around car pricing, fuel types, and company-wise trends by cleaning, visualizing, and analyzing key attributes.
To perform structured EDA and pre-processing on a car dataset to understand:
- Which brands dominate the dataset?
- How fuel types are distributed across different cars?
- How pricing varies across brands?
- Variation in pricing with respect to car age.
-
Data Cleaning
- Split the
Car_Namecolumn intoCar_CompanyandCar_Modelfor better granularity. - Removed unwanted characters and standardized company names.
- Handled missing values using
micewhere appropriate.
- Split the
-
Exploratory Data Analysis
- Bar plots to show most frequently sold cars.
- Boxplots for fuel efficiency comparisons across brands.
- Scatter plots to explore relationships between different variables.
-
Few Key Insights
- Cars with higher fuel efficiency will have a higher price bracket.
- Cars with higher mileage tend to have lower sale prices, but some high-mileage cars continue to cost more.
- Fuel efficiencies of Audi and Toyota are the highest among all manufacturers.
ggplot2bar charts, boxplots, and scatter plots.patchworkused to arrange multiple plots.GGally::ggpairs()for pairwise relationship exploration.kableExtrafor clean and styled data tables in output reports.
tidyverse # data manipulation and visualization
lubridate # handling date/time if needed
knitr # rendering tables and reports
GGally # pair plots and correlation visuals
mice # missing value imputation
patchwork # combining multiple ggplots
readr # data import
kableExtra # enhanced table output