Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

SMDM Guided Project Report

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

SMDM GUIDED

PROJECT
REPORT
Contents:
Question 1: How many rows and columns are present in the data? [0.5 mark]

Question 2: What are the datatypes of the different columns in the dataset? [0.5 mark]

Question 3: Are there any missing values in the data? If yes, treat them using an appropriate method. [1 Mark]

Question 4: Check the statistical summary of the data. What is the minimum, average, and maximum time it takes
for food to be prepared once an order is placed? [2 marks]

Question 5: How many orders are not rated? [1 mark]

Univariate Analysis

Question 6: Explore all the variables and provide observations on their distributions. (Generally, histograms,
boxplots, countplots, etc. are used for univariate exploration.) [8 marks]

Question 7: Which are the top 5 restaurants in terms of the number of orders received? [1 mark]

Question 8: Which is the most popular cuisine on weekends? [1 mark]

Question 9: What percentage of the orders cost more than 20 dollars? [2 marks]

Question 10: What is the mean order delivery time? [1 mark]

Question 11: The company has decided to give 20% discount vouchers to the top 3 most frequent customers. Find
the IDs of these customers and the number of orders they placed. [1 mark]

Multivariate Analysis

Question 12: Perform a multivariate analysis to explore relationships between the important variables in the
dataset. (It is a good idea to explore relations between numerical variables as well as relations between numerical
and categorical variables) [9 marks]

Question 13: The company wants to provide a promotional offer in the advertisement of the restaurants. The
condition to get the offer is that the restaurants must have a rating count of more than 50 and the average rating
should be greater than 4. Find the restaurants fulfilling the criteria to get the promotional offer. [3 marks]

Question 14: The company charges the restaurant 25% on the orders having cost greater than 20 dollars and 15% on
the orders having cost greater than 5 dollars. Find the net revenue generated by the company across all orders. [3
marks]

Question 15: The company wants to analyze the total time required to deliver the food. What percentage of orders
take more than 60 minutes to get delivered from the time the order is placed? (The food has to be prepared and
then delivered.)[2 marks]

Question 16: The company wants to analyze the delivery time of the orders on weekdays and weekends. How does
the mean delivery time vary during weekdays and weekends? [2 marks]

Question 17: What are your conclusions from the analysis? What recommendations would you like to share to help
improve the business? (You can use cuisine type and feedback ratings to drive your business recommendations.) [6
marks]

 Graphs & Plots


Question 1: How many rows and columns are present in the data? [0.5 mark]
Ans: There are 1898 rows and 9 columns in the data.
Question 2: What are the datatypes of the different columns in the dataset? [0.5 mark]
Ans: The datatypes are float64(1), int64(4), object (4).
Question 3: Are there any missing values in the data? If yes, treat them using an
appropriate method. [1 Mark]
Ans: There was no null values in the data frame so the data cleaning part was not included.
Question 4: Check the statistical summary of the data. What is the minimum, average, and
maximum time it takes for food to be prepared once an order is placed? [2 marks]
Ans: Minimum: 20.00, Average: 27.37, Maximum: 35.00
Question 5: How many orders are not rated? [1 mark]
Ans: 736
Univariate Analysis
Question 6: Explore all the variables and provide observations on their distributions.
(Generally, histograms, boxplots, countplots, etc. are used for univariate exploration.) [8
marks]
Ans: The code is organised into three sections based on the type of data being analyzed:
categorical data with a large number of categories, categorical data with a small number of
categories, and numerical data.
In the first section, the code loops over two categorical columns (restaurant_name and
cuisine_type) and creates a countplot for each, showing the frequency of each category.
Only the top 10 categories are plotted to avoid overcrowding.
In the second section, two countplots are created for the categorical columns rating and
day_of_the_week, which have a small number of categories. These plots are placed side-by-
side in a 1*2 grid of subplots.
In the third section, the code defines a list of numerical variables and creates a 2*3 grid of
subplots, with each numerical variable plotted in a boxplot and histogram. The bosplots
show the distribution of each variable, while the histograms show the frequency
distribution.
Question 7: Which are the top 5 restaurants in terms of the number of orders received? [1
mark]
Ans: Shake Shack (219), The Meatball Shop (132), Blue Ribbon Sushi (119), Blue Ribbon
Fried Chicken (96), Parm (68)
Question 8: Which is the most popular cuisine on weekends? [1 mark]
Ans: American (415), Japanese (335), Italian (207), Chinese (163), Mexican (53)
Question 9: What percentage of the orders cost more than 20 dollars? [2 marks]
Ans: Percentage of orders that cost more than $20: 29.24%. The gt method creates a
boolean data series from the column where the rows that greater than the argument is
replaced with True, and otherwise False. Calling mean afterwards gets the proportion of
the True vales since True is treated as 1 and False as Zero by the addition operation.
Question 10: What is the mean order delivery time? [1 mark]
Ans: Mean order delivery time: 24.16
Question 11: The company has decided to give 20% discount vouchers to the top 3 most
frequent customers. Find the IDs of these customers and the number of orders they
placed. [1 mark]
Ans: 52832 (13), 47440 (10), 83287 (9), 250494 (8), 259341 (7)
Multivariate Analysis
Question 12: Perform a multivariate analysis to explore relationships between the
important variables in the dataset. (It is a good idea to explore relations between
numerical variables as well as relations between numerical and categorical variables) [9
marks]
Ans: The first part contains a nested loop that plots a violin plot for each combination of a
numerical variable (on the y-axis) and a categorical variable (on the x-axis). The loop iterates
over all combinations of the two types of variables and uses violin plot to plot the data. The
resulting plot is a compact grid of violin plots that allows for easy comparison of the
distributions of numerical variables across different categories. A violin plot was used since
it displays more information compared with a box plot.
The second part removes rows with “Not given” ratings, changes the data type of the
“rating” column to float, and then creates two box plots showing the distribution of ratings
across different categories. The first box plot shows ratings by cuisine type, and the second
box plot shows ratings by day of the week. These plots allow for easy comparison of ratings
across different categories.
The third part creates a bar plot showing the mean rating for each cuisine type. The data is
grouped by cuisine type, and then the mean rating for each group is calculated and plotted.
The resulting bar plot allows for easy comparison of the average ratings of different cuisine
types.
Question 13: The company wants to provide a promotional offer in the advertisement
of the restaurants. The condition to get the offer is that the restaurants must have a
rating count of more than 50 and the average rating should be greater than 4. Find the
restaurants fulfilling the criteria to get the promotional offer. [3 marks]
Ans: Blue Ribbon Fried Chicken, Blue Ribbon Sushi, Shake Shack, The Meatball Shop
Question 14: The company charges the restaurant 25% on the orders having cost
greater than 20 dollars and 15% on the orders having cost greater than 5 dollars. Find
the net revenue generated by the company across all orders. [3 marks]
Ans: The net revenue generated by the company is: $6166.30
Question 15: The company wants to analyze the total time required to deliver the
food. What percentage of orders take more than 60 minutes to get delivered from the
time the order is placed? (The food has to be prepared and then delivered.)[2 marks]
Ans: 10.54% of orders take more than 60 minutes to get delivered.
Question 16: The company wants to analyze the delivery time of the orders on
weekdays and weekends. How does the mean delivery time vary during weekdays and
weekends? [2 marks]
Ans: The delivery time is longer during weekdays
Question 17: What are your conclusions from the analysis? What recommendations
would you like to share to help improve the business? (You can use cuisine type and
feedback ratings to drive your business recommendations.) [6 marks]
Ans: American, Japanese, Italian, Chinese are the most popular cuisine types, and
Spanish, Thai, Indian, and Mexican seem to be the most highly rated cuisine types.
Incorporating them in the restaurant dishes can be a viable way to increase either
customers or their satisfaction.The majority are indifferent to delivery times, however it
still has a noticeable effect to a customer's ratings. Spreading to more branches or
improving the routes taken, especially on weekdays is ideal.Korean cuisine isn't highly
rated, but they are cheap and easy to prepare which a bulk of the orders comes from.
 Graphs & Plots:
THANK
YOU

You might also like