Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Yelp Vs Zomato Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Emmi Carr

Madeline Richards
Final Project Report
4/22/2019

Yelp vs Zomato Analysis

1. Goals

● Our goal for our project was to collect data from at least 100 restaurants in Ann Arbor for
each website including star rating, price range, restaurant category, and the number of
reviews.
● Then we wanted to compare the results and trends we observed in the data between
Yelp and Zomato. Specifically, from each database, we wanted to compare the
following:
○ Number of restaurants in each restaurant category
○ Category with the most restaurants
○ Restaurant category with the highest average star rating
○ Price range with the highest average star rating
○ Overall average star rating and price range for all restaurants in the database
● For each database, we wanted to create at least two of the following graphs/charts:
○ Scatterplot of star rating vs. number of reviews for each restaurant
○ Histogram showing the star rating for each restaurant category including the
overall average star rating
○ Histogram showing the number of restaurants in each restaurant category

2. Goals Achieved

● We obtained all the data we set out to achieve from 100 restaurants from each website.
● From our original data analysis goals, we accomplished the following:
○ Number of restaurants in each restaurant category
■ Example: Yelp had 2 out of 100 restaurants in the “Italian” category, while
Zomato had 10.
■ Able to find the category with the most restaurants
● Yelp: “Coffee & Tea” (7)
● Zomato: “American” (28)
○ Average star rating based on restaurant category
■ Able to find the restaurant category with the highest average star rating
● Yelp: Breakfast & Brunch (4.75/5.0)
○ Local Flavor (5.0/5.0) but does not really count as a
restaurant so we disregarded it
● Zomato: “Cuban” (4.6/5.0)
○ Average star rating based on price range
■ Price range with the highest average star rating
● Yelp: 1 out of 4 / $ out of $$$$
● Zomato: 4 out of 4 / $$$$ out of $$$$
○ Overall average star rating
■ Yelp: 4.075 / 5
■ Zomato: 4.071 / 5
● For our data visualizations, we generated the following:
○ Scatterplot of star rating vs. number of reviews for each restaurant
○ We revised the histogram to display the distribution of star ratings for each
database

3. Issues Faced

● Our first issue was with getting information on the city level for the Zomato data. For
example, Ann Arbor was under the locality of Detroit, and we did not know, until after
carefully studying the documentation and testing with print statements, exactly how to
obtain only Ann Arbor restaurants.
● The most difficult issue we faced was adding 20 outputs at a time without dropping the
table. Our original code dropped the table if it already existed, and using a loop with
offsets, it grabbed 20 restaurants at a time, but the complete code only needed to be run
once to get 100 rows of data. After realizing this conflicted with what was asked of us for
this project, we struggled on how we could modify our code to fit the requirements. To
solve this issue, we replaced our offsets with a count system, and we added “INSERT
OR IGNORE INTO” statements to add each restaurant’s data only if it is not in the
database.

4. Calculation File

● Data-Analysis.py
5. Visualizations

● ratings_by_reviews.png
● Ratings_dist_hist.png
6. Instructions for Running the Code

1. Run the python file five times: Gathering_Data.py


a. 100 rows of data will now be in the SQL database: rest_data.sqlite
2. Run the python file: Data-Analysis.py
a. The two matplotlib visualizations are generated
b. The JSON file is generated: data_analysis.json

7. Documentation for Each Function

Function Inputs Outputs

get_num_of_rests_by_cat Category column for each Dictionary named


(Data-Analysis.py) Yelp and Zomato “categories.” Keys are
categories, values are the
number of restaurants in the
category

get_rating_by_cat (Data- Category and Rating columns Dictionary named “averages.”


Analysis.py) for each Yelp and Zomato Keys are categories, values
are average star rating of
each category

get_rating_by_price (Data- Price and Rating columns for Dictionary named “averages.”
Analysis.py) each Yelp and Zomato Keys are the price ranges,
and values are the average
star rating for each range

get_overall_average_rating Rating column for each Yelp Float values of average star
(Data-Analysis.py) and Zomato rating for each Yelp and
Zomato

main function NA Function makes the database


(Gathering_Data.py) connection, calls both APIs
and inserts restaurant data
into the database. Must call 5
times in order to populate
database with 100 rows.

main function (Data- NA Selects data from the


Analysis.py) database, performs all the
calculations specified and
writes them to a json file and
makes visualizations.
8. Documentation of Resources Used

Date Issue Description Location of Resource Result


(Did it fix the issue?)

4/17/19 Needed to make matplotlib Yes; we were able to


visualizations make scatterplots
and histograms using
the module

4/14/19 Needed to insert data sqlite Yes; we created a


into a SQL database database with two
and later select it tables for each
from the database website

4/20/19 Needed to be able to json Yes; used


load in requested json.loads() and
data from APIs and to json.dump()
be able to write our
calculations to a json-
formatted file

4/14/19 Needed to know https://developers.zo Yes; we extracted the


documentation of the mato.com/documenta data we need
data for the site in tion#/ (Zomato API
order to extract the Documentation)
information we
needed

4/14/19 Needed to know https://www.yelp.com/ Yes; we extracted the


documentation of the developers/document data we needed
data for the site in ation/v3/get_started
order to extract the (Yelp API
information we Documentation)
needed

Link to Repository: https://github.com/escarr/Final-Project

You might also like