Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3308558.3313683acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Is Yelp Actually Cleaning Up the Restaurant Industry? A Re-Analysis on the Relative Usefulness of Consumer Reviews

Published: 13 May 2019 Publication History

Abstract

Social media provides the government with novel methods to improve regulation. One leading case has been the use of Yelp reviews to target food safety inspections. While previous research on data from Seattle finds that Yelp reviews can predict unhygienic establishments, we provide a more cautionary perspective. First, we show that prior results are sensitive to what we call “Extreme Imbalanced Sampling”: extreme because the dataset was restricted from roughly 13k inspections to a sample of only 612 inspections with only extremely high or low inspection scores, and imbalanced by not accounting for class imbalance in the population. We show that extreme imbalanced sampling is responsible for claims about the power of Yelp information in the original classification setup. Second, a re-analysis that utilizes the full dataset of 13k inspections and models the full inspection score (regression instead of classification) shows that (a) Yelp information has lower predictive power than prior inspection history and (b) Yelp reviews do not significantly improve predictions, given existing information about restaurants and inspection history. Contrary to prior claims, Yelp reviews do not appear to aid regulatory targeting. Third, this case study highlights critical issues when using social media for predictive models in governance and corroborates recent calls for greater transparency and reproducibility in machine learning.

References

[1]
Laura Adler. 2016. Learning from Location. Data-Smart City Solutions(2016).
[2]
Kristen M Altenburger and Daniel E Ho. 2018. When Algorithms Import Private Bias into Public Enforcement: The Promise and Limitations of Statistical De-biasing Solutions. Journal of Institutional and Theoretical Economics (2018).
[3]
Emily Badger. 2013. How Yelp Might Clean Up the Restaurant Industry. Atlantic (2013).
[4]
James Bergstra, Daniel Yamins, and David Daniel Cox. 2013. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. (2013).
[5]
Joshua Blumenstock, Gabriel Cadamuro, and Robert On. 2015. Predicting poverty and wealth from mobile phone metadata. Science 350, 6264 (2015), 1073-1076.
[6]
Joshua E Blumenstock. 2018. Estimating Economic Characteristics with Phone Data. In AEA Papers and Proceedings, Vol. 108. 72-76.
[7]
Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5-32.
[8]
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321-357.
[9]
Raviv Cohen and Derek Ruths. 2013. Classifying political orientation on Twitter: It's not easy!. In ICWSM.
[10]
Open Science Collaboration 2015. Estimating the reproducibility of psychological science. Science 349, 6251 (2015), aac4716.
[11]
Kate Crawford. 2013. The hidden biases in big data. Harvard Business Review 1 (2013).
[12]
Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. 2013. Predicting depression via social media.ICWSM 13(2013), 1-10.
[13]
Matthew J Denny and Arthur Spirling. 2018. Text preprocessing for unsupervised learning: why it matters, when it misleads, and what to do about it. Political Analysis 26, 2 (2018), 168-189.
[14]
Katelynn Devinney, Adile Bekbay, Thomas Effland, Luis Gravano, David Howell, Daniel Hsu, Daniel O'Hallorhan, Vasudha Reddy, Faina Stavinsky, HaeNa Waechter, 2018. Evaluating Twitter for Foodborne Illness Outbreak Detection in New York City. Online Journal of Public Health Informatics 10, 1 (2018).
[15]
Chris Drummond, Robert C Holte, 2003. C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In Workshop on Learning from Imbalanced Datasets II, Vol. 11. Citeseer, 1-8.
[16]
Daniel Gayo-Avello. 2013. A meta-analysis of state-of-the-art electoral prediction from Twitter data. Social Science Computer Review 31, 6 (2013), 649-679.
[17]
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumee´ III, and Kate Crawford. 2018. Datasheets for Datasets. arXiv preprint arXiv:1803.09010(2018).
[18]
Jeremy Ginsberg, Matthew H Mohebbi, Rajan S Patel, Lynnette Brammer, Mark S Smolinski, and Larry Brilliant. 2009. Detecting influenza epidemics using search engine query data. Nature 457, 7232 (2009), 1012.
[19]
Edward L. Glaeser, Andrew Hillis, Scott Duke Kominers, and Michael Luca. 2016. Crowdsourcing City Government: Using Tournaments to Improve Inspection Accuracy. American Economic Review 106, 5 (May 2016), 114-18.
[20]
Steven N Goodman, Daniele Fanelli, and John PA Ioannidis. 2016. What does research reproducibility mean?Science Translational Medicine 8, 341 (2016), 341ps12-341ps12.
[21]
Verena Grubmüller, Katharina Götsch, and Bernhard Krieger. 2013. Social media analytics for future oriented policy making. European Journal of Futures Research 1, 1 (26 Sep 2013), 1-20.
[22]
Odd Erik Gundersen and Sigbjørn Kjensmo. 2017. State of the art: Reproducibility in artificial intelligence. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence and the Twenty-Eighth Innovative Applications of Artificial Intelligence Conference.
[23]
Cassandra Harrison, Mohip Jorder, Henri Stern, Faina Stavinsky, Vasudha Reddy, Heather Hanson, H Waechter, Luther Lowe, Luis Gravano, Sharon Balter, 2014. Using online reviews by restaurant patrons to identify unreported cases of foodborne illness - New York City, 2012-2013. Morbidity and Mortality Weekly Report 63, 20 (2014), 441-445.
[24]
Daniel E. Ho. 2012. Fudging the Nudge: Information Disclosure and Restaurant Grading. Yale Law Journal 122, 3 (2012), 574-688.
[25]
Daniel E. Ho. 2017. Does Peer Review Work: An Experiment of Experimentalism. Stanford Law Review 69(2017), 1-119.
[26]
Daniel E Ho. 2017. Equity in the Bureaucracy. Irvine Law Review 7(2017), 401-451.
[27]
Matthew Hutson. 2018. Artificial intelligence faces reproducibility crisis.
[28]
Nitin Indurkhya and Sholom M Weiss. 2001. Solving regression problems with rule-based ensemble classifiers. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 287-292.
[29]
John PA Ioannidis. 2005. Why most published research findings are false. PLoS Medicine 2, 8 (2005), e124.
[30]
Nathalie Japkowicz and Shaju Stephen. 2002. The class imbalance problem: A systematic study. Intelligent Data Analysis 6, 5 (2002), 429-449.
[31]
Neal Jean, Marshall Burke, Michael Xie, W Matthew Davis, David B Lobell, and Stefano Ermon. 2016. Combining satellite imagery and machine learning to predict poverty. Science 353, 6301 (2016), 790-794.
[32]
Sham Kakade, Percy Liang, Vatsal Sharan, and Gregory Valiant. 2016. Prediction with a short memory. arXiv preprint arXiv:1612.02526(2016).
[33]
Jun Seok Kang, Polina Kuznetsova, Michael Luca, and Yejin Choi. 2013. Where not to eat? improving public policy by predicting hygiene inspections using online reviews. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1443-1448.
[34]
Jon Kleinberg, Jens Ludwig, Sendhil Mullainathan, and Ziad Obermeyer. 2015. Prediction policy problems. American Economic Review 105, 5 (2015), 491-95.
[35]
Jey Han Lau and Timothy Baldwin. 2016. An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint arXiv:1607.05368(2016).
[36]
David Lazer, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. The parable of Google Flu: traps in big data analysis. Science 343, 6176 (2014), 1203-1205.
[37]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. 1188-1196.
[38]
Jeffrey T Leek and Roger D Peng. 2015. Opinion: Reproducible research can still be wrong: Adopting a prevention approach. Proceedings of the National Academy of Sciences 112, 6 (2015), 1645-1646.
[39]
Zachary C Lipton and Jacob Steinhardt. 2018. Troubling trends in machine learning scholarship. arXiv preprint arXiv:1807.03341(2018).
[40]
Yelena Mejova, Ingmar Weber, and Michael W Macy. 2015. Twitter: a digital socioscope. Cambridge University Press.
[41]
Jill P Mesirov. 2010. Accessible reproducible research. Science 327, 5964 (2010), 415-416.
[42]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781(2013).
[43]
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model Cards for Model Reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 220-229.
[44]
Elaine O Nsoesie, Sheryl A Kluberg, and John S Brownstein. 2014. Online reports of foodborne illness capture foods implicated in official foodborne outbreak reports. Preventive Medicine 67(2014), 264-269.
[45]
Rachel A Oldroyd, Michelle A Morris, and Mark Birkin. 2018. Identifying Methods for Monitoring Foodborne Illness: Review of Existing Public Health Surveillance Techniques. JMIR Public Health and Surveillance 4, 2 (2018), e57.
[46]
Adam Sadilek, Sean Brennan, Henry Kautz, and Vincent Silenzio. 2013. nEmesis: which restaurants should you avoid today?. In First AAAI Conference on Human Computation and Crowdsourcing.
[47]
John P. Schomberg, Oliver L. Haimson, Gillian R. Hayes, and Hoda Anton-Culver. 2016. Supplementing Public Health Inspection via Social Media. PLOS ONE 11, 3 (03 2016), 1-21.
[48]
Somini Sengupta. 2013. In Hot Pursuit of Numbers to Ward Off Crime. New York Times (2013).
[49]
Sören Sonnenburg, Mikio L Braun, Cheng Soon Ong, Samy Bengio, Leon Bottou, Geoffrey Holmes, Yann LeCun, Fernando Pereira, and Carl Edward Rasmussen. 2007. The Need for Open Source Software in Machine Learning. Journal of Machine Learning Research 8 (2007), 2443-2466.
[50]
Jeremy Stoppelman. 2016. Yelp CEO says online reviews could beat “gold standard” healthcare measures. Modern Healthcare (2016).
[51]
Tim Thompson. 2015. How Our Cities Are Using Social Data. IBM Big Data & Analytics Hub(2015).
[52]
Zeynep Tufekci. 2014. Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls. ICWSM 14(2014), 505-514.
[53]
Emilio Zagheni, Kivan Polimis, Monica Alexander, Ingmar Weber, and Francesco C Billari. 2018. Combining Social Media Data and Traditional Surveys to Nowcast Migration Stocks. (2018).
[54]
Jiaming Zeng, Berk Ustun, and Cynthia Rudin. 2017. Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society: Series A (Statistics in Society) 180, 3(2017), 689-722.
[55]
Justine Zhang, Jonathan P Chang, Cristian Danescu-Niculescu-Mizil, Lucas Dixon, Yiqing Hua, Nithum Thain, and Dario Taraborelli. 2018. Conversations Gone Awry: Detecting Early Signs of Conversational Failure. arXiv preprint arXiv:1805.05345(2018).

Cited By

View all
  • (2024)Enhancing Recommendation Accuracy and Diversity with Box Embedding: A Universal FrameworkProceedings of the ACM Web Conference 202410.1145/3589334.3645577(3756-3766)Online publication date: 13-May-2024
  • (2024)Self-supervised progressive graph neural network for enhanced multi-behavior recommendationInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02353-7Online publication date: 4-Sep-2024
  • (2023)Multi-behavior Self-supervised Learning for RecommendationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591734(496-505)Online publication date: 19-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '19: The World Wide Web Conference
May 2019
3620 pages
ISBN:9781450366748
DOI:10.1145/3308558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Yelp
  2. consumer reviews
  3. food safety
  4. regulation
  5. replication

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '19
WWW '19: The Web Conference
May 13 - 17, 2019
CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)8
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Enhancing Recommendation Accuracy and Diversity with Box Embedding: A Universal FrameworkProceedings of the ACM Web Conference 202410.1145/3589334.3645577(3756-3766)Online publication date: 13-May-2024
  • (2024)Self-supervised progressive graph neural network for enhanced multi-behavior recommendationInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02353-7Online publication date: 4-Sep-2024
  • (2023)Multi-behavior Self-supervised Learning for RecommendationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591734(496-505)Online publication date: 19-Jul-2023
  • (2023)Multi-behavior recommendation based on intent learningMultimedia Systems10.1007/s00530-023-01191-x29:6(3655-3668)Online publication date: 1-Dec-2023
  • (2022)Fair Decision-Making for Food InspectionsProceedings of the 2nd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization10.1145/3551624.3555289(1-11)Online publication date: 6-Oct-2022
  • (2021)Interpretable Aspect-Aware Capsule Network for Peer Review Based Citation Count PredictionACM Transactions on Information Systems10.1145/346664040:1(1-29)Online publication date: 24-Nov-2021
  • (2021)Graph Meta Network for Multi-Behavior RecommendationProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462972(757-766)Online publication date: 11-Jul-2021
  • (2020)Multiplex Behavioral Relation Learning for Recommendation via Memory Augmented Transformer NetworkProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401445(2397-2406)Online publication date: 25-Jul-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media