Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3173574.3173869acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections

Evorus: A Crowd-powered Conversational Assistant Built to Automate Itself Over Time

Published: 21 April 2018 Publication History


Crowd-powered conversational assistants have been shown to be more robust than automated systems, but do so at the cost of higher response latency and monetary costs. A promising direction is to combine the two approaches for high quality, low latency, and low cost solutions. In this paper, we introduce Evorus, a crowd-powered conversational assistant built to automate itself over time by (i) allowing new chatbots to be easily integrated to automate more scenarios, (ii) reusing prior crowd answers, and (iii) learning to automatically approve response candidates. Our 5-month-long deployment with 80 participants and 281 conversations shows that Evorus can automate itself without compromising conversation quality. Crowd-AI architectures have long been proposed as a way to reduce cost and latency for crowd-powered systems; Evorus demonstrates how automation can be introduced successfully in a deployed system. Its architecture allows future researchers to make further innovation on the underlying automated components in the context of a deployed open domain dialog system.

Supplementary Material

suppl.mov (pn2817-file5.mp4)
Supplemental video


Amazon. 2017. Meet Alexa. (2017). https://www.amazon.com/meet-alexa/b?ie= UTF8&node=16067214011
Rafael E Banchs and Haizhou Li. 2012. IRIS: a chat-oriented dialogue system based on the vector space model. In Proceedings of the ACL 2012 System Demonstrations. Association for Computational Linguistics, 37--42.
Jeffrey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C. Miller, Robin Miller, Aubrey Tatarowicz, Brandyn White, Samual White, and Tom Yeh. 2010. VizWiz: Nearly Real-time Answers to Visual Questions. In Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology (UIST '10). ACM, New York, NY, USA, 333--342.
Rollo Carpenter. 2006. Cleverbot. (2006). https://www.cleverbot.com/ {Online; accessed 08-March-2017}.
Joseph Chee Chang, Aniket Kittur, and Nathan Hahn. 2016. Alloy: Clustering with crowds and computation. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 3180--3191.
Yun-Nung Chen, Dilek Hakkani-Tür, Gökhan Tür, Jianfeng Gao, and Li Deng. 2016. End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding. In INTERSPEECH. 3245--3249.
Justin Cheng and Michael S Bernstein. 2015. Flock: Hybrid crowd-machine learning classifiers. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, 600--611.
Josh Constine. 2017. Amazon rejects AI2's Alexa skill voice-search engine. Will it build one? (May 2017). https://techcrunch.com/2017/05/31/amazon-skillsearch-engine/
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR - A Library for Large Linear Classification. (2008). http://www.csie.ntu.edu.tw/~cjlin/liblinear/ The Weka classifier works with version 1.33 of LIBLINEAR.
Michael J. Franklin, Donald Kossmann, Tim Kraska, Sukriti Ramesh, and Reynold Xin. 2011. CrowdDB: Answering Queries with Crowdsourcing. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD '11). ACM, New York, NY, USA, 61--72.
Milica Gasic, Nikola Mrksic, Lina M Rojas-Barahona, Pei-Hao Su, Stefan Ultes, David Vandyke, Tsung-Hsien Wen, and Steve Young. 2016. Dialogue manager domain adaptation using Gaussian process reinforcement learning. arXiv preprint arXiv:1609.02846 (2016).
Nathan Hahn, Joseph Chang, Ji Eun Kim, and Aniket Kittur. 2016. The Knowledge Accelerator: Big picture thinking in small pieces. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 2258--2270.
Bo Han and Timothy Baldwin. 2011. Lexical Normalisation of Short Text Messages: Makn Sens a #Twitter. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1 (HLT '11). Association for Computational Linguistics, Stroudsburg, PA, USA, 368--378. http://dl.acm.org/citation.cfm?id=2002472.2002520
Hangoutsbot. 2017. hangoutsbot/hangoutsbot. (Apr 2017). https://github.com/hangoutsbot/hangoutsbot
Jessi Hempel. 2015. Facebook Launches M, Its Bold Answer to Siri and Cortana. (Aug 2015). https://www.wired.com/2015/08/facebook-launchesm-new-kind-virtual-assistant/
Ting-Hao Kenneth Huang, Amos Azaria, and Jeffrey P. Bigham. 2016. InstructableCrowd: Creating IF-THEN Rules via Conversations with the Crowd. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA '16). ACM, New York, NY, USA, 1555--1562.
Ting-Hao K. Huang and Jeffrey P. Bigham. 2017. A 10-Month-Long Deployment Study of On-Demand Recruiting for Low-Latency Crowdsourcing. In In Proceedings of The fifth AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2017). AAAI, AAAI.
Ting-Hao K. Huang, Yun-Nung Chen, and Jeffrey P. Bigham. 2017. Real-time On-Demand Crowd-powered Entity Extraction. In In Proceedings of the 5th Edition Of The Collective Intelligence Conference (CI 2017, oral presentation).
Ting-Hao Kenneth Huang, Walter S. Lasecki, Amos Azaria, and Jeffrey P. Bigham. 2016. "Is there anything else I can help you with?": Challenges in Deploying an On-Demand Crowd-Powered Conversational Agent. In Proceedings of AAAI Conference on Human Computation and Crowdsourcing 2016 (HCOMP 2016). AAAI.
Ting-Hao Kenneth Huang, Walter S Lasecki, and Jeffrey P Bigham. 2015. Guardian: A Crowd-Powered Spoken Dialog System for Web APIs. In Third AAAI Conference on Human Computation and Crowdsourcing.
Ece Kamar, Severin Hacker, and Eric Horvitz. 2012. Combining human and machine intelligence in large-scale crowdsourcing. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 467--474.
Ece Kamar and Lydia Manikonda. 2017. Complementing the Execution of AI Systems with Human Computation. In AAAI Workshop on Crowdsourcing, Deep Learning and Artificial Intelligence Agents 2017. AAAI.
G. Laput, W. S. Lasecki, J. Wiese, R. Xiao, J. P. Bigham, and C. Harrison. 2015. Zensors: Adaptive, Rapidly Deployable, Human-Intelligent Sensor Feeds. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 10. http://www.cs.cmu.edu/~jbigham/ pubs/pdfs/2015/zensors.pdf
Walter S. Lasecki, Phyo Thiha, Yu Zhong, Erin Brady, and Jeffrey P. Bigham. 2013a. Answering Visual Questions with Conversational Crowd Assistants. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '13). ACM, New York, NY, USA, Article 18, 8 pages.
Walter S. Lasecki, Rachel Wesley, Jeffrey Nichols, Anand Kulkarni, James F. Allen, and Jeffrey P. Bigham. 2013b. Chorus: A Crowd-powered Conversational Assistant. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology (UIST '13). ACM, New York, NY, USA, 151--162.
Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A Persona-Based Neural Conversation Model. arXiv preprint arXiv:1603.06155 (2016).
Xuijun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, and Asli Celikyilmaz. 2017. End-to-End Task-Completion Neural Dialogue Systems. In Proceedings of The 8th International Joint Conference on Natural Language Processing (IJCNLP 2017). AFNLP.
Leigh Anne Liu, Chei Hwee Chua, and Günter K Stahl. 2010. Quality of communication experience: definition, measurement, and implications for intercultural negotiations. Journal of Applied Psychology 95, 3 (2010), 469.
Matthew Lynley. 2016. Make Magic's Assistants Do Almost Anything With $100/Hour And A Text Message. (Jan 2016). https://techcrunch.com/2016/01/05/ make-magics-assistants-do-almost-anything-with100hour-and-a-text-message/
Cable News Network. 2017. (2017). http://transcripts.cnn.com/TRANSCRIPTS/
Casey Newton. 2016 (accessed October 24th, 2016). SPEAK, MEMORY: When her best friend died, she rebuilt him using artificial intelligence. https://www.theverge.com/a/luka-artificialintelligence-memorial-roman-mazurenko-bot
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532--1543. http://www.aclweb.org/anthology/D14--1162
Antoine Raux and Maxine Eskenazi. 2008. Optimizing endpointing thresholds using dialogue features in a spoken dialogue system. In Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue. Association for Computational Linguistics, 1--10.
Daniela Retelny, Sébastien Robaszkiewicz, Alexandra To, Walter S Lasecki, Jay Patel, Negar Rahmati, Tulsee Doshi, Melissa Valentine, and Michael S Bernstein. 2014. Expert crowdsourcing with flash teams. In Proceedings of the 27th annual ACM symposium on User interface software and technology. ACM, 75--85.
Alan Ritter, Colin Cherry, and William B Dolan. 2011. Data-driven response generation in social media. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 583--593.
Erica Sadun and Steve Sande. 2012. Talking to Siri: Learning the Language of Apple's Intelligent Assistant. Que Publishing.
Akash Das Sarma, Ayush Jain, Arnab Nandi, Aditya Parameswaran, and Jennifer Widom. 2015. Surpassing humans and computers with JELLYBEAN: Crowd-vision-hybrid counting algorithms. In Third AAAI Conference on Human Computation and Crowdsourcing.
Konrad Scheffler and Steve Young. 2002. Automatic Learning of Dialogue Strategy Using Dialogue Simulation and Reinforcement Learning. In Proceedings of the Second International Conference on Human Language Technology Research (HLT '02). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 12--19. http://dl.acm.org/citation.cfm?id=1289189.1289246
Saiganesh Swaminathan, Raymond Fok, Fanglin Chen, Ting-Hao K. Huang, Irene Lin, Rohan Jadvani, Walter Lasecki, and Jeffrey Bigham. 2017. WearMail: On-the-Go Access to Information in Your Email with a Privacy-Preserving Human Computation Workflow. In 30th ACM Symposium on User Interface Software and Technology (UIST 2017).
Long Tran-Thanh, Sebastian Stein, Alex Rogers, and Nicholas R Jennings. 2014. Efficient crowdsourcing of unknown experts using bounded multi-armed bandits. Artificial Intelligence 214 (2014), 89--111.
Weather Underground. 2017. A Weather API Designed for Developers. (2017). https://www.wunderground.com/weather/api/
Marilyn A Walker, Diane J Litman, Candace A Kamm, and Alicia Abella. 1997. PARADISE: A framework for evaluating spoken dialogue agents. In Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 271--280.
Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Lina M Rojas-Barahona, Pei-Hao Su, David Vandyke, and Steve Young. 2016. Multi-domain Neural Network Language Generation for Spoken Dialogue Systems. arXiv preprint arXiv:1603.01232 (2016).
Wikipedia. 2017a. Cleverbot - Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title= Cleverbot&oldid=771836990. (2017). {Online; accessed 02-April-2017}.
Wikipedia. 2017b. Tay (bot) - Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title= Tay%20(bot)&oldid=769762463. (2017). {Online; accessed 04-April-2017}.
Jason D Williams and Steve Young. 2007. Partially observable Markov decision processes for spoken dialog systems. Computer Speech & Language 21, 2 (2007), 393--422.
Ian H Witten, Eibe Frank, Mark A Hall, and Christopher J Pal. 2016. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
Xuesong Yang, Yun-Nung Chen, Dilek Hakkani-Tür, Paul Crook, Xiujun Li, Jianfeng Gao, and Li Deng. 2017. End-to-end joint learning of natural language understanding and dialogue manager. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 5690--5694.
Yelp. 2017. Yelp API Documentation. (2017). https://www.yelp.com/developers/documentation/ v2/overview
Tiancheng Zhao, Kyusong Lee, and Maxine Eskenazi. 2016. DialPort: Connecting the Spoken Dialog Research Community to Real User Data. arXiv preprint arXiv:1606.02562 (2016).

Cited By

View all
  • (2024)Understanding the Longitudinal Impact of a Chatbot to Facilitate a Virtual Community of Practice for Teachers in Rural Côte d’IvoireACM Journal on Computing and Sustainable Societies10.1145/36757622:3(1-37)Online publication date: 16-Sep-2024
  • (2024)How Does Conversation Length Impact User’s Satisfaction? A Case Study of Length-Controlled Conversations with LLM-Powered ChatbotsExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650823(1-13)Online publication date: 11-May-2024
  • (2024)Requirements Conflicts Detection: Advancing with Conversational AI2024 IEEE 32nd International Requirements Engineering Conference Workshops (REW)10.1109/REW61692.2024.00019(101-107)Online publication date: 24-Jun-2024
  • Show More Cited By

Index Terms

  1. Evorus: A Crowd-powered Conversational Assistant Built to Automate Itself Over Time



    Information & Contributors


    Published In

    cover image ACM Conferences
    CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems
    April 2018
    8489 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].



    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 April 2018


    Request permissions for this article.

    Check for updates


    • Honorable Mention

    Author Tags

    1. chatbot
    2. conversational assistant
    3. crowd-powered system
    4. crowdsourcing
    5. real-time crowdsourcing


    • Research-article


    CHI '18

    Acceptance Rates

    CHI '18 Paper Acceptance Rate 666 of 2,590 submissions, 26%;
    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)99
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 14 Oct 2024

    Other Metrics


    Cited By

    View all
    • (2024)Understanding the Longitudinal Impact of a Chatbot to Facilitate a Virtual Community of Practice for Teachers in Rural Côte d’IvoireACM Journal on Computing and Sustainable Societies10.1145/36757622:3(1-37)Online publication date: 16-Sep-2024
    • (2024)How Does Conversation Length Impact User’s Satisfaction? A Case Study of Length-Controlled Conversations with LLM-Powered ChatbotsExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650823(1-13)Online publication date: 11-May-2024
    • (2024)Requirements Conflicts Detection: Advancing with Conversational AI2024 IEEE 32nd International Requirements Engineering Conference Workshops (REW)10.1109/REW61692.2024.00019(101-107)Online publication date: 24-Jun-2024
    • (2024)Knowledge-Enhanced Conversational AgentsJournal of Computer Science and Technology10.1007/s11390-024-2883-439:3(585-609)Online publication date: 22-Jul-2024
    • (2023)Designing for Hybrid Intelligence: A Taxonomy and Survey of Crowd-Machine InteractionApplied Sciences10.3390/app1304219813:4(2198)Online publication date: 8-Feb-2023
    • (2023)Leveraging Human-AI Collaboration in Crowd-Powered Source Search: A Preliminary StudyJournal of Social Computing10.23919/JSC.2023.00024:2(95-111)Online publication date: Jun-2023
    • (2023)ContextBotProceedings of the 34th ACM Conference on Hypertext and Social Media10.1145/3603163.3609031(1-14)Online publication date: 4-Sep-2023
    • (2023)Powering an AI Chatbot with Expert Sourcing to Support Credible Health Information AccessProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584031(2-18)Online publication date: 27-Mar-2023
    • (2023)It Is All About Criticism: Understanding the Effect of Social Media Discourse on Legal Crowdfunding CampaignsProceedings of the ACM on Human-Computer Interaction10.1145/35795007:CSCW1(1-37)Online publication date: 16-Apr-2023
    • (2023)Compass: Supporting Large Group Mentorship in a Chat-Based UIProceedings of the ACM on Human-Computer Interaction10.1145/35794707:CSCW1(1-25)Online publication date: 16-Apr-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options


    View or Download as a PDF file.



    View online with eReader.








    Share this Publication link

    Share on social media