Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3453483.3454046acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article
Open access

DIY assistant: a multi-modal end-user programmable virtual assistant

Published: 18 June 2021 Publication History

Abstract

While Alexa can perform over 100,000 skills, its capability covers only a fraction of what is possible on the web. Individuals need and want to automate a long tail of web-based tasks which often involve visiting different websites and require programming concepts such as function composition, conditional, and iterative evaluation. This paper presents DIYA (Do-It-Yourself Assistant), a new system that empowers users to create personalized web-based virtual assistant skills that require the full generality of composable control constructs, without having to learn a formal programming language.
With DIYA, the user demonstrates their task of interest in the browser and issues a few simple voice commands, such as naming the skills and adding conditions on the action. DIYA turns these multi-modal specifications into voice-invocable skills written in the ThingTalk 2.0 programming language we designed for this purpose. DIYA is a prototype that works in the Chrome browser. Our user studies show that 81% of the proposed routines can be expressed using DIYA. DIYA is easy to learn, and 80% of users surveyed find DIYA useful.

References

[1]
James Allen, Nathanael Chambers, George Ferguson, Lucian Galescu, Hyuckchul Jung, Mary Swift, and William Taysom. 2007. Plow: A collaborative task learning agent. In AAAI. 7, 1514–1519.
[2]
Tal Ater. 2019. annyang! Speech recognition for your site. https://github.com/TalAter/annyang
[3]
Shaon Barman, Sarah Chasins, Rastislav Bodik, and Sumit Gulwani. 2016. Ringer: Web Automation by Demonstration. SIGPLAN Not., 51, 10 (2016), Oct., 748–764. issn:0362-1340 https://doi.org/10.1145/3022671.2984020
[4]
berstend. 2020. puppeteer-extra-plugin-stealth. https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth
[5]
Julia Cambre, Alex C Williams, Afsaneh Razi, Ian Bicking, Abraham Wallin, Janice Tsai, Chinmay Kulkarni, and Jofish Kaye. 2021. Firefox Voice: An Open and Extensible Voice Assistant Built Upon the Web.
[6]
Giovanni Campagna, Rakesh Ramesh, Silei Xu, Michael Fischer, and Monica S. Lam. 2017. Almond: The Architecture of an Open, Crowdsourced, Privacy-Preserving, Programmable Virtual Assistant. In Proceedings of the 26th International Conference on World Wide Web - WWW ’17. ACM Press, New York, New York, USA. 341–350. isbn:9781450349130 https://doi.org/10.1145/3038912.3052562
[7]
Giovanni Campagna, Silei Xu, Mehrad Moradshahi, Richard Socher, and Monica S. Lam. 2019. Genie: A Generator of Natural Language Semantic Parsers for Virtual Assistant Commands. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). ACM, New York, NY, USA. 394–410. isbn:978-1-4503-6712-7 https://doi.org/10.1145/3314221.3314594
[8]
Sarah Chasins and Rastislav Bodik. 2017. Skip Blocks: Reusing Execution History to Accelerate Web Scripts. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 51, Oct., 28 pages. https://doi.org/10.1145/3133875
[9]
Sarah E. Chasins, Maria Mueller, and Rastislav Bodik. 2018. Rousillon: Scraping Distributed Hierarchical Web Data. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology (UIST ’18). Association for Computing Machinery, New York, NY, USA. 963–975. isbn:9781450359481 https://doi.org/10.1145/3242587.3242661
[10]
Allen Cypher. 1995. EAGER: PROGRAMMING REPETITIVE TASKS BY EXAMPLE. In Readings in Human–Computer Interaction, RONALD M. BAECKER, JONATHAN GRUDIN, WILLIAM A.S. BUXTON, and SAUL GREENBERG (Eds.). Morgan Kaufmann, 804–810. isbn:978-0-08-051574-8 https://doi.org/10.1016/B978-0-08-051574-8.50083-2
[11]
Michael Fischer, Giovanni Campagna, Silei Xu, and Monica S. Lam. 2018. Brassau: Automatic Generation of Graphical User Interfaces for Virtual Assistants. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI ’18). Association for Computing Machinery, New York, NY, USA. Article 33, 12 pages. isbn:9781450358989 https://doi.org/10.1145/3229434.3229481
[12]
Jack Franklin. 2020. Puppeteer Headless Chrome Node.js API. https://github.com/puppeteer/puppeteer
[13]
Sandra G Hart. 2006. NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the human factors and ergonomics society annual meeting. 50, 904–908.
[14]
2011. If This Then That. http://ifttt.com
[15]
Tessa Lau, Steven A. Wolfman, Pedro Domingos, and Daniel S. Weld. 2003. Programming by Demonstration Using Version Space Algebra. Mach. Learn., 53, 1–2 (2003), Oct., 111–156. issn:0885-6125 https://doi.org/10.1023/A:1025671410623
[16]
Gilly Leshed, Eben M. Haber, Tara Matthews, and Tessa Lau. 2008. CoScripter: Automating & Sharing How-to Knowledge in the Enterprise. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’08). Association for Computing Machinery, New York, NY, USA. 1719–1728. isbn:9781605580111 https://doi.org/10.1145/1357054.1357323
[17]
Ian Li, Jeffrey Nichols, Tessa Lau, Clemens Drews, and Allen Cypher. 2010. Here’s What i Did: Sharing and Reusing Web Activity with ActionShot. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’10). Association for Computing Machinery, New York, NY, USA. 723–732. isbn:9781605589299 https://doi.org/10.1145/1753326.1753432
[18]
Toby Jia-Jun Li, Amos Azaria, and Brad A. Myers. 2017. SUGILITE: Creating Multimodal Smartphone Automation by Demonstration. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI ’17). Association for Computing Machinery, New York, NY, USA. 6038–6049. isbn:9781450346559 https://doi.org/10.1145/3025453.3025483
[19]
Toby Jia-Jun Li, Igor Labutov, Xiaohan Nancy Li, Xiaoyi Zhang, Wenze Shi, Wanling Ding, Tom M Mitchell, and Brad A Myers. 2018. APPINITE: A Multi-Modal Interface for Specifying Data Descriptions in Programming by Demonstration Using Natural Language Instructions. In 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 105–114.
[20]
Toby Jia-Jun Li, Marissa Radensky, Justin Jia, Kirielle Singarajah, Tom M. Mitchell, and Brad A. Myers. 2019. PUMICE: A Multi-Modal Agent That Learns Concepts and Conditionals from Natural Language and Demonstrations. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology (UIST ’19). Association for Computing Machinery, New York, NY, USA. 577–589. isbn:9781450368162 https://doi.org/10.1145/3332165.3347899
[21]
Toby Jia-Jun Li, Marissa Radensky, Justin Jia, Kirielle Singarajah, Tom M Mitchell, and Brad A Myers. 2020. Interactive Task and Concept Learning from Natural Language Instructions and GUI Demonstrations. In The AAAI-20 Workshop on Intelligent Process Automation (IPA-20).
[22]
Toby Jia-Jun Li and Oriana Riva. 2018. Kite: Building Conversational Bots from Mobile Apps. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys ’18). Association for Computing Machinery, New York, NY, USA. 96–109. isbn:9781450357203 https://doi.org/10.1145/3210240.3210339
[23]
Yang Li, Jiacong He, Xin Zhou, Yuan Zhang, and Jason Baldridge. 2020. Mapping Natural Language Instructions to Mobile UI Action Sequences. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online. 8198–8210. https://doi.org/10.18653/v1/2020.acl-main.729
[24]
Greg Little, Tessa A. Lau, Allen Cypher, James Lin, Eben M. Haber, and Eser Kandogan. 2007. Koala: Capture, Share, Automate, Personalize Business Processes on the Web. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’07). Association for Computing Machinery, New York, NY, USA. 943–946. isbn:9781595935939 https://doi.org/10.1145/1240624.1240767
[25]
Anton Medvedev. 2020. finder: CSS Selector Generator. https://github.com/antonmedv/finder
[26]
Brad A. Myers, Richard G. McDaniel, and David S. Kosbie. 1993. Marquise: Creating Complete User Interfaces by Demonstration. In Proceedings of the INTERACT ’93 and CHI ’93 Conference on Human Factors in Computing Systems (CHI ’93). Association for Computing Machinery, New York, NY, USA. 293–300. isbn:0897915755 https://doi.org/10.1145/169059.169225
[27]
Tim Nolet. 2020. Puppeteer Recorder. https://github.com/checkly/puppeteer-recorder
[28]
Panupong Pasupat, Tian-Shun Jiang, Evan Liu, Kelvin Guu, and Percy Liang. 2018. Mapping natural language commands to web elements. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium. 4970–4976. https://doi.org/10.18653/v1/D18-1540
[29]
Gordon W Paynter. 1999. Familiar: Automating Repetition in Common Applications. In New Zealand Computer Science Research Students’ Conference. 62–69.
[30]
R. Rolim, G. Soares, L. D’Antoni, O. Polozov, S. Gulwani, R. Gheyi, R. Suzuki, and B. Hartmann. 2017. Learning Syntactic Program Transformations from Examples. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). 404–415. https://doi.org/10.1109/ICSE.2017.44
[31]
Alborz Rezazadeh Sereshkeh, Gary Leung, Krish Perumal, Caleb Phillips, Minfan Zhang, Afsaneh Fazly, and Iqbal Mohomed. 2020. VASTA: A Vision and Language-Assisted Smartphone Task Automation System. In Proceedings of the 25th International Conference on Intelligent User Interfaces (IUI ’20). Association for Computing Machinery, New York, NY, USA. 22–32. isbn:9781450371186 https://doi.org/10.1145/3377325.3377515
[32]
Janice Tsai and Jofish Kaye. 2018. Hey Scout: Designing a Browser-Based Voice Assistant. https://aaai.org/ocs/index.php/SSS/SSS18/paper/view/17543
[33]
Nancy Xu, Sam Masling, Michael Du, Giovanni Campagna, Larry Heck, James Landay, and Monica S. Lam. 2021. Grounding Open-Domain Instructions to Automate Web Support Tasks. In Proceedings of the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2021) (To Appear). arxiv:2103.16057
[34]
Tom Yeh, Tsung-Hsiang Chang, and Robert C. Miller. 2009. Sikuli: Using GUI Screenshots for Search and Automation. In Proceedings of the 22nd Annual ACM Symposium on User Interface Software and Technology (UIST ’09). Association for Computing Machinery, New York, NY, USA. 183–192. isbn:9781605587455 https://doi.org/10.1145/1622176.1622213
[35]
Tantek Çelik, Elika J. Etemad, Daniel Glazman, Ian Hickson, Peter Linss, and John Williams. 2018. Selectors Level 3 (W3C Recommendation). https://www.w3.org/TR/selectors-3/

Cited By

View all
  • (2024)Efficient Bottom-Up Synthesis for Programs with Local VariablesProceedings of the ACM on Programming Languages10.1145/36328948:POPL(1540-1568)Online publication date: 5-Jan-2024
  • (2024)DynaVis: Dynamically Synthesized UI Widgets for Visualization EditingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642639(1-17)Online publication date: 11-May-2024
  • (2024)ReactGenie: A Development Framework for Complex Multimodal Interactions Using Large Language ModelsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642517(1-23)Online publication date: 11-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation
June 2021
1341 pages
ISBN:9781450383912
DOI:10.1145/3453483
This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. end-user programming
  2. programming by demon- stration
  3. virtual assis- tants
  4. voice user interfaces
  5. web automation

Qualifiers

  • Research-article

Conference

PLDI '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)304
  • Downloads (Last 6 weeks)37
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Bottom-Up Synthesis for Programs with Local VariablesProceedings of the ACM on Programming Languages10.1145/36328948:POPL(1540-1568)Online publication date: 5-Jan-2024
  • (2024)DynaVis: Dynamically Synthesized UI Widgets for Visualization EditingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642639(1-17)Online publication date: 11-May-2024
  • (2024)ReactGenie: A Development Framework for Complex Multimodal Interactions Using Large Language ModelsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642517(1-23)Online publication date: 11-May-2024
  • (2024)Eva: Python-based Desktop Virtual Assistant for Visually Impaired2024 7th International Conference on Circuit Power and Computing Technologies (ICCPCT)10.1109/ICCPCT61902.2024.10673357(582-586)Online publication date: 8-Aug-2024
  • (2024)Co-designing the integration of voice-based conversational AI and web augmentation to amplify web inclusivityScientific Reports10.1038/s41598-024-66725-314:1Online publication date: 13-Jul-2024
  • (2023)Low-Code Programming ModelsCommunications of the ACM10.1145/358769166:10(76-85)Online publication date: 22-Sep-2023
  • (2023)"This machine is for the aides": Tailoring Voice Assistant Design to Home Health Care WorkProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581118(1-19)Online publication date: 19-Apr-2023
  • (2023)ONYX: Assisting Users in Teaching Natural Language Interfaces Through Multi-Modal Interactive Task LearningProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580964(1-16)Online publication date: 19-Apr-2023
  • (2023)A Human-Computer Collaborative Editing Tool for Conceptual DiagramsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580676(1-29)Online publication date: 19-Apr-2023
  • (2023)Generating voice user interfaces from web sitesBehaviour & Information Technology10.1080/0144929X.2023.2272192(1-24)Online publication date: 30-Oct-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media