research-article

Open access

PUMICE: A Multi-Modal Agent that Learns Concepts and Conditionals from Natural Language and Demonstrations

Authors:

Toby Jia-Jun Li,

Marissa Radensky,

Kirielle Singarajah,

Tom M. Mitchell,

Brad A. MyersAuthors Info & Claims

UIST '19: Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology

Pages 577 - 589

https://doi.org/10.1145/3332165.3347899

Published: 17 October 2019 Publication History

Abstract

Natural language programming is a promising approach to enable end users to instruct new tasks for intelligent agents. However, our formative study found that end users would often use unclear, ambiguous or vague concepts when naturally instructing tasks in natural language, especially when specifying conditionals. Existing systems have limited support for letting the user teach agents new concepts or explaining unclear concepts. In this paper, we describe a new multi-modal domain-independent approach that combines natural language programming and programming-by-demonstration to allow users to first naturally describe tasks and associated conditions at a high level, and then collaborate with the agent to recursively resolve any ambiguities or vagueness through conversations and demonstrations. Users can also define new procedures and concepts by demonstrating and referring to contents within GUIs of existing mobile apps. We demonstrate this approach in PUMICE, an end-user programmable agent that implements this approach. A lab study with 10 users showed its usability.

Supplementary Material

MP4 File (ufp4255pv.mp4)

Preview video

Download
6.21 MB

MP4 File (ufp4255vf.mp4)

Supplemental video

Download
121.58 MB

MP4 File (p577-li.mp4)

Download
577.08 MB

References

[1]

James Allen, Nathanael Chambers, George Ferguson, Lucian Galescu, Hyuckchul Jung, Mary Swift, and William Taysom. 2007. PLOW: A Collaborative Task Learning Agent. In Proceedings of the 22Nd National Conference on Artificial Intelligence - Volume 2 (AAAI'07). AAAI Press, Vancouver, British Columbia, Canada, 1514--1519.

Digital Library

[2]

Brenna D. Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. 2009. A Survey of Robot Learning from Demonstration . Robot. Auton. Syst. 57, 5 (May 2009), 469--483. http://dx.doi.org/10.1016/j.robot.2008.10.024

Digital Library

[3]

Amos Azaria, Jayant Krishnamurthy, and Tom M. Mitchell. 2016. Instructable Intelligent Personal Agent. In Proc. The 30th AAAI Conference on Artificial Intelligence (AAAI), Vol. 4.

[4]

Bruce W. Ballard and Alan W. Biermann. 1979. Programming in Natural Language “NLC” As a Prototype. In Proceedings of the 1979 Annual Conference (ACM '79). ACM, New York, NY, USA, 228--237. http://dx.doi.org/10.1145/800177.810072

[5]

Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1533--1544.

[6]

Alan W. Biermann. 1983. Natural Language Programming. In Computer Program Synthesis Methodologies (NATO Advanced Study Institutes Series), Alan W. Biermann and Gerard Guiho (Eds.). Springer Netherlands, 335--368.

[7]

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 1247--1250. http://dl.acm.org/citation.cfm?id=1376746

Digital Library

[8]

Sarah E. Chasins, Maria Mueller, and Rastislav Bodik. 2018. Rousillon: Scraping Distributed Hierarchical Web Data. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology (UIST '18). ACM, New York, NY, USA, 963--975. http://dx.doi.org/10.1145/3242587.3242661

Digital Library

[9]

Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement 20, 1 (1960), 37--46.

[10]

Allen Cypher and Daniel Conrad Halbert. 1993. Watch what I do: programming by demonstration. MIT press.

Digital Library

[11]

Ethan Fast, Binbin Chen, Julia Mendelsohn, Jonathan Bassen, and Michael S. Bernstein. 2018. Iris: A Conversational Agent for Complex Tasks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, 473:1--473:12. http://dx.doi.org/10.1145/3173574.3174047

Digital Library

[12]

Floraine Grabler, Maneesh Agrawala, Wilmot Li, Mira Dontcheva, and Takeo Igarashi. 2009. Generating Photo Manipulation Tutorials by Demonstration. In ACM SIGGRAPH 2009 Papers (SIGGRAPH '09). ACM, New York, NY, USA, 66:1--66:9. http://dx.doi.org/10.1145/1576246.1531372

Digital Library

[13]

T. R. G. Green and M. Petre. 1996. Usability Analysis of Visual Programming Environments: A 'Cognitive Dimensions' Framework . Journal of Visual Languages & Computing 7, 2 (June 1996), 131--174. http://dx.doi.org/10.1006/jvlc.1996.0009

[14]

H Paul Grice, Peter Cole, Jerry Morgan, and others. 1975. Logic and conversation. 1975 (1975), 41--58.

[15]

Björn Hartmann, Leslie Wu, Kevin Collins, and Scott R. Klemmer. 2007. Programming by a Sample: Rapidly Creating Web Applications with D.Mix. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology (UIST '07). ACM, New York, NY, USA, 241--250. http://dx.doi.org/10.1145/1294211.1294254

Digital Library

[16]

Thanapong Intharah, Daniyar Turmukhambetov, and Gabriel J. Brostow. 2019. HILC: Domain-Independent PbD System Via Computer Vision and Follow-Up Questions. ACM Trans. Interact. Intell. Syst. 9, 2--3, Article 16 (March 2019), 27 pages. http://dx.doi.org/10.1145/3234508

Digital Library

[17]

Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney. 2005. Learning to Transform Natural to Formal Languages. In Proceedings of the 20th National Conference on Artificial Intelligence - Volume 3 (AAAI'05). AAAI Press, Pittsburgh, Pennsylvania, 1062--1068. http://dl.acm.org/citation.cfm?id=1619499.1619504

[18]

Tessa Lau, Steven A. Wolfman, Pedro Domingos, and Daniel S. Weld. 2001. Your Wish is My Command. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, Chapter Learning Repetitive Text-editing Procedures with SMARTedit, 209--226. http://dl.acm.org/citation.cfm?id=369505.369519

Digital Library

[19]

Gilly Leshed, Eben M. Haber, Tara Matthews, and Tessa Lau. 2008. CoScripter: Automating & Sharing How-to Knowledge in the Enterprise. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '08). ACM, New York, NY, USA, 1719--1728. http://dx.doi.org/10.1145/1357054.1357323

Digital Library

[20]

Toby Jia-Jun Li, Amos Azaria, and Brad A. Myers. 2017. SUGILITE: Creating Multimodal Smartphone Automation by Demonstration. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 6038--6049. http://dx.doi.org/10.1145/3025453.3025483

Digital Library

[21]

Toby Jia-Jun Li, Igor Labutov, Xiaohan Nancy Li, Xiaoyi Zhang, Wenze Shi, Tom M. Mitchell, and Brad A. Myers. 2018. APPINITE: A Multi-Modal Interface for Specifying Data Descriptions in Programming by Demonstration Using Verbal Instructions. In Proceedings of the 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2018).

[22]

Toby Jia-Jun Li, Yuanchun Li, Fanglin Chen, and Brad A. Myers. 2017. Programming IoT Devices by Demonstration Using Mobile Apps. In End-User Development, Simone Barbosa, Panos Markopoulos, Fabio Paterno, Simone Stumpf, and Stefano Valtolina (Eds.). Springer International Publishing, Cham, 3--17.

[23]

Toby Jia-Jun Li and Oriana Riva. 2018. KITE: Building conversational bots from mobile apps. In Proceedings of the 16th ACM International Conference on Mobile Systems, Applications, and Services (MobiSys 2018). ACM.

Digital Library

[24]

Henry Lieberman. 2001. Your wish is my command: Programming by example. Morgan Kaufmann.

[25]

Henry Lieberman and Hugo Liu. 2006. Feasibility studies for programming in natural language. In End User Development. Springer, 459--473.

[26]

Henry Lieberman, Hugo Liu, Push Singh, and Barbara Barry. 2004. Beating Common Sense into Interactive Applications . AI Magazine 25, 4 (Dec. 2004), 63--63. http://dx.doi.org/10.1609/aimag.v25i4.1785

[27]

H. Lieberman and D. Maulsby. 1996. Instructible agents: Software that just keeps getting better. IBM Systems Journal 35, 3.4 (1996), 539--556. http://dx.doi.org/10.1147/sj.353.0539

Digital Library

[28]

James Lin, Jeffrey Wong, Jeffrey Nichols, Allen Cypher, and Tessa A. Lau. 2009. End-user Programming of Mashups with Vegemite. In Proceedings of the 14th International Conference on Intelligent User Interfaces (IUI '09). ACM, New York, NY, USA, 97--106. http://dx.doi.org/10.1145/1502650.1502667

[29]

H. Liu and P. Singh. 2004. ConceptNet -- A Practical Commonsense Reasoning Tool-Kit. BT Technology Journal 22, 4 (01 Oct 2004), 211--226. http://dx.doi.org/10.1023/B:BTTJ.0000047600.45421.6d

Digital Library

[30]

Christopher J. MacLellan, Erik Harpstead, Robert P. Marinier III, and Kenneth R. Koedinger. 2018. A Framework for Natural Cognitive System Training Interactions . Advances in Cognitive Systems (2018).

[31]

Pattie Maes. 1994. Agents That Reduce Work and Information Overload . Commun. ACM 37, 7 (July 1994), 30--40. http://dx.doi.org/10.1145/176789.176792

Digital Library

[32]

Jennifer Mankoff, Gregory D Abowd, and Scott E Hudson. 2000. OOPS: a toolkit supporting mediation techniques for resolving ambiguity in recognition-based interfaces. Computers & Graphics 24, 6 (2000), 819--834.

[33]

Christoper Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of 52Nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations . http://dx.doi.org/10.3115/v1/P14--5010

[34]

Rada Mihalcea, Hugo Liu, and Henry Lieberman. 2006. NLP (Natural Language Processing) for NLP (Natural Language Programming). In Computational Linguistics and Intelligent Text Processing (Lecture Notes in Computer Science), Alexander Gelbukh (Ed.). Springer Berlin Heidelberg, 319--330.

[35]

Tom Mitchell, William Cohen, Estevam Hruschka, Partha Talukdar, Bo Yang, Justin Betteridge, Andrew Carlson, B Dalvi, Matt Gardner, Bryan Kisiel, and others. 2018. Never-ending learning. Commun. ACM 61, 5 (2018), 103--115.

Digital Library

[36]

Brad A. Myers, Andrew J. Ko, Thomas D. LaToza, and YoungSeok Yoon. 2016. Programmers Are Users Too: Human-Centered Methods for Improving Programming Tools . Computer 49, 7 (July 2016), 44--52. http://dx.doi.org/10.1109/MC.2016.200

Digital Library

[37]

Brad A. Myers, Andrew J. Ko, Chris Scaffidi, Stephen Oney, YoungSeok Yoon, Kerry Chang, Mary Beth Kery, and Toby Jia-Jun Li. 2017. Making End User Development More Natural . In New Perspectives in End-User Development. Springer, Cham, 1--22. http://dx.doi.org/10.1007/978--3--319--60291--2_1

[38]

Brad A. Myers, John F. Pane, and Andy Ko. 2004. Natural Programming Languages and Environments . Commun. ACM 47, 9 (Sept. 2004), 47--52. http://dx.doi.org/10.1145/1015864.1015888

Digital Library

[39]

Sharon Oviatt. 1999a. Mutual disambiguation of recognition errors in a multimodel architecture. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. ACM, 576--583.

Digital Library

[40]

Sharon Oviatt. 1999b. Ten Myths of Multimodal Interaction . Commun. ACM 42, 11 (Nov. 1999), 74--81. http://dx.doi.org/10.1145/319382.319398

Digital Library

[41]

John F. Pane, Brad A. Myers, and others. 2001. Studying the language and structure in non-programmers' solutions to programming problems. International Journal of Human-Computer Studies 54, 2 (2001), 237--264. http://www.sciencedirect.com/science/article/pii/S1071581900904105

Digital Library

[42]

Panupong Pasupat and Percy Liang. 2015. Compositional Semantic Parsing on Semi-Structured Tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing . http://arxiv.org/abs/1508.00305 arXiv: 1508.00305.

[43]

Fabio Paterno and Volker Wulf. 2017. New Perspectives in End-User Development (1st ed.). Springer.

[44]

David Price, Ellen Rilofff, Joseph Zachary, and Brandon Harvey. 2000. NaturalJava: A Natural Language Interface for Programming in Java. In Proceedings of the 5th International Conference on Intelligent User Interfaces (IUI '00). ACM, New York, NY, USA, 207--211. http://dx.doi.org/10.1145/325737.325845

Digital Library

[45]

Shilad Sen, Toby Jia-Jun Li, WikiBrain Team, and Brent Hecht. 2014. Wikibrain: democratizing computation on wikipedia. In Proceedings of The International Symposium on Open Collaboration. ACM, 27. http://dl.acm.org/citation.cfm?id=2641615

Digital Library

[46]

Shashank Srivastava, Igor Labutov, and Tom Mitchell. 2017. Joint concept learning and semantic parsing from natural language explanations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1527--1536.

[47]

Anselm Strauss and Juliet M. Corbin. 1990. Basics of qualitative research: Grounded theory procedures and techniques. Sage Publications, Inc.

[48]

David Vadas and James R Curran. 2005. Programming with unrestricted natural language. In Proceedings of the Australasian Language Technology Workshop 2005. 191--199.

[49]

Jeannette M. Wing. 2006. Computational Thinking . Commun. ACM 49, 3 (March 2006), 33--35. http://dx.doi.org/10.1145/1118178.1118215

Digital Library

[50]

Tom Yeh, Tsung-Hsiang Chang, and Robert C. Miller. 2009. Sikuli: Using GUI Screenshots for Search and Automation. In Proceedings of the 22Nd Annual ACM Symposium on User Interface Software and Technology (UIST '09). ACM, New York, NY, USA, 183--192. http://dx.doi.org/10.1145/1622176.1622213

Digital Library

[51]

P. Yin, B. Deng, E. Chen, B. Vasilescu, and G. Neubig . 2018. Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow. In 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR). 476--486.

Digital Library

[52]

Pengcheng Yin and Graham Neubig. 2017. A Syntactic Neural Model for General-Purpose Code Generation. CoRR abs/1704.01696 (2017). http://arxiv.org/abs/1704.01696

[53]

Xiaoyi Zhang, Anne Spencer Ross, Anat Caspi, James Fogarty, and Jacob O. Wobbrock. 2017. Interaction Proxies for Runtime Repair and Enhancement of Mobile Application Accessibility. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 6024--6037. http://dx.doi.org/10.1145/3025453.3025846

Digital Library

Cited By

Muralidhar DBelloum Rde Oliveira KAshok AMohammad P(2025)The Effect of Progressive Disclosure in the Transparency of Large Language ModelsComputer-Human Interaction Research and Applications10.1007/978-3-031-82633-7_17(269-288)Online publication date: 8-Mar-2025
https://doi.org/10.1007/978-3-031-82633-7_17
Vaiani GPaternò F(2024)End-User Development for Human-Robot Interaction: Results and Trends in an Emerging FieldProceedings of the ACM on Human-Computer Interaction10.1145/36611468:EICS(1-40)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3661146
Tian YKummerfeld JLi TZhang T(2024)SQLucid: Grounding Natural Language Database Queries with Interactive ExplanationsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676368(1-20)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676368
Show More Cited By

Index Terms

PUMICE: A Multi-Modal Agent that Learns Concepts and Conditionals from Natural Language and Demonstrations
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Natural language interfaces

Recommendations

End user programing of intelligent agents using demonstrations and natural language instructions
IUI '19 Companion: Companion Proceedings of the 24th International Conference on Intelligent User Interfaces

End-user programmable intelligent agents that can learn new tasks and concepts from users' explicit instructions are desired. This paper presents our progress on expanding the capabilities of such agents in the areas of task applicability, task ...
Multi-Modal Interactive Task Learning from Demonstrations and Natural Language Instructions
UIST '20 Adjunct: Adjunct Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology

Interactive task learning (ITL) allows end users to 'teach' an intelligent agent new tasks, the corresponding task conditions,and the relevant concepts. This paper presents my research on expanding the applicability, generalizability, robustness, ...
Natural language in introductory programming: an experimental study
ITiCSE '11: Proceedings of the 16th annual joint conference on Innovation and technology in computer science education

Although characterized as being "high level", classical programming languages such as Pascal and C have a grammar that is very different from natural language. In this research field, two main streams are noteworthy, one of them is characterized by an ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

UIST '19: Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology

October 2019

1229 pages

ISBN:9781450368162

DOI:10.1145/3332165

General Chair:
François Guimbretière
Cornell University, USA
,
Program Chairs:
Michael Bernstein
Stanford University, USA
,
Katharina Reinecke
University of Washington, USA

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation
J.P. Morgan
Verizon
Oath

Conference

UIST '19

Sponsor:

UIST '19: The 32nd Annual ACM Symposium on User Interface Software and Technology

October 20 - 23, 2019

LA, New Orleans, USA

Acceptance Rates

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Upcoming Conference

UIST '25

Sponsor:
sigchi
sigchi

The 38th Annual ACM Symposium on User Interface Software and Technology

September 28 - October 1, 2025

Busan , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

58
Total Citations
View Citations
1,800
Total Downloads

Downloads (Last 12 months)326
Downloads (Last 6 weeks)52

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Muralidhar DBelloum Rde Oliveira KAshok AMohammad P(2025)The Effect of Progressive Disclosure in the Transparency of Large Language ModelsComputer-Human Interaction Research and Applications10.1007/978-3-031-82633-7_17(269-288)Online publication date: 8-Mar-2025
https://doi.org/10.1007/978-3-031-82633-7_17
Vaiani GPaternò F(2024)End-User Development for Human-Robot Interaction: Results and Trends in an Emerging FieldProceedings of the ACM on Human-Computer Interaction10.1145/36611468:EICS(1-40)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3661146
Tian YKummerfeld JLi TZhang T(2024)SQLucid: Grounding Natural Language Database Queries with Interactive ExplanationsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676368(1-20)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676368
Rosenberg KKazi RWei LXia HPerlin K(2024)DrawTalking: Building Interactive Worlds by Sketching and SpeakingProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676334(1-25)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676334
Ning ZTian YZhang ZZhang TLi T(2024)Insights into Natural Language Database Query Errors: from Attention Misalignment to User Handling StrategiesACM Transactions on Interactive Intelligent Systems10.1145/365011414:4(1-32)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3650114
Stegner LHwang YPorfirio DMutlu B(2024)Understanding On-the-Fly End-User Robot ProgrammingProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3660721(2468-2480)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3660721
Rao NTsay JKate KHellendoorn VHirzel M(2024)AI for Low-Code for AIProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645203(837-852)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645203
Lu YZhang CYang YYao YLi T(2024)From Awareness to Action: Exploring End-User Empowerment Interventions for Dark Patterns in UXProceedings of the ACM on Human-Computer Interaction10.1145/36373368:CSCW1(1-41)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3637336
Jiang YLu YKnearem TKliman-Silver CLutteroth CLi TNichols JStuerzlinger W(2024)Computational Methodologies for Understanding, Automating, and Evaluating User InterfacesExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3636316(1-7)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3636316
Vaithilingam PGlassman EInala JWang C(2024)DynaVis: Dynamically Synthesized UI Widgets for Visualization EditingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642639(1-17)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642639
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten