Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3332165.3347899acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article
Open access

PUMICE: A Multi-Modal Agent that Learns Concepts and Conditionals from Natural Language and Demonstrations

Published: 17 October 2019 Publication History

Abstract

Natural language programming is a promising approach to enable end users to instruct new tasks for intelligent agents. However, our formative study found that end users would often use unclear, ambiguous or vague concepts when naturally instructing tasks in natural language, especially when specifying conditionals. Existing systems have limited support for letting the user teach agents new concepts or explaining unclear concepts. In this paper, we describe a new multi-modal domain-independent approach that combines natural language programming and programming-by-demonstration to allow users to first naturally describe tasks and associated conditions at a high level, and then collaborate with the agent to recursively resolve any ambiguities or vagueness through conversations and demonstrations. Users can also define new procedures and concepts by demonstrating and referring to contents within GUIs of existing mobile apps. We demonstrate this approach in PUMICE, an end-user programmable agent that implements this approach. A lab study with 10 users showed its usability.

Supplementary Material

MP4 File (ufp4255pv.mp4)
Preview video
MP4 File (ufp4255vf.mp4)
Supplemental video
MP4 File (p577-li.mp4)

References

[1]
James Allen, Nathanael Chambers, George Ferguson, Lucian Galescu, Hyuckchul Jung, Mary Swift, and William Taysom. 2007. PLOW: A Collaborative Task Learning Agent. In Proceedings of the 22Nd National Conference on Artificial Intelligence - Volume 2 (AAAI'07). AAAI Press, Vancouver, British Columbia, Canada, 1514--1519.
[2]
Brenna D. Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. 2009. A Survey of Robot Learning from Demonstration . Robot. Auton. Syst. 57, 5 (May 2009), 469--483. http://dx.doi.org/10.1016/j.robot.2008.10.024
[3]
Amos Azaria, Jayant Krishnamurthy, and Tom M. Mitchell. 2016. Instructable Intelligent Personal Agent. In Proc. The 30th AAAI Conference on Artificial Intelligence (AAAI), Vol. 4.
[4]
Bruce W. Ballard and Alan W. Biermann. 1979. Programming in Natural Language “NLC” As a Prototype. In Proceedings of the 1979 Annual Conference (ACM '79). ACM, New York, NY, USA, 228--237. http://dx.doi.org/10.1145/800177.810072
[5]
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1533--1544.
[6]
Alan W. Biermann. 1983. Natural Language Programming. In Computer Program Synthesis Methodologies (NATO Advanced Study Institutes Series), Alan W. Biermann and Gerard Guiho (Eds.). Springer Netherlands, 335--368.
[7]
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 1247--1250. http://dl.acm.org/citation.cfm?id=1376746
[8]
Sarah E. Chasins, Maria Mueller, and Rastislav Bodik. 2018. Rousillon: Scraping Distributed Hierarchical Web Data. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology (UIST '18). ACM, New York, NY, USA, 963--975. http://dx.doi.org/10.1145/3242587.3242661
[9]
Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement 20, 1 (1960), 37--46.
[10]
Allen Cypher and Daniel Conrad Halbert. 1993. Watch what I do: programming by demonstration. MIT press.
[11]
Ethan Fast, Binbin Chen, Julia Mendelsohn, Jonathan Bassen, and Michael S. Bernstein. 2018. Iris: A Conversational Agent for Complex Tasks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, 473:1--473:12. http://dx.doi.org/10.1145/3173574.3174047
[12]
Floraine Grabler, Maneesh Agrawala, Wilmot Li, Mira Dontcheva, and Takeo Igarashi. 2009. Generating Photo Manipulation Tutorials by Demonstration. In ACM SIGGRAPH 2009 Papers (SIGGRAPH '09). ACM, New York, NY, USA, 66:1--66:9. http://dx.doi.org/10.1145/1576246.1531372
[13]
T. R. G. Green and M. Petre. 1996. Usability Analysis of Visual Programming Environments: A 'Cognitive Dimensions' Framework . Journal of Visual Languages & Computing 7, 2 (June 1996), 131--174. http://dx.doi.org/10.1006/jvlc.1996.0009
[14]
H Paul Grice, Peter Cole, Jerry Morgan, and others. 1975. Logic and conversation. 1975 (1975), 41--58.
[15]
Björn Hartmann, Leslie Wu, Kevin Collins, and Scott R. Klemmer. 2007. Programming by a Sample: Rapidly Creating Web Applications with D.Mix. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology (UIST '07). ACM, New York, NY, USA, 241--250. http://dx.doi.org/10.1145/1294211.1294254
[16]
Thanapong Intharah, Daniyar Turmukhambetov, and Gabriel J. Brostow. 2019. HILC: Domain-Independent PbD System Via Computer Vision and Follow-Up Questions. ACM Trans. Interact. Intell. Syst. 9, 2--3, Article 16 (March 2019), 27 pages. http://dx.doi.org/10.1145/3234508
[17]
Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney. 2005. Learning to Transform Natural to Formal Languages. In Proceedings of the 20th National Conference on Artificial Intelligence - Volume 3 (AAAI'05). AAAI Press, Pittsburgh, Pennsylvania, 1062--1068. http://dl.acm.org/citation.cfm?id=1619499.1619504
[18]
Tessa Lau, Steven A. Wolfman, Pedro Domingos, and Daniel S. Weld. 2001. Your Wish is My Command. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, Chapter Learning Repetitive Text-editing Procedures with SMARTedit, 209--226. http://dl.acm.org/citation.cfm?id=369505.369519
[19]
Gilly Leshed, Eben M. Haber, Tara Matthews, and Tessa Lau. 2008. CoScripter: Automating & Sharing How-to Knowledge in the Enterprise. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '08). ACM, New York, NY, USA, 1719--1728. http://dx.doi.org/10.1145/1357054.1357323
[20]
Toby Jia-Jun Li, Amos Azaria, and Brad A. Myers. 2017. SUGILITE: Creating Multimodal Smartphone Automation by Demonstration. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 6038--6049. http://dx.doi.org/10.1145/3025453.3025483
[21]
Toby Jia-Jun Li, Igor Labutov, Xiaohan Nancy Li, Xiaoyi Zhang, Wenze Shi, Tom M. Mitchell, and Brad A. Myers. 2018. APPINITE: A Multi-Modal Interface for Specifying Data Descriptions in Programming by Demonstration Using Verbal Instructions. In Proceedings of the 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2018).
[22]
Toby Jia-Jun Li, Yuanchun Li, Fanglin Chen, and Brad A. Myers. 2017. Programming IoT Devices by Demonstration Using Mobile Apps. In End-User Development, Simone Barbosa, Panos Markopoulos, Fabio Paterno, Simone Stumpf, and Stefano Valtolina (Eds.). Springer International Publishing, Cham, 3--17.
[23]
Toby Jia-Jun Li and Oriana Riva. 2018. KITE: Building conversational bots from mobile apps. In Proceedings of the 16th ACM International Conference on Mobile Systems, Applications, and Services (MobiSys 2018). ACM.
[24]
Henry Lieberman. 2001. Your wish is my command: Programming by example. Morgan Kaufmann.
[25]
Henry Lieberman and Hugo Liu. 2006. Feasibility studies for programming in natural language. In End User Development. Springer, 459--473.
[26]
Henry Lieberman, Hugo Liu, Push Singh, and Barbara Barry. 2004. Beating Common Sense into Interactive Applications . AI Magazine 25, 4 (Dec. 2004), 63--63. http://dx.doi.org/10.1609/aimag.v25i4.1785
[27]
H. Lieberman and D. Maulsby. 1996. Instructible agents: Software that just keeps getting better. IBM Systems Journal 35, 3.4 (1996), 539--556. http://dx.doi.org/10.1147/sj.353.0539
[28]
James Lin, Jeffrey Wong, Jeffrey Nichols, Allen Cypher, and Tessa A. Lau. 2009. End-user Programming of Mashups with Vegemite. In Proceedings of the 14th International Conference on Intelligent User Interfaces (IUI '09). ACM, New York, NY, USA, 97--106. http://dx.doi.org/10.1145/1502650.1502667
[29]
H. Liu and P. Singh. 2004. ConceptNet -- A Practical Commonsense Reasoning Tool-Kit. BT Technology Journal 22, 4 (01 Oct 2004), 211--226. http://dx.doi.org/10.1023/B:BTTJ.0000047600.45421.6d
[30]
Christopher J. MacLellan, Erik Harpstead, Robert P. Marinier III, and Kenneth R. Koedinger. 2018. A Framework for Natural Cognitive System Training Interactions . Advances in Cognitive Systems (2018).
[31]
Pattie Maes. 1994. Agents That Reduce Work and Information Overload . Commun. ACM 37, 7 (July 1994), 30--40. http://dx.doi.org/10.1145/176789.176792
[32]
Jennifer Mankoff, Gregory D Abowd, and Scott E Hudson. 2000. OOPS: a toolkit supporting mediation techniques for resolving ambiguity in recognition-based interfaces. Computers & Graphics 24, 6 (2000), 819--834.
[33]
Christoper Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of 52Nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations . http://dx.doi.org/10.3115/v1/P14--5010
[34]
Rada Mihalcea, Hugo Liu, and Henry Lieberman. 2006. NLP (Natural Language Processing) for NLP (Natural Language Programming). In Computational Linguistics and Intelligent Text Processing (Lecture Notes in Computer Science), Alexander Gelbukh (Ed.). Springer Berlin Heidelberg, 319--330.
[35]
Tom Mitchell, William Cohen, Estevam Hruschka, Partha Talukdar, Bo Yang, Justin Betteridge, Andrew Carlson, B Dalvi, Matt Gardner, Bryan Kisiel, and others. 2018. Never-ending learning. Commun. ACM 61, 5 (2018), 103--115.
[36]
Brad A. Myers, Andrew J. Ko, Thomas D. LaToza, and YoungSeok Yoon. 2016. Programmers Are Users Too: Human-Centered Methods for Improving Programming Tools . Computer 49, 7 (July 2016), 44--52. http://dx.doi.org/10.1109/MC.2016.200
[37]
Brad A. Myers, Andrew J. Ko, Chris Scaffidi, Stephen Oney, YoungSeok Yoon, Kerry Chang, Mary Beth Kery, and Toby Jia-Jun Li. 2017. Making End User Development More Natural . In New Perspectives in End-User Development. Springer, Cham, 1--22. http://dx.doi.org/10.1007/978--3--319--60291--2_1
[38]
Brad A. Myers, John F. Pane, and Andy Ko. 2004. Natural Programming Languages and Environments . Commun. ACM 47, 9 (Sept. 2004), 47--52. http://dx.doi.org/10.1145/1015864.1015888
[39]
Sharon Oviatt. 1999a. Mutual disambiguation of recognition errors in a multimodel architecture. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. ACM, 576--583.
[40]
Sharon Oviatt. 1999b. Ten Myths of Multimodal Interaction . Commun. ACM 42, 11 (Nov. 1999), 74--81. http://dx.doi.org/10.1145/319382.319398
[41]
John F. Pane, Brad A. Myers, and others. 2001. Studying the language and structure in non-programmers' solutions to programming problems. International Journal of Human-Computer Studies 54, 2 (2001), 237--264. http://www.sciencedirect.com/science/article/pii/S1071581900904105
[42]
Panupong Pasupat and Percy Liang. 2015. Compositional Semantic Parsing on Semi-Structured Tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing . http://arxiv.org/abs/1508.00305 arXiv: 1508.00305.
[43]
Fabio Paterno and Volker Wulf. 2017. New Perspectives in End-User Development (1st ed.). Springer.
[44]
David Price, Ellen Rilofff, Joseph Zachary, and Brandon Harvey. 2000. NaturalJava: A Natural Language Interface for Programming in Java. In Proceedings of the 5th International Conference on Intelligent User Interfaces (IUI '00). ACM, New York, NY, USA, 207--211. http://dx.doi.org/10.1145/325737.325845
[45]
Shilad Sen, Toby Jia-Jun Li, WikiBrain Team, and Brent Hecht. 2014. Wikibrain: democratizing computation on wikipedia. In Proceedings of The International Symposium on Open Collaboration. ACM, 27. http://dl.acm.org/citation.cfm?id=2641615
[46]
Shashank Srivastava, Igor Labutov, and Tom Mitchell. 2017. Joint concept learning and semantic parsing from natural language explanations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1527--1536.
[47]
Anselm Strauss and Juliet M. Corbin. 1990. Basics of qualitative research: Grounded theory procedures and techniques. Sage Publications, Inc.
[48]
David Vadas and James R Curran. 2005. Programming with unrestricted natural language. In Proceedings of the Australasian Language Technology Workshop 2005. 191--199.
[49]
Jeannette M. Wing. 2006. Computational Thinking . Commun. ACM 49, 3 (March 2006), 33--35. http://dx.doi.org/10.1145/1118178.1118215
[50]
Tom Yeh, Tsung-Hsiang Chang, and Robert C. Miller. 2009. Sikuli: Using GUI Screenshots for Search and Automation. In Proceedings of the 22Nd Annual ACM Symposium on User Interface Software and Technology (UIST '09). ACM, New York, NY, USA, 183--192. http://dx.doi.org/10.1145/1622176.1622213
[51]
P. Yin, B. Deng, E. Chen, B. Vasilescu, and G. Neubig . 2018. Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow. In 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR). 476--486.
[52]
Pengcheng Yin and Graham Neubig. 2017. A Syntactic Neural Model for General-Purpose Code Generation. CoRR abs/1704.01696 (2017). http://arxiv.org/abs/1704.01696
[53]
Xiaoyi Zhang, Anne Spencer Ross, Anat Caspi, James Fogarty, and Jacob O. Wobbrock. 2017. Interaction Proxies for Runtime Repair and Enhancement of Mobile Application Accessibility. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 6024--6037. http://dx.doi.org/10.1145/3025453.3025846

Cited By

View all
  • (2025)The Effect of Progressive Disclosure in the Transparency of Large Language ModelsComputer-Human Interaction Research and Applications10.1007/978-3-031-82633-7_17(269-288)Online publication date: 8-Mar-2025
  • (2024)End-User Development for Human-Robot Interaction: Results and Trends in an Emerging FieldProceedings of the ACM on Human-Computer Interaction10.1145/36611468:EICS(1-40)Online publication date: 17-Jun-2024
  • (2024)SQLucid: Grounding Natural Language Database Queries with Interactive ExplanationsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676368(1-20)Online publication date: 13-Oct-2024
  • Show More Cited By

Index Terms

  1. PUMICE: A Multi-Modal Agent that Learns Concepts and Conditionals from Natural Language and Demonstrations

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    UIST '19: Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology
    October 2019
    1229 pages
    ISBN:9781450368162
    DOI:10.1145/3332165
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. end user development
    2. multi-modal interaction
    3. natural language programming
    4. programming by demonstration

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    UIST '19

    Acceptance Rates

    Overall Acceptance Rate 561 of 2,567 submissions, 22%

    Upcoming Conference

    UIST '25
    The 38th Annual ACM Symposium on User Interface Software and Technology
    September 28 - October 1, 2025
    Busan , Republic of Korea

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)326
    • Downloads (Last 6 weeks)52
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)The Effect of Progressive Disclosure in the Transparency of Large Language ModelsComputer-Human Interaction Research and Applications10.1007/978-3-031-82633-7_17(269-288)Online publication date: 8-Mar-2025
    • (2024)End-User Development for Human-Robot Interaction: Results and Trends in an Emerging FieldProceedings of the ACM on Human-Computer Interaction10.1145/36611468:EICS(1-40)Online publication date: 17-Jun-2024
    • (2024)SQLucid: Grounding Natural Language Database Queries with Interactive ExplanationsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676368(1-20)Online publication date: 13-Oct-2024
    • (2024)DrawTalking: Building Interactive Worlds by Sketching and SpeakingProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676334(1-25)Online publication date: 13-Oct-2024
    • (2024)Insights into Natural Language Database Query Errors: from Attention Misalignment to User Handling StrategiesACM Transactions on Interactive Intelligent Systems10.1145/365011414:4(1-32)Online publication date: 2-Mar-2024
    • (2024)Understanding On-the-Fly End-User Robot ProgrammingProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3660721(2468-2480)Online publication date: 1-Jul-2024
    • (2024)AI for Low-Code for AIProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645203(837-852)Online publication date: 18-Mar-2024
    • (2024)From Awareness to Action: Exploring End-User Empowerment Interventions for Dark Patterns in UXProceedings of the ACM on Human-Computer Interaction10.1145/36373368:CSCW1(1-41)Online publication date: 26-Apr-2024
    • (2024)Computational Methodologies for Understanding, Automating, and Evaluating User InterfacesExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3636316(1-7)Online publication date: 11-May-2024
    • (2024)DynaVis: Dynamically Synthesized UI Widgets for Visualization EditingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642639(1-17)Online publication date: 11-May-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media