Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3586183.3606773acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
Open access

CrossTalk: Intelligent Substrates for Language-Oriented Interaction in Video-Based Communication and Collaboration

Published: 29 October 2023 Publication History


Despite the advances and ubiquity of digital communication media such as videoconferencing and virtual reality, they remain oblivious to the rich intentions expressed by users. Beyond transmitting audio, videos, and messages, we envision digital communication media as proactive facilitators that can provide unobtrusive assistance to enhance communication and collaboration. Informed by the results of a formative study, we propose three key design concepts to explore the systematic integration of intelligence into communication and collaboration, including the panel substrate, language-based intent recognition, and lightweight interaction techniques. We developed CrossTalk, a videoconferencing system that instantiates these concepts, which was found to enable a more fluid and flexible communication and collaboration experience.

Supplemental Material

ZIP File
Supplemental File


Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300233
Marjolijn L Antheunis, Alexander P Schouten, and Joseph B Walther. 2020. The hyperpersonal effect in online dating: Effects of text-based CMC vs. videoconferencing before meeting face-to-face. Media Psychology 23, 6 (2020), 820–839.
Apple. 2023. Siri. https://www.apple.com/siri/ Retrieved Jan 13, 2023.
Bon Adriel Aseniero, Marios Constantinides, Sagar Joglekar, Ke Zhou, and Daniele Quercia. 2020. MeetCues: Supporting Online Meetings Experience. In 31st IEEE Visualization Conference, IEEE VIS 2020 - Short Papers, Virtual Event, USA, October 25-30, 2020. https://doi.org/10.1109/VIS47514.2020.00054
Sriram Karthik Badam, Andreas Mathisen, Roman Rädle, Clemens N. Klokmose, and Niklas Elmqvist. 2019. Vistrates: A Component Model for Ubiquitous Analytics. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 586–596. https://doi.org/10.1109/TVCG.2018.2865144
Annie Banbury, Daniel Chamberlain, Susan Nancarrow, Jared Dart, Len Gray, and Lynne Parkinson. 2017. Can videoconferencing affect older people’s engagement and perception of their social support in long-term conditions management: a social network analysis from the Telehealth Literacy Project. Health & social care in the community 25, 3 (2017), 938–950.
Michel Beaudouin-Lafon. 2017. Towards unified principles of interaction. In Proceedings of the 12th Biannual Conference on Italian SIGCHI Chapter. 1–2.
Puneet Bhargava, Amanda E Lackey, Sabeen Dhand, Mariam Moshiri, Kedar Jambhekar, and Tarun Pandey. 2013. Radiology education 2.0—on the cusp of change: part 1. Tablet computers, online curriculums, remote meeting tools and audience response systems. Academic Radiology 20, 3 (2013), 364–372.
Richard A. Bolt. 1980. "Put-that-there": Voice and Gesture At the Graphics Interface. In Proceedings of the 7th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1980, Seattle, Washington, USA, July 14-18, 1980. https://doi.org/10.1145/800250.807503
Virginia Braun and Victoria Clarke. 2019. Reflecting on reflexive thematic analysis. Qualitative research in sport, exercise and health 11, no. 4 (2019): 589-597.
Fanglin Chen, Kewei Xia, Karan Dhabalia, and Jason I. Hong. 2019. MessageOnTap: A Suggestive Interface To Facilitate Messaging-related Tasks. In Conference on Human Factors in Computing Systems (CHI). https://doi.org/10.1145/3290605.3300805
Herbert H. Clark and Susan E. Brennan. 1991. Grounding In Communication. In Perspectives on socially shared cognition, Lauren B. Resnick, John M. Levine, and Stephanie D. Teasley (Eds.). American Psychological Association, 127–149. https://doi.org/10.1037/10096-006
Ross Cutler, Yasaman Hosseinkashi, Jamie Pool, Senja Filipi, Robert Aichner, Yuan Tu, and Johannes Gehrke. 2021. Meeting Effectiveness and Inclusiveness In Remote Collaboration. Proc. ACM Hum. Comput. Interact. 5, CSCW1 (2021), 173:1–173:29. https://doi.org/10.1145/3449247
Richard L Daft and Robert H Lengel. 1983. Information richness. A new approach to managerial behavior and organization design. Technical Report. Texas A and M Univ College Station Coll of Business Administration.
Martha S. Feldman and Wanda J. Orlikowski. 2011. Theorizing Practice and Practicing Theory. Organ. Sci. 22, 5 (2011), 1240–1253. https://doi.org/10.1287/orsc.1100.0612
Susan R. Fussell, Robert E. Kraut, and Jane Siegel. 2000. Coordination of Communication: Effects of Shared Visual Context On Collaborative Work. In CSCW 2000, Proceeding on the ACM 2000 Conference on Computer Supported Cooperative Work, Philadelphia, PA, USA, December 2-6, 2000. https://doi.org/10.1145/358916.358947
Werner Geyer, Heather A. Richter, Ludwin Fuchs, Tom Frauenhofer, Shahrokh Daijavad, and Steven E. Poltrock. 2001. A Team Collaboration Space Supporting Capture and Access of Virtual Meetings. In Proceedings of GROUP 2001, ACM 2001 International Conference on Supporting Group Work, September 30 - October 3, 2001, Boulder, Colorado, USA. https://doi.org/10.1145/500286.500315
Daniel Gillick, Korbinian Riedhammer, Benoît Favre, and Dilek Hakkani-Tür. 2009. A Global Optimization Framework for Meeting Summarization. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009, 19-24 April 2009, Taipei, Taiwan. https://doi.org/10.1109/ICASSP.2009.4960697
Google. 2022. Google NLP. https://cloud.google.com/naturallanguage/
Jens Emil Grønbæk, Banu Saatçi, Carla F. Griggio, and Clemens Nylandsted Klokmose. 2021. MirrorBlender: Supporting Hybrid Meetings with a Malleable Video-Conferencing System. In Conference on Human Factors in Computing Systems (CHI). https://doi.org/10.1145/3411764.3445698
William A. Hamilton, Nic Lupfer, Nicolas Botello, Tyler Tesch, Alex Stacy, Jeremy Merrill, Blake Williford, Frank R. Bentley, and Andruid Kerne. 2018. Collaborative Live Media Curation: Shared Context for Participation In Online Learning. In Conference on Human Factors in Computing Systems (CHI). https://doi.org/10.1145/3173574.3174129
Ken Hinckley, Koji Yatani, Michel Pahud, Nicole Coddington, Jenny Rodenhouse, Andy Wilson, Hrvoje Benko, and Bill Buxton. 2010. Pen + Touch = New Tools. In Symposium on User Interface Software and Technology (UIST). https://doi.org/10.1145/1866029.1866036
Pamela Hinds, Sara B Kiesler, and Sara Kiesler. 2002. Distributed work. MIT press.
James D. Hollan and Scott Stornetta. 1992. Beyond Being There. In Conference on Human Factors in Computing Systems (CHI). https://doi.org/10.1145/142750.142769
Eric Horvitz. 1999. Principles of Mixed-Initiative User Interfaces. In Conference on Human Factors in Computing Systems (CHI). https://doi.org/10.1145/302979.303030
René Tuma Hubert Knoblauch and Bernt Schnettler. 2014. Video analysis and videography. "Video analysis and videography." The SAGE handbook of qualitative data analysis (2014): 435-449.
Hiroshi Ishii and Minoru Kobayashi. 1992. ClearBoard: A Seamless Medium for Shared Drawing and Conversation with Eye Contact. In Conference on Human Factors in Computing Systems (CHI). https://doi.org/10.1145/142750.142977
Shahram Izadi, Harry Brignull, Tom Rodden, Yvonne Rogers, and Mia Underwood. 2003. Dynamo: a Public Interactive Surface Supporting the Cooperative Sharing and Exchange of Media. In Symposium on User Interface Software and Technology (UIST). https://doi.org/10.1145/964696.964714
Daniel Jurafsky and James H Martin. 2023. Chatbots and Dialogue Systems. In Speech and Language Processing. https://web.stanford.edu/ jurafsky/slp3/15.pdf
Kaggle. 2020. Human Conversation training data. https://www.kaggle.com/datasets/projjal1/human-conversation-training-data Retrieved September 1, 2022.
Demetrios Karis, Daniel Wildman, and Amir Mané. 2016. Improving Remote Collaboration With Video Conferencing and Video Portals. Hum. Comput. Interact. 31, 1 (2016), 1–58. https://doi.org/10.1080/07370024.2014.921506
Simone Kauffeld and Nale Lehmann-Willenbrock. 2012. Meetings matter: Effects of team meetings on team and organizational success. Small group research 43, 2 (2012), 130–158.
A. Kay and A. Goldberg. 1977. Personal Dynamic Media. Computer 10, 3 (1977), 31–41. https://doi.org/10.1109/C-M.1977.217672
Yea-Seul Kim, Mira Dontcheva, Eytan Adar, and Jessica Hullman. 2019. Vocal Shortcuts for Creative Experts. In Conference on Human Factors in Computing Systems (CHI). https://doi.org/10.1145/3290605.3300562
Nurit Kirshenbaum, Kylie Davidson, Jesse Harden, Chris North, Dylan Kobayashi, Ryan Theriot, Roderick S. Tabalba, Michael L. Rogers, Mahdi Belcaid, Andrew Thomas Burks, Krishna Bharadwaj, Luc Renambot, Andrew E. Johnson, Lance Long, and Jason Leigh. 2021. Traces of Time Through Space: Advantages of Creating Complex Canvases In Collaborative Meetings. Proc. ACM Hum. Comput. Interact. 5, ISS (2021), 502:1–502:20. https://doi.org/10.1145/3488552
Clemens N. Klokmose, James R. Eagan, Siemen Baader, Wendy Mackay, and Michel Beaudouin-Lafon. 2015. Webstrates: Shareable Dynamic Media. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology (Charlotte, NC, USA) (UIST ’15). Association for Computing Machinery, New York, NY, USA, 280–290. https://doi.org/10.1145/2807442.2807446
Hubert Knoblauch and Bernt Schnettler. 2012. Videography: Analysing video data as a ‘focused’ ethnographic and hermeneutical exercise. "Videography: Analysing video data as a ‘focused’ ethnographic and hermeneutical exercise." Qualitative Research 12, no. 3 (2012): 334-356.
Robert E. Kraut, Susan R. Fussell, and Jane Siegel. 2003. Visual Information As a Conversational Resource In Collaborative Physical Tasks. Hum. Comput. Interact. 18, 1-2 (2003), 13–49. https://doi.org/10.1207/S15327051HCI1812_2
Gierad Laput, Mira Dontcheva, Gregg Wilensky, Walter Chang, Aseem Agarwala, Jason Linder, and Eytan Adar. 2013. PixelTone: a Multimodal Interface for Image Editing. In Conference on Human Factors in Computing Systems (CHI). https://doi.org/10.1145/2470654.2481301
Stephen C Levinson. 2016. Turn-taking in human communication–origins and implications for language processing. Trends in cognitive sciences 20, 1 (2016), 6–14.
Christian Licoppe and Julien Morel. 2012. Video-in-interaction:“Talking heads” and the multimodal organization of mobile and Skype video calls. Research on Language & Social Interaction 45, 4 (2012), 399–429.
Xingyu “Bruce” Liu, Vladimir Kirilyuk, Xiuxiu Yuan, Alex Olwal, Peggy Chi, Xiang ‘Anthony’ Chen, and Ruofei Du. 2023. Visual Captions: Augmenting Verbal Communication with On-the-fly Visuals. In Conference on Human Factors in Computing Systems (CHI).
Kent Lyons, Christopher Skeels, Thad Starner, Cornelis M. Snoeck, Benjamin A. Wong, and Daniel Ashbrook. 2004. Augmenting Conversations using Dual-purpose Speech. In Symposium on User Interface Software and Technology (UIST). https://doi.org/10.1145/1029632.1029674
M. Lynne Markus and Terry Connolly. 1990. Why CSCW Applications Fail: Problems In the Adoption of Interdependent Work Tools. In CSCW ’90, Proceedings of the Conference on Computer Supported Cooperative Work, Los Angeles, CA, USA, October 7-10, 1990. https://doi.org/10.1145/99332.99368
Nora McDonald, Sarita Schoenebeck, and Andrea Forte. 2019. Reliability and Inter-rater Reliability In Qualitative Research: Norms and Guidelines for CSCW and HCI Practice. Proc. ACM Hum. Comput. Interact. 3, CSCW (2019), 72:1–72:23. https://doi.org/10.1145/3359174
Moira McGregor and John C. Tang. 2017. More To Meetings: Challenges In Using Speech-Based Technology To Support Meetings. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, CSCW 2017, Portland, OR, USA, February 25 - March 1, 2017. https://doi.org/10.1145/2998181.2998335
Microsoft. 2022. Azure Cognitive Services. https://azure.microsoft.com/en-us/services/cognitive-services/ Retrieved April 4, 2022.
Microsoft. 2022. Cortana. https://www.microsoft.com/en-us/cortana Retrieved Jan 13, 2023.
Meredith Ringel Morris, Jarrod Lombardo, and Daniel Wigdor. 2010. WeSearch: Supporting Collaborative Search and Sensemaking On a Tabletop Display. In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, CSCW 2010, Savannah, Georgia, USA, February 6-10, 2010. https://doi.org/10.1145/1718918.1718987
Carol T Nixon and Glenn E Littlepage. 1992. Impact of meeting procedures on meeting effectiveness. Journal of Business and Psychology 6 (1992), 361–369.
Kenton O’Hara, Jesper Kjeldskov, and Jeni Paay. 2011. Blended Interaction Spaces for Distributed Team Collaboration. ACM Trans. Comput. Hum. Interact. 18, 1 (2011), 3:1–3:28. https://doi.org/10.1145/1959022.1959025
Tomislav Pejsa, Julian Kantor, Hrvoje Benko, Eyal Ofek, and Andrew D. Wilson. 2016. Room2Room: Enabling Life-Size Telepresence In a Projected Augmented Reality Environment. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, CSCW 2016, San Francisco, CA, USA, February 27 - March 2, 2016. https://doi.org/10.1145/2818048.2819965
Martin Porcheron, Joel E. Fischer, Stuart Reeves, and Sarah Sharples. 2018. Voice Interfaces In Everyday Life. In Conference on Human Factors in Computing Systems (CHI). https://doi.org/10.1145/3173574.3174214
Roman Rädle, Midas Nouwens, Kristian Antonsen, James R. Eagan, and Clemens N. Klokmose. 2017. Codestrates: Literate Computing with Webstrates. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (Québec City, QC, Canada) (UIST ’17). Association for Computing Machinery, New York, NY, USA, 715–725. https://doi.org/10.1145/3126594.3126642
Nils Reimers. 2019. SentenceTransformers. https://www.sbert.net/ Retrieved September 10, 2022.
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019. https://doi.org/10.18653/v1/D19-1410
Henry M Robert III, Daniel H Honemann, Thomas J Balch, Daniel E Seabold, and Shmuel Gerber. 2020. Robert’s rules of order newly revised. PublicAffairs.
Steven G Rogelberg, Linda Rhoades Shanock, and Cliff W Scott. 2012. Wasted time and money in meetings: Increasing return on investment. Small Group Research 43, 2 (2012), 236–245.
Yang Shi, Chris Bryan, Sridatt Bhamidipati, Ying Zhao, Yaoxue Zhang, and Kwan-Liu Ma. 2018. MeetingVis: Visual Narratives To Assist In Recalling Meeting Context and Content. IEEE Trans. Vis. Comput. Graph. 24, 6 (2018), 1918–1929. https://doi.org/10.1109/TVCG.2018.2816203
Arjun Srinivasan and John Stasko. 2017. Orko: Facilitating multimodal interaction for visual exploration and analysis of networks. IEEE transactions on visualization and computer graphics 24, 1 (2017), 511–521.
John C. Tang, Gina Venolia, and Kori M. Inkpen. 2016. Meerkat and Periscope: I Stream, You Stream, Apps Stream for Live Streams. In Conference on Human Factors in Computing Systems (CHI). https://doi.org/10.1145/2858036.2858374
Stephen Viller. 1991. The Group Facilitator: A CSCW Perspective. In Proceedings of the Second European Conference on Computer Supported Cooperative Work, 24-27 September 1991, Amsterdam, The Netherlands. https://doi.org/10.1007/978-94-011-3506-1_6
Wei Wang, Saghar Hosseini, Ahmed Hassan Awadallah, Paul N. Bennett, and Chris Quirk. 2019. Context-Aware Intent Identification In Email Conversations. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, July 21-25, 2019. https://doi.org/10.1145/3331184.3331260
Terry Winograd. 1972. Understanding Natural Language. Academic Press, Inc., Orlando, FL, USA.
Haijun Xia. 2020. Crosspower: Bridging Graphics and Linguistics. In Symposium on User Interface Software and Technology (UIST). https://doi.org/10.1145/3379337.3415845
Haijun Xia. 2020. Object-Orieinted Representation and Interaction: A Step Towards Cognitively Direct Interaction. University of Toronto (Canada).
Haijun Xia, Bruno Araújo, Tovi Grossman, and Daniel J. Wigdor. 2016. Object-Oriented Drawing. In Conference on Human Factors in Computing Systems (CHI). https://doi.org/10.1145/2858036.2858075
Haijun Xia, Nathalie Henry Riche, Fanny Chevalier, Bruno De Araujo, and Daniel Wigdor. 2018. DataInk: Direct and Creative Data-Oriented Drawing. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3173574.3173797
Haijun Xia, Sebastian Herscher, Ken Perlin, and Daniel Wigdor. 2018. Spacetime: Enabling Fluid Individual and Collaborative Editing In Virtual Reality. In Symposium on User Interface Software and Technology (UIST). https://doi.org/10.1145/3242587.3242597
Haijun Xia, Jennifer Jacobs, and Maneesh Agrawala. 2020. Crosscast: Adding Visuals To Audio Travel Podcasts. In Symposium on User Interface Software and Technology (UIST). https://doi.org/10.1145/3379337.3415882
Saelyne Yang, Changyoon Lee, Hijung Valentina Shin, and Juho Kim. 2020. Snapstream: Snapshot-based Interaction In Live Streaming for Visual Art. In Conference on Human Factors in Computing Systems (CHI). https://doi.org/10.1145/3313831.3376390

Cited By

View all
  • (2024)Augmented Physics: Creating Interactive and Embedded Physics Simulations from Static Textbook DiagramsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676392(1-12)Online publication date: 13-Oct-2024
  • (2024)DrawTalking: Building Interactive Worlds by Sketching and SpeakingProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676334(1-25)Online publication date: 13-Oct-2024
  • (2024)The CoExplorer Technology Probe: A Generative AI-Powered Adaptive Interface to Support Intentionality in Planning and Running Video MeetingsProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661507(1638-1657)Online publication date: 1-Jul-2024
  • Show More Cited By

Index Terms

  1. CrossTalk: Intelligent Substrates for Language-Oriented Interaction in Video-Based Communication and Collaboration



      Information & Contributors


      Published In

      cover image ACM Conferences
      UIST '23: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology
      October 2023
      1825 pages
      This work is licensed under a Creative Commons Attribution International 4.0 License.



      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 29 October 2023

      Check for updates

      Author Tags

      1. Context-aware Computing
      2. Language-oriented Interaction
      3. Natural Language Interface
      4. Videoconferencing


      • Research-article
      • Research
      • Refereed limited


      UIST '23

      Acceptance Rates

      Overall Acceptance Rate 561 of 2,567 submissions, 22%

      Upcoming Conference

      UIST '25
      The 38th Annual ACM Symposium on User Interface Software and Technology
      September 28 - October 1, 2025
      Busan , Republic of Korea


      Other Metrics

      Bibliometrics & Citations


      Article Metrics

      • Downloads (Last 12 months)1,411
      • Downloads (Last 6 weeks)132
      Reflects downloads up to 10 Feb 2025

      Other Metrics


      Cited By

      View all
      • (2024)Augmented Physics: Creating Interactive and Embedded Physics Simulations from Static Textbook DiagramsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676392(1-12)Online publication date: 13-Oct-2024
      • (2024)DrawTalking: Building Interactive Worlds by Sketching and SpeakingProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676334(1-25)Online publication date: 13-Oct-2024
      • (2024)The CoExplorer Technology Probe: A Generative AI-Powered Adaptive Interface to Support Intentionality in Planning and Running Video MeetingsProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661507(1638-1657)Online publication date: 1-Jul-2024
      • (2024)Exploring the Potential for Generative AI-based Conversational Cues for Real-Time Collaborative IdeationProceedings of the 16th Conference on Creativity & Cognition10.1145/3635636.3656184(117-131)Online publication date: 23-Jun-2024
      • (2024)DrawTalking: Towards Building Interactive Worlds by Sketching and SpeakingExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651089(1-8)Online publication date: 11-May-2024
      • (2024)CoExplorer: Generative AI Powered 2D and 3D Adaptive Interfaces to Support Intentionality in Video MeetingsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650797(1-10)Online publication date: 11-May-2024
      • (2024)Are Claims Grounded in Data? An Empowering Linking Approach for Misalignment Identification in Online Data-Driven DiscussionsIEEE Access10.1109/ACCESS.2024.351103912(182045-182061)Online publication date: 2024

      View Options

      View options


      View or Download as a PDF file.



      View online with eReader.


      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options






      Share this Publication link

      Share on social media