poster

How do video content creation goals impact which concepts people prioritize for generating B-roll imagery?

Authors:

Mackenzie Leake,

Matthew David Fisher,

Judith E. FanAuthors Info & Claims

C&C '24: Proceedings of the 16th Conference on Creativity & Cognition

Pages 542 - 549

https://doi.org/10.1145/3635636.3664252

Published: 23 June 2024 Publication History

Abstract

B-roll is vital when producing high-quality videos, but finding the right images can be difficult and time-consuming. Moreover, what B-roll is most effective can depend on a video content creator’s intent—is the goal to entertain, to inform, or something else? While new text-to-image generation models provide promising avenues for streamlining B-roll production, it remains unclear how these tools can provide support for content creators with different goals. To close this gap, we aimed to understand how video content creator’s goals guide which visual concepts they prioritize for B-roll generation. Here we introduce a benchmark containing judgments from > 800 people as to which terms in 12 video transcripts should be assigned highest priority for B-roll imagery accompaniment. We verified that participants reliably prioritized different visual concepts depending on whether their goal was help produce informative or entertaining videos. We next explored how well several algorithms, including heuristic approaches and large language models (LLMs), could predict systematic patterns in human judgments. We found that none of these methods fully captured human judgments in either goal condition, with state-of-the-art LLMs (i.e., GPT-4) even underperforming a baseline that sampled only nouns or nouns and adjectives. Overall, our work identifies opportunities to develop improved algorithms to support video production workflows.

References

[1]

Marc Brysbaert, Amy Beth Warriner, and Victor Kuperman. 2014. Concreteness ratings for 40 thousand generally known English word lemmas. Behavior research methods 46 (2014), 904–911.

[2]

Colin Campbell, Frauke Mattison Thompson, Pamela E Grimm, and Karen Robson. 2017. Understanding why consumers don’t skip pre-roll video ads. Journal of Advertising 46, 3 (2017), 411–423.

[3]

Pei-Yu Chi and Henry Lieberman. 2011. Intelligent assistance for conversational storytelling using story patterns. In Proceedings of the 16th international conference on Intelligent user interfaces. 217–226.

Digital Library

[4]

Judith E Fan, Robert D Hawkins, Mike Wu, and Noah D Goodman. 2020. Pragmatic inference and visual abstraction enable contextual flexibility during visual communication. Computational Brain & Behavior 3, 1 (2020), 86–101.

[5]

Klaus Fliessbach, Susanne Weis, Peter Klaver, Christian Erich Elger, and Bernd Weber. 2006. The effect of word concreteness on recognition memory. NeuroImage 32, 3 (2006), 1413–1421.

[6]

Simon Garrod, Nicolas Fay, John Lee, Jon Oberlander, and Tracy MacLeod. 2007. Foundations of representation: Where might graphical symbol systems come from?Cognitive science 31, 6 (2007), 961–987.

[7]

Kendall Goodrich, Shu Z Schiller, and Dennis Galletta. 2015. Consumer reactions to intrusiveness of online-video advertisements: do length, informativeness, and humor help (or hinder) marketing outcomes?Journal of advertising research 55, 1 (2015), 37–50.

[8]

Robert D Hawkins, Megumi Sano, Noah D Goodman, and Judith E Fan. 2023. Visual resemblance and interaction history jointly constrain pictorial meaning. Nature Communications 14, 1 (2023), 2199.

[9]

Sebastian Holt, Judith E Fan, and David Barner. 2024. Creating ad hoc graphical representations of number. Cognition 242 (2024), 105665.

[10]

Bernd Huber, Hijung Valentina Shin, Bryan Russell, Oliver Wang, and Gautham J Mysore. 2019. B-script: Transcript-based b-roll video editing with recommendations. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–11.

Digital Library

[11]

Holly Huey, Xuanchen Lu, Caren M Walker, and Judith E Fan. 2023. Visual explanations prioritize functional properties at the expense of visual fidelity. Cognition 236 (2023), 105414.

[12]

Mohamed Ibrahim, Pavlo D Antonenko, Carmen M Greenwood, and Denna Wheeler. 2012. Effects of segmenting, signalling, and weeding on learning from educational video. Learning, media and technology 37, 3 (2012), 220–235.

[13]

Yu Jiang, Jing Liu, Zechao Li, Changsheng Xu, and Hanqing Lu. 2012. Chat with illustration: a chat system with visual aids. In Proceedings of the 4th International Conference on Internet Multimedia Computing and Service. 96–99.

Digital Library

[14]

Claire Youngnyo Joa, Kisun Kim, and Louisa Ha. 2018. What makes people watch online in-stream video advertisements?Journal of Interactive Advertising 18, 1 (2018), 1–14.

[15]

Mackenzie Leake, Hijung Valentina Shin, Joy O Kim, and Maneesh Agrawala. 2020. Generating Audio-Visual Slideshows from Text Articles Using Word Concreteness. In CHI, Vol. 20. 25–30.

[16]

Jihyeon Janel Lee, Mitchell Gordon, and Maneesh Agrawala. 2017. Automatically Visualizing Audio Travel Podcasts. In Adjunct Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology. 165–167.

[17]

Jian Liao, Adnan Karim, Shivesh Singh Jadon, Rubaiat Habib Kazi, and Ryo Suzuki. 2022. RealityTalk: Real-time speech-driven augmented presentation for AR live storytelling. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–12.

Digital Library

[18]

Xingyu" Bruce" Liu, Vladimir Kirilyuk, Xiuxiu Yuan, Alex Olwal, Peggy Chi, Xiang" Anthony" Chen, and Ruofei Du. 2023. Visual captions: Augmenting verbal communication with on-the-fly visuals. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.

[19]

K. Mukherjee, R. X. D. Hawkins, and J. E. Fan. 2019. Communicating semantic part information in drawings. In Proceedings of the 41th Annual Meeting of the Cognitive Science Society. 2413–2419.

[20]

Kushin Mukherjee, Holly Huey, Xuanchen Lu, Yael Vinker, Rio Aguina-Kang, Ariel Shamir, and Judith Fan. 2023. SEVA: Leveraging sketches to evaluate alignment between human and machine visual abstraction. In Advances in Neural Information Processing Systems.

[21]

Allan Paivio, Mary Walsh, and Trudy Bons. 1994. Concreteness effects on memory: When and why?Journal of Experimental Psychology: Learning, Memory, and Cognition 20, 5 (1994), 1196.

[22]

Takaaki Shiratori, Moshe Mahler, Warren Trezevant, and Jessica K Hodgins. 2013. Expressing animated performances through puppeteering. In 2013 IEEE Symposium on 3D User Interfaces (3DUI). IEEE, 59–66.

[23]

Abdulhadi Shoufan. 2019. Estimating the cognitive value of YouTube’s educational videos: A learning analytics approach. Computers in Human Behavior 92 (2019), 450–458.

Digital Library

[24]

Douglas West and John Ford. 2001. Advertising agency philosophies and employee risk taking. Journal of Advertising 30, 1 (2001), 77–91.

[25]

Haijun Xia. 2020. Crosspower: Bridging graphics and linguistics. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 722–734.

Digital Library

[26]

Haijun Xia, Jennifer Jacobs, and Maneesh Agrawala. 2020. Crosscast: adding visuals to audio travel podcasts. In Proceedings of the 33rd annual ACM symposium on user interface software and technology. 735–746.

Digital Library

[27]

Justin Yang and Judith E Fan. 2021. Visual communication of object concepts at different levels of abstraction. arXiv preprint arXiv:2106.02775 (2021).

Index Terms

How do video content creation goals impact which concepts people prioritize for generating B-roll imagery?
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
      1. User models
  2. Visualization
    1. Empirical studies in visualization
2. Information systems
  1. Information systems applications
    1. Multimedia information systems
      1. Multimedia content creation

Recommendations

Simulation-based in-between creation for CACAni system
SIGGRAPH ASIA '09: ACM SIGGRAPH ASIA 2009 Sketches

In-between creation in traditional cel animation based on the hand-drawn key-frames is a fundamental element for the actual production and plays a symbolic role in an artistic interpretation of the scene. To create impressive in-betweens, however, ...
Automatic cinematography and multilingual NLG for generating video documentaries

Automatically constructing a complete documentary or educational film from scattered pieces of images and knowledge is a significant challenge. Even when this information is provided in an annotated format, the problems of ordering, structuring and ...
Maya® character creation: modeling and animation controls

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

C&C '24: Proceedings of the 16th Conference on Creativity & Cognition

June 2024

718 pages

ISBN:9798400704857

DOI:10.1145/3635636

Copyright © 2024 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2024

Check for updates

Author Tags

Qualifiers

Poster
Research
Refereed limited

Funding Sources

ONR Science of Autonomy award
NSF Career Award

Conference

C&C '24

Sponsor:

SIGCHI

C&C '24: Creativity and Cognition

June 23 - 26, 2024

IL, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 108 of 371 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
32
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)32

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents