Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3635636.3664252acmconferencesArticle/Chapter ViewAbstractPublication Pagesc-n-cConference Proceedingsconference-collections
poster

How do video content creation goals impact which concepts people prioritize for generating B-roll imagery?

Published: 23 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    B-roll is vital when producing high-quality videos, but finding the right images can be difficult and time-consuming. Moreover, what B-roll is most effective can depend on a video content creator’s intent—is the goal to entertain, to inform, or something else? While new text-to-image generation models provide promising avenues for streamlining B-roll production, it remains unclear how these tools can provide support for content creators with different goals. To close this gap, we aimed to understand how video content creator’s goals guide which visual concepts they prioritize for B-roll generation. Here we introduce a benchmark containing judgments from > 800 people as to which terms in 12 video transcripts should be assigned highest priority for B-roll imagery accompaniment. We verified that participants reliably prioritized different visual concepts depending on whether their goal was help produce informative or entertaining videos. We next explored how well several algorithms, including heuristic approaches and large language models (LLMs), could predict systematic patterns in human judgments. We found that none of these methods fully captured human judgments in either goal condition, with state-of-the-art LLMs (i.e., GPT-4) even underperforming a baseline that sampled only nouns or nouns and adjectives. Overall, our work identifies opportunities to develop improved algorithms to support video production workflows.

    References

    [1]
    Marc Brysbaert, Amy Beth Warriner, and Victor Kuperman. 2014. Concreteness ratings for 40 thousand generally known English word lemmas. Behavior research methods 46 (2014), 904–911.
    [2]
    Colin Campbell, Frauke Mattison Thompson, Pamela E Grimm, and Karen Robson. 2017. Understanding why consumers don’t skip pre-roll video ads. Journal of Advertising 46, 3 (2017), 411–423.
    [3]
    Pei-Yu Chi and Henry Lieberman. 2011. Intelligent assistance for conversational storytelling using story patterns. In Proceedings of the 16th international conference on Intelligent user interfaces. 217–226.
    [4]
    Judith E Fan, Robert D Hawkins, Mike Wu, and Noah D Goodman. 2020. Pragmatic inference and visual abstraction enable contextual flexibility during visual communication. Computational Brain & Behavior 3, 1 (2020), 86–101.
    [5]
    Klaus Fliessbach, Susanne Weis, Peter Klaver, Christian Erich Elger, and Bernd Weber. 2006. The effect of word concreteness on recognition memory. NeuroImage 32, 3 (2006), 1413–1421.
    [6]
    Simon Garrod, Nicolas Fay, John Lee, Jon Oberlander, and Tracy MacLeod. 2007. Foundations of representation: Where might graphical symbol systems come from?Cognitive science 31, 6 (2007), 961–987.
    [7]
    Kendall Goodrich, Shu Z Schiller, and Dennis Galletta. 2015. Consumer reactions to intrusiveness of online-video advertisements: do length, informativeness, and humor help (or hinder) marketing outcomes?Journal of advertising research 55, 1 (2015), 37–50.
    [8]
    Robert D Hawkins, Megumi Sano, Noah D Goodman, and Judith E Fan. 2023. Visual resemblance and interaction history jointly constrain pictorial meaning. Nature Communications 14, 1 (2023), 2199.
    [9]
    Sebastian Holt, Judith E Fan, and David Barner. 2024. Creating ad hoc graphical representations of number. Cognition 242 (2024), 105665.
    [10]
    Bernd Huber, Hijung Valentina Shin, Bryan Russell, Oliver Wang, and Gautham J Mysore. 2019. B-script: Transcript-based b-roll video editing with recommendations. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–11.
    [11]
    Holly Huey, Xuanchen Lu, Caren M Walker, and Judith E Fan. 2023. Visual explanations prioritize functional properties at the expense of visual fidelity. Cognition 236 (2023), 105414.
    [12]
    Mohamed Ibrahim, Pavlo D Antonenko, Carmen M Greenwood, and Denna Wheeler. 2012. Effects of segmenting, signalling, and weeding on learning from educational video. Learning, media and technology 37, 3 (2012), 220–235.
    [13]
    Yu Jiang, Jing Liu, Zechao Li, Changsheng Xu, and Hanqing Lu. 2012. Chat with illustration: a chat system with visual aids. In Proceedings of the 4th International Conference on Internet Multimedia Computing and Service. 96–99.
    [14]
    Claire Youngnyo Joa, Kisun Kim, and Louisa Ha. 2018. What makes people watch online in-stream video advertisements?Journal of Interactive Advertising 18, 1 (2018), 1–14.
    [15]
    Mackenzie Leake, Hijung Valentina Shin, Joy O Kim, and Maneesh Agrawala. 2020. Generating Audio-Visual Slideshows from Text Articles Using Word Concreteness. In CHI, Vol. 20. 25–30.
    [16]
    Jihyeon Janel Lee, Mitchell Gordon, and Maneesh Agrawala. 2017. Automatically Visualizing Audio Travel Podcasts. In Adjunct Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology. 165–167.
    [17]
    Jian Liao, Adnan Karim, Shivesh Singh Jadon, Rubaiat Habib Kazi, and Ryo Suzuki. 2022. RealityTalk: Real-time speech-driven augmented presentation for AR live storytelling. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–12.
    [18]
    Xingyu" Bruce" Liu, Vladimir Kirilyuk, Xiuxiu Yuan, Alex Olwal, Peggy Chi, Xiang" Anthony" Chen, and Ruofei Du. 2023. Visual captions: Augmenting verbal communication with on-the-fly visuals. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.
    [19]
    K. Mukherjee, R. X. D. Hawkins, and J. E. Fan. 2019. Communicating semantic part information in drawings. In Proceedings of the 41th Annual Meeting of the Cognitive Science Society. 2413–2419.
    [20]
    Kushin Mukherjee, Holly Huey, Xuanchen Lu, Yael Vinker, Rio Aguina-Kang, Ariel Shamir, and Judith Fan. 2023. SEVA: Leveraging sketches to evaluate alignment between human and machine visual abstraction. In Advances in Neural Information Processing Systems.
    [21]
    Allan Paivio, Mary Walsh, and Trudy Bons. 1994. Concreteness effects on memory: When and why?Journal of Experimental Psychology: Learning, Memory, and Cognition 20, 5 (1994), 1196.
    [22]
    Takaaki Shiratori, Moshe Mahler, Warren Trezevant, and Jessica K Hodgins. 2013. Expressing animated performances through puppeteering. In 2013 IEEE Symposium on 3D User Interfaces (3DUI). IEEE, 59–66.
    [23]
    Abdulhadi Shoufan. 2019. Estimating the cognitive value of YouTube’s educational videos: A learning analytics approach. Computers in Human Behavior 92 (2019), 450–458.
    [24]
    Douglas West and John Ford. 2001. Advertising agency philosophies and employee risk taking. Journal of Advertising 30, 1 (2001), 77–91.
    [25]
    Haijun Xia. 2020. Crosspower: Bridging graphics and linguistics. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 722–734.
    [26]
    Haijun Xia, Jennifer Jacobs, and Maneesh Agrawala. 2020. Crosscast: adding visuals to audio travel podcasts. In Proceedings of the 33rd annual ACM symposium on user interface software and technology. 735–746.
    [27]
    Justin Yang and Judith E Fan. 2021. Visual communication of object concepts at different levels of abstraction. arXiv preprint arXiv:2106.02775 (2021).

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    C&C '24: Proceedings of the 16th Conference on Creativity & Cognition
    June 2024
    718 pages
    ISBN:9798400704857
    DOI:10.1145/3635636
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 June 2024

    Check for updates

    Author Tags

    1. human behavioral benchmarking
    2. text-to-image
    3. video B-roll
    4. visual communication

    Qualifiers

    • Poster
    • Research
    • Refereed limited

    Funding Sources

    • ONR Science of Autonomy award
    • NSF Career Award

    Conference

    C&C '24
    Sponsor:
    C&C '24: Creativity and Cognition
    June 23 - 26, 2024
    IL, Chicago, USA

    Acceptance Rates

    Overall Acceptance Rate 108 of 371 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 32
      Total Downloads
    • Downloads (Last 12 months)32
    • Downloads (Last 6 weeks)32
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media