Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3607542.3617359acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

Why Don't You Speak?: A Smartphone Application to Engage Museum Visitors Through Deepfakes Creation

Published: 29 October 2023 Publication History

Abstract

In this paper, we offer a gamification-based application for the cultural heritage sector that aims to enhance the learning and fruition of museum artworks. The application encourages users to experience history and culture in the first person, based on the idea that the artworks in a museum can tell their own story, thus improving the engagement of the museums and providing information on the artwork itself.
Specifically, we propose an application that allows museum visitors to create a deepfake video of a sculpture directly through their smartphone. More in detail, starting from a few live frames of a statue, the application generates in a short time a deepfake video where the statue talks by moving its lips synchronized with a text or audio file. The application exploits an underlying generative adversarial network technology and has been specialized on a custom statues dataset collected for the purpose. Experiments show that the generated videos exhibit great realism in the vast majority of cases, demonstrating the importance of a reliable statue face detection algorithm. The final aim of our application is to make the museum experience different, with a more immersive interaction and an engaging user experience, which could potentially attract more people to deepen classical history and culture.

Supplementary Material

MP4 File (10-video.mp4)
The video shows a practical usage of the Application proposed (WDyS) in a Museum. It is a demo, but it is still possible to see the interaction between people and statues allowed by the app. All three functionalities presented in the paper are shown in the video, to make the aim of the project clearer.

References

[1]
2014. ImageNet site. https://www.image-net.org/index.php
[2]
2018. MediaRecorder Android. https://developer.android.com/reference/android/media/MediaRecorder
[3]
2021. Camera 2 Android. https://developer.android.com/training/camera2
[4]
2021. DeepFaceLab git repo. Retrieved April 27, 2023 from https://github.com/ iperov/DeepFaceLab
[5]
2022. DeepFaceLive git repo. Retrieved May 30, 2023 from https://github.com/iperov/DeepFaceLive
[6]
2023. EfficientNet source code PyTorch. https://pytorch.org/vision/main/_modules/torchvision/models/efficientnet.html
[7]
2023. Faceswap git repo. Retrieved July 12, 2023 from https://github.com/deepfakes/faceswap
[8]
2023. Flask Python. https://flask.palletsprojects.com/en/2.3.x/
[9]
2023. HTTPURLConnection Android. https://developer.android.com/reference/java/net/HttpURLConnection
[10]
Zhuo Chen, ChaoyueWang, Haimei Zhao, Bo Yuan, and Xiu Li. 2022. D2Animator: Dual Distillation of StyleGAN For High-Resolution Face Animation. In Proceedings of the 30th ACM International Conference on Multimedia (Lisboa, Portugal) (MM '22). Association for Computing Machinery, New York, NY, USA, 1769--1778. https://doi.org/10.1145/3503161.3548002
[11]
David Peter Fox. [n. d.]. https://play.google.com/store/apps/details?id=com. talkingstatues&hl=en&gl=US
[12]
Ailin Huang, Zhewei Huang, and Shuchang Zhou. 2022. Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer. In Proceedings of the 30th ACM International Conference on Multimedia. ACM. https://doi.org/10.1145/3503161.3551577
[13]
Kaggle. 2020. Dataset of human faces. Retrieved 2020 from https://www.kaggle.com/datasets/ashwingupta3012/human-faces
[14]
Imran Khan, Ana Melro, Ana Carla Amaro, and Lídia Oliveira. 2020. Systematic review on gamification and cultural heritage dissemination. Journal of Digital Media & Interaction 3, 8 (2020), 19--41.
[15]
Iryna Korshunova, Wenzhe Shi, Joni Dambre, and Lucas Theis. 2017. Fast Faceswap Using Convolutional Neural Networks. arXiv:1611.09577 [cs.CV]
[16]
Yisroel Mirsky and Wenke Lee. 2021. The Creation and Detection of Deepfakes. Comput. Surveys 54, 1 (jan 2021), 1--41. https://doi.org/10.1145/3425780
[17]
Xingang Pan, Ayush Tewari, Thomas Leimkühler, Lingjie Liu, Abhimitra Meka, and Christian Theobalt. 2023. Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold. arXiv:2305.10973 [cs.CV]
[18]
Ivan Perov, Daiheng Gao, Nikolay Chervoniy, Kunlin Liu, Sugasa Marangonda, Chris Umé, Mr. Dpfks, Carl Shift Facenheim, Luis RP, Jian Jiang, Sheng Zhang, Pingyu Wu, Bo Zhou, and Weiming Zhang. 2020. DeepFaceLab: A simple, flexible and extensible face swapping framework. CoRR abs/2005.05535 (2020). arXiv:2005.05535 https://arxiv.org/abs/2005.05535
[19]
K. R. Prajwal, Rudrabha Mukhopadhyay, Vinay P. Namboodiri, and C. V. Jawahar. 2020. A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild. CoRR abs/2008.10010 (2020). arXiv:2008.10010 https://arxiv.org/abs/2008.10010
[20]
Albert Pumarola, Antonio Agudo, Aleix M. Martinez, Alberto Sanfeliu, and Francesc Moreno-Noguer. 2018. GANimation: Anatomically-aware Facial Animation from a Single Image. arXiv:1807.09251 [cs.CV]
[21]
Tingting Qiao, Jing Zhang, Duanqing Xu, and Dacheng Tao. 2019. MirrorGAN: Learning Text-to-image Generation by Redescription. arXiv:1903.05854 [cs.CL]
[22]
Lawrence Rabiner and Ronald Schafer. 2010. Theory and applications of digital speech processing. Prentice Hall Press.
[23]
Karen Robson, Kirk Plangger, Jan H. Kietzmann, Ian McCarthy, and Leyland Pitt. 2015. Is it all a game? Understanding the principles of gamification. Business Horizons 58, 4 (2015), 411--420. https://doi.org/10.1016/j.bushor.2015.03.006
[24]
Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, and Yaniv Taigman. 2022. Make-A-Video: Text-to-Video Generation without Text-Video Data. arXiv:2209.14792 [cs.CV]
[25]
Ultralytics. 2023. YOLOv8 official GitHub repository. Retrieved July 27, 2023 from https://github.com/ultralytics/ultralytics
[26]
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE signal processing letters 23, 10 (2016), 1499--1503. 37.

Index Terms

  1. Why Don't You Speak?: A Smartphone Application to Engage Museum Visitors Through Deepfakes Creation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SUMAC '23: Proceedings of the 5th Workshop on analySis, Understanding and proMotion of heritAge Contents
    November 2023
    75 pages
    ISBN:9798400702792
    DOI:10.1145/3607542
    • General Chairs:
    • Valerie Gouet-Brunet,
    • Ronak Kosti,
    • Li Weng
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. deepfake
    2. face detection
    3. generative adversarial network
    4. museum user experience

    Qualifiers

    • Research-article

    Funding Sources

    • European Union - NextGenerationEU
    • Sapienza University of Rome

    Conference

    MM '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 5 of 6 submissions, 83%

    Upcoming Conference

    MM '24
    The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 226
      Total Downloads
    • Downloads (Last 12 months)226
    • Downloads (Last 6 weeks)31
    Reflects downloads up to 03 Sep 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media