research-article

Open access

Why Don't You Speak?: A Smartphone Application to Engage Museum Visitors Through Deepfakes Creation

Authors:

Matteo Zaramella,

Paolo RussoAuthors Info & Claims

SUMAC '23: Proceedings of the 5th Workshop on analySis, Understanding and proMotion of heritAge Contents

Pages 29 - 37

https://doi.org/10.1145/3607542.3617359

Published: 29 October 2023 Publication History

Abstract

In this paper, we offer a gamification-based application for the cultural heritage sector that aims to enhance the learning and fruition of museum artworks. The application encourages users to experience history and culture in the first person, based on the idea that the artworks in a museum can tell their own story, thus improving the engagement of the museums and providing information on the artwork itself.

Specifically, we propose an application that allows museum visitors to create a deepfake video of a sculpture directly through their smartphone. More in detail, starting from a few live frames of a statue, the application generates in a short time a deepfake video where the statue talks by moving its lips synchronized with a text or audio file. The application exploits an underlying generative adversarial network technology and has been specialized on a custom statues dataset collected for the purpose. Experiments show that the generated videos exhibit great realism in the vast majority of cases, demonstrating the importance of a reliable statue face detection algorithm. The final aim of our application is to make the museum experience different, with a more immersive interaction and an engaging user experience, which could potentially attract more people to deepen classical history and culture.

Supplementary Material

MP4 File (10-video.mp4)

The video shows a practical usage of the Application proposed (WDyS) in a Museum. It is a demo, but it is still possible to see the interaction between people and statues allowed by the app. All three functionalities presented in the paper are shown in the video, to make the aim of the project clearer.

Download
27.41 MB

References

[1]

2014. ImageNet site. https://www.image-net.org/index.php

[2]

2018. MediaRecorder Android. https://developer.android.com/reference/android/media/MediaRecorder

[3]

2021. Camera 2 Android. https://developer.android.com/training/camera2

[4]

2021. DeepFaceLab git repo. Retrieved April 27, 2023 from https://github.com/ iperov/DeepFaceLab

[5]

2022. DeepFaceLive git repo. Retrieved May 30, 2023 from https://github.com/iperov/DeepFaceLive

[6]

2023. EfficientNet source code PyTorch. https://pytorch.org/vision/main/_modules/torchvision/models/efficientnet.html

[7]

2023. Faceswap git repo. Retrieved July 12, 2023 from https://github.com/deepfakes/faceswap

[8]

2023. Flask Python. https://flask.palletsprojects.com/en/2.3.x/

[9]

2023. HTTPURLConnection Android. https://developer.android.com/reference/java/net/HttpURLConnection

[10]

Zhuo Chen, ChaoyueWang, Haimei Zhao, Bo Yuan, and Xiu Li. 2022. D2Animator: Dual Distillation of StyleGAN For High-Resolution Face Animation. In Proceedings of the 30th ACM International Conference on Multimedia (Lisboa, Portugal) (MM '22). Association for Computing Machinery, New York, NY, USA, 1769--1778. https://doi.org/10.1145/3503161.3548002

Digital Library

[11]

David Peter Fox. [n. d.]. https://play.google.com/store/apps/details?id=com. talkingstatues&hl=en&gl=US

[12]

Ailin Huang, Zhewei Huang, and Shuchang Zhou. 2022. Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer. In Proceedings of the 30th ACM International Conference on Multimedia. ACM. https://doi.org/10.1145/3503161.3551577

Digital Library

[13]

Kaggle. 2020. Dataset of human faces. Retrieved 2020 from https://www.kaggle.com/datasets/ashwingupta3012/human-faces

[14]

Imran Khan, Ana Melro, Ana Carla Amaro, and Lídia Oliveira. 2020. Systematic review on gamification and cultural heritage dissemination. Journal of Digital Media & Interaction 3, 8 (2020), 19--41.

[15]

Iryna Korshunova, Wenzhe Shi, Joni Dambre, and Lucas Theis. 2017. Fast Faceswap Using Convolutional Neural Networks. arXiv:1611.09577 [cs.CV]

[16]

Yisroel Mirsky and Wenke Lee. 2021. The Creation and Detection of Deepfakes. Comput. Surveys 54, 1 (jan 2021), 1--41. https://doi.org/10.1145/3425780

Digital Library

[17]

Xingang Pan, Ayush Tewari, Thomas Leimkühler, Lingjie Liu, Abhimitra Meka, and Christian Theobalt. 2023. Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold. arXiv:2305.10973 [cs.CV]

[18]

Ivan Perov, Daiheng Gao, Nikolay Chervoniy, Kunlin Liu, Sugasa Marangonda, Chris Umé, Mr. Dpfks, Carl Shift Facenheim, Luis RP, Jian Jiang, Sheng Zhang, Pingyu Wu, Bo Zhou, and Weiming Zhang. 2020. DeepFaceLab: A simple, flexible and extensible face swapping framework. CoRR abs/2005.05535 (2020). arXiv:2005.05535 https://arxiv.org/abs/2005.05535

[19]

K. R. Prajwal, Rudrabha Mukhopadhyay, Vinay P. Namboodiri, and C. V. Jawahar. 2020. A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild. CoRR abs/2008.10010 (2020). arXiv:2008.10010 https://arxiv.org/abs/2008.10010

[20]

Albert Pumarola, Antonio Agudo, Aleix M. Martinez, Alberto Sanfeliu, and Francesc Moreno-Noguer. 2018. GANimation: Anatomically-aware Facial Animation from a Single Image. arXiv:1807.09251 [cs.CV]

[21]

Tingting Qiao, Jing Zhang, Duanqing Xu, and Dacheng Tao. 2019. MirrorGAN: Learning Text-to-image Generation by Redescription. arXiv:1903.05854 [cs.CL]

[22]

Lawrence Rabiner and Ronald Schafer. 2010. Theory and applications of digital speech processing. Prentice Hall Press.

[23]

Karen Robson, Kirk Plangger, Jan H. Kietzmann, Ian McCarthy, and Leyland Pitt. 2015. Is it all a game? Understanding the principles of gamification. Business Horizons 58, 4 (2015), 411--420. https://doi.org/10.1016/j.bushor.2015.03.006

[24]

Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, and Yaniv Taigman. 2022. Make-A-Video: Text-to-Video Generation without Text-Video Data. arXiv:2209.14792 [cs.CV]

[25]

Ultralytics. 2023. YOLOv8 official GitHub repository. Retrieved July 27, 2023 from https://github.com/ultralytics/ultralytics

[26]

Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE signal processing letters 23, 10 (2016), 1499--1503. 37.

Index Terms

Why Don't You Speak?: A Smartphone Application to Engage Museum Visitors Through Deepfakes Creation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Feasibility Study on Touch Screen Interaction Technology Based on “The Sword of King Goujian” Exhibit
Culture and Computing
Abstract
Museums, as important carriers of historical cultural relics, can provide more innovative and attractive viewing experiences to attract more visitors while a wide range of interactive technologies are developed nowadays. As we all know, the ...
Visualizing museum visitors' behavior: Where do they go and what do they do there?

Museum curators and personnel are interested in understanding what is happening at their museum: what exhibitions and exhibits do visitors attend to, what exhibits visitors spend most time at, what hours of the day are most busy at certain areas in the ...
Identifying and providing museum services for ubiquitous visitors

Recent technological advancements have enabled personalised services for the museum ubiquitous visitors. Museum ubiquitous visitors are the visitors who are free to access required services anywhere, any time through any possible devices without making ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SUMAC '23: Proceedings of the 5th Workshop on analySis, Understanding and proMotion of heritAge Contents

November 2023

75 pages

ISBN:9798400702792

DOI:10.1145/3607542

General Chairs:
Valerie Gouet-Brunet
LaSTIG Lab / IGN - Gustave Eiffel University, France
,
Ronak Kosti
Picsart AI Research Lab, Berlin, Germany
,
Li Weng
Zhejiang Financial College, Hangzhou, China

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

European Union - NextGenerationEU
Sapienza University of Rome

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

November 2, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 5 of 6 submissions, 83%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
226
Total Downloads

Downloads (Last 12 months)226
Downloads (Last 6 weeks)31

Reflects downloads up to 03 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents