Abstract
The understanding of information in a text description can be improved by visually accompanying it with images or videos. This opportunity is particularly relevant for books and other traditional instructional material. Videos or, more in general, (interactive) graphics contents, can help to increase the effectiveness of this material, by providing, e.g., an animated representation of the steps to be performed to carry out a given procedure. The generation of 3D animated contents, however, is still very labor-intensive and time-consuming. Systems able to speed up this process offering flexible and easy-to-use interfaces are becoming of paramount importance. Hence, this paper describes a system designed to automatically generate a computer graphics video by processing a text description and a set of associated images. The system combines Natural Language Processing and image analysis for extracting information needed to visually represent the procedure depicted in an instruction manual using 3D animations. It relies on a database of 3D models and preconfigured animations that are activated according to the information extracted from the said input. Moreover, by analyzing the images, the system can also generate new animations from scratch. Promising results have been obtained assessing the system performance in a specific use case focused on printers maintenance.
This work was developed in the frame of the VR@POLITO initiative. The research was supported by PON “Ricerca e Innovazione” 2014-2020 – DM 1062/2021 funds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
NeuralCoref: https://github.com/huggingface/neuralcoref.
- 2.
spaCy: https://spacy.io/.
- 3.
Scene Graph Parser: http://tiny.cc/dnqpuz.
- 4.
Mask R-CNN: https://github.com/matterport/Mask_RCNN.
- 5.
WordNet: https://wordnet.princeton.edu/.
- 6.
VGG Image Annotator: http://tiny.cc/fnqpuz.
- 7.
Canon PIXMA-MX495 manual: https://bit.ly/3hxeohx.
- 8.
Epson WF-7010 manual: https://bit.ly/3pxLNNu.
- 9.
HP Deskjet 3000 manual: https://bit.ly/3hw1nVv.
- 10.
Video generated in the experiments: http://tiny.cc/iiopuz.
References
Ali, G., Lee, M., Hwang, J.I.: Automatic text-to-gesture rule generation for embodied conversational agents. Comput. Anim. Virtual Worlds 31(4–5), e1944 (2020)
Armando, A., Pecchiari, P.: NALIG: a CAD system for interior design with high level interaction capabilities. In: Proceedings of the IEEE Conference on Tools with AI, pp. 446–447 (1993)
Badler, N.I., Bindiganavale, R., Allbeck, J.: Parameterized Action Representation for Virtual Embodied Conversational Agents. MIT Press, Cambridge (2000)
Cannavò, A., et al.: An automatic 3D scene generation pipeline based on a single 2D image. In: De Paolis, L.T., Arpaia, P., Bourdot, P. (eds.) AVR 2021. LNCS, vol. 12980, pp. 109–117. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87595-4_9
Cannavò, A., Lamberti, F.: A virtual character posing system based on reconfigurable tangible user interfaces and immersive virtual reality. In: Proceedings of the Conference on Smart Tools and Applications in Graphics, pp. 1–11 (2018)
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)
Chang, A.X., Eric, M., Savva, M., Manning, C.D.: SceneSeer: 3D scene design with natural language. arXiv preprint arXiv:1703.00050 (2017)
Chen, C.Y., Wong, S.K., Liu, W.Y.: Generation of small groups with rich behaviors from natural language interface. Comput. Anim. Virtual Worlds 31(4–5), e1960 (2020)
Coyne, B., Sproat, R.: WordsEye: an automatic text-to-scene conversion system. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 487–496 (2001)
Denis, M., Logie, R., Cornoldo, C., de Vega, M., EngelKamp, J.: Imagery, Language and Visuo-spatial Thinking, vol. 1. Psychology Press, Hove (2012)
Hanser, E., Mc Kevitt, P., Lunney, T., Condell, J.: SceneMaker: automatic visualisation of screenplays. In: Mertsching, B., Hund, M., Aziz, Z. (eds.) KI 2009. LNCS (LNAI), vol. 5803, pp. 265–272. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04617-9_34
Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of the 4th Alvey Vision Conference, pp. 147–151 (1988)
Hassani, K., Nahvi, A., Ahmadi, A.: Design and implementation of an intelligent virtual environment for improving speaking and listening skills. Interact. Learn. Environ. 24(1), 252–271 (2016)
Johansson, R., Williams, D., Berglund, A., Nugues, P.: Carsim: A system to visualize written road accident reports as animated 3D scenes. In: Proceedings of the 2nd Workshop on Text Meaning and Interpretation, pp. 57–64 (2004)
Liu, Z.Q., Leung, K.M.: Script visualization (ScriptViz): a smart system that makes writing fun. Soft Comput. 10(1), 34–40 (2006)
Ma, M.: Automatic conversion of natural language to 3D animation. Ph.D. thesis, University of Ulster (2006)
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Mansor, N.R., et al.: A review survey on the use computer animation in education. IOP Conf. Ser. Mater. Sci. Eng. 917, 012021 (2020)
Marti, M., et al.: Cardinal: computer assisted authoring of movie scripts. In: Proceedings of the 23rd International Conference on Intelligent User Interfaces, pp. 509–519 (2018)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Özdemir, S.: Supporting printed books with multimedia: a new way to use mobile technology for learning. Br. J. Educ. Technol. 41(6), E135–E138 (2010)
Preim, B., Meuschke, M.: A survey of medical animations. Comput. Graph. 90, 145–168 (2020)
Seversky, L.M., Yin, L.: Real-time automatic 3D scene generation from natural language voice and text descriptions. In: Proceedings of the 14th ACM international Conference on Multimedia, pp. 61–64 (2006)
Shi, J., et al.: Good features to track. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600 (1994)
Soogund, N.U.N., Joseph, M.H.: Signar: A sign language translator application with augmented reality using text and image recognition. In: Proceedings of the IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing, pp. 1–5 (2019)
Wolfartsberger, J., Niedermayr, D.: Authoring-by-doing: animating work instructions for industrial virtual reality learning environments. In: Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces - Abstracts and Workshops, pp. 173–176 (2020)
Yadav, P., Sathe, K., Chandak, M.: Generating animations from instructional text. Int. J. Adv. Trends Comput. Sci. Eng. 9(3), 3023–3027 (2020)
Zhang, Y., Tsipidi, E., Schriber, S., Kapadia, M., Gross, M., Modi, A.: Generating animations from screenplays. arXiv preprint arXiv:1904.05440 (2019)
Zyda, M.: From visual simulation to virtual reality to games. Computer 38(9), 25–32 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Cannavò, A., Gatteschi, V., Macis, L., Lamberti, F. (2022). Automatic Generation of 3D Animations from Text and Images. In: De Paolis, L.T., Arpaia, P., Sacco, M. (eds) Extended Reality. XR Salento 2022. Lecture Notes in Computer Science, vol 13445. Springer, Cham. https://doi.org/10.1007/978-3-031-15546-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-15546-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15545-1
Online ISBN: 978-3-031-15546-8
eBook Packages: Computer ScienceComputer Science (R0)