All in-text references underlined in blue are linked to publications on ResearchGate, letting you... more All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Proceedings of the ACM SIGGRAPH Symposium on Applied Perception, 2015
We convey a tremendous amount of information vocally. In addition to the obvious exchange of sema... more We convey a tremendous amount of information vocally. In addition to the obvious exchange of semantic information, we unconsciously vary a number of acoustic properties of the speech wave to provide information about our emotions, thoughts, and intentions. [Cahn 1990] Advances in understanding of human physiology combined with increases in the computational power available in modern computers have made the simulation of the human vocal tract a realistic option for creating artificial speech. Such systems can, in principle, produce any sound that a human can make. Here we present two experiments examining the expression of emotion using prosody (i.e., speech melody) in human recordings and an articulatory speech synthesis system.
This paper deals with the highly challenging problem of reconstructing the shape of a refracting ... more This paper deals with the highly challenging problem of reconstructing the shape of a refracting object from a single image of its resulting caustic. Due to the ubiquity of transparent refracting objects in everyday life, reconstruction of their shape entails a multitude of practical applications. The recent Shape from Caustics (SfC) method casts the problem as the inverse of a light propagation simulation for synthesis of the caustic image, that can be solved by a differentiable renderer. However, the inherent complexity of light transport through refracting surfaces currently limits the practicability with respect to reconstruction speed and robustness. To address these issues, we introduce Neural-Shape from Caustics (N-SfC), a learning-based extension that incorporates two components into the reconstruction pipeline: a denoising module, which alleviates the computational cost of the light transport simulation, and an optimization process based on learned gradient descent, which e...
Proceedings of the ACM SIGGRAPH Symposium on Applied Perception in Graphics and Visualization - APGV '11, 2011
Figure 1: In this work, we investigate the impact of the retargeting process on fixation patterns... more Figure 1: In this work, we investigate the impact of the retargeting process on fixation patterns, under the assumption that viewers' fixations should not change when viewing the source image and when viewing the retargeted results. We gather the users' fixations on the images and we derive their saliency maps from these eye tracking data. We compare the saliency maps of the retargeted images to the saliency maps of the original size images with six different automatic image similarity metrics. The differences between all these saliency maps and the ones gathered by using a model of prediction of human fixations are evaluated as well, in order to analyze if this model is able to match actual eye movements in a retargeting context.
Motion blur is a frequent requirement for the rendering of high-quality animated images. However,... more Motion blur is a frequent requirement for the rendering of high-quality animated images. However, the computational resources involved are usually higher than those for images that have not been temporally antialiased. In this article we study the influence of high-level properties such as object material and speed, shutter time, and antialiasing level. Based on scenes containing variations of these parameters, we design different psychophysical experiments to determine how influential they are in the perception of image quality. This work gives insights on the effects these parameters have and exposes certain situations where motion blurred stimuli may be indistinguishable from a gold standard. As an immediate practical application, images of similar quality can be produced while the computing requirements are reduced. Algorithmic efforts have traditionally been focused on finding new improved methods to alleviate sampling artifacts by steering computation to the most important dim...
Communications in Computer and Information Science
Figure 1: This paper investigates the perception of different emotions in reenacted portrait vide... more Figure 1: This paper investigates the perception of different emotions in reenacted portrait videos. Using a modified state-of-the-art technique that operates on the uv maps (b) of the manipulated meshes, we alter the facial expression of an input video (a) to display happiness (c), disbelief (d), positive surprise (e) and disgust (f).
26th ACM Symposium on Virtual Reality Software and Technology
We present a new pipeline to enable head-motion parallax in omnidirectional stereo (ODS) panorama... more We present a new pipeline to enable head-motion parallax in omnidirectional stereo (ODS) panorama video rendering using a neural depth decoder. While recent ODS panorama cameras record short-baseline horizontal stereo parallax to offer the impression of binocular depth, they do not support the necessary translational degrees-of-freedom (DoF) to also provide for head-motion parallax in virtual reality (VR) applications. To overcome this limitation, we propose a pipeline that enhances the classical ODS panorama format with 6 DoF free-viewpoint rendering by decomposing the scene into a multi-layer mesh representation. Given a spherical stereo panorama video, we use the horizontal disparity to store explicit depth information for both eyes in a simple neural decoder architecture. While this approach produces reasonable results for individual frames, video rendering usually suffers from temporal depth inconsistencies. Thus, we perform successive optimization to improve temporal consistency by fine-tuning our depth decoder for both temporal and spatial smoothness. Using a consumer-grade ODS camera, we evaluate our approach on a number of real-world scene recordings and demonstrate the versatility and robustness of the proposed pipeline.
Videos obtained by current face swapping techniques can contain artifacts potentially detectable,... more Videos obtained by current face swapping techniques can contain artifacts potentially detectable, yet unobtrusive to human observers. However, the perceptual differences between real and altered videos, as well as properties leading humans to classify a video as manipulated, are still unclear. Thus, to support the research on perceived realism and conveyed emotions in face swap videos, this paper introduces a high-resolution dataset providing the community with the necessary sophisticated stimuli. Our recording process has been specifically designed to focus on human perception research and entails three scenarios (text-reading, emotiontriggering, and free-speech). We assess the perceived realness of our dataset through a series of experiments. The results indicate that our stimuli are overall convincing, even for long video sequences. Furthermore, we partially annotate the dataset with noticeable facial distortions and artifacts reported by participants.
Proceedings of the 18th International Conference on Intelligent Virtual Agents, 2018
People tend to personify machines. Giving machines the ability to actually produce social informa... more People tend to personify machines. Giving machines the ability to actually produce social information can help improve human-machine interactions. Embodied Conversational Agents (ECAs) are virtual software agents that can process and produce speech, facial expressions, gestures and eye gaze, enabling natural, multimodal, human-machine communication. On the one hand, the field of personality psychology provides insights into how we could describe and measure the virtual personality of ECAs. On the other hand, ECAs provide a method to systematically examine how different factors affect the perception of personality. This paper shows that standardized, validated personality questionnaires can be used to evaluate ECAs psychologically, and that state of the art ECAs can manipulate their perceived personality through appearance and behavior.
Creating convincing representations of humans is a fundamental problem in both traditional arts a... more Creating convincing representations of humans is a fundamental problem in both traditional arts and modern media. In our digital world, virtual avatars allow us to simulate and render the human body for a variety of applications, including movie production, sports, human-computer interaction, and medical sciences. However, capturing digital representations of a person’s shape, appearance, and motion is an expensive and time-consuming process which usually requires a lot of manual adjustments.
2021 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), 2021
We investigate the mitigation of cybersickness (CS) in 360° videos, a phenomenon caused by the vi... more We investigate the mitigation of cybersickness (CS) in 360° videos, a phenomenon caused by the visually induced impression of ego-motion while being physically at rest. We evaluate the effectiveness of scene modulations to reduce motion in the peripheral visual field by deliberately blurring or opaque occluding eccentric view areas of up to ten degrees. Our results indicate that both methods effectively reduce CS in pre-recorded 360° video with the dynamic opaque occlusion method yielding best results.
We are increasingly approaching the point where computer-based technology is truly ambient and om... more We are increasingly approaching the point where computer-based technology is truly ambient and omnipresent. People tend to personify their technical servants, including giving them human names as well as attributing personality traits and intentions to them. The more those devices advance from simple tools to intelligent assistants the more seriously we need to take this personification. That is, if the computers perform human-like tasks in collaboration with humans, and humans already tend to treat computers as human-like, it is only reasonable to give those devices a human-like appearance and conversational abilities. Therefore, one approach to design advanced human-machine interfaces relies heavily on the so-called Embodied Conversational Agents (ECAs). An ECA is a virtual software agent that can process and produce speech, facial expressions, gestures and eye-gaze and, as a result, enables natural, multimodal, human-machine communication. Decades of research in psychology and re...
Cybersickness is a unpleasant phenomenon caused by the visually induced impression of ego-motion ... more Cybersickness is a unpleasant phenomenon caused by the visually induced impression of ego-motion while in fact being seated. To reduce its negative impact in VR experiences, we analyze the effectiveness of two techniques – peripheral blurring and field of view reduction – through an experiment in an interactive race game environment displayed with a commercial head-mounted display with integrated eye tracker. To measure the level of discomfort experienced by our participants, we utilize self-report and physiological measurements. Our results indicate that, among both techniques, reducing the displayed field of view up to 10 degrees is most efficient to mitigate cybersickness.
All in-text references underlined in blue are linked to publications on ResearchGate, letting you... more All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Proceedings of the ACM SIGGRAPH Symposium on Applied Perception, 2015
We convey a tremendous amount of information vocally. In addition to the obvious exchange of sema... more We convey a tremendous amount of information vocally. In addition to the obvious exchange of semantic information, we unconsciously vary a number of acoustic properties of the speech wave to provide information about our emotions, thoughts, and intentions. [Cahn 1990] Advances in understanding of human physiology combined with increases in the computational power available in modern computers have made the simulation of the human vocal tract a realistic option for creating artificial speech. Such systems can, in principle, produce any sound that a human can make. Here we present two experiments examining the expression of emotion using prosody (i.e., speech melody) in human recordings and an articulatory speech synthesis system.
This paper deals with the highly challenging problem of reconstructing the shape of a refracting ... more This paper deals with the highly challenging problem of reconstructing the shape of a refracting object from a single image of its resulting caustic. Due to the ubiquity of transparent refracting objects in everyday life, reconstruction of their shape entails a multitude of practical applications. The recent Shape from Caustics (SfC) method casts the problem as the inverse of a light propagation simulation for synthesis of the caustic image, that can be solved by a differentiable renderer. However, the inherent complexity of light transport through refracting surfaces currently limits the practicability with respect to reconstruction speed and robustness. To address these issues, we introduce Neural-Shape from Caustics (N-SfC), a learning-based extension that incorporates two components into the reconstruction pipeline: a denoising module, which alleviates the computational cost of the light transport simulation, and an optimization process based on learned gradient descent, which e...
Proceedings of the ACM SIGGRAPH Symposium on Applied Perception in Graphics and Visualization - APGV '11, 2011
Figure 1: In this work, we investigate the impact of the retargeting process on fixation patterns... more Figure 1: In this work, we investigate the impact of the retargeting process on fixation patterns, under the assumption that viewers' fixations should not change when viewing the source image and when viewing the retargeted results. We gather the users' fixations on the images and we derive their saliency maps from these eye tracking data. We compare the saliency maps of the retargeted images to the saliency maps of the original size images with six different automatic image similarity metrics. The differences between all these saliency maps and the ones gathered by using a model of prediction of human fixations are evaluated as well, in order to analyze if this model is able to match actual eye movements in a retargeting context.
Motion blur is a frequent requirement for the rendering of high-quality animated images. However,... more Motion blur is a frequent requirement for the rendering of high-quality animated images. However, the computational resources involved are usually higher than those for images that have not been temporally antialiased. In this article we study the influence of high-level properties such as object material and speed, shutter time, and antialiasing level. Based on scenes containing variations of these parameters, we design different psychophysical experiments to determine how influential they are in the perception of image quality. This work gives insights on the effects these parameters have and exposes certain situations where motion blurred stimuli may be indistinguishable from a gold standard. As an immediate practical application, images of similar quality can be produced while the computing requirements are reduced. Algorithmic efforts have traditionally been focused on finding new improved methods to alleviate sampling artifacts by steering computation to the most important dim...
Communications in Computer and Information Science
Figure 1: This paper investigates the perception of different emotions in reenacted portrait vide... more Figure 1: This paper investigates the perception of different emotions in reenacted portrait videos. Using a modified state-of-the-art technique that operates on the uv maps (b) of the manipulated meshes, we alter the facial expression of an input video (a) to display happiness (c), disbelief (d), positive surprise (e) and disgust (f).
26th ACM Symposium on Virtual Reality Software and Technology
We present a new pipeline to enable head-motion parallax in omnidirectional stereo (ODS) panorama... more We present a new pipeline to enable head-motion parallax in omnidirectional stereo (ODS) panorama video rendering using a neural depth decoder. While recent ODS panorama cameras record short-baseline horizontal stereo parallax to offer the impression of binocular depth, they do not support the necessary translational degrees-of-freedom (DoF) to also provide for head-motion parallax in virtual reality (VR) applications. To overcome this limitation, we propose a pipeline that enhances the classical ODS panorama format with 6 DoF free-viewpoint rendering by decomposing the scene into a multi-layer mesh representation. Given a spherical stereo panorama video, we use the horizontal disparity to store explicit depth information for both eyes in a simple neural decoder architecture. While this approach produces reasonable results for individual frames, video rendering usually suffers from temporal depth inconsistencies. Thus, we perform successive optimization to improve temporal consistency by fine-tuning our depth decoder for both temporal and spatial smoothness. Using a consumer-grade ODS camera, we evaluate our approach on a number of real-world scene recordings and demonstrate the versatility and robustness of the proposed pipeline.
Videos obtained by current face swapping techniques can contain artifacts potentially detectable,... more Videos obtained by current face swapping techniques can contain artifacts potentially detectable, yet unobtrusive to human observers. However, the perceptual differences between real and altered videos, as well as properties leading humans to classify a video as manipulated, are still unclear. Thus, to support the research on perceived realism and conveyed emotions in face swap videos, this paper introduces a high-resolution dataset providing the community with the necessary sophisticated stimuli. Our recording process has been specifically designed to focus on human perception research and entails three scenarios (text-reading, emotiontriggering, and free-speech). We assess the perceived realness of our dataset through a series of experiments. The results indicate that our stimuli are overall convincing, even for long video sequences. Furthermore, we partially annotate the dataset with noticeable facial distortions and artifacts reported by participants.
Proceedings of the 18th International Conference on Intelligent Virtual Agents, 2018
People tend to personify machines. Giving machines the ability to actually produce social informa... more People tend to personify machines. Giving machines the ability to actually produce social information can help improve human-machine interactions. Embodied Conversational Agents (ECAs) are virtual software agents that can process and produce speech, facial expressions, gestures and eye gaze, enabling natural, multimodal, human-machine communication. On the one hand, the field of personality psychology provides insights into how we could describe and measure the virtual personality of ECAs. On the other hand, ECAs provide a method to systematically examine how different factors affect the perception of personality. This paper shows that standardized, validated personality questionnaires can be used to evaluate ECAs psychologically, and that state of the art ECAs can manipulate their perceived personality through appearance and behavior.
Creating convincing representations of humans is a fundamental problem in both traditional arts a... more Creating convincing representations of humans is a fundamental problem in both traditional arts and modern media. In our digital world, virtual avatars allow us to simulate and render the human body for a variety of applications, including movie production, sports, human-computer interaction, and medical sciences. However, capturing digital representations of a person’s shape, appearance, and motion is an expensive and time-consuming process which usually requires a lot of manual adjustments.
2021 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), 2021
We investigate the mitigation of cybersickness (CS) in 360° videos, a phenomenon caused by the vi... more We investigate the mitigation of cybersickness (CS) in 360° videos, a phenomenon caused by the visually induced impression of ego-motion while being physically at rest. We evaluate the effectiveness of scene modulations to reduce motion in the peripheral visual field by deliberately blurring or opaque occluding eccentric view areas of up to ten degrees. Our results indicate that both methods effectively reduce CS in pre-recorded 360° video with the dynamic opaque occlusion method yielding best results.
We are increasingly approaching the point where computer-based technology is truly ambient and om... more We are increasingly approaching the point where computer-based technology is truly ambient and omnipresent. People tend to personify their technical servants, including giving them human names as well as attributing personality traits and intentions to them. The more those devices advance from simple tools to intelligent assistants the more seriously we need to take this personification. That is, if the computers perform human-like tasks in collaboration with humans, and humans already tend to treat computers as human-like, it is only reasonable to give those devices a human-like appearance and conversational abilities. Therefore, one approach to design advanced human-machine interfaces relies heavily on the so-called Embodied Conversational Agents (ECAs). An ECA is a virtual software agent that can process and produce speech, facial expressions, gestures and eye-gaze and, as a result, enables natural, multimodal, human-machine communication. Decades of research in psychology and re...
Cybersickness is a unpleasant phenomenon caused by the visually induced impression of ego-motion ... more Cybersickness is a unpleasant phenomenon caused by the visually induced impression of ego-motion while in fact being seated. To reduce its negative impact in VR experiences, we analyze the effectiveness of two techniques – peripheral blurring and field of view reduction – through an experiment in an interactive race game environment displayed with a commercial head-mounted display with integrated eye tracker. To measure the level of discomfort experienced by our participants, we utilize self-report and physiological measurements. Our results indicate that, among both techniques, reducing the displayed field of view up to 10 degrees is most efficient to mitigate cybersickness.
Uploads
Papers by Susana Castillo