The Alteration Experiment examines the role of hand animation accuracy and character appearance on participants’ comprehension, perception of the character, and social presence. A between-group design was used, where each participant saw either 15 short clips or six long motions on one avatar (out of two) with one motion alteration (out of eight), leading to a total of 32 different conditions: 8 (Motion Alteration) \(\times\) 2 (Avatar) \(\times\) 2 (Clip Length).
4.1 Motion Alterations
Our baseline motions are the original, unmodified motion captured data (Original) and the complete lack of hand motion (Static). Additionally, we created six motion alterations based on typical errors in the motion capture process or on methods to synthesize or post-process motion data. Based on our results, we realized that our alterations can be grouped into three categories: Full motion data displays the fully accurate motion data, Partial motion data represents data that has been altered from the captured data, and No motion data includes conditions where no information on the hand motions is used and hand motions are either lacking or synthesized from scratch. In the following, we summarize and detail the eight motion conditions.
•
Full Motion Data
—
Original: unmodified motion captured data
•
Partial Motion Data
—
Reduced: simplified motion capture
—
Popping: periodic freezes
•
No Motion Data
—
Passive: passive hand motion
—
Random: unrelated motion capture data
Original. Original corresponds to the detailed, unaltered motion captured motion. These motions were recorded with a high-fidelity motion capture system and manually post-processed. This quality can typically not be achieved with real-time, consumer level equipment (yet). It is our most accurate motion.
Reduced. The reduced condition simulates a reduced marker set from Hoyet et al. [
2012], assuming only six markers, two each on the thumb, index, and pinky fingers. We use the markers to get the fingertip positions for the index, pinky, and thumb. The fingertip positions for the middle and ring fingers are computed using linear interpolation. Based on the fingertip positions, we compute rotations for the finger joints using inverse kinematics. This type of motion happens when a hand tracking system is used that only records the fingertip positions [Advanced Realtime Tracking
2022].
Jitter. Jitter induces random rotation movement (jitter) along the primary rotational axes of the wrist, fingers, and thumb. This condition simulates the effects of noise from sensors, which can cause jumpiness and small fluctuations in the animation. For each frame, we compute a small random rotation perturbation by sampling an angle
\(\theta\) from a normal distribution:
\(\theta \sim \mathcal {N}(0, \sigma)\). J. Segen and S. Kumar [
1998] examined the ranges of jitter in hand tracking and proposed that a typical jitter in orientation are less than 2 degrees, which also corresponds to our experience. We stay consistent with this result when setting the variance
\(\sigma\) to 0.667 to create jitter. With this setting, the angle
\(\theta\) stays within
\(-\)2 and 2 in 99.7% of cases. We also stay within the range used by Toothman and Neff [
2019] who add jitter to whole body motions to evaluate the impact of avatar tracking errors in virtual reality. They apply a rotational jitter between 0 and 0.5 degrees, then between 0 and 1 degree, and finally between 0 and 6 degrees. Jitter is also encountered in current consumer equipment and can increase in low light conditions [Oculus VR
2021].
Popping. The popping condition periodically freezes the joints of the wrist, fingers, and thumb and then pops them back to their current rotations. It simulates the effects of abrupt transitions in the motion such as those caused by temporary occlusions or loss of tracking. This type of error is common with head-mounted inside-out hand-tracking technologies when the hands leave the tracking space [Ferstl et al.
2021]. We induce popping with a freeze duration of 0.8 seconds at intervals between 7 and 9 seconds to prevent the popping from looking too regular. Pops are more visible if the hands are moving a lot. We ensured each clip had at least one pop.
Smooth. Most systems perform smoothing to counteract jitter from sensors. We implement this condition by applying an exponentially weighted average on the original animation curves of the wrist, fingers, and thumb, sampled at 30 frames per second. This smoothing technique blends the incoming frame \(f^{{\it orig}}\) with the previous computed frame \(f^{t-1}\) such that \(f^t = f^{{\it orig}} \alpha + f^{t-1} (1-\alpha)\). Choosing a lower \(\alpha\) weights the previous values over the new value, which produces a smoother curve at the expense of loss of detail. We set \(\alpha\) to 0.2 to simulate a slight, not too obvious smoothing that would also be used in practice in such applications.
Passive. The Passive condition uses the method developed by Neff and Seidel [
2006] to implement digits that move solely under the effect of gravity. The result is a hand that seems uncontrolled and lax. The authors provide the results of simulation in a table, driven by wrist orientation, which we implement directly. The motivation for including this condition is that in cases where no information on the finger and thumb motion is available, it might look more realistic to add some motion than to have none at all.
Random. Based on the same motivation as the Passive condition (some hand motion might be preferred to none), Random adds captured hand motions that might not fit the body motions: for each charade, we applied the hand motion from the next charade (order is alphabetical by title), starting at the middle of the charade to avoid similar beginnings. This technique creates somewhat random hand motions within the same style. The short clips were extracted from the resulting long motions.
Static. The hand does not move. We set the wrist, fingers, and the thumb to a relaxed pose to make the effect more subtle. This condition occurs when an avatar’s hands are shown but there is no detailed hand tracking, for example when using simple controllers.
4.3 Results and Discussion
If not otherwise mentioned, results were analyzed with an 8
\(\times\) 2
\(\times\) 2 repeated measures ANOVA with between-subjects factors Motion Alteration (8), Avatar, and Clip Length. As typical tests for normality do not provide reliable answers for large datasets, we inspected the distribution of the answers in the histograms. As the number of analyses run was large, p-values were adjusted for Type I error using
False Discovery Rate (
FDR) control over all values from the 15 measures [Benjamini and Hochberg
1995]. If significance was found post-hoc testing used Tukey HSD comparisons. Only significant results are reported. Statistics for the Alteration experiment are provided in Table 1 of the Appendix. We follow the order in Table 1 when presenting and discussing our results, starting with main effects of Motion Alteration, Avatar, and Clip Length followed by any interaction effects for each examined concept.
Comprehension. Our analysis revealed a main effect of Motion Alteration for Motion Comprehension and Perceived Comprehension; the No Motion Data conditions performed significantly worse than the Partial and Full Motion Data conditions, with the exception of a non-significant difference between Passive and Reduced for Perceived Comprehension, see Figure
3.
These results support part of H1, that the complete absence of motion data reduces comprehension. This effect could not be diminished by adding synthesized motions as in the Random and Passive conditions. However, the first part of H1 was not supported, as errors or reduced information, at least up to the levels we tested, did not affect comprehension in our experiment. The hand motion data in our Partial Data conditions was sufficient to understand the meaning of our clips as correctly as when the accurate hand motion was depicted.
We also found the main effects of Avatar for Motion Comprehension and Perceived Comprehension. As shown in Figure
4, participants were on average able to guess more words or movies with the Realistic avatar than with the Mannequin. Despite efforts to keep the avatars as similar as possible, including their degrees of freedom and primary colors, participants were not able to understand the Mannequin as well as the Realistic avatar. This result could be due to the fact that the shading of the hands lead to slightly less contrast for the Mannequin, or due to increased familiarity with the Realistic avatar. This result does show how important the design of the avatar is when accurate comprehension is key.
Furthermore, we found main effects of Clip Length for both Motion Comprehension and Perceived Comprehension, as the Long movies received lower ratings than the Short clips for both comprehension measures. The better results for the Short clips could be due to the fact that they were taken from the most comprehensible segments of the Long movies or maybe guessing words is an easier task then guessing movies. For the Long movies, participants might have guessed parts of the answer correctly but did not manage to infer the correct movie, which may have contributed to the lower comprehension scores.
Finally, there was a significant interaction effect between Motion Alteration and Clip Length for Perceived Comprehension, see Figure
5. The effect occurs because for the Full and Partial Motion Data conditions, the Long and Short clips are perceived to be similarly comprehensible, whereas for the No Motion Data conditions the Long movies were perceived to be less comprehensible than the Short clips. Interestingly, this interaction effect is not present when it comes to actual Motion Comprehension. This result may imply that the user had enough time when viewing the Long clips to realize that not everything could be understood, leading to a lower perceived comprehension. Or this difference could be attributed to the differences in tasks and a different perception of task difficulty. One conclusion could be that to achieve a high level of perceived comprehension, accurately tracked hand motions are more important in longer interactions.
Perception of Character. Main effects of Motion Alteration were present for nine of the twelve Perception of Character measures, see Figure
6. Agreeableness, Extraversion, and Emotional Stability were the exceptions. For each measure, some of the No Motion Data conditions were rated as significantly worse than some of the Full or Partial Data conditions. In most cases the Static condition received the least positive results. The only additional significant differences affect the Naturalness measure: the Jitter condition was rated as significantly less natural than the Original condition and Random was perceived as significantly more natural than Static. The detailed significant differences for each measure are listed in Table 1 in the Appendix.
To quantify these results, we counted how often each condition received a significantly higher value (+1) or a significantly lower value (−1) than any other condition for all measures related to the perception of the character. We found the following results: Original 12, Reduced 7, Jitter 3, Popping 5, Smooth 13, Passive -5, Random -9, Static -26, confirming our observations. According to these sums, the conditions can be divided into four groups: Original and Smooth were rated most favorably followed by Reduced, Jitter, and Popping. The next group consists of Passive and Random, seen as less positive than the Reduced, Jitter, and Popping conditions. Finally, in the Static condition the character was perceived least favorably by far.
These results strongly support the first part of H2, that changes to the hand motions will affect the participants’ perception of the character. The significant effects are nearly all based on the No Motion Data conditions being rated less favorably. These results imply that hand motions are important when it comes to a positive impression of a virtual character. Surprisingly, the Partial Motion Data conditions did not significantly change participants’ perception of the character when compared to the Full Motion Data condition, meaning that errors in hand tracking did not significantly affect how people perceive a virtual character at least up to the levels of error we tested in this first experiment. One exception is Jitter, but even Jitter only reduced the perceived Naturalness of the character, not other measures such as Familiarity.
A closer look at our results reveals further insights:
—
When some type of hand motion is added (Passive and Random conditions), our virtual characters less often receive lower ratings than without any hand motion (Static). While these conditions still perform significantly worse than selected Full Motion Data or Partial Motion Data conditions for some measures, adding some motion and having correct wrist motions seems to be advantageous.
—
While there were no significant differences between the Partial and the Full Motion Data conditions (except for the naturalness of Jitter), the Original and Smooth conditions were more often significantly different from the No Motion Data conditions than the other Partial Data conditions, so Original and Smooth were rated most positively overall.
Significant main effects of Avatar were found for Realism, Appeal, Familiarity, Assuredness, and Agreeableness. In all cases, the Mannequin avatar was ranked significantly lower than the Realistic avatar. These results strongly support the second part of H2, that changes to character appearance will affect the participants’ perception of the character. It furthermore shows that the design of an avatar is a crucial element of any application where interaction with a virtual character is important.
Main effects of Clip Length were present for Naturalness, Realism, Appeal, Familiarity, and Openness to Experience. The Long movies were rated worse than the Short words in all cases. This result is in line with our results for Motion Comprehension and Perceived Comprehension, where longer movies also performed worse.
These results indicate that negative effects are more noticeable when the motions are seen for longer times. Viewers might have more time to notice errors and imperfections. Finally, we found interaction effects between Avatar and Clip Length for several measures related to the perception of the character measures: Naturalness, Realism, Appeal, Familiarity, and Trustworthiness (Figure
7). In most cases, the Long movie clips with the Mannequin were rated significantly worse than all other conditions (see Appendix Table 1 for details).
Social Presence. We found a main effect of Motion Alteration for Social Presence. Participants who watched the Static condition found that Social Presence was significantly lower than participants who watched any of the Full and Partial data conditions with the exception of Reduced, see Figure
8, left. Furthermore, there was a main effect of Avatar, with the Realistic avatar leading to significantly higher ratings than the Mannequin (Figure
8, right).
These results support H3, that less natural hand motions or a less realistic character will reduce social presence, as the Static condition and the Mannequin both had that effect. The other No Motion Data conditions, Passive and Random, did reduce the perceived social presence on average, but considerably less and without reaching significance, implying that some motion, even if partly incorrect, is still better than none. It is unclear why the difference between Reduced and Static was not significant, maybe the decrease in detail did impact social presence for the Reduced condition. Based on these results, we also recommend using a more realistic avatar when high social presence is desired.