As our approach solves residual collisions by modifying the motion of crowd characters, the resulting increase in physical realism comes at the cost of altering the overall character motion. It is therefore essential for the evaluation of our method to investigate how users perceive such motion corrections. To this end, we designed a perceptual experiment where viewers were presented with a number of videos displaying stationary or non-stationary animated crowds of changing density, generated with or without applying our method. We asked them about the realism of character contacts (which is related to the presence of collisions with overlaps in animations), the motion quality (which is related to character motions being influenced by their neighbors), and the overall realism of the scene. The goal of this experiment is, therefore, to explore the following hypotheses.
4.4.1 Experimental Design.
To evaluate our hypotheses, we asked participants to watch a set of 3D crowd animation videos, populated at various levels of densities, with or without residual contact corrections using our method, rendered from several viewpoints, and displaying either stationary or non-stationary crowds. The stationary crowd scenarios illustrated a situation resembling a concert crowd, while non-stationary crowd scenarios displayed characters walking in a corridor. As in our quantitative evaluation from Section
4.2, we used 5
Density levels, from sparse to dense levels, corresponding, respectively, to 2.22, 3.33, 4.44, 5.55, and
\(6.66~\text{characters} / \text{m}^{2}\) (labeled as D1 to D5). We also used 4
Viewpoint levels: front (V1), upper-front (V2), front-side (V3), and overhead (V4). Finally, for each condition, residual collisions were either left untouched (Baseline) or resolved by applying our method (Ours).
The experiment was then performed in two stages. In the first stage, we chose to demonstrate the benefits of using our method in the simplest stationary scenario, while simultaneously exploring the effects of density and viewpoint. Participants saw a total of 40 videos, presented in random order: 5 Density (D1 to D5) × 4 Viewpoint (V1 to V4) × 2 Method (Baseline vs. Ours). Following the positive results of the first stage, we then ran the second stage of the experiment to evaluate the benefits of our method on the more complex non-stationary scenario. Based on the analysis of the results of the first stage (described in Section
4.4.4), we, however, selected only two viewpoints for this stage (V3 and V4), as they showed more varied results in the stationary Baseline condition. This subset eases participants to focus their attention on the goal of the study, despite the added complexity in the videos (i.e., non-stationary scenes are more dynamic). In the second stage, the same group of participants, therefore, saw a total of 20 videos, presented in random order: 5 Density (D1 to D5) × 2 Viewpoint (V3, V4) × 2 Method (Baseline vs. Ours). See Figure
8 for samples of the stimuli for different viewpoints, densities, and scenarios.
Participants were asked to answer the following assertions (using 6-point Likert-scales ranging from (1) “I strongly disagree” to (6) “I fully agree”), chosen to provide information about
(i)
contact realism: Q1 “Contacts made by characters between them seem natural to me”, Q2 “Character bodies overlap, it is not physically realistic to me”,
(ii)
character awareness: Q3 “Characters seem to make contact and sometimes push one another”, Q4 “Characters move as if they were alone, their motion ignore neighbors”,
(iii)
and overall realism: Q5 “A crowd moving like this one could exist in the real life”, Q6 “I feel this crowd as a whole does not move in a natural way”.
Notice that each pair of questions was formulated in a way that there was one positive question, as well as one negative question (as for Control questions commonly used in evaluation questionnaires). Therefore, higher scores to Q1, Q3, and Q5, and lower scores to Q2, Q4, and Q6, mean better performance of our method.
The different crowd animations were generated using the procedure presented in Section
4.1. Each crowd animation was then rendered as a video at a
\(1920\times 1080\) resolution. Each video has a duration of
\(20~\text{seconds}\) and looped until the participant had answered the six following assertions about the video (see Figure
7 for a screenshot of the UI design, which was displayed on a 24-inch display).
4.4.3 Analysis.
Questions with a similar but opposed meaning were first grouped by pair, by inverting the answer given to negative questions so that \({\bf \bar{Q}}i=7-{\bf Q}i, i\in (2,4,6)\), after checking the internal reliability between paired questions. Therefore, we analyze answers about contact realism \({\bf Q}_{\text{CR}}= ({\bf Q1} + {\bf \bar{Q}2})/2\), character awareness \({\bf Q}_{\text{CA}} = ({\bf Q3} + {\bf \bar{Q}4})/2\), and overall realism \({\bf Q}_{\text{OR}} = ({\bf Q5} + {\bf \bar{Q}6})/2\).
To assess the effect of our method, density and viewpoint, on participants’ answers, we conducted for each crowd scenario 3-way repeated-measures ANOVAs with within-subject factors Density, Viewpoint, and Method. We set the level of significance to \(\alpha\) = 0.05 and used the notations * (p-value \(\lt 0.05\)), ** (\(\lt 0.01\)) and *** (\(\lt 0.001\)) to highlight significant differences in the figures. Results are reported as mean \(\pm\) standard deviation. We assess the normality assumption using Q-Q plots, and sphericity using Mauchly’s tests. Greenhouse-Geisser adjustments to the degrees of freedom were applied when appropriate to avoid any violation of the sphericity assumption. In case of a significant main or interaction effect, we performed pairwise comparisons using post-hoc Tukey tests.
4.4.4 Results.
As we found similar effects across the three categories of questions, we present here the general results of our perceptual evaluation based on the studied factors. Also, we focus in this section on the principal results for the sake of clarity, while additional details are provided in the supplemental material.
Effect of Correcting Contacts. We found a strong main effect of Method in both scenarios for all the categories of questions: Contact Realism (stationary: \(F_{1,29}=154.81, p\lt 0.001, \eta ^{2}_{p}=0.84\); non-stationary: \(F_{1,29}=326.63, p\lt 0.001, \eta ^{2}_{p}=0.92\)), Character Awareness (stationary: \(F_{1,29}=161.77, p\lt 0.001, \eta ^{2}_{p}=0.85\); non-stationary: \(F_{1,29}=158.07, p\lt 0.001, \eta ^{2}_{p}=0.85\)) and Overall Realism (stationary: \(F_{1,29}=67.79, p\lt 0.001, \eta ^{2}_{p}=0.70\); non-stationary: \(F_{1,29}=168.44, p\lt 0.001, \eta ^{2}_{p}=0.85\)).
The results show that participants considered contacts to be inappropriate in the Baseline condition across scenarios and categories of questions. More specifically, they suggest that when our method is applied (i) contacts between characters are perceived to be more natural (validating H1-a), (ii) characters are perceived as being adjusting their motion to the presence of neighbors (validating H1-b), (iii) animated crowds appeared to be overall more realistic (validating H1-c). These results therefore completely validate H1.
Effect of Density. For the stationary scenario, we found a main effect of Density for all the categories of questions: Contact Realism (\(F_{3.04,88.2}=16.37, p\lt 0.001, \eta ^{2}_{p}=0.36\)), Character Awareness (\(F_{3.08,89.30}=5.57, p\lt 0.001, \eta ^{2}_{p}=0.16\)) and Overall Realism (\(F_{4,116}=4.38, p\lt 0.01, \eta ^{2}_{p}=0.13\)). For the non-stationary scenario, we only found a main effect of Density on Character Awareness (\(F_{2.60,75.41}=6.43, p\lt 0.001, \eta ^{2}_{p}=0.18\)). These main effects are illustrated in the supplemental material.
More interestingly, we also observed a Density × Method interaction in both scenarios for all the categories of questions (illustrated in Figure
9): Contact Realism (stationary:
\(F_{4,116}=18.00, p\lt 0.001, \eta ^{2}_{p}=0.38\); non-stationary:
\(F_{4,116}=10.55, p\lt 0.001, \eta ^{2}_{p}=0.27\)), Character Awareness (stationary:
\(F_{4,116}=22.60, p\lt 0.001, \eta ^{2}_{p}=0.43\), non-stationary:
\(F_{4,116}=4.29, p\lt 0.01, \eta ^{2}_{p}=0.13\)) and Overall Realism (stationary:
\(F_{4,116}=15.09, p\lt 0.001, \eta ^{2}_{p}=0.34\); non-stationary:
\(F_{4,116}=4.97, p\lt 0.001, \eta ^{2}_{p}=0.15\)). For both Contact Realism and Overall Realism, post-hoc analyses showed an effect of density only when our method is not applied (Baseline), showing overall that in spite of larger occlusions between characters when density increases,
participants perceive body overlaps to be increasingly more unrealistic when they are not solved, which also negatively impacts the overall realism of the crowd. For Character Awareness, the results even suggest that
characters were perceived to be more aware of their surroundings at the highest densities when contacts were corrected using our approach. These results justify applying our method at both sparse and dense conditions since scores about contact realism, overall realism, and character awareness are high and are not negatively affected by the density level with our method. This however contradicts
H2-a and
H2-b, as we expected our method to be more efficient at lower densities. This completely rejects
H2, while demonstrating that our method has a domain of validity in densities which is larger than we expected.
Effect of Viewpoint. First, we found a main effect of Viewpoint on Contact Realism (\(F_{3,87}=27.08, p\lt 0.001, \eta ^{2}_{p}=0.48\)) and Overall Realism (\(F_{2.43,70.55}=13.43, p\lt 0.001, \eta ^{2}_{p}=0.32\)), but only for the stationary scenario. More interestingly, we also observed a Viewpoint × Method interaction for all the categories of questions, again only for the stationary scenario: Contact Realism (\(F_{3,87}=13.69, p\lt 0.001, \eta ^{2}_{p}=0.32\)), Character Awareness (\(F_{3,87}=15.47, p\lt 0.001, \eta ^{2}_{p}=0.35\)) and Overall Realism (\(F_{3,87}=9.83, p\lt 0.001, \eta ^{2}_{p}=0.25\)). We also observed less significant Viewpoint × Density and Viewpoint × Density × Method interaction effects for all the categories of questions (reported in the supplementary material).
The Viewpoint × Method interaction results all reveal an unfavorable viewpoint (front-side
V3) in the Baseline condition (illustrated in Figure
10-top), with lower scores on average than with any other viewpoint condition. This was the only viewpoint where both the whole body of foreground characters, as well as the upper body of background characters, can be observed. In contrast, the overhead bird’s-eye viewpoint (V4) in the Baseline condition seems to be less favorable for perceiving unnatural contacts (which contradicts
H3-a), leading to higher overall realism and perceived character awareness (which contradicts
H3-b and
H3-c). However, post-hoc analysis of the significant 3-way interaction effect for Contact Realism, Character Awareness, and Overall Realism suggest that these results only appeared for low densities (D1 and D2), as scores were not significantly different between viewpoints for the other densities (D3, D4, and D5). It also seems important to mention that these two viewpoints were not perceived to be significantly different in the non-stationary scenario across the categories of questions. Moreover,
realism and awareness scores were relatively high across viewpoints when our method was applied, in both stationary and non-stationary scenarios. This suggests that corrected contacts are perceived to be quite realistic across viewpoints, with positive impacts on both character awareness and overall realism. These results therefore completely invalidate
H3.