1 Introduction
With the growing prevalence of
Virtual Reality (
VR) technology and applications, developers, and researchers must understand how people interact with VR environments. The most common control method for interactions in VR has been with tracked controllers such as the Oculus Touch and HTC Vive controllers, but advancing technology is enabling real-time hand tracking in virtual environments [Han et al.
2018], and is available in commercial VR headsets such as the Valve Index and Oculus Quest. As the use of hand tracking in VR is becoming more widespread, research into how it affects VR experiences has grown [Dewez et al.
2021; Seinfeld et al.
2020].
Although using hand tracking in VR can feel more natural than controllers, it typically lacks haptic feedback and the tracked fingers can intersect with the geometry of the virtual objects. The lack of haptic feedback makes it difficult to know if an object has been grasped, and the intersections with virtual objects look unrealistic and might reduce immersion. Visual feedback when grasping can help mitigate these issues. An example visualization used in VR applications is to hide the virtual hand as long as an object is grasped and to constrain the object to the position of the hand during that time.
Based on these observations, we study the effects of two control modes (controllers vs. hand tracking) and two grasping visualizations (continuously tracked hands vs. virtual hands that disappear when grasping) on ownership, realism, efficiency, enjoyment, and presence. Previous research has studied these or very similar effects [Lin et al.
2019; Canales et al.
2019; Prachyabrued and Borst
2012] using more typical experimental designs with short tasks that are repeated in several conditions. These tasks do not reflect our experience in VR applications or games, and participants are often aware of the concepts studied and experience all conditions. In this article, our goal is to investigate these effects during an experience that might be closer to a typical VR experience where the user’s attention is not focused on interaction conditions and instead on gameplay. To this aim, we designed a VR Escape Room game (see Figure
1). Can we still observe similar effects when the participants are not aware of the purpose of the experiment, when they are not able to compare different conditions, and when they might be distracted and not even pay attention to the interaction being used? Our design furthermore allows us to study effects that would be difficult to examine in a repetitive task such as the influence of control modes and grasping visualizations on enjoyment.
3 Experimental Design
3.1 Conditions
Our study uses a
between-subjects experimental design comparing the independent variables of Control Modes (conditions: Controllers vs. tracked Gloves) and Grasping Visualizations (conditions: Tracked Hand vs. Disappearing Hand); see Table
1.
In the
Control Modes conditions, participants either use Oculus Touch
Controllers to interact with the scene or have their hands tracked by wearing
Gloves with 19 motion capture markers attached to each finger joint and the back of the hand (Figure
2). The markers are tracked at 120 fps using 15 Optitrack Prime 17W cameras and labeled in real time using Han et al.’s [
2018] optical marker based hand tracking algorithm. Participants can freely move their hands and the movements of the avatar’s hands mimic their own. In the Controllers condition the fingers are directed by a thumb button, index finger trigger, and hand trigger (typically activated with the middle finger); fingers are extended if the buttons are untouched, partially extended if being touched, and pinched if the buttons are pressed. The avatar’s arms are animated using inverse kinematics based on the position of the hands. Participants choose the glove that best tightly fits their hand out of six different sizes prior to entering the virtual environment. The hand size of the avatar is then adjusted accordingly.
We create two types of
Grasping Visualizations:
Tracked Hand and
Disappearing Hand (Figure
3). In the Tracked Hand condition the virtual hands are always visible and follow the players’ hands or the controller motions.
In the Disappearing Hand condition the virtual hands disappear once a participant grabs an item and reappear upon release. We chose the Disappearing Hand condition as it imitates grasping visualization in VR games such as
Job Simulator or
I Expect You to Die. It is simple to implement as the hand pose does not need to be adjusted based on the object, which might be why it is a popular approach. The Disappearing Hand is furthermore investigated in Canales et al.’s work [
2019] where it was rated significantly lower in questions related to ownership than some of the other tested conditions and preferred least on average out of all tested conditions.
Whether an item is grasped or not in the Gloves condition depends on the positions and velocities of the thumb and index fingers in relation to each other and on the number of contacts between a hand and an object. An item is detected as “grabbed” if the distance between the index finger and thumb is below 5 mm or if the velocity between those two digits is greater than 15 cm/s; additionally, the nearby item needs at least two points of contact with the hand. An item is released when the thumb and the index finger move apart at a velocity above a threshold of 30 cm/s. The thresholds were adjusted through tests with multiple pilot participants.
3.2 Hypotheses
Our hypotheses on ownership, realism, and efficiency are based on Lin et al.’s [
2019] and Canales et al.’s [
2019] work. We anticipate higher presence and thus higher game enjoyment [Tamborini and Skalski
2006] with the Tracked Hand and the Gloves condition. So we formulate our hypotheses as follows:
(1)
H1. Ownership:
(a)
Greater ownership in the Gloves condition than in the Controllers condition [Lin et al.
2019].
(b)
Greater ownership in the Tracked Hand condition than in the Disappearing Hand condition [Canales et al.
2019].
(2)
H2. Realism: Greater realism in the Gloves condition than in the Controllers condition [Lin et al.
2019].
(3)
H3. Efficiency: Greater efficiency in the Controllers condition than in the Gloves condition [Lin et al.
2019].
(4)
H4. Enjoyment:
(a)
Greater enjoyment for the Tracked Hand than for the Disappearing Hand, as we assume presence increases for the Tracked Hand and increased presence leads to increased enjoyment.
(b)
Greater enjoyment in the Gloves condition compared to the Controllers condition, as the Gloves are the more “natural” control mode.
(5)
H5. Presence:
(a)
Greater presence for the Tracked Hand than for the Disappearing Hand, as the Disappearing Hand may be slightly jarring and thus reduce presence.
(b)
Greater presence in the Gloves condition compared to the Controllers condition, due to the gloves’ more accurate tracking of the hand motions.
3.3 Participants
A total of 72 participants were recruited for this IRB approved study, 17 for each Controllers condition and 19 for each Gloves condition as we expected technical issues; 62.5% identified as male, 36.1% as female, and 1.4% as other. Ages of participants ranged from 19 to 69, with a mean age of 26. All participants were locally recruited through email, reddit, and word of mouth, with a majority being university students. Participants were assigned conditions in sequential order, round-robin style with the two extra participants at the end. A total of eight participants were eliminated from results analysis: three participants were excluded as the motion capture system was not well calibrated, two for different unique technical errors, and three as they had difficulties understanding how to play the game in general. This left 64 participants for analysis; demographics are detailed in Table
2.
3.4 The Game
For this experiment, we designed an Escape Room type video game, modeled after the popular live-action activity, where a person or a group of people is locked in a room and has to get out by finding clues and solving puzzles. In our case, the player was locked to a chair in an escape pod in space and had to solve puzzles to find the key to the lock.
Advantages of that specific genre are that it uses a first person player perspective and that the player would not walk or run around. Participants stayed seated during the duration of the game and all necessary puzzle-solving objects were provided within our tracking space. The puzzles allowed us to create a variety of interactions and to design a fun experience where players would use their virtual hands. Early pilots showed that placing all clues in front of the players at the same time was confusing. Therefore, only the objects related to the current puzzle were placed on a table in front of the participant. When a puzzle was solved, the table surface was lowered, then rose with the next puzzle’s objects in place. A total of seven puzzles were implemented, with four primary complex puzzles. A quick playthrough of the game can be seen in the video; impressions of the game are shown in Figures
1,
4, and
5.
For one of the puzzles, we implemented a mimic—a box with teeth that suddenly closes when one tries to retrieve the object in it—as a threat condition. The mimic was used as an indication for the strength of the virtual hand illusion in a similar way that has been done in other studies [Lin and Jörg
2016; Lin et al.
2019; Canales et al.
2019; Argelaguet et al.
2016; Ma and Hommel
2013; Yuan and Steed
2010].
Objects became highlighted when picked or when the players’ hands were in touching distance. In addition, objects that could interact with held objects also became highlighted when both objects touched. There was no gravity: if an object was let go of in mid-air, it stayed there until it was picked up again.
As a neutral actor, we used a robot from Unity’s 4.0 Mecanim Animation Tutorial [Unity
2012], which we modified in Maya 2017 and Unity 5.6.1 to allow resizable hands. The participant could look down and see their virtual body. The avatar hand provided all degrees of freedom for movement of the 20 finger joints, but did not perform subtler movements such as skin stretching and palm flexing. The game models were created in Maya 2017, textures were designed in Adobe Photoshop CC 2017 and PaintTool SAI, and game functionality was implemented in Unity 5.6.1.
3.5 Procedure
At the start of the experiment, participants are asked to sign a consent form and answer a cybersickness pre-experiment motion sickness questionnaire. Participants who answer “yes” to more than two of four cybersickness questions are eliminated from the study; none were.
Before putting on the Oculus Rift headset, participants are guided on how to adjust the spacing between the lenses to match their interocular distance and, if necessary, how to put on the headset with glasses. Participants are assisted with tightening and adjusting the headset for a satisfactory fit.
Prior to entering the VR environment, participants in the Gloves condition are instructed to pick up items by pinching with their thumb, index, and middle fingers. Participants using the Controllers condition are instructed to pick up items by grabbing with the primary thumb button and index finger trigger, resulting in a similar motion to the pinching action of the Gloves condition. Participants in all conditions determine their hand size by trying on the tracking gloves; the size of their virtual hands in the game environment is then adjusted to match their real-world hand size for increased presence. The avatar height and arm length are also adjusted to match those of each participant.
Participants are introduced to the concept of the experiment, an Escape Room video game in VR, described in Section
3.4, at the start of the study. During the course of the game, participants who take more than a set amount of time to solve a puzzle (dependent on the puzzle and determined in pilot tests; max: 255 s, min: 27 s, mean: 137 s) are prompted with situational clues such as “There is something below you that can be interacted with” or “That stove could use some fuel” and the key puzzle items also flash briefly.
Finally, participants are given time to explore and practice grasping, moving, and placing items in a training phase. Participants can color-match simple shapes and blocks to grow comfortable with the interaction methods and the virtual environment.
Game completion takes on average 7 minutes and 33 seconds, not including the average 92 seconds it takes for participant calibration and training. Once participants finish the game, they are offered congratulations and directed to complete a post-experiment questionnaire (Table
3) on a nearby desktop computer. When the questionnaire is completed, participants are asked whether they noticed the threat condition (the toothy mimic that tries to bite their hand), what they thought of it, and what they thought of the game. After these questions, participants choose whether to sign a release of information form for the data gathered during the experiment (all participants signed) and then receive their incentive card.
3.6 Measures
We investigate the influence of our four interaction conditions on the players’ feeling of ownership of the virtual hands, the perceived realism, the perceived efficiency of the interactions, the players’ enjoyment, and the players’ feeling of presence. The effect of different interaction types on ownership, realism, and efficiency have been investigated in two recent studies [Lin et al.
2019; Canales et al.
2019], and we compare our results to theirs. We furthermore examine the effect of our interaction conditions on presence and enjoyment, which are typical measurements for game experiences. Our questions are listed in Table
3. The questions on ownership, realism, and efficiency were adapted from previous studies. We use the Pens Presence [Rigby and Ryan
2007] questionnaire as a measure of game presence, as PENS is statistically validated and generally comparable to other popular questionnaires IEQ and EEngQ [Denisova et al.
2016]. Seven items slightly altered from the Intrinsic Motivation Inventory [Center for Self-Determination Theory
2000] are used to measure game enjoyment.
4 Results
Results of our experiment were analyzed with a two-way independent ANOVA. Levene’s Test was used to assess the homogeneity of variance across groups. A significant difference of variance (
) was found in one measure, E2 on the IMI Enjoyment Questionnaire (Table
3). All measures were significantly non-normal. Therefore, we attempted a robust ANOVA that included trimming the means, but it did not yield any differences in significant results compared to the two-way ANOVA, and thus was not included in the results. All questionnaire results are summarized in Table
3.
Ownership. We found a significant main effect of Control Mode for questions O3
and O4
. As expected, participants reported higher levels of ownership when using gloves compared to using controllers. A significant main effect of Grasping Visualization was found for question O2
. Ownership was perceived to be greater when participants used the Tracked Hand visualization for grasping (see Figure
6).
Of the 52 (out of 64) participants who responded when asked about the threatening mimic, 27 (51%) reported that it was frightening in some way. Of those 27, 7 (26%) participants used controllers and 20 (74%) used gloves. Of the 25 (49%) who reported it as non-frightening or unnoticed, 14 (56%) used controllers and 11 (44%) used gloves. Pearson’s chi-squared test showed a significant association between the type of Control Mode and whether participants reported the mimic as scary (
). The odds of participants reporting the mimic as frightening were 3.5 (CI: 0.99, 13.9) times higher in the Gloves condition than in the Controllers condition.
Participants in either condition made comments such as “I didn’t want to lose my hand” or “I hesitated until I remembered it was VR.” Twenty-two participants visibly jumped or exclaimed when the mimic chomped.
Realism. A significant main effect of Control Mode was present for question R2
; see Figure
6. The motion of the hand was perceived to be more realistic in the Gloves condition than in the Controllers condition.
Efficiency. We found no significant effects on perceived efficiency. However, analysis of game completion time showed a significant main effect of Grasping Visualization
, with participants in the Disappearing Hand condition taking longer to complete the game than those in the Tracked Hand condition.
Enjoyment. Significant main effects of Control Mode were present for three of the five game enjoyment questions: E2
, E3
, and E4
, with enjoyment being rated as higher by participants who used the gloves. Additionally, an effect of Grasping Visualization was found for E2
, with participants reporting enjoying the Tracked Hand visualization more. A significant interaction effect was found for question E1, but a Tukey HSD post-hoc test did not show any significant results. Main effects of enjoyment can be seen in Figure
7.
Presence. Measuring presence yielded significant main effects for Control Mode for four of the nine presence questions: P5
, P7
, P8
, and P9
. For all effects the Glove condition induced higher perceived presence than the controllers (see Figure
8). An additional interaction effect was found for question P3; however, a Tukey HSD post-hoc test did not show any significant differences between conditions.
5 Discussion
In this section, we discuss if our hypotheses are supported, compare our results to previous studies, discuss the advantages and disadvantages of using a game-like experience to study interactions in VR, and give tips for preparing such studies.
Ownership and Realism. Our results confirm our hypotheses H1 (a) based on significant differences for questions O3 and O4 as well as the reactions to the threat. In all cases, ownership was perceived to be higher in the Gloves condition where the motions of the virtual fingers corresponded to the players’ motions than in the Controllers condition that only displayed base poses. H1 (b) is only supported through one question (O2), so our evidence is only weak in this case. Participants in the Tracked Hand condition experienced higher ownership than those in the Disappearing Hand condition.
Realism metric R2 showed that participants perceived the movement of the virtual hands to be more realistic in the Gloves condition than in the Controllers condition, confirming hypothesis H2.
These overall results correspond to the findings from Lin et al. [
2019] and Canales et al. [
2019]. However, the results for each individual question are not always the same. Lin et al. averaged the answers to their ownership questions in their analysis and found a significant effect. We ran that analysis and also find a significant effect in that case. However, they did not find a significant effect for O3 when considered individually, which we do. Lin et al. also find a significant effect for question R1 (Gloves rated as more realistic than Controllers), which we do not (they did not ask R2). Canales et al. find a significant difference for O1 and two ownership questions that we did not ask, but not for O3 or O4. Findings from the different studies are shown in direct comparison in Figure
9.
Efficiency. We did not find significant differences between the Controllers condition and the Gloves condition when it comes to the perceived efficiency or the actual game completion. Thus, we cannot confirm our hypothesis H3. Lin et al. find that the controllers were perceived to be more efficient than the gloves. In a simpler grasping task and in direct comparison, differences in efficiency might be more noticeable than in a relatively slow-paced game such as this one that is focused on solving puzzles.
Enjoyment and Presence. We can confirm Hypothesis H4 (b), that enjoyment was higher in the Gloves condition compared to the Controllers condition, based on the significant differences in the answers of E2, E3, and E4. Enjoyment was rated very high in general, which shows that we successfully created an enjoyable game experience. Hypothesis H4 (a), that enjoyment will be greater for the Tracked Hand than for the Disappearing Hand, was only supported by a significant effect of E2, so the evidence in this case is too weak to draw confident conclusions.
We find evidence to support Hypothesis H5 (b) but not H5 (a). Presence questions P5, P7, P8, and P9 all showed that using gloves to interact in VR leads to a greater feeling of presence when compared to using controllers.
Experiments with Game-Like Experiences. As a goal of VR research is to understand our perception to create better VR experiences, our findings and hypotheses should be confirmed in scenarios that are similar to actual user experiences outside of lab settings in addition to experiments with repetitive tasks (not instead). However, the design of such studies also presents many challenges: The development of a suitable game can be very laborious, the variance between participants’ reactions can be increased through further confounding factors such as how skilled participants are at playing specific games, and the number of participants needed is typically larger (based on the estimated variance and the fact that such studies might require between-subjects designs). Furthermore, effects might become diluted in some types of games. For example, it is more difficult to measure efficiency and performance in a game that focuses on slow-speed puzzles than in a first-person shooter where speed is a key to success. Being able to compare different conditions without distractions in a repetitive design might lead to the participants’ “recalibration of the scale” and show more subtle differences. However, these differences might then not be important in a more immersive application. Despite the challenges, we consider experiments using more realistic applications as a necessary and important addition to studies with procedures using repetitive tasks because they can provide more true-to-life observations of immersive virtual experiences.
When planning such an experiment, we recommend to adjust the game type to the concepts being studied. Different types of games might need to be used to evaluate different concepts, and ideally, the same concepts would be tested in several scenarios. Ideally, a series of applications of different types would be accessible for experiments in the community, so that hypotheses can be tested in a variety of genres.
6 Conclusion, Limitations, and Future Work
In this article, we present a study that investigates the effect of two control modes (Gloves vs. Controllers) and two grasping visualizations (Tracked Hands vs. Disappearing Hands) on ownership, realism, efficiency, enjoyment, and presence when playing an Escape Room game in which players interact with objects to solve puzzles. Our results show that ownership, realism, enjoyment, and presence significantly increased when using hand tracking (Gloves) as an input modality compared to controllers. We also found limited evidence that a Tracked Hand visualization increases ownership and enjoyment compared to a virtual hand that disappears during grasps.
We therefore recommend to take hand tracking into account as an input modality instead of controllers when creating VR applications, and to continue to improve this technology and increase its accuracy for consumers. Our results were found using a motion capture system that was specifically developed to track hand motions in real time. Further studies would need to demonstrate if our findings would be the same with current commercially available hand tracking devices.
A limitation of this work is that the user’s hands are represented by a robotic model low in realism. While this model is in line with the model used in previous studies and allows for better comparison, the results might look different with a more realistic hand model. Additionally, our grasping representations are not realistic as the participants’ fingers intersect with the object when grasping if they do not disappear. Interestingly, none of the participants commented on the hands moving through the objects. Future work could investigate the effect of hand model and grasping representation realism in game-like experiences. It would also be interesting to investigate whether visualizing the hands with controllers in the Controllers condition would affect results. Finally, we only tried one game and cannot generalize our results to other games or genres. Exploring our results with experiments that use other game genres of varying levels of immersion, use players of different experience, or use existing games with modifications would further the generalizability of our findings.
While our results cannot be generalized to other games, one has to also be cautious when generalizing studies with a repetitive design. We often cannot confirm with certainty that such results will still be the same with an altered task or a different participant sample, who might, for example, have more experience with virtual reality. Most research progress in our field (and in any other field) is not made through individual studies but through many studies. Findings need to be replicated and validated in different contexts. While we do not replicate other studies—we would need to accurately follow the exact same protocol to do so—verifying how specific conditions are perceived in different situations can reinforce and strengthen findings, which is one of the main contributions of this article.