1 Introduction
People increasingly use video-conferencing platforms for workplace meetings [
108], education, entertainment [
98], and social interaction with families and friends [
79], accelerated by the need to interact remotely during the global COVID-19 pandemic. In video communication, people often need to show physical things (
e.g., documents and design artifacts) to remote users for work [
58] or in social settings [
61], just as they do in in-person social interactions [
33].
However, showing physical objects in video meetings poses several challenges. Due to the constraints of screen space and the camera’s field of view, users often have to choose between a close-up view of an object without showing themselves and the entire view that simultaneously shows both themselves and the object, at the expense of the object’s details. Observing an object from different perspectives requires asking the local user to adjust the camera or the object since the camera only captures a 2D image of an object from a fixed point of view. Referencing objects often involves multiple rounds of verbal communication as the remote user does not have access to the object. Moreover, presenting digital objects, such as scanned 3D models, images, and video clips, can be an effective way to facilitate object-centered communication in meetings. However, preparing these objects prior to the meeting can be time-consuming and may not be adaptable to unplanned situations such as impromptu questions from other attendees.
To support remote collaboration, Buxton [
6] identified three essential communication channels for video-mediated and remote communication:
person space for delivering facial expressions,
task space for establishing a shared workspace, and
reference space for supporting shared gesturing in the task space. Prior work has explored
shared (first-person) view and
desk view in object-focused scenarios sharing and accessing remote spaces or specialized surfaces that separated the
task-reference space from the
person space and with extensive hardware configurations (
e.g., multiple cameras [
77], projectors [
52], robot telepresence [
59,
87], shape-changing interfaces [
23]). While hardware setups that enable stable and hands-free interactions [
52,
79,
90,
103,
104] can be beneficial, they are complex, require preparation, and are not yet ubiquitous [
58]. This constrains use cases to more special-purpose scenarios that are not yet common in everyday video-conferencing setup [
58,
61].
In contrast to hardware solutions, we are interested in understanding and developing software solutions that allow people to share physical objects using everyday video-conferencing tools. Previous studies [
22,
61,
73] have shown that the
face-to-face configuration in conventional video-conferencing platforms can hinder the remote collaboration for tasks involving physical objects. However, there is a lack of understanding of how people overcome these challenges in their everyday use of video-mediated communication [
7,
61]. To better understand these practices and the challenges people face, we conducted a formative study with 124 crowd workers. Participants reported encountering challenges such as showing detailed views of objects, maintaining objects within the camera view, referencing specific sections of an object, and displaying multiple objects or perspectives.
Informed by the findings from the formative study, we developed ThingShare, a videoconferencing system that supports users to fluidly share physical objects with a remote partner by creating digital copies. ThingShare provides user interface controls and interactions for users to easily create, manipulate, and reference digital copies for effective discussions around physical objects. Imagine the following usage scenario with ThingShare, where two hardware engineers, Alice and Charlie, are discussing the design of a game controller. Using ThingShare, Alice can create a digital copy of a game controller in her hand by a simple drag-and-drop (Figure
1a), which can then be placed on her video feed (Figure
1b). Since the digital copy is displayed on her video feed, she can use gestures to refer to the specific parts of the controller or hold another game controller in her hand. During their discussion on the placement of electrical components, Alice can open a Task View to display a close-up view of the controller, where they can also make annotations (Figure
1d). Charlie finds that the trigger button design of her controller differs from Alice’s, and she creates a digital copy of her controller and adds it to the Task View for comparison (Figure
1e). In brief, these interactions show that users can have expressive conversations around physical objects through ad-hoc interaction with digital copies via ThingShare.
To evaluate the system, we conducted a user study with 16 participants. Our findings illustrate that capturing and storing physical objects enhances conversations around physical objects and improves the local user’s ability to reference properties of objects and manage the privacy boundaries between sharing a view of their local environment vs. the objects within it. Supporting the remote user’s access, control, and reference to digital copies helped them to better understand and contribute to the communication. We conclude that future designs should focus on providing an object-separated layer in front of users.
In summary, we contribute: 1) ThingShare, a video-conferencing system with commodity hardware designed for object-sharing practices. 2) Findings from a formative study on everyday object-sharing practices and challenges in remote meetings. 3) Insights from a user study on the use of ThingShare for sharing and discussing objects in remote meetings.
3 Formative Study of Object Sharing Practices
Though there was a significant body of prior art, we found a dearth of knowledge regarding the current state of everyday object-sharing practices in video-conferencing tools and how their behavior and objectives vary depending on context. Therefore, we conducted an IRB-approved survey to help us understand how people currently show objects, to learn typical behaviors, and primary purposes for showing physical objects, and to gauge people’s concerns and challenges to show things under different contexts, i.e., work-related and leisure-related meetings or video calls.
3.1 Methods
We posted a survey on Mechanical Turk and recruited Mturk workers with a 98% approval rating and with more than 5000 HITs (i.e., requests on Mturk) approved and paid each $3.00 for approximately 10-15 minutes. A screening question was included to filter out respondents who did not have regular video calls (less than once or twice a week) and hence could not provide insights for our research questions. We received 124 valid responses (74 male, 49 female, and 1 non-binary) from 156 respondents, which represent diverse occupations, including IT professionals, research lab managers, instructors, designers and creative professionals, salespeople, and homemakers. For respondents’ frequency of having video calls in different contexts, 86.29% of respondents reported having work-related online meetings or video calls at least once or twice a week. 60.17% of respondents reported having leisure-related online meetings or video calls at least once or twice a week.
3.2 Findings
Here, we report the frequency, behaviors, purpose, and challenges of showing and sharing physical objects in
work-related and
leisure- related meetings. The complete summary of results is available in Appendix
A.1. Overall, for users having
work-related meetings at least once or twice a week, nearly half of respondents (72 respondents, 58.0%) reported showing objects at least occasionally (43.8% occasionally; 12.4 % frequent; 1.7% always). For users having
leisure-related meetings at least once or twice a week, nearly half of respondents (71 respondents, 57.2%) reported showing objects at least occasionally (37.2% occasionally; 19.0% frequent; 0.8% always).
3.2.1 Behaviors of Sharing Physical Objects.
Most respondents “Hold the object up to the camera” in
work-related (50.3%) and
leisure-related meetings (45.1%)
1 (See Figure
3 for other behaviors).
3.2.2 Use Cases of Sharing Physical Objects.
We analyzed the variety of use cases for sharing physical objects in video meetings. Overall, the purposes in our sample include sharing objects for collaborative, instructional, or ad-hoc needs that vary across work-related and leisure-related meetings. We first categorized use cases found in work-related meetings (72 respondents) as follows.
(1)
Discussing details of products (31/72) focuses on the detailed properties (size, color, design styles) of an object, such as the prototypes and products, e.g., “I work in a women’s clothing wholesale. I showed color swatches or fabric samples as an example of the product. Images on paperwork were shown to see the item’s style.”
(2)
Remote instruction (11/72) requires users to convey instructions for training or troubleshooting purposes, e.g., to instruct others how to test and use a prototype: “I showed my test device in a company training because I want to show others how I test the products”, or a troubleshooting meeting that requires a user to show the broken parts to determine the root cause of a failure.
(3)
Ad-hoc demonstration and clarification (43/72), e.g., “I showed sample documents because I find it easier at the moment than sharing digital doc.” The ad-hoc need for showing physical objects could also be to explain some complicated details with data graphs, sketches, visualizations, and calculations on paper or whiteboards (10/72) for clarification and explanation. For example, when discussing product modifications, a product analyst mentioned the need to show product changes or make calculations and notes on paper or tablets and show it to the camera for remote users to see, e.g., “Some of the company’s products that are marketed to customers occasionally would refer to a wall chart with information; I might make notes or calculations spur of the moment and hold them up for others to see.”
(4)
Showing personal items for small talk (17/72). For example, some showed personal items when others saw things in the background and asked “I showed a 3D printed item I made during a weekly meeting because people saw it in the background and were curious about it.” ” Some respondents showed items as part of the small talk, e.g., “I showed my dogs at various meetings to break the ice and garner some soft attention.” or used objects to show reactions, e.g., “I showed a cup raising it as a toast to acknowledge a good idea.”
For respondents who never (12.4%) or rarely (29.8%) showed objects, respondents generally indicated three reasons for not needing or wanting to share physical objects: 1) their work-related objects are digital, 2) the objects felt too personal and not professional for work-related reasons, or 3) it is challenging with current commercial tools.
During leisure-related meetings (71 respondents), respondents often share personal items in their conversations. Most respondents reported showing handheld and background items during social settings when the purpose could be ad-hoc and related to the conversation.
(1)
Sharing handheld Items. More than half of them (37/71) showed items that they just bought, “I showed outfits I purchased because my mother was curious about what I bought after my shopping trip to the mall.” Moreover, respondents also show items to enjoy the moment with their family and friends (11/71), e.g., having food and drinks together (8/71), and to play music remotely (3/71).
(2)
Sharing background items. Respondents (15/71) showed background items, or how things fit in the space with remote families, e.g., “I was excited to show my new house to my mother, who lives far away...” and “I showed flowers in my garden in a conversation with my mother-in-law because we were talking about yard work and the season.” Moreover, some of them provided how-to instructions (10/71) to remote families.
3.3 Challenges and Concerns of Showing Physical Objects
The survey findings verified our hypothesis that there was significant use of ad-hoc sharing and showing behavior of “holding a physical object up to the camera.” The challenges and concerns we emphasized here occur more or less using all kinds of devices (desktops, laptops and mobile devices) yet slightly differ when using different behaviors (e.g., capturing desk surfaces). Here we summarized the challenges and concerns of showing physical objects when using the behavior of “holding the objects up to the camera” that motivated our design.
•
C1: Difficult to show the finer detail or the actual size of the object. The respondents found it hard to show the detail of an object and the actual size of an object, e.g., “I can not control the position quickly to show the right size.” P54 (Engineer) who holds the object to the camera found that it was hard to “... show off some of the finer details of our prototype...” P41 (Art Designer) usually shows some documents and paper artifacts with a combination of taking images and holding the paper up to the camera during work meetings, and she noted that “The first I learned a lot about how to have a screenshot placement so they could see it.It was not right after I tried a few more times... I hold them up so they can see them.”
•
C2: Unable to capture multiple pages of a document, different perspectives, or the entire image of a large item. The respondents noted that “it is hard to keep together and refer to the document when the document is of multiple pages. The majority of them found it difficult to display multiple sides of the product, e.g., “Some idea is not easily copied on the phone or laptop, such as in a product design draft picture.” and to show two items at once, e.g., “I can not show the object and my image at the same time.” and items that cannot be picked up, e.g., “...trying to show my mom a vacuum cleaner that had a confusing part to it trying to get her to solve it and couldn’t manage to get it into view.”
•
C3: Limited view to reference and understand remote user’s attention due to occlusion. Respondents reported challenges to “point to a specific section so that everyone can see exactly what I’m referring to.” (P47, engineer, work meeting) and concerned whether the concept has been conveyed properly to everyone. “It has to be clearly shown to others, and everyone has to notice what is being shown.” (P66, production manager, work meeting)
•
C4: Laborious to repeatedly frame, coordinate, and adjust the position and angle of the camera or the object to show. In both work-related and leisure-related meetings, respondents reported adjusting the angle to fit the physical objects in the video properly, and some mentioned feeling tired from doing so. This can be because “the camera reverses images” and also that the user wants to make sure to “get the object in the field of vision” and “hold the camera or the thing steady.”
•
C5: Incompatible with the virtual background in the current video chat or meeting platforms. Respondents noted the challenges when showing objects with virtual background, e.g., “I was showing the notebook and phone; the blurring background obscured them and made it harder to decipher what they were at first glance.” and they will need to breach their privacy to show objects, e.g., “I have to turn off the virtual background to show objects.”
3.4 ThingShare Design Goals
The formative study findings demonstrated the need to show physical objects in both work and social contexts in everyday applications and the potential challenges of current showing practices. This finding reinforced our idea and hypothesis to support the ad-hoc and collaborative sharing of physical objects in a face-to-face metaphor. To facilitate this, we decided to support a heterogeneous set of physical objects, ranging from customer goods to work-related artifacts such as paper documents, sketchbooks, and hybrid objects like mobile phone screens with digital displays in physical formats. In light of these considerations, we formulated five objectives to direct the design of ThingShare.
•
G1: Provide in-context and detailed views of sharing the physical objects.
•
G2: Capture and store various perspectives of an object.
•
G3: Support remote gestures for collaborative referencing.
•
G4: Support an efficient hands-free and temporal manipulation of object size and position.
•
G5: Support flexibly showing and hiding objects from the surrounding environment.
6 Results
A one-way ANOVA (
α = 0.05) was used to analyze the general and task-related questionnaire questions. The results of the questionnaire are presented in Table
1 and the visualizations can be found in Appendix
A.2.1. For overall preference, 11 out of 16 participants preferred ThingShare to the Baseline during Task 1. For Task 2, 15 participants preferred ThingShare to the Baseline. For Task 3, 11 participants preferred ThingShare over the Baseline.
We present our qualitative findings with the following themes: reflecting on usability (
6.1), presence and absence of digital copies (
6.2), factors affecting choices of digital copies (
6.3), and constructing shared perspectives (
6.4).
6.1 Reflecting on Usability of Capturing and Storing Digital Copies
Overall, the participants were positive about the user interface and the interactions of ThingShare. Many participants (11/16) found the contour highlighting helpful for identifying and selecting detected objects, and (9/16) found the drag-and-drop interaction to be intuitive and easy to use. Most participants (15/16) mentioned that they found resizing and repositioning the snapshots useful. On the other hand, some participants (4/16) suggested including the user’s hand in the captured snapshots of Context Mode, as the hand serves as a reference for perceiving the size of the object. For example, P16 stated, “
...I prefer having the hands with captured objects because I can let the remote partner understand how large the item is...” The inability to directly place objects into another user’s person view video was also an occasional source of difficulty, as participants mentioned that directly placing objects would be more efficient (P11) and help gauge attention (P15) in some situations. This feature was intentionally blocked as we designed one’s person view video as their own private space that others cannot manipulate. Though this was not explicitly implemented, a download button in the Object Library beside each snapshot or short video was asked for by several users (4/16) so that they could archive and take notes of the item being discussed “
Also, making a collage and saving it for later was so cool, but I would like to download it immediately.” and they sometimes forgot the context of some snapshots being captured. Additionally, P11 and P15 suggested having some auto or semi-auto features for object selection (
i.e., auto selection or suggestions for object selection) with less cursor-based interaction. As P11 noted, “
quickly capture snapshots without mouse interaction. ” These suggestions may be valuable for future design considerations.
6.2 Presence and Absence of Digital Copies
Our results show that users appropriated the combination of interactions with digital copies (i.e., freeze, store, diminish and reveal the digital copies within their person views) for a variety of social interactions. These interactions not only augmented object-sharing behaviors, but also allowed users to reveal objects from virtual backgrounds.
Blending Digital Copies and Person View Window.
We observed that participants frequently used digital copies of physical objects to blend with their person view videos for casual and playful conversations. For example, P9 froze an apple in the
person view and pretended to eat it in the video, saying “
See, I am eating this” (Figure
10b). Another participant (P10) blended digital copies of her psyduck toy and her partner’s apple to create an interesting scene, saying, “
my duck just stole your apple.” P7 created multiple copies of her partner’s object to decorate and tile her virtual background. Participants also used their body projections to perform embodied interactions with the captured objects, as if they were interacting with a physical object (
e.g., see Figure
10c). Moreover, similar to findings in [
73], participants occasionally forgot to present the item to the remote audience when looking at their objects in the baseline. However, after creating digital copies with ThingShare, participants rarely showed objects live to their remote partners via the person view. Instead, some of them would freeze an object in the person view and then shift between looking at the physical item and pointing to the digital copy in their video window using their embodied projections (see Figure
10d). This demonstrated the efficiency of the blended effect in facilitating the transition from examining the physical properties of an object to physically referencing it. Overall, ThingShare provided an effective embodied affordance for more efficient sharing.
Remote Access to Digital Copies Enables Parallel Exploration.
Participants (16/16) appreciated that they were able to interact with the digital copies of their partner’s objects despite being remote from the actual object. From the remote user’s perspective, we also observed them dragging off digital copies of physical objects from the local user’s person view to their own view to magnify and see details during a discussion. For instance, when a local user was promoting an apple variety in Task 1, the remote user dragged a stored apple into his own person view and zoomed in on it, saying “
but this apple has a hole on it.” In another example, a user was describing the details on the Starbucks cup sleeve (Figure
10e), while the remote user viewed it in his own person view and asked questions about specific details. This enabled parallel exploration and provided a more efficient way to reference specific details of an object for instant questions.
Maintaining Privacy While Sharing Physical Objects.
According to the questionnaire results, participants were able to show objects without comprising their privacy with ThingShare (ThingShare: Median = 4.5, IQR = 1; Baseline: Median = 3, IQR = 1). Furthermore, participants (14/16) appreciated the ability to maintain privacy while selectively expanding the shared space to include physical objects, while all participants found it difficult to show objects with
baseline with the virtual background (
e.g.,“
the object gets blurred if the object is placed outside the human body.”, P4). This was particularly the case for background objects that could not be held up to the camera. Some participants appreciated the controllability of the baseline condition, as they found that the physical object could be shown when the handheld object was within the boundary of the human body, and hidden when moved out of the human boundary (
i.e., P1 noted “
I can control whether I want the audience to see the object or not using the background.”). However, all participants found the amount of shared area enabled by the body area in the baseline condition to be limited (See Figure
10a).
6.3 Factors Affecting Choices of Digital Copies
We sum up factors that lead to the use of different formats of digital copies, i.e., live videos, snapshots, and short videos.
Handsfree Benefits of Sharing Non-Live Digital Copies over Live ones.
Participants (13/16) preferred non-live copies (
i.e., snapshots and short videos) to live videos as non-live copies allow them to free their hands from the object and efficiently discuss the objects (
e.g., “
I do not need to hold the item all the time, and also I can zoom in to let the picture become bigger, P12). In particular, some participants (4/16) expressed concern about the difficulty of digesting multiple frames of information and remembering details in live videos,
e.g.,“
I need to digest multiple details of live video over time and remember the details I saw versus what I am seeing at the moment" (P11). Three participants also commented on the need to capture copies for local users and revisit past activities in the case of live sharing only.
The Potential Value of Combining Live and Non-Live Digital Copies.
While non-live copies were useful in many situations, we also observed instances that showed the potential value of combining live and non-live digital copies for increased awareness and collaboration. In Task 2, we observed that some local users would occasionally show live videos of themselves holding the controller through their person view window in order to ask the remote user if they were holding or using it correctly while both users were viewing the non-live copy of the controller in the task view. Some participants suggested combining live and non-live copies to provide more detailed information in certain situations with a trade-off between live and non-live views,
e.g., P13 commented -“
I hope in Focus Mode, there could be a way to expand my video [person views] when I wish to signpost the non-live copies but also show it lively through my video.”
Use Cases Affecting the Sharing of Non-Live Digital Copies.
Participants have divided preferences over capturing the snapshots and short videos. Some participants (11/16) liked that short videos can mitigate the tediousness of capturing snapshots multiple times, especially when the captured snapshot is not optimal. They preferred the short videos when showing objects with multiple facets (P11, P13, P16), such as a cube, or capturing the motion of an object as tutorials (P9, P12) such as demonstrating switching light bulbs. In addition, some participants (4/16) did not favor short videos as the short videos can be unnecessarily long with transitioning scenes, e.g., P7 commented - “it is hard to find the best framing of the short video so that I need to coordinate the framing again and again and make a longer video than expected.”
6.4 Constructing Shared Perspectives
The Focus Mode reconfigured the layouts with a Task View that enacts shared perspectives for diverse uses in both asymmetric sharing tasks (one user shares objects) and symmetric sharing tasks (both users have the same set of objects).
Digital Copies Enable a Shared Perspective for Asymmetric Tasks.
For Task 2 (Asymmetric Task), participants generally used two object-sharing behaviors in the baseline condition: 1) the remote user(s) instructed the local user(s) to show different perspectives of the object, and the local user(s) rotated the object in their hand to interact with it before rotating it back towards the webcam for communication; 2) the local user(s) turned themselves towards the object to share the same perspective with the webcam as if they were looking at the object from the same side (See Figure
11a-b). With ThingShare, however, local users rarely turned themselves or moved objects back-and-forth for interaction, instead utilizing the snapshots and Task View (See Figure
11d). Participants appreciated the Collage View, which allowed for the display of objects from multiple perspectives at once (See Figure
11c), reducing the need for inefficient verbal communications requesting for perspective changes. Participants (15/16) also reported that ThingShare helped construct a detailed shared view with the remote user reference, allowing for live annotation on the focus-shared object and the ability to interact with the object based on remote instructions without the need for switching views (Figure
11d).
Digital Copies Help Gauge Attention in Symmetrical Tasks.
As both users had the same set of objects in Task 3 (Symmetric Task), we observed that participants using the baseline condition struggled to synchronize their views. When discussing furniture, most participants (13/16) were looking at their physical artifacts instead of holding up the object to create a shared digital view. This made it difficult to confirm whether they were looking at the same page (“
In baseline, I didn’t know whether the remote user was looking at the same page with me”, P9) and to gauge the other user’s attention (“
when I was showing it [to the camera] to confirm, my peer is still looking at his book [brochure]”, P12). Figure
12c-d shows that a user did not pay attention to the screen when his partner(s) were holding up to show something. As a result, participants preferred to communicate using verbal descriptions, such as page numbers and locations on the page. In contrast, participants using ThingShare were able to use the Collage View to place and discuss multiple snapshots of the objects. Some participants also used the snapshots as a "cursor" by minimizing the snapshots to indicate the position they wanted to place furniture. However, we observed that two participants used finger gestures (Figure
12f) to point to shared items on the screen, which was not usable with ThingShare.
All participants mentioned that they liked that the shared object automatically filled the task view window without showing unrelated objects and details in ThingShare. In baseline, the participants had to physically move an object close to the webcam to show the detailed view, which was also difficult to maintain as the object could ward off their view (see Figure
12d). However, some participants (7/16) still brought objects closer to the webcam for better resolution, which led to challenges with object tracking. It often led to losing track of an object as the object tracking model could not detect the object boundaries, which may need to be addressed. Some participants suggested a post-cropping feature to further eliminate unnecessary information on snapshots.