1 Introduction

The emergence of musical performances in VR has enabled audiences to actively participate in virtual concerts without physical presence. A 360-degree view of the virtual stage and real-time interaction with online participants Moritzen (2022) promoted the perception of “Being There” (Sense of Presence) in VR performances Charron (2017); Velt et al. (2015). As mentioned in Webb et al. (2016), resolving a few challenges (e.g., enabling performers to switch levels of social interactions or providing subtle physical cues of audience engagement) in social co-presence has the potential to provide distributed liveness for remote platforms like VR. To this end, previous works contributed to improving virtual avatars’ motions in VR to improve co-presence Yakura and Goto (2020); Kaneko et al. (2018); Yan et al. (2020); Wang et al. (2020). Still, researchers focus primarily on visual and auditory cues which limit the possibility of further improving the sense of presence.

Recently, researchers start adopting multi-sensory stimuli which encompass visual and auditory cues as well as haptic stimuli Melo et al. (2020). For example, researchers translated the offline music performer’s movement, physical interaction with the instrument, and audio output into vibrotactile feedback Turchet et al. (2021). Although previous work enriched the audience’s experience by adopting haptic stimuli, it only triggered discrete haptic feedback based on specific gestures or signals rather than reflecting the continuous motions. Another work Abe et al. (2022) provided haptic feedback based on the audience’s excitement level computed by biometric data (e.g., pulse). However, the biometric data could deliver haptic feedback that is not relevant to the performance context which lowers the sense of unity. Thus, we focus on forming a haptic rendering pipeline that blended with the context of a VR performance such as the performer’s choreographic and communicative motions.

Translation of visual and audio data into haptic feedback has been explored in many contexts. Previous work automatically generated haptic rendering parameters based on given input video and audio data Li et al. (2021); Kim et al. (2013); Rasool and Sourin (2014). For example, the saliency map of a movie frame integrated with psycho-acoustic features computes tactile intensity and rendering locations for vibrotactile feedback on a chair Li et al. (2021). However, extracting features from the two-dimensional (2D) video file was inefficient for getting precise motion information for immersive vibrotactile feedback. Getting detailed and specific motion data is left in the challenge, which entails context information as well. In our work, referencing Yun et al. (2021), which considers the wholesome context of interaction methods while enjoying first-person shooter games, we integrate our idea to detect the contextual information from the performer’s movement.

Inspired by previous works, we further propose an automatic haptic rendering pipeline that translates the performer’s full 3D motion data into meaningful vibrotactile feedback (Fig. 1). The higher immersive level would incur audiences to feel like being virtual performers, which would further enhance the sense of embodiment under a given VR performance. To transfer effective vibrotactile feedback to users, we employ an upper-body wearable configuration.

Fig. 1
figure 1

HapMotion is a motion-to-tactile framework that translates the performer’s motion in real time to enable an immersive VR performance experience

With the proposed motion-to-tactile framework, we aim to enrich the immersion of VR performances. To go beyond utilizing conventional visual and audio data, our approach translates tactile sensation based on 3D feature points directly acquired from the performer’s motions. By controlling the vibrotactile intensity and localization properties, we enhance audiences’ embodiment and attention by generating a coherent tactile sensation with the VR performance.

In this work, we devise our translation pipeline to cover a variety of performer motion types including communicative (e.g., waving hands, passing a mic toward audiences, hand clapping) and choreographic motions. By enabling a new contextual experience in VR performances, we expect to further increase the level of participation and immersion compared to existing offline or VR concerts. Our contributions are as follows:

  • A novel haptic rendering algorithm to translate performer’s diverse motion context into haptic intensity and localization parameters;

  • A novel motion-to-tactile framework that converts multimedia contents including performers’ 3D motions and audio data into vibrotactile feedback;

  • A wearable haptic interface to support a full 3D upper-body haptic feedback;

  • Analysis of user studies demonstrating the user experiences with HapMotion for naturalness, immersion, satisfaction, consistency, and embodiment.

2 Related work

2.1 Vibrotactile translation of multimedia data

Researchers attempted to translate multimedia data into meaningful haptic feedback in order to enhance the given audiovisual experiences Danieau et al. (2012); MacLean et al. (2017). This translation has been applied in watching videos Abdur Rahman et al. (2010), undergoing rehabilitation Alamri et al. (2010), playing games Rehman et al. (2008), and experiencing 4D films Seo et al. (2018). In our work, we aim to translate multimedia data from live VR performances into haptic feedback to promote immersive experiences.

Audio-to-Tactile Translation The audio-to-tactile translation was initially suggested since adding tactile experience induces easy communication of auditory information to users Chang and O’Sullivan (2005); Remache-Vinueza et al. (2021). Moreover, researchers demonstrated that the audio-to-tactile translation improved the user’s perception of physical stimuli Imschloss and Kuehnl (2019) and media experience Mazzoni and Bryan-Kinns (2016). The earlier works often translated musical information to haptic sensation in a chair embedded with multiple vibrotactile actuators Karam et al. (2010); Nanayakkara et al. (2009); Hayes (2012); Yamazaki and Ohkura (2018); Fontana et al. (2016); Altinsoy and Merchel (2010). These works mainly focused on providing spatiotemporal vibration patterns to the body surface based on acoustic features from given audio. Recent works bring audio-to-tactile experience into wearable such as belts Yamazaki et al. (2017), gloves Enriquez et al. (2020), armbands Turchet and Barthet (2018), jackets Hashizume et al. (2018), and whole-body suits West et al. (2019). Still, tactile translation solely based on audio modality has limitations in conveying full media contexts to users.

Visual-to-Tactile Translation To go beyond acoustic feature-based tactile experience, visual-to-tactile translation has been explored to mediate immersive experiences Kim et al. (2013); Kruijff et al. (2017). In previous works, the addition of haptic feedback created from visual stimuli enabled rich multimodal sensations to improve immersion level Kim et al. (2010); Wilson et al. (2016); Lin et al. (2021). To be specific, researchers employed the visual saliency map representing contextual event locations to generate spatiotemporal vibrotactile effects Li et al. (2021). However, the context of visual stimuli remains in 2D RGB pixels which cannot express the full 3D information of state-of-art multimedia contents (e.g., 3D videos and motion-captured skeleton data).

Motion-to-Tactile Translation In terms of motion-to-tactile translation, there have been a few trials of translating the movement of one’s character or camera’s gait. Hapseat Danieau et al. (2012) translate the first-person point-of-view simulation into three force feedback devices, trying to mimic the sensation of a principal actor might have felt while recording with sensors attached to one’s body. There was also research on considering the camera motion position as an input and translating it into six numbered ERM motors attached in moving chair Seo et al. (2018). As the availability of diverse choices to render in haptic technologies, there are no guidelines for rendering human movement that is explicitly associated with performance and performer in a virtual environment. In our work, we further extend the capability of the vibrotactile translation framework by supporting 3D motion data. We minimized the redundancy of attaching sensors to the performer itself and supported audiences with wearable devices to enjoy the haptic experience anywhere they prefer. Here, we devise a full 3D upper-body wearable-based haptic interface that automatically renders the performer’s motion into meaningful vibrotactile feedback.

2.2 Haptic feedback for immersive experience

The number of virtual performances surged over the past years with the advancement of commercially available VR headsets and users’ preference for remote participation Charron (2017). To enrich the experience of participating in VR live performances, recent work Yakura and Goto (2020) focused on improving the motions of virtual audience avatars. Also, various commercial platforms the show must go beyond (The show must go beyond 2022; Emotionwave XR) for virtual performances have been launched to support live music performances of global artists in VR.

For haptic feedback in virtual performance, previous work improved the sense of unity by sharing responses through visual, auditory, and haptic stimuli based on audiences’ biometric data Abe et al. (2022). In this study, the sense of unity and embodiment increased. Aligned with this direction, we propose a system that focuses on the translation of virtual performers’ motions into vibrotactile feedback. To support more immersive haptic feedback, we cover the whole upper body including the shoulder by integrating haptic sleeves along with the haptic vest.

In order to adapt to the environment of the haptic feedback for virtual performance, referencing offline performance research was inevitable. The offline musical haptic wearable device for audiences Turchet et al. (2019) helped audiences’ musical experiences by leveraging the sense of touch in terms of providing new capabilities for creative participation.

A broad range of tactile feedback approaches has been suggested for immersive musical experiences for musicians. For example, virtual drums, Cellomobo Berdahl et al. (2008), and reed Smyth et al. (2006), offer musicians a chance to interact with musical instruments integrated with a haptic system. Moreover, virtual violin supporting drawing a bow Nichols (2002) and AirPiano Hwang et al. (2017) based on a mid-air haptic system improved the players’ experience. Still, the haptic feedback system that exclusively supports the performance context is rare. To this end, we propose a haptic system that directly applies to VR performances.

Haptic feedback supports to convey of meaningful information. Particularly it is used as an important part of storytelling in various fields. Feel Effects Israr et al. (2014) devised the goal of haptic vocabulary and authoring methods for creating realistic haptic representations. Providing relevant and analogous haptic feedback considering the ongoing cues from the context of the environment is the most important feature for establishing a haptic sensation. In our work, we create novel haptic rendering algorithms to translate motion which often contains contextual information about the performance.

2.3 Design approach for rendering vibrotactile feedback

The eccentric rotating mass (ERM) motors are commonly used to implement vibrotactile feedback García-Valle et al. (2020). Although it is hard to control the frequency of motors Miklós and Szabó (2015), researchers still utilize ERMs because of their performance (e.g., strong vibrations) and scalability (e.g., low cost) Yun et al. (2019); Li et al. (2021); Tawa et al. (2021); Park and Choi (2018); Zhang et al. (2020). Previous researchers have developed various graphical editing tools that allow animators to render pipelines Schneider et al. (2015); Cuartielles et al. (2012). For example, perceptually optimizing interpolation algorithms for sparse vibrotactile grids Schneider et al. (2015) is one of the authoring approaches to feeling the haptic sensation from the media data. In order to build a comfortable and effective tool for real-time VR concert Lalioti et al. (2021), it is essential to translate a haptic authoring tool that provides automatically generated haptic feedback through the data-driven algorithm.

Recently, researchers have explored the method to generate haptic effects with automatic authoring pipeline Israr and Poupyrev (2011); Israr et al. (2014, 2016). For example, previous work defined the foundational library of usable haptic vocabulary to explicitly match linguistic phrases to the corresponding haptic patterns Israr et al. (2014). Recent works introduce the use of the designer’s voice to design vibrotactile haptic feedback for iterative haptic design process Degraen et al. (2021) and the cross-modal information (diegetic audio and localization of sounding objects) to automatically generate tactile effects Zhang et al. (2020). These design approaches tell that context-based automatic haptic feedback generation would be essential to accommodate a complex set of multimedia data.

Inspired by previous works, we propose an automatic vibrotactile translation framework that utilizes 3D motion data from VR performances. For our framework, we focus on understanding the motions’ representative features and associated context which is known to be crucial components in general performance Blumenfeld-Jones (2008). Unlike manual authoring tools, our work suggests a fully automated vibrotactile rendering framework that extracts the representative key points from the performer’s motions and converts them into real-time vibrotactile feedback.

3 Design space

In this section, we share survey results on how performers’ motions occur during offline and VR performances. We analyzed the recorded videos of both offline and VR performances to categorize and quantify the types of motions and their occurrence frequencies.

3.1 Offline and VR performance survey

To better understand which motions could provide effective haptic feedback to promote an immersive experience, observing the types and characteristics of motions that occurred during offline and virtual performances is essential. Before getting into the deep analysis, we define the motion-type terminology: choreographic motion as the dance movements following the music during the concert or a technique of combining movements and performing them in dance Bisig (2022); Pehkonen (2017), and communicative motion as the actions that have functions to share their emotions and nonverbal communication between audiences Kaneko et al. (2018).

3.1.1 Survey method

We conducted a survey and analysis on the videos demonstrating the audiences’ responses after the performer’s actions. For our survey, we categorize the motion as communicative motion if performers aim to bond with the audience and gain social interaction (e.g., Handing a microphone and waving to the audience). We validate our categorization by checking the audiences’ responses right after the communicative motion occurred. For example, we checked whether audiences shout, raise and clap, or sing along during offline performances and raising visualization feedback or adding text comments (e.g., Justin Biebers’ Virtual Experience: flying hearts Bieber (2021)) during VR performances.

3.1.2 Live offline performances

We first listed up the sound source by selecting the two highest-ranked songs per year (between 2010 and 2021) from the most trustworthy music chart like BILLBOARD HOT 100 Billboard (2022). We mainly picked songs that have 95\(\sim\)140 beats per minute (BPM) and support both choreographic and communicative motion. We categorize the tempo of music as slow (95-115 BPM), medium (115-135 BPM), and fast (135-140 BPM) Karageorghis et al. (2011). We found out that audiences’ responses were apparent after the performer’s communicative motion for songs with over 100 BPM. In this work, we focused on translating a solo artist rather than a group since a solo artist’s motions clearly show representative choreographic motion and communicative motion (Fig. 2).

Fig. 2
figure 2

The footage of offline and VR performances. A and C represent communicative motions, while B and D indicate choreographic motions

3.1.3 VR performances

All VR performances support converting performers into avatars where real performers go through a motion capture system to convert their motion data into live avatars. We selected 15 pre-recorded VR live performances with 250 min of playing time from YouTube. Like offline concert selection, we choose the most viewed VR concert videos.

3.2 Survey takeaway and design considerations

Both offline and virtual performances are all about the music, choreography, and nonverbal communication between the audience and the players Kaneko et al. (2018).

Figure 3 shows the ratio of communicative and choreographic motions from the performance. According to our analysis, we found that live offline performances generally consisted of 88.95% of choreography and 11.04% of communicative motions. We observed that the ratio of choreographic motions decreases to 73.64% and the communicative motions increase to 26.35% for VR performances.

Fig. 3
figure 3

The ratio of communicative motion and choreographic motion from 25 live offline performances (Left) and 12 virtual performances video clips (Right)

Both offline and VR performances show higher occupation rates of choreographic motions compared to communicative motions. However, VR performances show a 10% higher rate than offline performances for communicative motions. This result indicates that we need to consider how to translate communicative motions in a more adaptive way for futuristic performance. As shown in Fig. 4a, communicative motions utilize more hands in order to convey nonverbal interaction such as pointing at audiences to induce sing-along or clapping.

Fig. 4
figure 4

A Exemplary communicative motions and B chosen choreographic motions excerpted from the survey

Choreographic motions also consisted of movements that mainly use the upper body, which is regarded as the placement of the hands, and wrists in meaningful spaces. Therefore, we should focus on translating both the detailed and macro-level flow of upper-body motions. Inspired by previous works Tsai et al. (2022); Li et al. (2021); Fang (2021); Gonzalez-Franco and Peck (2018), we focused on five key aspects including naturalness, immersion, satisfaction, consistency, and embodiment when designing our system. To achieve this goal, we apply the following design considerations.

Naturalness To transfer both the sophisticated and overall flow of the performer’s movements, we apply vibrotactile sensation to the whole upper body. We customize the haptic sleeves on both shoulders to fulfill the missing region from the existing haptic vest.

Immersion To render immersive vibrotactile feedback, we control spatiotemporal parameters with various warping approaches. We adjust vibrotactile amplitudes based on physics-based elements (e.g., acceleration, and distance from the audience’s body) rather than applying constant intensity.

Satisfaction Musical interaction with choreographic motions is crucial in supporting accessibility and satisfaction for audiences Veronesi (2014). To this end, we devise our system to support audio-to-haptic feedback along with the vibrotactile feedback rendered from the performer’s motions. We assume that the multimodal (music and motion) vibrotactile rendering approach could improve the haptic experience while enjoying VR performance.

Consistency To maintain consistency in the haptic experience, it is essential to provide integrated haptic feedback based on both performance context as well as performer’s direct motion flow. To support this, we propose a novel rendering algorithm that operates a set of vibrotactile devices to convey the intended contexts from the performer’s motions in real time with zero latency.

Embodiment Previous study showed that the embodiment of an outgroup can enhance empathy Thériault et al. (2021). Along this line, we believe that feeling the third person’s motions would improve the embodiment of the virtual performer and enhance the virtual performance’s immersion level. We aim to provide vibrotactile feedback that translates haptic location in a mirrored way. Here, Body Swapping is a key approach where synchronizing movements from both users result in an illusion of body ownership analogous to other bodily illusions Botvinick and Cohen (1998). Syncing movements in a mirrored way as if a user is glancing at the mirror shows enhancement in the relationship between the two participants. Thus, we horizontally flipped the locations of tactile feedback.

4 Motion-to-tactile translation approach

Previous researchers introduced a haptic effect using an RGB-image-based visual saliency map that works with audio data Li et al. (2021). In our case, we add 3D motion data to accommodate every movement of the performer into a meaningful haptic effect. We propose a motion salient triangle (MST) that aims to effectively translate characteristics of movements into vibrotactile haptic feedback. In this section, we describe our novel rendering design approach using the proposed MST. Our rendering approach using MST processes spatiotemporal parameters extracted from three-dimensional (3D) joint coordinates. Furthermore, one-dimensional (1D) haptic phantom sensation Park and Choi (2018) is adopted in order to express the detailed flow of the performer’s motions during consecutive frames. We support robust real-time data processing without much data loss. Therefore, our method achieves a high-level correlation between vibrotactile effects and virtual performers’ movement to improve the audience’s experience in virtual performance.

4.1 Computing motion salient triangle from key element vertices

MST is a key motion event localization method for translating one’s motion into vibrotactile feedback. As mentioned in Sect. 3.2, a large portion of choreographic and communicative motions includes upper-body movements. Moreover, we analyze that hands joint coordinates play a crucial role in upper-body movements such as Handing the microphone to the audience and Inducing the audience to do Mexican surf. For this reason, we assign hand joint coordinates as active joint coordinates \(J_{A}\) that represents rich information from motion. In this work, we formulate 3D joint coordinates as J = (x,y,z).

We further define root joint coordinates (\(J_{R}\)) and the center of mass of torso coordinates (\(J_{T}\)). As shown in Fig. 5, \(J_{R}\) represents a stable point on the shoulder located opposite to \(J_{A}\) side which reflects the balanced position while carrying out diverse motions. Since the shoulders’ translation displacement is low compared to other joints during the performer’s motion, we pick shoulders for \(J_{R}\) Golomer et al. (2009). \(J_{T}\) provides a stable point inside of a torso, which mostly sticks to its initial position. Using these two stationary points, our proposed algorithm considers not only micro-level motion flow but also the macro-level stream of movement in continuous frames. We name \(J_{A}\), \(J_{T}\) and \(J_{R}\) as key element vertices which required to form MST.

Fig. 5
figure 5

Overall concept of MST. From the original upper-body motion (Left), we extract key elements vertices for MST (Middle). Then, we concatenate the vertices with edges (Right) and create a 3D triangle called motion salient triangle (MST)

By concatenating these key element vertices, we generate a 3D polygon. MST-based algorithms employ real-time human body tracking consisting of 32 joints from Azure Kinect DK Microsoft (2020). We designate \(J_{A}\) as either \(\text{Hand}_{Right}\) or \(\text{Hand}_{Left}\) joint given from the Azure Kinect. We place \(J_{R}\) in either \(\text{Shoulder}_{Left}\) or \(\text{Shoulder}_{Right}\) which is symmetrically opposite side of the \(J_{A}\).

Referencing from computing the center of mass of human body segments Adolphe et al. (2017), we first consider the spine naval point as the center of mass in the human body. We carry out \(\textrm{r} = \frac{\textrm{R}\cdot \textrm{l}}{\textrm{Q}}\) using Unity 3D engine. Here, \(\textrm{R}\) is the value of reactive force, which accounts for value 1, \(\textrm{l}\) refers to the length of the lever, which computed the height of the virtual character, and lastly, \(\textrm{Q}\) refers to a mass of the human body which calculated automatically. We then finally calibrate the center of mass of the torso through the mentioned equation.

MST Dynamic Point After creating the 3D triangle, we compute MST dynamic point (\(\text{MST}_{DP}\)) as shown in Eq. 1. Here, \(J_{C}\) refers to a centroid in MST. \(\omega _{Torso}\), \(\omega _{Active}\), and \(\omega _{Root}\) indicate the weighting coefficients for each key element vertex. We set \(\text{MST}_{DP}\) with weighted distance from each key element vertices which translates the direct flow of the movement. For the initial frame, we set the \(\omega\) values as 1 and adjust them afterward according to the movement of the performer.

$$\begin{aligned} { \text{MSP}_{DP} = J_{C} +\frac{(J_{A}-J_{C})\cdot \omega _{Active} + (J_{R}-J_{C})\cdot \omega _{Root} +(J_{T}-J_{C})\cdot \omega _{\text{Torso}}}{\omega _\text{Active} + \omega _{Root} + \omega _\text{Torso}}, }\end{aligned}$$
(1)

Figure 6 illustrates the overall system flow of our proposed haptic translation method described as follows:

  1. 1.

    Collect an offline performer’s 3D joint data (Azure Kinect) in real-time and transfer joint data to a virtual avatar in the Unity plugin.

  2. 2.

    Compute the \(J_{T}\) point if the current frame is not the initial frame.

  3. 3.

    Set \(\text{Joint}_{Right}\) and \(\text{Joint}_{Left}\) as potential active joints and keep track of both distances to \(J_{T}\). By comparing computed distances, we determine the number of active joints, \(J_{A}\).

  4. 4.

    Compute acceleration of \(J_{A}\)(s). If two \(J_{A}\)s are above the threshold, we set two \(J_{A}\)s for computing \(\text{MST}_{DP}\). If one \(J_{A}\) is above the threshold, we set one \(J_{A}\) and \(J_{R}\).

  5. 5.

    Distribute localization weights to each key element vertex (See 4.2.2) and compute \(\text{MST}_{DP}\) using Eq. 1.

  6. 6.

    Process mapping and warping of \(\text{MST}_{DP}\) (Fig. 7). If the \(\text{MST}_{DP}\) is inside of the bounded area, tactile location is assigned through the 3D warping method. If not, we consider it as surface direct mapping.

  7. 7.

    Set intensity level based on the distance value from \(\text{MST}_{DP}\) to \(J_{T}\).

  8. 8.

    Increase the haptic intensity level if \(\text{MST}_{DP}\)’s acceleration goes above the threshold, mainly mentioned in Sect. 4.3. If 3D warping vertices to the exception nodes are illustrated in Fig. 11a, we employ 1D phantom sensation when adjusting the intensity level (See Fig. 11b).

Fig. 6
figure 6

Overall system flow of MST-based algorithms including data processing, tactile localization, and tactile intensity adjustment

4.2 Translating tactile location

4.2.1 Rendering MST dynamic point

The proposed algorithm maintains the controlled proportion of the distance between \(\text{MST}_{DP}\) and the surface of the torso. In our system, we stream direct vector3-type raycasting to the target point (TP). TP is the centroid among four representative joint coordinates, including the front and back of the torso and left/right shoulders (Fig. 7a).

Fig. 7
figure 7

Overall process of warping and mapping of \(\text{MST}_{DP}\). A First, we compute the target point (TP). B Then, we process 3D warping or direct mapping within the defined ranges. There could be several mapping cases including 3D warping, direct surface mapping, and out-of-range projection. (See Fig. 8 as well.)

Figure 8a shows the top and side view of the warping range and how raycasting stimulates each haptic node. The range of the warping boundary is set based on the range of motion (ROM) data from our previous surveys (See Sect. 3.2). Therefore, the performer’s maximum and minimum X and Z ROM become the range of the X-axis and Z-axis of the warping boundary. We measure the maximum and minimum length between \(J_{A}\) and the local coordinate of the performer’s \(J_{T}\) to define the range of warping availability. In general, the maximum and minimum ranges come out as 185 cm and 13 cm. If \(\text{MST}_{DP}\) is out of these ranges, we adjust \(\text{MST}_{DP}\) to the closest boundary.

Fig. 8
figure 8

We show examples of A 3D warping, B direct surface mapping, and C out-of-range projection while employing and MST-based localizing algorithm

As shown in Fig. 8a, there are two exemplary cases for assigning haptic feedback by 3D warping. By applying a homogeneous matrix, we convert \(\text{MST}_{DP}\) from 3D coordinates to the 2D haptic display nodes. Figure 8b indicates the surface mapping, which occurs when \(\text{MST}_{DP}\) hits the haptic display node directly. We further explain in more detail about 3D warping and direct surface mapping.

When it comes to the raycasting in the middle of consecutive actuator nodes (\(\text{Node}_{A}\) and \(\text{Node}_{B}\), \(\text{Node}_{C}\), \(\text{Node}_{D}\)), we deploy modified 1D phantom Park and Choi (2018). Figure 8c shows the haptic output after raycasting. As the ray hits among random actuator nodes, it will eventually be actuated at the same time with different intensity levels due to the 1D phantom sensation and 2D Grid-based sensation. We will give a description in Sect. 4.3.4 about two main cases of additive sensation.

3D Warping Fig. 8a illustrates how we process raycasting to warp the \(\text{MST}_{DP}\) to the haptic display node. The raycasting starts from \(\text{MST}_{DP}\) to the target point. If the ray hits the node and locates within the boundary, we set it as a haptic proxy.

In order to convey a natural and embodied user experience, our system aims for mirrored haptic feedback from the virtual performer. This denotes that audiences feel the flipped feedback of the performer’s motions as if they are watching the mirror of their performer. We consider that mirrored rendering design enhances the level of embodiment while experiencing the vibrotactile feedback as mentioned in Sect. 3.2.

Direct Surface Mapping When \(\text{MST}_{DP}\) is smaller than the minimum warping range shown in Fig. 7b, \(\text{MST}_{DP}\) generally locates directly on the surface nodes of the performer’s torso. In this case, these surface nodes become haptic proxies to transfer tactile feedback as shown in Fig. 8b.

Out-of-Range Projection Since our system supports real-time rendering, strong treatment for unexpected cases is inevitable. If \(\text{MST}_{DP}\) has been positioned out of the pre-calculated maximum range, we redefine the excluded \(\text{MST}_{DP}\) to the closest coordinate in the maximum range (Fig. 8c).

4.2.2 Integrating motion context to MST dynamic point

To cover the various motion types from the virtual performance, we adjust weight distribution when computing \(\text{MST}_{DP}\). We consider single active joint and dual active joints conditions.

For weight distribution, we compare each appointed \(J_{A}\)’s acceleration data in every frame. In terms of the acceleration threshold, we compute the average acceleration for the previous three frames. If the acceleration value of the current frame is higher than the real-time threshold, we append the value of weight (\(A_{t-2} - A_{t-1}\)) on the \(J_{A}\). Therefore, the calculated weight value will be applied in real time with Eq. 1.

Regarding dual active joints, key element vertices consist of two \(J_{A}\) and sole \(J_{T}\). In this case, we compute both the acceleration value and distance value. If the acceleration values in the current frame t for both active joints are higher than the real-time computing average threshold, we once again define there are two active joints to render. Then, we compute the distance value from left \(J_{A}\) to \(\text{MST}_{DP}\) and right \(J_{A}\) to \(\text{MST}_{DP}\) at the same time. By comparing the distance value of each active joint, we distribute the weight value (\(A_{t-2} - A_{t-1}\)) to the joint which records a higher distance from \(\text{MST}_{DP}\). Figure 9b adapts the condition for two active joints.

Fig. 9
figure 9

Overall workflow and visualization of weight distributions for tactile translation. We consider both A single active joint and B dual active joint cases

4.3 Translating tactile intensity

4.3.1 Hardware intensity calibration

To translate tactile intensity with a set of hardware, we first define the hardware calibration coefficient (C) to provide precise tactile stimuli. This calibration is to identify the relationship for our input–output actuators, which certifies the liable following results. We measured the output acceleration from each eccentric rotating mass (ERM) using a high-precision 9DoF IMU (SparkFun, ICM-20948) while changing the input amplitude. The measured acceleration in each condition was fit to linear interpolation. For the output amplitude, corresponding vibration frequency from the bHaptics vest and sleeve recorded the vibrotactile actuators (range 1.00\(\sim\)4.37 G). Here, (G) refers to gravitational acceleration. The most effective vibrotactile frequency for human perception lies between 130 and 230 Hz Sun et al. (2022). To satisfy both vibrotactile intensity and frequency, we set the value C as 6 which corresponds to level 6 of bHaptics’s intensity parameter (3.16 G with 142 Hz).

4.3.2 Intensity control strategy

To accurately simulate the sensation of the upper-body movement, we adjust the intensity level according to the distance of \(\text{MST}_{DP}\) to \(J_{T}\). Therefore, controlling a fine level of intensity is necessary Li et al. (2021). We control the ERMs’ intensity parameter value to effectively convey the performer’s motions. Depending on the distance value of \(\text{MST}_{DP}\) to \(J_{T}\), the level of tactile intensity is linearly combined. The larger ROM gets, the higher the tactile amplitude is. By adjusting tactile intensity based on the distance which represents the quantity of the motion from the performer, users would easily notice the flow of movements from the performer. The proposed intensity control strategy would benefit motions that contain precise and dynamic contexts like choreographic and communicative motions.

$$\begin{aligned} I_{t} = \left( \alpha \cdot D_{t} \cdot C + (1-\alpha ) \cdot I_{t-1} \right) \end{aligned}$$
(2)

Equation 2 is based on an exponential filter that uses exponentially weighted averaging to produce an output value. Here, \(I_{t}\), \(\alpha\), \(D_{t}\) refer to the total intensity value, smoothing factor, and distance measurement of two vertices accordingly. In our work, we set \(\alpha\) as 0.5 where we distribute the same importance weight to the current frame (t) and the previous frame (t-1).

As mentioned previously, we adjust C to transfer the intended tactile intensity to the bHaptics vest and sleeve. Thoroughly stated in Sect. 4.3.1, we confirm bHaptics’s level 6 intensity parameter is the most comfortable value Maereg et al. (2017). Thus, we set the value C as level 6.

4.3.3 Intensity distribution based on motion dynamics

As previously mentioned, Distance(\(J_{T}\) - \(\text{MST}_{DP}\)) indicates the distance between \(\text{MST}_{DP}\) and \(J_{T}\) as shown in Fig. 10. We increase the intensity as the distance gets larger which intuitively conveys the performer’s movement into a tactile experience. We also accommodate dynamic motions by controlling intensity level based on the active joint’s (\(J_{A}\)) acceleration. The wholesome intensity is modified if the acceleration exceeds the set threshold which is the mean acceleration value from the recent three frames. If the acceleration of the current frame exceeds, maximum level 2 (1.23 G with 103 Hz) intensity is added regarding the minimum value of human noticeable intensity value Verrillo (1966). Therefore, the maximum intensity will be accounted for in intensity level 8.

Fig. 10
figure 10

Overall methods for intensity control. (Top) The intensity increases as the distance gets larger from the performer’s body. (Bottom) We employ an acceleration threshold and use average acceleration to further control the tactile intensity. When the average acceleration is lower than the threshold, the intensity level decreases and vice versa

In a particular example, regarding some communicative and choreographic motions, whose active joint(s) has the same speed but translates forward or backward in continuous frames, it is inevitable to render the different levels of intensity on a haptic display. As our demonstrating vibrotactile intensity translation reflects the amount of displacement between two main element vertices, we guarantee to improve the different types of motions either that account for big translation but also small translation.

4.3.4 1D phantom and 2D grid-based tactile sensation

In order to convey subtly ruled intensity, we provide the phantom sensation inspired by Park and Choi (2018). When the ray cast hits the computed bounded area, which is settled in the width of length between adjacent nodes divided by ten units (Unity 3D), we add the supplement vibration intensity to the primarily computed intensity. The intensities for each node will be adjusted along with the divided units, gained by multiplication of value K, the distance between nodes, and the normalized portion of distance \(a_{n}\).

Figure 11a implies the 1D phantom sensation in between two consecutive nodes. By tracking the destination of the ray based on the normalized distance portion, the closest node to the ray will be designated as the main node and regarded as the starting point with the coordinate of (0,0). If the ray hits near \(\text{Node}_{A}\), the intensity level will increase with the computed value, \(\textrm{K} \cdot (1-\alpha _{n})\) \(\text{Node}_{B}\) gains \(\textrm{K} \cdot \alpha _{n}\), while \(\text{Node}_{A}\) gains \(\textrm{K} \cdot (1-\alpha _{n})\).

Fig. 11
figure 11

Overall cases of node processing approach. A Case of node processing approach for 1D phantom tactile sensations and intensity plot for \(\text{Node}_{A}\) and \(\text{Node}_{B}\). B Case of node processing approach for 2D grid-based method and intensity plot for \(\text{Node}_{A}\) and rest of the nodes. Here, the X- and Y-axis refer to normalized distance and intensity, respectively. The highlighted vertical line in plots refers to the illusory locations where the perceived node locates

In the case of 2D grid-based tactile sensation, Fig. 11b indicates cases when a ray hits among four adjoined nodes. This rendering rule is extended from the previously mentioned 1D phantom sensations. We distribute the computed intensity separately to four nodes located near the perceived node by following functions. In 2D grid-based sensation, the closest node to the destination of the ray is regarded as the main node (\(\text{Node}_{A}\)). We examine three correlation sets between the main node and supplementary nodes (\(\text{Node}_{B}\), \(\text{Node}_{C}\), \(\text{Node}_{D}\)). Following two nodes \(\text{Node}_{B}\), \(\text{Node}_{C}\) comply with the rule of 1D phantom sensation. With respect to the intensity of \(\text{Node}_{D}\), we averaged the distributed value of \(\text{Node}_{B}\) and \(\text{Node}_{C}\), which allows experiencing the continuity of transition in consideration of connected nodes. Regarding the intensity of \(\text{Node}_{A}\), we arranged the multiplying coefficient value to a properly measured value, 0.5 in order to prevent the node from high-intensity saturation.

5 Hardware and software configuration

5.1 System overview

Fig. 12 demonstrates four main steps in overall hardware workflow. First, we collect 3D joint coordinates from the offline performer with Azure Kinect. Once a set of point cloud data is captured from offline performers, we convert these data into a virtual performer in Unity 3D. Then, we extract joint-based spatiotemporal parameters from the converted avatar, which are used as input to our MST-based algorithm to carry out the tactile translation. Lastly, we apply our real-time tactile translation to a proposed full upper-body wearable haptic interface covering the torso and shoulders.

Fig. 12
figure 12

System overview. A We collect 60 Hz 3D motion data from the virtual performer, B compute the joint-based spatiotemporal measurements, and C transfer the associated driving parameters to the haptic driver based on our proposed algorithm. D Users experience real-time tactile feedback translated from the performer’s motions wearing a customized haptic vest along with haptic sleeves on both shoulders

5.2 System configuration

To convey the performer’s upper-body motion to users, we configure our prototype with two different types of wearable haptic devices. We employ Tactsuit X40 and pairs of Tactosy from bHaptics (2019) where we place the Tactosy sleeves to each of shoulders as shown in Fig. 12. Every vibrotactile display module is wireless and battery-powered. The suit consisted of 40 individually controllable ERMs (20 ERMs on front and back) with a weight of 1.7 kg. For the sleeve, we alter its intended equipped location to the shoulder instead of the forearm. Each of haptic sleeves consist of 6 individually controllable ERMs with a weight of 0.32 kg. We used Oculus Quest 2 Inc (2020) as our VR platform. We utilize the bHaptics Unity plugin to assign vibration location and intensity from our algorithm.

6 User experience study

We carry out multiple user studies to validate our proposed MST-based algorithms with the hardware and software configuration shown in Sect. 5. We collect participants’ subjective ratings through two different user studies. In the first study, we compare our pipeline with the baseline approach to confirm how our algorithm performs in translating discrete choreographic and communicative motion sequences. For the second study, we set up several virtual concert scenes and collected subjective ratings to compare various media-to-vibrotactile translation approaches and their combination (Fig. 13).

Fig. 13
figure 13

User study setup. A Participants experience the virtual concert using the given HMD. B Wearing the customized upper-body haptic device. C We carry out in-VR questionnaires to collect subjective ratings

6.1 Study setup

We recruited 24 participants (11 male and 13 female) for the experiment with a mean age of 25 (ranging from 20 to 37). No participants reported any sensory disorders that could affect their auditory, vision, or haptic perception. All participants had experience using a head-mounted device (HMD), and 12 participants had experience with a haptic suit. All participants mentioned they had no basic knowledge about VR performance, so we thoroughly explained the concept of a virtual performance before carrying out the study. Researchers went through with participants on user study procedures and equipped them with HMD and customized wearable haptic hardware as shown in Fig. 12.

The training session was given to the participants before conducting an actual session. During the main session, the participants verbally answer in-VR questionnaires (Table 1) after experiencing each method. We collect subjective ratings on the given haptic rendering methods using a 7-point Likert scale (1=Strongly Disagree, 7=Strongly Agree). We give 10–15 s break after experiencing each method to prevent user adaptation and fatigue. The entire study took about 1.5–2 h.

6.2 Questionnaires

We devise questionnaires to investigate whether the proposed MST-based rendering pipeline enhanced the VR experience regarding naturalness, immersion, consistency, satisfaction, and embodiment. The higher ratings from these aspects would ensure the validity of our system. Since translating motion data into vibrotactile feedback in real time requires a direct response, which affects users’ experience and satisfaction Lin et al. (2021), we added a questionnaire for the latency. We ask users whether tactile feedback was translated on time along with the performer’s motion (visual aid). Table 1 shows questionnaires used in both studies, and we slightly modify words to better represent each study context.

Table 1 Questionnaires for user study 1 and 2
Table 2 Two-way ANOVA results for six subjective ratings

6.3 Study 1: motion-to-tactile framework performance

In this study, we compare two different haptic rendering approaches shown in Fig. 14. We select six motion sequences consisting of four choreographic motions (Side to side hip-hop dance, Forward to Back hip-hop dance, Diagonally side to side Jazz motion, and Waving side to side motion) and two communicative motions (Waving toward audiences(motion 1), Throwing a ball/mic toward audiences(motion 4) from Fig. 4. Each motion lasts for about 20\(\sim\)25 s. A total of 12 combinations (2 haptic rendering methods \(\times\) 6 motions) were tested. The presentation order of the motion conditions was randomized for each participant, and that of the rendering approaches was randomized within all motion conditions.

Fig. 14
figure 14

(Right) Scene of user HMD. (Left) We show two rendering results of baseline and MST-based pipeline

The baseline method directly maps the active joint to the haptic feedback. This approach is a widely used and conventional approach Schneider et al. (2015) where the haptic feedback is rendered based on a key factor like active joint information. To solely compare the performance difference, we use the same warping algorithm (Fig. 7) and hardware configuration for both methods. The only difference is that “Baseline” method uses the active joint while the “MST-based pipeline” employs \(\text{MST}_{DP}\) to deliver haptic experiences.

Table 3 One-way ANOVA results for each motion

Results and Discussion We conduct a two-way within-subjects analysis of variance (ANOVA) first and carry out one-way ANOVA for the cases presenting meaningful interaction effects. Then, we run Tuckey’s HSD test for each of the 6 subjective ratings to confirm the effects of the haptic rendering methods and their significance.

We look at the effects of haptic rendering methods with the 6 discrete motions by analyzing two-way ANOVA results (Table 2). Excluding “latency,” different haptic rendering methods show statistically significant main effects with p <  0.001. The results indicate that the haptic rendering methods largely affect the user experience (effect size \(\eta ^2\) near 0.05). For naturalness (5.74 vs. 3.55), immersion (5.77 vs. 3.38), satisfaction (5.57 vs. 3.52), consistency (5.8 vs. 3.24), and embodiment (4.84 vs. 3.68), MST rendering pipeline shows a much higher score than the baseline. This result indicates our algorithm successfully translates choreographic and communicative motions into vibrotactile feedback. For the motion with a large range of motions (e.g., Motion 5), we notice higher consistency, satisfaction, naturalness, and immersion level.

In terms of “latency,” a lower score indicates better performance meaning there was no latency for the vibrotactile feedback. Both haptic rendering methods present fast and responsive tactile stimuli to users, and we found no significant differences in latency for main and interaction effects. Therefore, we continued our statistical analysis excluding this subjective area.

Figure 15 shows the average Likert score of the 6 subjective areas from participants. In all motions, we observe that the average rating for our algorithm is superb to the baseline approach in general. To statistically confirm the validity of these observations, we perform a one-way ANOVA to assess the performance and effectiveness of the proposed MST-based algorithm compared to the “Baseline” method.

Fig. 15
figure 15

The user experience results for given haptic feedback in terms of naturalness, immersion, satisfaction, consistency, latency, and embodiment. The error bars represent standard errors. **,*** = p value < 0.01, and 0.001

According to Table 3, for “Motion 2–6,” overall user experience with the MST-based algorithm comes out to be superior to the baseline, which results in statistically significant main effects (p < 0.05) on most of the subjective ratings in Table 3. For “Motion 1,” we only see a statistically significant effect on naturalness. We notice that the absolute magnitude of the Likert score for “Baseline” is particularly high compared to other motions since Waving or Inducing shouting included in this communicative motion is hard to produce effective haptic feedback experiences with an active joint or \(\text{MST}_{DP}\). Still, our algorithm shows better ratings in general.

6.4 Study 2: tactile translation preference

Our proposed motion-to-tactile framework showed promising results for translating a virtual performer’s motion into effective vibrotactile feedback. We further investigate the holistic user experience using our framework along with or without a conventional audio-to-tactile approach. We would like to find out subjective ratings and user preference under three different conditions: (1) motion-to-tactile, (2) audio-to-tactile, and (3) audio and motion (multimodal)-to-tactile (Fig. 16).

Fig. 16
figure 16

We confirm user experience in VR performance with (Left) audio-to-tactile, (Middle) motion-to-tactile, and (Right) audio and motion-to-tactile translations

Table 4 Detail VR performance scene information for Study 2
Table 5 Two-way ANOVA results for 3 different media-to-tactile translation methods for each scene’s dependent variables: naturalness, immersion, satisfaction, consistency, latency, and embodiment

For motion-to-tactile condition, we employ an MST-based algorithm from Sect. 4. For audio-to-tactile translation, we utilize the audio-to-haptics feature from bHaptics (2019) which provides several audio-to-tactile themes with varying frequencies. We choose POP theme which supports 80–90 Hz, as this frequency range effectively conveys a variety of audio cues, such as sudden changes in pitch and rhythm. Lastly, we test a combination of audio-to-tactile and motion-to-tactile translation methods. Here, we set the intensity ratio of audio-to-motion as 2 to 3 since it provides adequate and comfortable feedback. We notice that increasing the audio intensity ratio generally overwhelms the whole sensation which is not desirable for a balanced mixture.

We select motions for each study scene from Sect. 3.2. To appropriately distribute choreographic and communicative motions that align with the audio context, we refer to the actual performance of each selected song (Table 4). As a result, user study scenes consist of approximately 80% choreographic motions and 20% communicative motions.

In summary, we empirically test 9 combinations (3 haptic translation methods \(\times\) 3 scenes) for multimedia rendering conditions in virtual concerts (Table 4). The presentation order of the three scenes was randomized for each participant, and that of the rending modality methods was randomized within all scenes.

Results and Discussion We conduct a two-way ANOVA on each subjective rating.

As shown in Table 5, different media-to-translation methods show a statistically significant effect in all subjective ratings (p <  0.01 or p <  0.001). We observe a main effect of scene on consistency, but the participants reported high consistency scores across all scenes (4.90, 4.48, 4.91 for Scenes 1-3). The study did not find significant interaction effects between media-to-tactile methods and scene types except for latency. The rating for latency was 2.01 in the audio-to-tactile approach, which was the largest value among other approaches. We conduct post hoc analysis with Tukey’s Honest Significant Difference (HSD) test to further investigate the effects of various media-to-tactile approaches.

Figure 17 illustrates the post hoc test results with the grouping labels where the different letter indicates significantly different groups. We observe that the proposed motion-to-tactile (\(\mu\)=5.38, SD=0.12) method receives more positive feedback from users than audio-to-tactile (\(\mu\)=4.09, SD=0.17) or multimodal-to-tactile (\(\mu\)=4.76, SD=0.10). The results clearly show that appending tactile feedback created from performer motions induces an immersive VR performance experience.

Fig. 17
figure 17

Result of Tukey’s HSD test for subjective ratings (naturalness, immersion, satisfaction, consistency, and embodiment) with various media-to-tactile translation methods. Error bars represent standard errors. The conditions grouped with the same alphabet did not show statistically significant differences

Moreover, we directly ask users to rank their preferred media-to-tactile translation methods based on their experiences from the study. We also obtain verbal feedback to fully understand users’ opinions. Our results show that users prefer motion-to-tactile (58.4%) technique compared to audio-to-tactile (11.11%), and multimodal-to-tactile (30.54%) approaches.

As shown in Fig. 18, we observe users prefer motion-to-tactile and multimodal-to-tactile over audio-to-tactile. Surprisingly, the users choose haptic feedback created from performer motions computed with MST algorithms as the most preferred experience. This tells that users favor connecting haptic feedback toward visual cues (virtual performer’s motion). However, users prefer both multimodal-to-tactile and motion-to-tactile in Scene 3. Unlike other scenes where communicative motions are given regardless of audio context (e.g., maintain same audio for hand waving), we design Scene 3 with the balanced allocation of communicative motions for the given audio context (e.g., slow tempo/low volume for hand waving). The results tell that reflecting motions from performers, whether it is aligned with the audio context or not, still improves the user experience for VR performance over the conventional audio-to-tactile method.

Fig. 18
figure 18

Preference from users on 3 different tactile translation methods for user experience

We notice that the users least preferred the sole audio-to-tactile approach. One participant (P6) mentioned that it is always better to have a haptic feedback feature while watching the virtual performance. However, P6 denoted that it was hard to extract any context from a given haptic experience. Another participant (P2) commented that the MST-based algorithm felt more immersive since choreographic motions and given audio fully express the performance context through vibrotactile feedback. This implies that users consider “Motion” as a key factor for tactile translation. From Fig. 18, the results show that “Motion” integrated tactile translation methods (motion-to-tactile and multimodal-to-tactile) were preferred by more than 90% of users. Thus, we believe that the proposed tactile translation method shows great potential in facilitating immersive experiences for virtual performance.

7 General discussion and conclusion

We propose MST-based algorithms that provide contextual motion-to-tactile translation and enable sophisticated real-time haptic experiences. Throughout the user studies, participants report that the haptic experience with VR performance is consistent and well designed to support VR concerts. Discussing our design considerations, we identify several design guidelines and challenges along with future works.

Multimodal-to-tactile Framework From the study result, the motion-to-tactile approach is preferred the most by users. However, we observe that users prefer multimodal-to-tactile for the scene with carefully designed motion allocation with a given audio context. Since most VR performance scenes would be designed with careful motion allocation along with audio context, we encourage utilizing the multimodal-to-tactile approach.

Embodiment for VR performance The main objective of our work is to provide an immersive VR performance experience by reflecting the performer’s motion translated to the users. In our studies, we observe that users feel like dancing even in a seated position when applying motion-to-tactile translation. Moreover, our approach further enhances the bond between the VR performer and the user. Aligned with previous research Thériault et al. (2021), we observe that users perceive the same motion as the virtual character using our method. This tells that our method has the potential in providing a sense of presence and realism as well as increasing the bond with the remote performer in VR performance.

Future Work In this work, we mainly focus on a virtual solo performer. The current MST-based algorithm is hard to reflect multiple performers’ motions into representative haptic feedback. Therefore, for future work, it is in our interest to find out an effective method to exert meaningful haptic feedback from multiple performers. Potential solutions would be tracking user’s attentions (e.g., eye-tracking and head-tracking) to efficiently reflect the overall haptic experience.

We foresee that the MST algorithm’s flexibility lends itself to the expansion of other types of promising performance platforms such as augmented (AR) and mixed reality (MR). For instance, AR/MR can enhance the live concert experience by adding digital overlays to the real-world environment. It is possible to correlate our tactile translation with the digital contents overlaid live performance. Overall, the use of tactile feedback in live concerts opens up a new opportunity to support engagement and creativity and enhance the cultural experience for both performers and fans.

We present a novel media-to-tactile translation method based on an MST-based framework. This translates the performer’s choreographic as well as communicative motions into meaningful vibrotactile feedback. We also customize an upper-body wearable haptic interface to provide full 3D haptic feedback to reflect the performer’s various motions with appropriate haptic feedback. Through user studies, we confirm the proposed algorithm’s performance over the conventional approach in terms of subjective ratings and user preference. Our work will enable an immersive VR performance experience by proposing a novel motion-to-tactile framework reflecting contextual information about the performance.