Understanding Software Quality Metrics For Virtual Reality Products - A Mapping Study
ABSTRACT in various use-cases like news, shopping, education, art, and ana-
Virtual Reality (VR) Software is becoming more mainstream in lytics, etc. In the initial days, VR systems were mostly limited to
recent years. It has provided an opportunity for VR practitioners Aviation and Shipping Industry. They were built to train pilots and
to explore new domains and deliver cutting edge products. The sailors on navigation systems. With the advent of abridged versions
success of the VR products depends primarily on the product con- of various Head Mounted Devices (HMD), multitude of personal-
textual relevance and qualities exhibited. However, it is unclear ized products are being built using VR, creating a significant impact
how VR practitioners curb software quality challenges and improve in digital consumer market.
the essence of the VR product over every release. In this paper, we The practices followed by VR developers originated from the
present a Systematic Mapping Study of the software quality metrics Gaming Industry due to its widespread presence in Online Games
adopted by VR practitioners for assessing the quality of their VR [63]. Game Developers started contributing to Core VR product
products. The study showed that practitioners used unique metrics development with an idea of building serious enterprise VR prod-
to measure the quality of their VR products in addition to adopting ucts. Although game development life cycle resembles traditional
some of existing enterprise software metrics. Further, we consoli- software (for example, enterprise) product development life cycle,
date these metrics into different themes that future practitioners VR products are yet to adopt a lot of practices. The major reason
may use for developing VR products. being the various challenges specific to VR setup that are translated
to its products. As a result, assessment of quality of VR software
CCS CONCEPTS products is still not systematic like enterprise software. Previously,
we conducted a study to understand the modalities of Virtual Re-
• Software and its engineering → Software development tech-
ality Product Development in the Software Industry [34]. Some of
niques; Software testing and debugging.
the observations pertinent to process and product quality in VR
KEYWORDS software captured from the study are:
Software Quality; Virtual Reality; Industrial Practices; Metrics • VR software development is complex, disorganized and can
ACM Reference Format: be correlated to the level of practitioners’ participation.
Mohit Kuri, Sai Anirudh Karre, and Y. Raghu Reddy. 2021. Understanding • Quality assessment for VR software products is considered to
Software Quality Metrics for Virtual Reality Products - A Mapping Study. be cost-intensive. Also, it is difficult to generalize the quality
In 14th Innovations in Software Engineering Conference (formerly known as attributes to all the end-users as VR products tend to be
India Software Engineering Conference) (ISEC 2021), February 25–27, 2021, personalized.
Bhubaneswar, Odisha, India. ACM, New York, NY, USA, 11 pages. https: • Design and Usability reflect VR product sensitivities. They
//doi.org/10.1145/3452383.3452391 have a direct impact on product quality.
• Design versioning and Sustenance maintenance are time-
1 MOTIVATION consuming and confusing at times for unstructured VR prod-
Virtual Reality (VR) is known for interpreting complex visual expe- uct builds.
riences into simple ones for real-world events using Head Mounted • Support tools for VR product development practices are in-
Devices (HMDs) [31]. In Gartner report [12], VR is presented as a adequate
‘Strategic Technology Trend’ i.e. it is meant to guide organizations • Stakeholder conflicts are far more given the wider variety of
that have digital use-cases best solved using immersive experience. stakeholder involvement in the development of VR products
Technology based on VR can help people perceive the digital world • There are almost no comprehensive testing strategies for VR
ISEC 2021, February 25–27, 2021, Bhubaneswar, Odisha, India M. Kuri et al.
appropriate metrics while conducting Software Quality studies dur- Table 1: Application of PICOC method to VR products study
ing VR Product development. In section 4, we discuss some threats
to validity and finally we present some conclusions in section 5. Criterion Description
Population Virtual Reality related software prod-
2 THE MAPPING STUDY ucts and applications
Intervention Software quality metrics or indicators
ISO Standard 9126 [17] quality model classifies quality into a col-
Comparison Comparison between the results cap-
lection of quality characteristics and sub-characteristics. Various
tured in various software quality met-
metrics can be used to measure these quality characteristics. Prod-
uct owners have to adopt diverse software quality metrics to track
Outcome Studies where software quality metric-
the health of the software product after every product release. Soft-
s/indicators are applied to VR product
ware quality of an enterprise software product involves quality
and apps
assessment, quality assurance, and quality evaluation. Researchers
Context Academia, software industry and other
have proposed various approaches and metrics to address software
empirical studies
quality problems at different stages of software production. Shihab
et al. [72] conducted a literature review of more than 100 published
research papers on software defect prediction. It was found that
most of the approaches did not provide guidance on industrial
adoption and rarely considered the impact, risk, and dependency Search String: The search terms were chosen with concepts re-
associated with the predicted or forecasted defects. The practical sulting from the PICOC method. Below are the details of search
adoption of software quality methods in the industry is limited as strings.
software industry tends to be reactive. Compared to traditional soft- C1: “Virtual Reality” OR “Virtual Programming” OR “Virtual Real-
ware developers, VR practitioners have fairly limited knowledge ity Product” OR “Virtual Learning” OR “VR” OR “Virtual Environ-
about the state-of-art metrics needed for quality management of VR ment”
products [34]. In this paper, we detail a systematic mapping study C2: “Software Quality” OR “Software Metrics” OR “Software Indi-
performed on existing VR literature to explore software quality cators” OR “Software Evaluation” OR “Metric” OR “Metrics” OR
metrics or indicators used by VR developers while developing VR “Indicator” OR “Indicators” OR “Quality Assessment” OR “Qual-
products. ity Improvement” OR “Quality Evaluation” OR “Quality Measure-
C3: “Publication year” > “2000”
2.1 Research Questions
The Systematic Mapping Study described in this paper uses the
guidelines suggested by Petersen et al. [56]. The primary goal of
The resulting string formulated for addressing research questions
the study is to capture the details of software quality indicators or
R1, R2, and R3 is ‘C1’AND ‘C2’ AND ‘C3’. We considered Virtual
metrics adopted by VR developers while developing their respec-
Reality and Virtual Programming related keywords as the initial
tive VR Products. We followed an evidence-based approach called
search filter. Software Quality Metrics is a major factor, hence we
PICOC method. PICOC is an acronym for Population, Interven-
expanded the search space to consider all potential keywords per-
tion, Comparison, Outcome, and Context [46]. Application of the
taining to quality metrics. We conducted a multi-level analysis [39]
PICOC concepts to VR products study is shown in Table 1. PICOC
on the Virtual Reality research area and found that with the advent
method helped formulate the research questions effectively and
of new hardware there was a significant shift in VR technology
document the scope of mapping study. The work presented in this
and corresponding software after the year 2000. Hence, the year
paper addresses the following research questions:
2000 was considered as a limit for publication year to extract the
R1: What are the existing software quality metrics/indicators used literature.
as part of VR product(s)/app(s)?
R2: Is there any trend in adapting certain software quality metric- Search Quality Assessment: We reviewed the search strings mul-
s/indicator in VR product(s)/app(s)? tiple times and incrementally developed them based on a peer-
R3: Are there any domain specific VR Software Quality metrics? review approach [39]. We also conducted a manual search of the
search string incrementally and compared the results in every val-
idation cycle. Our peer-reviewers graded the manual search vali-
2.2 Search Strategy dation results and helped us finalize the search string. We worked
For any systematic mapping study establishing a search string is key. with fellow researchers from similar research areas to our finalize
As a first step, keywords pertinent to Software Quality metrics in the search string. Once the search string was finalized, the authors
VR applications were identified. The research questions R1, R2, and independently conducted the search activity against all available
R3 are related to each other and hence only one search string was attributes of a research paper including abstract, contents of the pa-
constructed. The search strategy was set to enable identification of per, keywords, etc. and recorded the respective results. We filtered
studies that describe the presence of at least one software quality these attributes further to avoid miscellaneous research papers. Our
metric or indicator applied to VR Software Product. review supplement data can be found here [47].
Understanding Software Quality Metrics
for Virtual Reality Products - A Mapping Study ISEC 2021, February 25–27, 2021, Bhubaneswar, Odisha, India
2.3 Databases and Paper Selection spectrum of VR product development. In the review process, we
We conducted this search activity against major electronic research came across several papers where researchers used software quality
databases including IEEE Xplore, ACM Digital Library, Springer, metrics for validating their VR Prototype or Product.
and ScienceDirect. The search order was based on the databases Based on the review of literature, we conducted a coding based
that returned most results. The search fields and search string was thematic analysis [11] of the gathered research papers. Based on the
formulated to assure that the search process is made similar across coding results, we broadly categorized the gathered data into six
these electronic research databases. We omitted the grey litera- themes: VR Audio Quality, VR Scene Quality, VR Video Quality, VR
ture (research produced by organizations outside of the traditional QoE (Quality of Experience), VR Image Quality and VR Code Quality.
commercial or academic publishing and distribution channels) and In this section, we analyze the research questions and discuss the
focused only on active publications. Our review considers research relevant observations.
papers published till September 2020.
Exclusion Criteria - Articles short on metric description details
were ignored. Research papers with no clarity on their VR product
setup were ignored as it is critical for judging the quality metric
used in a VR scene. Articles that focused only on software quality
processes/models/techniques, topics related to the description of
software quality engineering, or industry white papers were ex-
cluded from our study. Also, papers that did not mention anything
about the quality aspect of the VR product built as part of their
research was not considered. Books were not included as part of
this mapping study as books tend to cover broader data that is more
useful for in-depth analysis rather than mapping study.
Inclusion Criteria - Papers that discussed the use or implemen-
tation of a Software Quality Metric(s) or Indicator(s) in their title,
abstract, or keywords are considered. Peer reviewed publications
that contain clear details about the VR products and documents
were given primary consideration. Only articles written in English
were considered as part of our study.
using 3D Convolutional Neural Networks. They conducted an em- dataset and suggestions were provided on the improvement of im-
pirical study on 3D Panoramic VR Video Dataset and determined ages based on VR Scene setup. Junfei Qiao et al. [60] reviewed 3D
the initial quality of these videos using a subjective score for each Synthesized views used for the rapid development of high-quality
data sample. They proposed a ‘fusion strategy score’ to rank and VR Scenes. They formulated an algorithm as part of their previous
determine the quality of the VR Video project process. They con- work and compared it with their existing dedicated No-Reference
sidered MultimediaQA, VRVideoQA, and ImageQA indexes as a Image Quality Assessment (NIQA) method to assess the quality of
measure for the quality of their assessment. Their future plans in- the Image in a high-quality VR Scene. They set up an experimental
clude the application of their proposed score on large VR video study to comprehend image quality degradation and distortions.
databases to help VR Practitioners adopt the proposed score and Further, they have recommended guidelines for low-level compres-
metrics as part of conventional VR product development. Alireza et sion of images in VR Scene to avoid rendering issues. Rahim et al.
al. [87] conducted a quality assessment comparative study between [62] introduced a content-dependent objective quality assessment
tile-based method and truncated square pyramid (TSP) method of procedure to evaluate the distortions that occur while building the
projecting VR Videos. These methods are primarily involved in viewport in VR Scenes. They used a supervised learning method
Streaming VR Videos which are subject to latency issues. Quality- to classify their dataset to determine the viewport quality of 360
Assessment-View (QAV) Index was used to assess the quality of degrees images in a given VR Scene. They set up an experiment
VR Video. Subjective evaluation was performed and the observed to predict the proposed metric against the viewport quality of the
data was analyzed to determine the merits and demerits of these image set with reasonable accuracy.
methods. Sijia Chen et al. [10] reviewed Omnidirectional VR Videos
VR Audio Quality: Miroslaw et al. have reviewed issues with spa-
and conducted an objective evaluation to calculate the spherical
tial audio as part of their previous work and have now reviewed the
structural similarity index (SSSI) and compare this quality metric
quality aspects of compressed audio on emerging HMDs [50]. They
results with traditional heuristic methods. They also conducted an
adopted the MUSHRA test methodology (Multiple Stimuli with
experimental assessment to determine the relationship between
Hidden Reference and Anchor) to assess the quality of soundscape
two domains to determine the video quality.
of a 360 degree streaming VR for immersion setup. They conducted
Sebastian et al. [70] made attempts to understand the methods to
a subjective evaluation of raw audio content and captured the qual-
render virtual viewports from supplementary depth information to
ity of user experience. They reviewed the options of compressing
create a better VR video quality. Depth-image-based rendering and
the spatial audio and proposed a few guidelines for practitioners.
Peak-Signal-to-noise ratio are two quality metrics used to evaluate
They plan to develop an objective spatial audio quality metric as
the VR video quality. They conducted a subjective evaluation and
part of their future work. Jules et al. [18] were the first to study
published their findings. Naty et al. [73] conducted an experimen-
continuous movement recognition and real-time sound parameter
tal study on understanding the performance and computational
generation in a VR Scene. They conducted series of experiments
complexity of 360 degrees VR Video. They were part of a research
to understand the mapping between the design process and user
group that developed new coding tools to address video encoding
interaction through a VR Scene using machine learning approach.
and decoding to avoid noise and a better bit rate in VR Videos.
They used Auditory Feedback as a quality measure to determine
They have used Weighted-Peak-Signal-to-Noise Ratio as a custom
the health of their study. David Triantafyllou et al. [78] are pioneers
metric to evaluate the 360 degrees VR Video quality. Shu Yang et
on studying sound in VR scenes. They conducted two experiments
al. have introduced a quality assessment method for panoramic
to determine the relationship and shortcomings between physical-
videos which is based on multi-level quality factors. This is calcu-
world sound and VR Scene sound. They identified the difference
lated based on the region of interest in a given VR Scene [85]. They
between these two experimental conditions and used auditory feed-
conducted a subjective valuation using a few panoramic scenes and
back to assess the quality. They have published a few guidelines to
captured insightful results. Their observations shows that the qual-
practitioners on building combinations of surfaces with better nat-
ity assessment method is easy to implement, when compared with
ural sounds in virtual environments. Ceenu et al. [22] investigated
traditional video quality assessment method. Carlos et al. proposed
whether audio signals and haptic feedback can act as indicators
two novel metrics to study the user behavior under 360-degree
for real-world boundaries, such as objects, walls, and people. They
movie cuts [45]. This is to examine the influence of user perception
used NASA TLX Survey and Presence questionnaire to gather feed-
on a 360 degree movie cuts over large scale video scenes.
back on presence and workload from the participants in their study
setup. Adrielle et al. studied Audio localization in a VR Scene using
spatial audio metrics like the NASA TLX questionnaire to capture
VR Image Quality: Wei et al. [75] conducted a subjective quality
workload while performing actions like gaze pointing and wand
valuation of compressed virtual reality images in VR scenes. They
pointing towards the projected audio in VR Scene [48].
performed a correlation study with popular objective quality mea-
sures and published their observations. They used Single-Stimulus VR Scene Quality: Blaine Bell et al. [5] are first to investigate
method to collect the subjective scores from participants and have the quality of rendered VR plane. As part of their work, authors
computed the MultiScale Structural Similarity Index (MSSI) to de- focused on view management in a 3D view plan and determine
termine the quality of the images in a VR Scene. Huiyu Duan et the properties of objects, position, size, transparency, and shapes
al. [14] established an exhaustive VR Image dataset and worked of the virtual world. They proposed a layout decision approach
on perceptual quality assessment of Omnidirectional images in and conducted a subjective evaluation. Further, they proposed a
VR. The image quality assessment measures were applied on their custom quality metric to determine the view plan representation
Understanding Software Quality Metrics
for Virtual Reality Products - A Mapping Study ISEC 2021, February 25–27, 2021, Bhubaneswar, Odisha, India
in a virtual world. Ying Zhang et al. [86] created a multi-model and with less geometric distortion. An experiment was conducted
interface for the virtual environment for an assembly application. to review the quality method and was compared with traditional
They evaluated multi-model feedback on assembly task activities models. Yingxue et al. worked on improving the immersive viewing
through simulation in the virtual world. They were the first to build experience for VR Scenes [88]. They proposed a display protocol
such an interface and conduct a heuristic auditory and sensory and evaluated it against panoramic VR Scene videos to review the
feedback study to assess the quality of their environment. They distortions and video coding compression. They conducted a case
also conducted a subjective evaluation and captured the observa- study on all these VR Scenes and heuristically reviewed the qual-
tions via a questionnaire. They provided recommendations to the ity of the scenes using the mean opinion score method from the
practitioners on building an efficient task performance based VR participants. Deba et al. proposed a method for defining a quality
Scene. Shun Li et al. worked on a virtual surgical simulation setup metric on context-aware intelligent environments with inference
with an intent to understand the thermal damage of a bone tissue to address physiological processes [67]. Their work was focused on
using bone drilling [41]. They formulated a virtual surgical pro- VR Scenes which requires a visual feedback model, and developed
cess and evaluated the virtual scenes using a customized quality a Supermarket application for validation. They used the electro-
metric called ‘Temperature Distribution’. Dongdong et al. explored dermal activity as a quality metric and conducted a tasked based
the differences of visual discomfort caused by long-term immer- experiment to assess their proposed method.
sion between virtual environments and physical environments [26] Brendean John et al. [32] worked towards investigating pupillary
across a variety of VR scenes. Visual fatigue scale (VFS), change of light response in the virtual scene setup. They used a custom quality
pupil size (PS), and accommodation response (ACR) are captured as metric called pupil light response as a reference scale to detect and
metrics. Bilal et al. conducted a task-based subjective evaluation of identify the rates of cognitive-emotional responses from the par-
a driving simulation [64] using the user-experience questionnaire ticipants who were part of the case study. They built a task-based
and Gaze interaction metrics. activity experiment and proposed guidelines for building effective
Carvalheiro et al. [8] proposed a haptic based interaction system scenes with reasonable pupillary interaction. Markus Wirth et al.
for virtual reality products. This includes a combination of tracking investigated interaction techniques in a virtual reality scene setup
devices for hands and objects in a given scene. Their solution re- focused on a diagnostic radiology application. This is first of its
ceives haptic stimuli by manipulating real objects mapped to virtual kind domain-specific VR Application where a VR Scene based qual-
objects. They conducted a subjective evaluation via an experiment ity metric was evaluated from Software Engineering perspective
setup and proposed a quality metric called ‘Simulation Awareness’ [83]. Attractiveness, Pragmatic/Hedonic Quality attributes were
to understand the stimuli experience of the end-users in a virtual evaluated as part of a thorough experiment for radiologists.
world. Rohan et al. [9] presented a novel algorithmic framework to
optimize the depth of camera placements for a given virtual envi- VR QoE (Quality of Experience): Mapar et al. [44] and Akpan
ronment. As part of the study, they utilized a quality metric called et al. [1] adopted a heuristic-based personalized quality metric to
simulated annealing and depth inaccuracy to evaluate the quality evaluate a space flight simulation setup and an assembly product.
of the construction of the scene in virtual space. In [38], the author They conducted a study to determine the quality of experience and
investigated a unique visual comfort model for the real prediction of used egocentric depth perception as a metric [33]. Jarvinen et al.
a 3D VR Scene, which is based on the physiologic mechanism. They [30] conducted a series of experiments on the spatial setup in VR
conducted an experimental study to evaluate r model by utilizing a scenes and evaluated their experiment VR scene to evaluate the
customized quality metric called multimodel interactive continuous quality of experience in spatial memory. They used a customized
scoring. This helped them understand the stability and perception test called Spatial memory test to gauge the quality. Ruddle et al.
of 3D VR Scene images. Jann et al. conducted a VR Scene tolerance [66] conducted a VR scene evaluating using travel time, collision
study using a custom scale called [21] Cybersickness Susceptibil- index, and speed profile index as the quality of experience in their
ity Questionnaire. The scope of this metric is to predict the Scene case study VR Scenes. Markus et al. [82] assessed the personality
tolerance of the participant. traits of athletes using a VR football game scene using the Presence
Hak et al. [37] proposed a quality metric of exceptional motion Questionnaire. Monthir et al. compared Game-pad and Naturally-
in VR Video Contents for VR sickness assessment. Their metric was mapped Controller Effects on Perceived Virtual Reality Experiences
developed to improve the quality of VR scenes to avoid sickness [2] and captured customized metrics Self-reported Presence, Self-
issues. They validated the work using Simulator sickness ques- reported Engagement, and Self-reported Accuracy. Peng et al. [55]
tionnaires in VR environments. Viktorija et al. studied Levitation conduct a comparative study between a PC and VR based presence
Simulator using VR based shooting scene to capture workload using evaluation of emotional challenge-based games.
the NASA TLX Task-based survey [53]. Jeremy et al. proposed an Lugrin et al. [42] conducted a subjective evaluation of a task-
innovative method to navigate into VR Scenes by opting for accel- based assembly activity to analyze the quality of experience of a
eration parameters from the users in real-time [57]. The motivation participant through quality metrics In-game performance index,
was to address VR Sickness issues. They have conducted a case In-game navigation, Multi-Screen usage index to determine the
study and used customized quality metrics called Motion-Sickness adaptable of VR Scene. Charles et al. [58] built a workstation in
Dose Value and Electro-dermal activity to judge their results. Ke Gu a VR environment to understand the physical risk factors in hu-
et al. proposed a novel referenceless quality metric for depth-image mans. The quality of the experience was evaluated in a subjective
based rendering of VR Scenes [25]. This is to ensure that the syn- evaluation using rapid upper limb assessment, averaged muscle
thesized free-viewpoint videos are generated with higher accuracy activations, Total task time as quality indicators. Hamam et al. [27]
ISEC 2021, February 25–27, 2021, Bhubaneswar, Odisha, India M. Kuri et al.
Table 2: Software Quality Metrics mapped with respective type and VR Theme
VR Quality Domain (or) Theme Software Quality Metrics Used Type of VR Product References
Streaming VR Video Quality Quality Assessment View (QAV) Streaming VR Video [87]
VR Audio Quality MUSHRA test methodology Use-Case Specific VR Product [50]
Auditory Feedback ML based Approach [18]
Auditory Feedback Questionnarire [78] [22] [48]
VR Code Quality LOC, CC, %Lack of Cohesion Case Study [23]
VR Image Quality MultiScale Structural Similarity (MSSIM) Use-Case Specific VR Product [75]
Spherical Structural Similarity Index Use-Case Specific VR Product [10]
DIBR-synthesized IQA metric Use-Case Specific VR Product [60]
perceived viewport quality,
Content Dependent Objective quality metric 360 degree video/image [62]
VR Paranomic Scene Quality BP-based quality assessment of panoramic videos Use-Case Specific VR Product [85]
Heuristic Coding Lab [88] [76]
VR QoE Electrodermal Activity, Heart Rate, Miss Clicks Speech and Language Theraphy [35] [36]
Immersive Experience Questionnaire Use-Case Specific VR Product [65]
Interaction Modality Use-Case Specific VR Product [52]
Hand Movement Velocity Rehabilitation Game [43]
Interface quality, Realisum Index Use-Case Specific VR Product [71]
Locomotion Index Use-Case Specific VR Product [7] [6] [19]
UserState Measure, Perception Measure,
Physiological Measure Game [27]
Egocentric Depth Perception Task Based Activity [33]
electrocardiographic signal, galvanic skin response,
blood volume pressure, electrodermal activity Task Based Activity [69] [81]
Flow Experience Analysis Task Based Activity [29]
Heuristic Assembly Product [1]
Presence Evaluation Use-Case Specific VR Product [74] [82] [2]
Subjective Evaluation Exercise IoT App [61] [79]
Subjective Evaluation Use-Case Specific VR Product [66] [20]
Subjective Evaluation Task Based Activity [28] [68]
Subjective Evaluation, content resolution, start delay, stalling ratio Task Based Activity [15] [40]
Subjective Evaluation Walk-In-Place activity [54]
Heuristic Auditory and Sensory Feedback Case Study - Automobile Driving [80]
Heart Rate, Skin Conductivity Case Study - Public Speaking [16]
Spatial Memory Test Case Study - Spaces [30] [49]
Vibrotactile Feedback Task Based Activity [51]
realism, control, interface quality,
ability to examine, performance,
haptic sub-scales Haptic Based Case Study [4] [55]
Laban Movement Analysis Task Based Activity [3]
In-game Performance, In-game Navigation,
Multi-Screen usage Task Based Activity [42] [24]
Simulated Annealing, Depth Inaccuracy Use-Case Specific VR Product [9]
Motion-Sickness Dose Value, Electro-Dermal Activity Use-Case Specific VR Product [57]
depth image-based rendering Use-Case Specific VR Product [25]
Multi-modal interactive continuous scoring Use-Case Specific VR Product [38]
temperature distribution Virtual Surgery - Bone Drilling [41]
electrodermal activity quality metric Super Market [67]
VR Scene Quality
Subjective Evaluation, Visual Discomfort Task Based Activity [32] [26]
Subjective Evaluation Custom Haptic Setup [8]
Heuristic Auditory and Sensory Feedback Assembly Product [86]
Heuristic Auditory and Sensory Feedback Case Study [37]
View Plane Representation View Plan - Visible Surface Determination [5]
Attractiveness, Pragmatic and Hedonic Quality Diagnostic Radiology - Interaction [83]
VR Simulation Quality Heuristic Space Flight Simulation, Driving [44] [64]
rapid upper limb assessment,
VR Task Quality averaged muscle activations, Assembly Product [58]
Total Task Time
VR Video Quality MultimediaQA, VRVideoQA, Image QA Use-Case Specific VR Product [84]
depth-image-based rendering,
peaksignal-to-noise ratio Use-Case Specific VR Product [70]
omnidirectional IQA Use-Case Specific VR Product [14]
weighted Peak Signal to Noise Ratio Use-Case Specific VR Product [73] [45]
Understanding Software Quality Metrics
for Virtual Reality Products - A Mapping Study ISEC 2021, February 25–27, 2021, Bhubaneswar, Odisha, India
built a game to analyze the quality of user experience in a VR to assess the experience of a VR scene using 5G bandwidth [40].
scene by capturing User State Measure, perception measure and, Robertro et al. developed a two-stage method to enable efficient
physiological measure as quality indicators and shared significant Streaming via QoE-aware Mobile Networks for AR/VR Scenes [76].
insights. Steinicke et al. [74] built a simulation system to study The proposed Dynamic path selection approach uses a new QoE
the presence of a participant using a presence evaluation method. model for video streaming. Pupillometry based metric was proposed
Aristidou et al. [3] investigated the motion capture technology for by Kenya et al. as a Method to evaluate Reading Comprehension in
virtual spaces with a specific focus on folk dance motion analysis. VR-based Educational Comics [68]. Maria et al. Human-Centered
They present a framework to identify the styles in dance motion QoE approach was adapted to assessing Delay, Jitter and Packet
and use Laban Movement Analysis to study the quality of experi- Loss in VR Applications [79].
ence of the generated Scene. Gayathri et al. studied spatial memory
w.r.t heights in adults and teens using a Virtual staircase [49]. Met- R2: Is there any trend in adapting certain software quality metric-
rics like Turning error, Latency, and Corsi Scores are captured as s/indicator in VR product(s)/app(s)?
a case study. Precision and Task Completion time are captured As part of our study, we observed that metrics that can be considered
along with a Subject Questionnaire to evaluate collaborative tasks as a common set used across VR software applications, are fairly
in VR [20]. Additionally, various other quality metrics are imple- limited. Most of the quality indicators or metrics for VR applications
mented on prototypes for focused applications through a subjective are limited to particular usage and need. The trend seemed to be
evaluation to understand the quality of experience. They include more towards researchers opting for customized quality metrics and
metrics like Interaction Modality [52], Locomotion Index [7] [19], building a methodology around the quality metric with a subjective
Haptic based systems include Realism, control, interface quality, or objective evaluation. Most of these customized metrics are unique
ability to examine, performance, and haptic subscales [4], Speech, and are found to be used as part of focused VR applications like
and language-based therapy applications used Electrodermal Ac- rehabilitation, trauma, education, and fun task-based prototypes.
tivity, Heart Rate, Miss Clicks [35], Rehabilitation Game employed Metrics like Temperature distribution [41], Electrodermal Activity,
Hand Movement Velocity as a customized metric [43]. Heuristic Heart Rate, Miss Clicks [35], Hand Movement Velocity [43], and
and Survey based evaluations were conducted as part of studies Pragmatic and Hedonic Quality [83] are widely used health care
[28], [54], [80], [65] where metrics like perceptual ration, estimated based application over a decade. Other metrics like Rapid upper
path length, recall time and immersion score were used as metrics. limb assessment, averaged muscle activations, Total Task Time [58],
Andreea et al. conducted a quality evaluation of the effect of ther- Heuristic Evaluation [44] [1], Auditory and Sensory Feedback [86],
mal visual representation on users grasping Interaction in Virtual Immersion Score [80] are distinctively followed in the manufac-
Reality application [6]. A novel metric called Grasp Aperture was turing domain over a decade. Due to distinction in scene design
presented and used as part of the study. across various domains, no traces of a generic software metric(s)
The VR Applications which are focused on healthcare can broadly or indicator(s) are found to be practiced.
be categorized as detection applications and intervention applica- In some cases [22], researchers relied on a common quality in-
tions. Given that VR is still nascent, studies have focused more on dicator for multiple quality factors like workload and presence. It
detection rather than intervention. Metrics like Electrocardiograph shows that there is a need for new approaches for use-case based
signal, galvanic skin response, blood volume pressure, electroder- adoption. It clearly shows us that the practitioner’s attitude to-
mal activity [69] [36], Flow Experience Analysis for task-based wards adopting a software quality metric is unique and varies
activities [29], Miss Rate, Merge Rate, Fragmentation Rate in case based on the intent of the scene. Most of the practitioners have
of rehabilitation and trauma-based VR applications [61], Effect not considered at least one metric in common to address essen-
of Reach-ability [15], Vibrotactile Feedback [51], Interface qual- tial quality requirements. These observations motivate the need
ity, and Realism Index along with an immersive tendency survey for a So f twareQualityEvaluation f ramework for VR applications,
[71], Heart Rate, and Skin Conductivity [16] are used to assess which includes strategies for addressing Image, Video, Code, and
the quality of applications. In [81], PPG signals and EEG signals Audio quality challenges in a generic way. Such frameworks can be
are captured to study the participant’s attention in terms of learn- realizable if and only if multiple empirical studies are attempted on
ing VR applications. This study was conducted to understand the large VR product data sets. This will help future VR practitioners
multi-dimensional physiological characteristics of the participants to adopt generalized quality metrics as a basis and then develop
towards immersed learning. Elena et al. were the first to study anti- focused quality metrics on top of the framework, based on their
stress adaptation to a new educational environment for foreign business needs. We plan to explore this avenue as part of our future
students [77] using a VR tool called Emotional Experience designer. work and will attempt to work towards formulating a generalized
This tool captures emotional and muscle relaxation as quality met- Software Quality Evaluation framework for VR products.
rics. Multi-user Isness medical condition experiences were studied
using a discussion-based questionnaire [24]. Scope for Automation - As part of our review study, it was sur-
VR applications which are focused on providing high quality prising to note that practitioners or VR product(s) are not using
of experience through enhanced bandwidth use interesting ap- automated methods or metrics to assess software quality. Devel-
proaches. Krogfoss et al. studied the impact of 5G bandwidth and opers are making progress in building frameworks/tools that can
its impact on improving the VR Audio and Video quality. They be used for VR testing. While most of the VR Quality metrics are
conducted a QoE assessment with customized metrics like con- defined to achieve Quality of Experience, a large number of these
tent resolution (cod – coding), start delay (s), and stalling ratio (t) metrics are being evaluated manually. Metrics like lines of code [23],
ISEC 2021, February 25–27, 2021, Bhubaneswar, Odisha, India M. Kuri et al.
lack of cohesion [23], Omnidirectional IQA [14], and depth image- and VR scene quality. New QoE metrics content resolution (cod
based rendering [25] are currently captured using semi-automated – coding), start delay (s), and stalling ratio (t) to assess the expe-
methods. New automated methods as well as a moving current rience of a VR scene using 5G bandwidth [40]. Frames to reach a
semi-automated methods to fully automated methods can shift the ROI (framesToROI) and Percentage of total fixations inside the ROI
make the assessment more objective. There is tremendous scope (percFixInside) are new metrics developed to study user behavior
for developing automated approaches to assess/improve VR quality. in a 360-degree video scene [45]. Pupillometry based metric like
For example, in VR simulation software, Interface testing software, capturing pupil size and dilation time were captured to study read-
etc. automated assessment of various qualities can significantly ing comprehension in Educational comics [68]. Customized Metrics
help in making the applications better. like Self-reported Presence, Self-reported Engagement, and Self-
R3: Are there any domain specific VR Software Quality metrics? reported Accuracy [2] are captured in comparison studies. Grasp
in VR is studied through the Grasp Aperture metric [6].
Table 2 illustrates the details of the metrics used as part of VR prod-
uct quality assessment across specific Quality Themes (or) Domain. Empirical Metric Evaluation Methods: The domain-specific VR
Domain - It is not a type of industry or business but a common applications relied on empirical approaches to gather the metric
theme under which a VR product is developed. To categorize, we data. Question Survey [23] [35], Subjective Survey [84] [44], Pres-
consider VR applications that are involved in assembling the objects ence Survey [7] , and Presence Questionnaire [4] [22] [82] have
in a specific order that may come under the Assembly domain. VR primarily used as part of Task-Action based VR applications. Mean
Apps which require the users to perform tasks and generate actions Opinion Scores[88], Comparative Analysis [60] and Case Study
in the given VR scene are regarded as Task-Action domain. VR Apps [57] [87] [58] [14] [9] [25] [41] [73] [67] [36] [27] [61] [21] [24]
based on games for fun or serious games are acknowledged as Gam- [32] based empirical methods are practiced in Gaming based VR
ing Domain. VR Apps, which provides health care solutions, comes applications. Experimental Setup [33] [69] [29] [1] [66] [28] [15]
under Healthcare Domain. The categorization of domain here is not [54] [8] [86] [37] [80] [16] [30] [51] [5] [18] [78] [83] [3] [42] [20]
specific to a business need, but it is a heuristic set portrayed as a [77] [64], Temporal Visual Comfort Model [38], Kinect skeletal
domain. To present the metrics in Table 2 in a well-defined form, model [43], Kennedy-Lane Simulator Sickness Questionnaires [74]
we categorized them into below types. based empirical methods are used in the Healthcare domain. Single-
Stimulus [75] and Immersive Experience Questionnaire [65] [81]
Widely Used Metrics: Heuristic Evaluation with Presence Survey [52] are followed in Assembly-simulation based VR applications.
is the most widely used quality metric by most of the simulation- Susanne et al. conducted the effectiveness of Questionnaires in
based VR applications [44] [88] [55] [1]. Auditory Feedback is an- VR User Studies as a quality aspect and found that questionnaires
other metric is used by practitioners to assess VR audio quality [18] reduces Break in Presence (BIP) and theoretical bias [59].
[78]. Apart from these two approaches, the rest appear to be either
distinct to a particular VR product or not relevant for generic usage. Customized Quality Models: Few researchers proposed Quality
QoE based metrics like Jitter, Delay, and Packet Loss were captured models on improving the streaming of Audio and Video data [40]
using Subjective Evaluation [79]. [76]. These models are formulated based on network bandwidth.
LTE and 5G bandwidths play a vital role in these quality models
Unique Metrics: We observed a few quality metrics which are where Network key quality indicators and QoE quality indicators
only one of its kind; unlike anything else. Flow Experience Analy- are compared with the varied scale of network bandwidth. to judge
sis [29] is used in the Task-based quality assessment methods. This the streaming quality. Cloud Quality Assessment Model using local
metric can be customed to a specific scale and updated based on the binary patterns were proposed to improve the screen quality of VR
practitioner’s strategy. User state measure [27], Realism Index [4], HMDs for rendering effective scene quality [13].
Spatial Memory Test [30], Simulation Awareness [8], and Laban
Movement Analysis [3] are the quality metrics which are unique 4 THREATS TO VALIDITY
on gathering results and their evaluation. In one case, researchers
In this section, we cover the threats to validity of our systematic
used the Presence Survey and NASA TLX Survey for capturing
literature review.
quality data for both Audio and Haptic feedback, which was un-
common in the case of other works [22]. Visual fatigue scale (VFS), Conclusion Validity - In our study, we considered research papers
Change of pupil size (PS), and Accommodation response (ACR) are written in English only. It helped us construct the search string in
captured to evaluate Fatigue Rate due to prolonged Immersion in a an appropriate language. There is a possibility of research work in-
large VR Scene [26]. A Gender-based case study was conducted to volving Software Quality for VR in other languages. We overlooked
understand VR Scene tolerance [21] by introducing a metric called such papers as it is challenging to comprehend the observations in
Cybersickness Susceptibility Questionnaire. all possible languages.
Novel Metrics: We observed a few novel metrics which are not Internal Validity - We worked with Software Quality domain ex-
found to be used in traditional software product development. Ef- perts to monitor and assess the quality of string search, filtration of
fect of Reachability [15], Pupillary Light Response [32], Miss Rate - search content, review of results, and overall analysis. We received
Merge Rate - Fragmentation Rate [61], Temperature Distribution constant feedback from the domain experts who were part of both
[41] and Vibrotactile Feedback [51] are few of the metrics. We ob- industry and academia to judge our search strategy and progress
serve that these metrics are novel and can be adopted by upcoming of our study. Of course, there could be minor mistakes by authors
VR products that are planning to focus on Quality-of-Experience regarding the judgment of a research paper during the filtration
Understanding Software Quality Metrics
for Virtual Reality Products - A Mapping Study ISEC 2021, February 25–27, 2021, Bhubaneswar, Odisha, India
ISEC 2021, February 25–27, 2021, Bhubaneswar, Odisha, India M. Kuri et al.
Understanding Software Quality Metrics
for Virtual Reality Products - A Mapping Study ISEC 2021, February 25–27, 2021, Bhubaneswar, Odisha, India
