Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Advanced Predictive Tile Selection Using Dynamic Tiling for Prioritized 360° Video VR Streaming

Published: 24 August 2023 Publication History

Abstract

The widespread availability of smart computing and display devices such as mobile phones, gaming consoles, laptops, and tethered/untethered head-mounted displays has fueled an increase in demand for omnidirectional (360°) videos. 360° video applications enable users to change their viewing angles while interacting with the video during playback. This allows users to have a more personalized and interactive viewing experience. Unfortunately, these applications require substantial network and computational resources that the conventional infrastructure and end devices cannot support. Recently proposed viewport adaptive fixed tiling solutions stream only relevant video tiles based on user interaction with the virtual reality (VR) space to use existing transmission resources more efficiently. However, achieving real-time accurate viewport extraction and transmission in response to both head movements and bandwidth dynamics can be challenging, which can impact the user’s Quality of Experience (QoE). This article proposes innovative dynamic tiling-based adaptive 360° video streaming solutions in order to achieve high viewer QoE. First, novel and easy-to-scale tiling layout selection methods are introduced, and the best tiling layouts are employed in each adaptation interval based on the prediction-assisted visual quality metric and the observed viewport divergence. Second, a novel proactive tile selection approach is presented, which adaptively extracts tiles for each selected tiling layout based on two low-complex viewport prediction mechanisms. Finally, a practical dynamic tile priority-oriented bitrate adaptation scheme is introduced, which uniformly distributes the bitrate budget among different tiles during 360° video streaming. Extensive trace-driven experiments are conducted to evaluate the proposed solutions using head motion traces from 48 VR users for five 360° videos with tiling layouts of 4 × 3, 6 × 4, and 8 × 6 and segment durations of 1s, 1.5s, and 2s. The experimental evaluations show that the dynamic video tiling solutions achieve up to 11.2% more viewport matches and an average improvement in QoE of 9.7% to 18% compared to state-of-the-art 360° streaming approaches.

1 Introduction

Recently, 360° virtual reality (VR) video has improved the traditional streaming format by allowing the viewer to feel fully immersed in the video by providing a complete spherical field of view (FoV). This is achieved by capturing video from all directions using multiple cameras and then stitching the video together into a single, seamless sphere. Users can have an incredibly immersive experience, especially when using high-resolution head-mounted display (HMD) devices [53]. However, remote transmission and rendering of ultra-high-resolution panoramic content significantly exceeds the capacity of conventional infrastructure. However, the emerging 5G and beyond wireless network technologies are expected to bridge the current performance gap by offering higher network flexibility, transmission capacity, and mobility support [3].
Currently, a standard way to mitigate the transmission of ever-increasing 360° video services is through viewport-based adaptive streaming frameworks (i.e., monolithic streaming [7, 64] and tile-based streaming [37, 54]). Multiple versions of pre-defined viewports are prepared on the server side in monolith streaming. The entire spherical frame provides higher viewport quality and gradually lower outside quality for each viewing feedback. Contrarily, tile-based streaming lowers these requirements by spatially partitioning the video frames into independently encodable rectangular video parts known as tiles [21, 59]. The VR user can envision the FoV tiles in higher-quality levels [31, 65] compared to the other tiles, which are delivered in lower resolution [12, 38] or even discarded [49]. The user’s head motion patterns are an essential measurement for quality-efficient remote transmission. However, it is limited in many cases. Viewport prediction can help to reduce the time it takes for new tiles to be loaded as the viewer changes their viewing angle, improving the overall streaming experience. The client can allocate more bits to these tiles based on visual visit information [28].
The spatial partitioning structure of tiles plays a vital role in balancing viewport availability and bandwidth utilization. Existing fixed tiling layout solutions [15, 16, 36] stream variable-quality views in order to reduce data transmission. However, this can still lead to poor visual boundaries and inefficient use of bandwidth. In contrast, a dynamic tiling-based streaming framework reduces redundant data and provides improved FoV availability for different viewing behaviors of users. However, it is challenging to support dynamic tiling-based streaming under complex viewing patterns. Similarly, identifying and selecting prioritized views is necessary but not simple. Using traditional bitrate adaptation heuristics [42, 52] for tile-based streaming in the presence of various uncertainties (such as connection speed, user movements, segment sizes, etc.) is not practical due to the spatial and temporal separation of 360° content. Suppose a learning-based [22, 45, 46] or controlled adaptation technique [60, 61] can correctly calculate the bitrate for the next segment in real time. Still, it is strenuous to best match the quality scores due to the instantaneous short-term viewport updates.
This article introduces two novel Dynamic video Frames Tiling-based (DFT) 360° video streaming solutions involving a three-tier adaptation in terms of tiling layout adaptation, streaming tile selection, and bitrate adaptation. In an end-to-end remote 360° video transmission, the first solution, DFT1, decides an optimal tiling layout based on a newly proposed priority-assisted weighted visual quality metric. The second solution, referred to as DFT2, intelligently adapts the tiling version based on the head movement prediction accuracy for each video segment. The proposed DFT solutions perform prioritized tile selection by classifying streaming regions into the following cases: (1) Case 1: fixed viewport with no marginal tiles; (2) Case 2: fixed viewport with marginal tiles; and (3) Case 3: extended viewport with no marginal tiles. Finally, a DFT bitrate adaptation heuristic is designed in such a way as to support the dynamic tiling-based streaming framework by implementing prioritized bitrate budget distribution between different tile groups. This article has the following main contributions:
(1)
Adaptive Tiling Layout Switching Based on Visual Quality and Prediction Relevance: Two innovative solutions that dynamically determine tiling layouts, taking into account both visual quality prioritization (DFT1) and viewport prediction accuracy (DFT2), during each segment playback are introduced. In particular, DFT1 selects the highest-quality tiling layout to deliver an optimal viewing experience, effectively addressing the complexity and scalability issues faced by existing solutions. The second strategy, DFT2, tailors tiling layouts based on viewport prediction performance, thereby enhancing viewport availability across a variety of motion content.
(2)
Efficient Computation of Streaming Regions: A low-complexity, precise solution for determining the optimal arrangement of streaming tiles, utilizing a combination of two viewport prediction mechanisms, where the viewport is defined in terms of 110\(^{\circ }\) angles in both horizontal and vertical directions, is described. This approach employs advanced tile classification, i.e., dynamic viewport and marginal regions, in order to improve the displayed viewport’s adaptability in response to non-native head movements.
(3)
Region-based Uniform Bitrate Adaptation: A dynamic tiling-based uniform bitrate adaptation algorithm that incorporates diverse adaptation policies, including aggressive, weighted, and conservative, is proposed. This novel algorithm proactively allocates the available bandwidth to specific spatial regions and optimizes viewer experience according to the desired adaptation strategy.
We present extensive experimental evaluations using real head motion traces of 48 VR users considering five 4K videos prepared in three tiling layouts (4 \(\times\) 3, 6 \(\times\) 4, 8 \(\times\) 6) and with three segment durations (1s, 1.5s, 2s). Experimental results show that DFT improves the streaming performance measured in terms of viewport overlap (8.6% to 11.2%) and QoE (9.70% to 18%) under dynamic bandwidth conditions in comparison to popular fixed tiling-based and dynamic tiling-based solutions.
This work presents significant new contributions compared to our previously proposed solutions, CFOV [55] and DVS [56]. In comparison to [55] and [56], the proposed solutions have the following new points. First, two novel options for tiling layout selection are proposed that can improve viewport availability and reduce the transmission of redundant pixels under variable head movement prediction accuracy. Second, the DFT tile selection mechanisms are comprehensively different from those proposed before. DFTs employ adaptive marginal and extension region selections, which are fine-grained and help with highly dynamic viewing patterns. DVS considered visual complexity and circular distance between viewpoints to classify viewport, marginal, and background tile sets, while CFOV considered fixed and extended FoV scenarios and adopts a wider marginal region based on prediction results. Third, DFT solutions introduce a novel bitrate adaptation algorithm designed to handle dynamic adaptation decisions for multiple tiling layouts, which is a significant new contribution in contrast with the previously introduced fixed tiling-based solutions. DVS specifically switches between uniform (per-region) and non-uniform (per-tile) quality allocation strategies, while DFT considers per-region uniform bitrate adaptation. Finally, a significantly expanded testing setup is used to evaluate comparatively the streaming behaviors of both fixed and dynamic tiling-based solutions.
Article Organization: Section 2 discusses the most recent literature on 360° tile-based streaming. Section 3 details the structure of the proposed 360° adaptation framework and problem formulation. The details of tiling layout selection, tile selection, and tile bitrate adaptation are introduced in Section 4. Section 5 presents the experimental settings, results, and performance analysis. Finally, Section 6 offers conclusive remarks.

2 Background and Related Works

This section presents the important technical background linked to our research and provides a comprehensive overview of the most recent streaming techniques, applications, and limitations.

2.1 Fixed Viewport-based Streaming

In this streaming approach, the size of the viewer window (the “viewport”) is fixed. The system delivers a higher-quality version of the video to the portion of the video that is within the viewport. This approach takes into account the viewer’s dynamic motion patterns, as the viewport is adjusted to follow their movements.
Hosseini and Swaminathan [16] proposed a priority-based bitrate adaptation (PBA) algorithm for 360° video streaming that takes into account the location of different tiles within the video (central, surrounding, and outside). The algorithm starts by assigning the lowest-quality version of the video to the entire segment and then gradually increases the quality of the central tile to the highest level, followed by the surrounding and outer tiles. However, the PBA algorithm was evaluated using a VR setup with a 2K resolution and videos encoded using H.264/AVC, which may not be optimal for enriched 360° videos. Similarly, Chen et al. [4] proposed a system for adapting the quality of 360° video based on the location of different tiles within the viewport, with higher priority given to tiles in the center and lower priority given to tiles in the corners. However, this system does not take into account viewer motion or use any prediction mechanism and was evaluated using fixed network connections. Nasrabadi et al. [28] employed a cube map projection-based scalable video coding scheme where each face of the cube was divided into two horizontal and two vertical tiles and encoding was performed using one base layer and two enhancement layers. The experimental evaluations using four streams of different spatiotemporal complexities demonstrate that compared to the non-scalable coding, layer-assisted tile coding results in fewer rebuffering events while offering improved quality. Van der Hooft et al. [15] proposed a Uniform ViewPort (UVP) quality solution that is designed for use with a fixed viewport. UVP divides the video into two regions: the viewport, which is the portion of the video that is currently being displayed to the viewer, and the non-viewport, which is the rest of the video. The tiles in both regions are arranged using a prediction approach that extrapolates the viewer’s head motion to anticipate their upcoming viewing points. However, this method was only tested using three videos with a single segment duration. Wei et al. [45] proposed a hybrid adaptation solution to control viewpoint prediction and adaptation decisions by leveraging a deep reinforcement learning (DRL) method to continuously compute first the segment bitrate and then the per-tile bitrate based on predicted fixed viewport maps and use them in a cooperative bargaining game theory approach. The proposed solution processes head movement and eye fixation information to adjust the prioritized quality decisions within the spatial and temporal domains.

2.2 Marginal Region-based Streaming

In this streaming approach, a spatial extension, known as the “marginal area,” is defined around the viewport. The purpose of the marginal region is to provide a buffer around the viewport to account for possible errors in head movement prediction. Petrangeli et al. [36] proposed an adaptive virtual reality (AVR) streaming approach that divides the tiles of the 360° video into viewport, adjacent, and outside groups. The authors collected viewport traces using the Gear VR framework while 10 users watched a single 360° video. However, the evaluation was limited to a single 60-second-long 360° video clip. Ben Yahia et al. [2] divided the equirectangular frame into viewport, marginal, immediate background, and far background regions. The proposed model involves two viewport prediction intervals, i.e., before and during the delivery of the same segment. The client assigns variable weights to different priority regions and can update the resource allocation based on updated prediction results. Zou et al. [65] introduced a convolutional neural network (CNN)-based prediction mechanism and then distributed the communication resources for the quality selection of predicted tiles. The proposed solution maps the spherical representation to the planar projection to calculate the viewing probability of each tile. The tiles are then divided into viewport, marginal, and background tile groups. The marginal tiles surround the viewport in all directions, similar to [36]. However, CNN-based viewport prediction models are computationally expensive and are difficult to extend for different videos. Yuan et al. [57] proposed a simple yet effective buffer-based quality-aware bitrate adaptation algorithm to allocate different quality levels to the viewport, marginal, and outside tiles. The experimental evaluations using three 4K test sequences prepared in a 6 \(\times\) 4 tiling layout under staged bandwidth variations show that the proposed solution favors the high visible quality levels with considerable navigation smoothness. However, concise simulations were performed for each video content (about 10s). Yadav and Ooi [50] modeled the per-tile bitrate allocation problem as a multiclass knapsack problem based on a dynamic profit function of the current FoV, buffer level, and per-tile representation level. The proposed tile-rate allocation solution based on the previously proposed non-tiled ABR algorithm [51] achieves good results in terms of reducing playback interruptions and quality switches while improving the overall quality and bandwidth savings. However, this approach may lead to higher spatial quality variance within the viewport, and the use of a separate buffer for each tile can cause the playback of the entire video to stall if one of the tiles is not downloaded in time.

2.3 Extended Viewport-based Streaming

Extended viewport-based streaming is a technique of delivering 360° video in which the viewport is virtually extended by a certain percentage, typically 10% to 30%, in order to provide a buffer around the viewport to account for viewer movements. Van der Hooft et al. [15] proposed a quality adaptation approach by considering the extended viewport (full-frame) region. This approach, called Center Tile First (CTF), focuses on improving the quality of the center or viewpoint tile and then gradually increases the quality of the remaining tiles. CTF was evaluated considering the weighted viewport quality metric, which assigns higher weights to the center tile quality and gradually lowers the weights toward the end tiles. It was shown to outperform the uniform viewport quality allocation solution, UVP, for the weighted viewport quality metric. However, when tested using average viewport quality, UVP performs better than CTF.
He et al. [14] proposed a joint adaptation solution that adjusts both the size of the viewport and the bitrate of the video based on network conditions. The algorithm measures the round-trip time (RTT) of the network connection and uses this information to determine the viewport size and the necessary bitrate for smooth streaming. Simulation results using the Network Simulator (NS)-3 tool showed that this adaptable viewport coverage approach can improve the quality of the streaming experience. However, the details of this work, such as the viewport prediction mechanism, the dataset and tiling layout used, and the content resolution, are not provided. Similarly, Hu et al. [17] proposed a system called MELiveOV for live streaming high-resolution 360° video using 5G-enabled edge servers to distribute processing tasks. This edge-based live streaming system adjusts the size of the viewport based on network conditions, with a smaller viewport (90°) requested in higher bitrates under poor network conditions and a larger viewport (120°) selected for streaming under ideal conditions. However, the performance of this work was only compared to a viewport-independent streaming approach. Guo et al. [13] proposed a solution for 360° video streaming that takes into account random motion patterns and variable network conditions for each viewer and tries to use multicast opportunities to reduce redundant data transmissions. The proposed solution computes the actual viewport tiles for the current user and adds more tiles to the viewing region based on the common interest of other users. The authors considered 100° viewport coverage and an extra 15° in both horizontal and vertical directions. Similarly, Long et al. [24] optimized the overall utility of multiple users in a wireless network environment with a single server. The proposed solution takes into account factors such as transmission time, video quality smoothness, and power constraints in order to maximize the aggregated utility of the users.

2.4 Dynamic Tiling-based Streaming

In dynamic tiling-based adaptive streaming, multiple tiling layouts are prepared on the server side in order to optimize the delivery of a 360° video to a viewer. The tiling layout that is used for a particular viewer may be changed dynamically in order to adapt to their viewing and network conditions. Khiem et al. [39] investigated the impact of tiling layouts on interactive zoomable video streaming by employing the dynamic cropping of regions of interest (RoI). The authors compared the performance of regular monolithic streaming and tile-based streaming using two HD videos and found that larger tiles can improve compression efficiency, but at the cost of transmitting redundant pixels. In this work, we attempt to reduce the transmission bits and provide improved viewport availability but with an unmodified decoder. In the follow-up work [30], the authors employed user access patterns to encode the different streaming regions with different encoding parameters. Our DFT solutions also assign variable uniform bitrates to different streaming regions, but with more profound viewing region selection and dynamic bandwidth distribution. Nguyen et al. [32] proposed an adaptive tiling selection (ATS) solution for 360° video streaming. The authors evaluated four different tiling layouts (4 \(\times\) 3, 6 \(\times\) 4, 8 \(\times\) 4, and 8 \(\times\) 8) and divided the selected tiles into viewport and non-viewport groups for each layout. During each adaptation interval, the tile sets that resulted in the minimum viewport distortion or the maximum viewport bitrate were chosen for streaming. However, this approach did not incorporate any viewport prediction mechanism and was tested using fixed network connections. Xiao et al. [48] proposed an optimal tiling solution by partitioning a 360° segment into variable-size sub-rectangles to minimize the storage cost on the server side. The proposed solution estimates the storage and transmission cost by extracting the motion vectors and sizes of all basic sub-rectangles. An integer linear program (ILP) is then used to output the optimal tiling version that covers possible views of the segment. The proposed solution achieves interesting results, but at the cost of increased computational complexity. We attempt to achieve a similar goal of balancing storage size and data transmission, but with reduced server-side storage overhead and by utilizing standard computing and streaming components. The proposed solutions are essential for viewers who want to take advantage of the immersive and interactive VR experience without having to invest in additional hardware.
Kattadige and Thilakarathna [20] proposed a method for selecting the tiling layout of each segment of a video based on the visual attention of the user. The approach involves analyzing the frames of the video, creating visual attention maps for the user, and dividing the frames into three regions based on the user’s attention. The proposed solution was compared to three fixed tiling layouts (4 \(\times\) 6, 6 \(\times\) 6, and 10 \(\times\) 20) and was found to be more efficient in terms of pixels and bandwidth usage. Ozcinar et al. [34] employed visual attention maps to improve the network capacity planning for different tile groups. Variable-sized non-overlapping tiles are adaptively selected for each segment. However, real-time visual attention map computation and transmission require extensive resources, which is not in favor of this proposed solution. In a follow-up work [35], the authors extended their visual-attention-aware variable-size non-overlapping tile mapping to benefit from the dynamic tiling structure. Each 360° video frame was split into two fixed-sized polar tiles (one-fourth of the frame from the top and one-fourth from the bottom). The remaining equator region was horizontally divided into 1 and 2 tiles and then each part was divided into 1, 2, 4, 8, and 16 vertical tiles. Numerous dynamic tiling combinations can be considered using this division. The authors employed seven different spatial and temporal motion content types, but all with a duration of 10s. However, this type of tiling structure is not feasible in real-time streaming scenarios, as the two fixed-size polar tiles (half of the frame) need to be transmitted in full quality if any part of the viewport is predicted to be in that region.
Table 1 illustrates the most significant streaming techniques for tile-based adaptive 360° video streaming. These algorithms use user-specific viewing preferences to improve the user’s QoE by establishing a stable background. Most of the fixed viewport-based solutions [4, 15, 16] define variable quality levels within the viewport, which can lead to severe spatial quality oscillations even for perfect prediction results. Several solutions [2, 12, 36, 65] simply employ a fixed marginal area around the viewport in all directions. It can compensate for the highly dynamic viewing nature of the user; however, a significant waste of the bandwidth can be observed under medium to high prediction accuracy. Similarly, always extending the viewport region by 15° [13] and 10° [24] can lead to unnecessary transmission under perfect predictions. Different from previous works, in our approach the viewport and marginal region are considered special cases in the quest to overcome viewing uncertainty. Dynamic tiling solutions [20, 30, 34, 35] are theoretically effective in terms of increasing the picture quality and users’ QoE. However, some of these solutions require real-time visual mapping, which makes them difficult to implement in traditional on-demand scenarios. Mixing different resolution tiles [44] to provide a non-redundant viewport transmission [20, 34, 35] can result in users sensing quality variations and degradation for high and relatively static motion content. These solutions are difficult to be extended to consider different content types and are associated with additional coding and reconstruction overheads.
Table 1.
Streaming TechniqueWorksDesignDatasetTile LayoutsResolutionSegment DurationExperimental Duration
Fixed Viewport[16]Non-uniform VP5 Videos, 1 Users6 tiles720p-4K-Video duration
[4]Non-uniform VP5 Videos [23]3 \(\times\) 3, 4 \(\times\) 4, 5 \(\times\) 52K1s20s
[15]Uniform and Non-uniform VP3 Videos, 48 Users [47]1\(\times\) 1, 2 \(\times\) 2, 4 \(\times\) 2, 4 \(\times\) 4, 8 \(\times\) 4, 8 \(\times\) 6, 8 \(\times\) 8, 16 \(\times\) 12/164K1.067sVideo duration
[49]Probability Based1 Video, 5 Users6 \(\times\) 122K1s3m
[28]Layer Assisted4 Videos, 5 Users6 and 24 tiles4K32 framesVideo duration
Marginal Region[36]Fixed Margin1 Video, 10 Users6 tiles8K1s, 2s, 4s60s
[2]Fixed Margin3 Videos, 3 Users [6]6 \(\times\) 44K1s1m
[65]Fixed Margin3 Videos, 10 Users [1]8 \(\times\) 84K1sVideo duration
[57]Dynamic Margin3 Videos, 1 Trace4 \(\times\) 64K2s10s
Extended Viewport[14]Dynamic Extension-----
[17]Dynamic Extension4 Videos, 1 User4 \(\times\) 64KLiveVideo duration
[13]Fixed Extension (15°)1 Video36 \(\times\) 2-0.1sVideo duration
[24]Fixed Extension (10°)1 Video18 \(\times\) 36--Video duration
Dynamic Tiling[32]Visual Distortion1 Video, 10 Users4 \(\times\) 3, 6 \(\times\) 4, 8 \(\times\) 4, 8 \(\times\) 84K1s60s
[35]Visual Attention7 Videos, 25 UsersMultiple8K-10s
[20]Region Based30 Videos, 30 UsersMultipleHD-4K-60s
[48]Variable Rectangles5 Videos, 58 UsersMultiple2K & 4K--
Table 1. Summary of Tile-based Viewport Adaptive 360° Video Streaming Solutions

3 Proposed Dynamic Tiling-based Architecture

3.1 Dynamic Tiling-based System Architecture

Figure 1 illustrates the workflow of DFT solutions. On the server side, the 360° video is pre-processed by dividing it into a number of segments, i.e., \(\mathcal {S}=\lbrace \mathcal {S}(1), \mathcal {S}(2), \ldots , \mathcal {S}(i), \ldots , \mathcal {S}(I)\rbrace\). Each segment is then divided into \(l\) tiling layouts, i.e., \(\mathcal {T}_l(i),{\it } \forall {\it } l\in \lbrace x, y, z\rbrace\), containing a small, medium, and large number of tiles, respectively. Each tiling layout is further divided into a number of tiles, i.e., \(\mathcal {T}_l=\lbrace \mathcal {T}_l^{1}(i), \mathcal {T}_l^{2}(i), \ldots , \mathcal {T}_l^{k}(i), \ldots , \mathcal {T}_l^{K}(i)\rbrace\). These tiles are then encoded at a number of different bitrates, i.e., \(\mathcal {L}_l=\lbrace \mathcal {L}^k_{l,1}(i), \mathcal {L}^k_{l,2}(i), \ldots , \mathcal {L}^k_{l,j}(i), \ldots , \mathcal {L}^k_{l,J}(i)\rbrace\). Let \(\mathcal {L}^{k}_{l,j}(i)\) represent the \(j\)th bitrate of the \(k\)th tile in the \(l\)th tiling layout of the \(i\)th segment.
Fig. 1.
Fig. 1. The proposed 360° client-server streaming architecture.
The DFT clients, which control the adaptive streaming operations, need to know in advance about the available tiling layouts on the server side. DFT2 performs tiling layout selection before determining the streaming tiles and bitrate allocations during each adaptation interval. The tiling layout selection module in DFT2 checks the overlap between the actual and predicted viewport areas during the previous segment. The streaming tile selection module selects sets of tiles for different priority regions (i.e., viewport (\(\mathcal {T}_{l}^{v}(i)\)), marginal (\(\mathcal {T}_{l}^{m}(i)\)), and background (\(\mathcal {T}_{l}^{b}(i)\))) based on the predicted viewport coordinates for each segment. This helps to ensure that the video is able to adapt to the viewer’s movements and maintain a high level of quality by pre-downloading tiles that are most likely to be watched. The tile bitrate adaptation unit then selects appropriate bitrates for each tile based on the associated region and the available network capacity. DFT1, on the other hand, first calculates the streaming regions and relevant bitrates for each tiling layout. It then selects the tiling layout that results in the highest-weighted-area-based visual quality score in each adaptation interval. The segment request is then sent, and upon receiving the segments, the client decodes and reconstructs the requested views similar to fixed tiling-based views in the post-processing phase with no additional decoding overhead. The requested content is then presented to the user.

3.2 Problem Definition

In 360° adaptive video streaming, it is important to consider the user’s quality expectations, which depend largely on the quality of the visible area. Even if the viewport tiles are played at higher-quality levels, the intra- and inter-segment quality oscillations may not satisfy the user. The QoE metric used in this context includes viewport quality and spatial and temporal smoothness factors, as well as the risk of playback buffer issues.
Viewport Quality: The user is able to visualize only certain tiles during 360° video playback. The viewport quality reflects how much a user is satisfied with the visual perception. The client can be presented with any visual quality representation, but the average quality levels of the viewport tiles are highly correlated with the average bitrate that is actually consumed by the viewer. Therefore, by averaging the quality of the actual viewport tiles in segment \((i)\), for the \(l\)th tiling layout, the viewport quality is given as follows [37, 63]:
\begin{equation} {\it f}_{1}(i)=\frac{\sum _{k \in \mathcal {T}_{l}^{\hat{v}}(i)}{\ }\sum _{j \in \mathcal {L}_l}\mathcal {Q}(\mathcal {L}^{k}_{l,j}(i))}{|\mathcal {T}_{l}^{\hat{v}}(i)|}, \end{equation}
(1)
where \(\mathcal {T}_{l}^{\hat{v}}(i)\) represents the actual viewport tiles set in the \((i)\)th segment and \(|\mathcal {T}_{l}^{\hat{v}}(i)|\) indicates the cardinality of the set. \(\mathcal {Q}(\mathcal {L}^{k}_{l,j}(i))\) maps the \(j\)th bitrate of the \(k\)th tile to the particular video quality level.
Temporal Quality Oscillations: The inter-segment quality switches can reduce the “sense of being there” in an immersive environment. This may happen not only because of the network fluctuations but also due to the differences in head movement predictions. The user’s experience can be impaired by physiological symptoms such as dizziness and headache when observing frequent visual disparity [41]. Therefore, the inter-segment quality fluctuations should not be drastic and can be calculated as the difference between the observed viewport quality levels of two consecutive segments [37, 63]:
\begin{equation} {\it f}_{2}(i)=| {\it f}_{1}(i)-{\it f}_{1}(i-1)|. \end{equation}
(2)
Spatial Quality Oscillations: The visual tiles having different quality levels leads to complex perception. Cybersickness, viewing irritation, nausea, fatigue, and aversion [11], can be driven by inconsistent quality levels within the viewport. Compared to regular 2D videos, if the perceived quality of 360° tiles is not smooth, it will reduce the overall QoE. Following [19], we measured the spatial quality oscillations according to the coefficient of variation (CV) of viewport tiles’ quality:
\begin{equation} {\it f}_{3}(i)=\frac{\sigma (\mathcal {Q}(\mathcal {L}^{k}_{l,j}(i)))}{\mu (\mathcal {Q}(\mathcal {L}^{k}_{l,j}(i)))},{\quad }\forall k \in \mathcal {T}_{l}^{\hat{v}}(i),{ }\forall j \in \mathcal {L}_l. \end{equation}
(3)
The standard deviation of the viewport quality samples is in the numerator, and the mean of the samples is in the denominator.
Playback Buffer Risk: A large buffer capacity may not be efficient for 360° video streaming because of the constantly changing FoV during playback [9, 33]. Pre-buffering high-quality tiles can be risky, as the user’s FoV may shift at the time of playback. Instead of relying on the traditional playback discontinuity under short-term viewport prediction, it is more beneficial to assess directly risky buffer events based on the available connection bandwidth and the selected video bitrates. This can be expressed as follows [45]:
\begin{equation} f_{4}(i) = {\left\lbrace \begin{array}{ll}1, & {{\bf if} (\widehat{B}(i) \lt \sum _{k\in \mathcal {T}_{l}(i)}{\mathcal {L}^k_{l,j}(i)})} \\ 0, & Otherwise \end{array}\right.}, \end{equation}
(4)
where \(\widehat{B}(i)\) represents the available bandwidth budget for the \((i)\)th segment.
Following the principle behind the QoE metric for traditional video [26], some works [37, 62, 63] consider video quality, quality variations, rebuffering events, and so forth to model a QoE metric for 360° videos. The user-perceived QoE for each 360° segment is defined by a weighted summation formulation:
\begin{equation} {{\it QoE}(i)=\alpha \times {\it f}_{1}(i) - \beta \times {\it f}_{2}(i) - \gamma \times {\it f}_{3}(i) - \delta \times {\it f}_{4}(i),} \end{equation}
(5)
where \(\alpha\), \(\beta\), \(\gamma\), and \(\delta\) are the parameters indicating how much importance a user gives to video bitrate, temporal and spatial quality variances, and rebuffering risk, respectively. As users do not want to experience quality fluctuations and rebuffering events, the functions \({\it f}_{2}(i)\), \({\it f}_{3}(i)\), and \({\it f}_{4}(i)\) are set to negative.
Accurate evaluation of QoE is essential for optimizing the performance of traditional, multimedia [58], and immersive video content. The level of satisfaction a user experiences while watching a VR video is determined by how long they feel immersed in the scene. The proposed clients aim to select optimal bitrates for each segment in a dynamic tiling streaming system in order to maximize the user’s long-term QoE reward. The mathematical problem formulation is as follows:
Problem:
\begin{equation} {max \sum _{i\in \mathcal {S}} QoE (i)} \end{equation}
(6)
The proposed solutions solve this problem by implementing a three-tier adaptation mechanism. First, they select a relevant tiling layout for each segment. Next, DFT solutions dynamically perform the viewing area selection based on the two viewport prediction mechanisms to predict the most likely to be watched tiles. Finally, the tile bitrate adaptation mechanism improves the bitrate budget distribution between different tile groups. These mechanisms are elaborated on in the next section.

4 Proposed Dynamic Tiling-based Adaptation Algorithms

This section presents the adaptation algorithms for DFT1 and DFT2 streaming clients.

4.1 DFT Tiling Layout Selection Algorithms

Tile-based encoding brings several opportunities such as efficient video coding [40], improved quality distribution, and parallel [25] and partial decoding [5] for VR video applications. The choice of the appropriate tiling layout, which reflects the spatial partitioning of frame areas, impacts the overall video compression performance. In 360° video, the polar regions have higher viewing distortions and less viewing probability than the equator regions when transforming a spherical representation into a two-dimensional planar format, i.e., equirectangular projection. Therefore, encoding polar areas with more pixels consumes the user’s limited bandwidth to transmit data related to less relevant image regions. Fixed tiling solutions encode polar and equator regions at similar bitrate levels, leading to unattractive viewport boundaries and losing positive compression opportunities. Employing a smaller number of tiles (i.e., larger resolution tiles) can improve the compression performance in some cases. Yet, at the same time, it may include unnecessary higher-quality portions outside the viewport [35]. Contrarily, smaller resolution tiles can reduce the number of redundant pixels [45]; however, it may also cause visual distortions such as flickering, floating, and blurring at the edges of the tiles [8]. Finding ways to dynamically select the most appropriate tiling layout for a given viewing scenario and preferences is an important area of research. By developing smart techniques that can take these factors into account and adjust the tiling layout accordingly, it may be possible to improve the overall viewing experience. Therefore, the proposed solution considers two tiling layout selection solutions to lower redundant data transmission and facilitate a fine-grained visual perception for different motion content.
DFT1:The proposed DFT1 solution decides an optimal tiling layout during each adaptation interval based on the observed visual quality scores. Since the user gaze point is mostly located around the center of the viewport [27, 43], the viewpoint quality should have a higher priority compared to other tiles. Therefore, we design a priority-assisted visual quality metric to attentively select the suitable tiling layouts during 360° video streaming. In this context, DFT1 assigns different priority weights to the viewport tiles in such a way that the tiles closer to the viewpoint should have a higher priority compared to other tiles. The tiles are arranged based on how far they are located from the viewpoint. The priority weights are assigned such that the most important parts of the image, as determined by their proximity to the center of the viewer’s focus, are rendered with the highest quality, while less important parts of the image are rendered with lower quality. In this context, the highest and lowest weights are allocated for the mapped quality of the viewpoint and the last tile, respectively, in the sorted tile set. The weighted quality metric is given in Equation (7):
\begin{equation} { \mathcal {WQ}_{l}^{v}(i) = \frac{\sum _{k=1}^{|\mathcal {T}_{l}^{v}(i)|}\sum _{j=1}^{J}(2)^{|\mathcal {T}_{l}^{v}(i)|-k}\times \mathcal {Q}(\mathcal {L}^{k}_{l,j}(i))}{(2)^{|\mathcal {T}_{l}^{v}(i)|}-1}}, \end{equation}
(7)
where the quantity \(|\mathcal {T}_{l}^{{v}}(i)|\) represents the number of tiles in the set of tiles predicted to be within the viewport, and \(\mathcal {Q}(\mathcal {L}^{k}_{l,j}(i))\) maps the video bitrate to a specific quality level. Since we consider the extended viewport case, elaborated in Section 4.2, where the visual area can be different for different tiling layouts, for instance, an extended viewport with \(\mathcal {T}_{x}^{{v}}(i)\) could cover more region as compared to an extended viewport with \(\mathcal {T}_{z}^{{v}}(i)\). Therefore, we define the visual-area-based weighted video quality metric, which tries to balance the visual area and the weighted quality and is given in Equation (8):
\begin{equation} \mathcal {VQ}_{l}^{v}(i) = \frac{|\mathcal {T}_{l}^{v}(i)|}{|\mathcal {T}_{l}(i)|}\times \mathcal {WQ}_{l}^{v}(i), \end{equation}
(8)
where \(|\mathcal {T}_{l}(i)|\) represents the total number of tiles in the tiling layout \(l\). The tiling layout selection procedure for DFT1 is given as follows:
(1)
For each tiling layout:
Perform streaming tile selection and identify the streaming case using Algorithm 2.
Perform bitrate adaptation for the tile groups of the selected case using Algorithm 3.
Compute the prioritized visual-area-based quality scores using Equations (7) and (8).
(2)
Stream the tiles from the tiling layout that results in the highest visual levels.
DFT2: DFT2 decides an optimal tiling layout based on the viewport prediction performance. Unlike DFT1, which is based on visual area, DFT2 measures the closeness between actual and predicted viewport tile sets in terms of viewport overlap to select the appropriate tiling layout for the next segment. Let \(\mathcal {O}(i-1)\) denote the overlap percentage of the actual and predicted viewport tiles for the \((i-1)\)th segment; it is given as [29]
\begin{equation} \mathcal {O}(i-1) = \frac{|\mathcal {T}_{l}^{\hat{v}}(i-1) \cap \mathcal {T}_{l}^{v}(i-1)|}{|\mathcal {T}_{l}^{\hat{v}}(i-1)|}\times 100. \end{equation}
(9)
Algorithm 1 details the tiling layout selection procedure in DFT2. As no information is available at the start, the tiling layout with a larger number of tiles (\(\mathcal {T}_z(i)\)) is selected for the first segment (lines 1–2). If there is no overlap between actual and predicted viewing tiles, then the tiling layout with a smaller number of tiles (\(\mathcal {T}_x(i)\)) is selected for the \((i)\)th segment to deal with fast head rotations (lines 3–4). If actual and predicted viewports perfectly overlap during the previous segment, the smallest-resolution tiles are selected to lessen the abundance of unnoticeable pixels outside the viewport region (lines 5–6). If the actual and predicted viewports partially overlap during the playback of the previous segment, medium-resolution tiles represented as \(\mathcal {T}_y(i)\) are streamed for the next segment (lines 7–8). DFT solutions do not involve complex frame partitioning and ensure a flexible uniform tiling structure without any modifications of existing video coding and stream processing tools, which makes them attractive to be adopted in on-demand and live streaming scenarios. DFT1 is a scalable solution that can work with any number of tiling layouts. It is also practical for both simulation and real-time environments.

4.2 DFT Streaming Tile Selection Algorithm

The ability to choose the best-fit tiles in response to the user’s unpredictable head movements is one of the fundamental criteria for 360° video applications. The prediction accuracy of current streaming solutions based on a single viewport prediction technique can decrease when predicting longer in the future. To adaptively encompass the real viewing region, this work employs two viewpoint/viewport prediction techniques. It’s interesting to note that, in the majority of cases, the naive prediction model (using the current coordinates as predicted points) outperforms more sophisticated models [10]. The primary viewport tile set (\(\mathcal {T}_{l}^{vn}(i)\)) contains the viewport tiles actually watched by the user during the previous segment. The secondary viewport tile set (\(\mathcal {T}_{l}^{vs}(i)\)) is computed using a spherical walk approach described in [15].
Algorithm 2 aims to find appropriate tiles for the viewport, marginal, and background regions, respectively. The tile identification and selection are dynamically performed for each adaptation interval. Algorithm 2 takes as input the tile set \(\mathcal {T}_{l}(i)\) with tiling layout \(l\) for the \((i)\)th segment, the primary predicted viewport tile set \(\mathcal {T}_{l}^{vn}(i)\), and the secondary predicted viewport tile set \(\mathcal {T}_{l}^{vs}(i)\). It outputs the estimated viewport tile set \(\mathcal {T}_{l}^v(i)\), the estimated marginal tile set \(\mathcal {T}_{l}^{m}(i)\), and the estimated background tile set \(\mathcal {T}_{l}^b(i)\). The algorithm first determines the viewport tile set based on the intersection between the primary and secondary predicted viewport tile sets. If the primary and secondary predicted viewport tile sets are disjoint sets, then the viewport tile set is the union of the primary and secondary predicted viewport tile sets. Otherwise, the primary predicted viewport tile set is assigned to the viewport tile set. Next, the algorithm determines the marginal tile set, such that if the intersection of primary and secondary viewport sets is empty, then the marginal tile set is empty. Otherwise, the marginal tile set is the difference between the secondary predicted viewport tile set and the primary predicted viewport tile set. Finally, the algorithm determines the background tile set following a check between tile set and the primary and secondary predicted viewport tile sets. Specifically, all the tiles that do not belong to the viewport or marginal tile sets are added to the background tile set. Figures 2 and 3 illustrate the tile selection cases in DFT2 based on the output of Algorithm 1 for two consecutive segments. The black rectangle represents the primary predicted viewport, while the blue rectangle represents the secondary predicted viewport. The potential viewport tiles are represented by a purple window, whereas the marginal and background tiles are marked in light green and brown, respectively.
Fig. 2.
Fig. 2. Tile selection cases in DFT2 for \(\mathcal {T}_z(i)\) tiling layout of \((i)\)th segment.
Fig. 3.
Fig. 3. Tile selection cases in DFT2 for \(\mathcal {T}_x(i+1)\) tiling layout of \((i+1)\)th segment.

4.3 DFT Tile Bitrate Adaptation Algorithm

Adaptive streaming players usually maintain a large buffer space for regular 2D videos to absorb the uneven motions in video scenes and playback interruptions. However, for 360° videos, a large buffer capacity is not encouraged due to FoV dynamics. In practice, for 360° tiled video streaming, the buffer should be as small as possible (usually two segments [15]) to accommodate the new chunks in response to the user movements within the immersive video. Algorithm 3 takes into account both the predicted tiles and network conditions to more accurately adjust the video quality for a smoother viewing experience. This algorithm is specifically designed for dynamic tiling-based 360° video streaming. Both DFT1 and DFT2 clients employ the same bitrate adaptation algorithm to decide the suitable bitrates for tiles.
In the absence of buffer consideration, accurate bandwidth estimation is crucial to achieving higher playback performance [53]. An over/under-estimation of the available bandwidth can result in frequent rebuffering/lower-quality playback. Following [28], the bandwidth for the \((i)\)th segment is computed as follows:
\begin{equation} {\widehat{B}}(i)=\frac{\sum _{\forall k, j}\mathcal {L}_{l,j}^k(i-1)*\tau }{\mathcal {D}(i-1)}, \end{equation}
(10)
where \(\mathcal {L}_{l,j}^k(i-1)\) represents the bitrate of the previous segment, \(\tau\) is the playback duration of the segment, and \(\mathcal {D}(i-1)\) represents the download time of the \((i-1)\)th segment. The proposed bitrate allocation algorithm considers aggressive, weighted, and conservative quality adjustments for different tile selection cases to improve the corresponding bitrate choice for each tile that the network can support. For tile selection Case 1, an aggressive quality adjustment is performed for viewport tiles. The algorithm performs a weighted quality adjustment if the marginal region is non-empty (Case 2 of Algorithm 2). A relatively conservative bitrate selection is performed for Case 3, where the viewport region is extended to lower the viewport mismatch while sacrificing the quality.
Algorithm 3 determines the bitrate selection for the tiles belonging to different priority regions calculated in Section 4.2. The input to the algorithm consists of various sets of video tiles (viewport, marginal, and background tiles), the number of tiles in the viewport and marginal regions, the available bandwidth for each segment of the video, and initial priority weights for the viewport and marginal tiles. The output of the algorithm is the selected bitrates for each tile in each segment of the video. The playback adaptation is performed for each segment after the previous segment has been fully downloaded. The algorithm begins by checking if the available bandwidth is less than or equal to the sum of the lowest bitrate options for all tiles in the current segment. If this is the case, the lowest bitrate is selected for all tiles (lines 1–2). If the available bandwidth is greater than or equal to the sum of the highest bitrate options for all tiles, the highest bitrate is selected for all tiles (lines 3–4). In other cases, the algorithm sets the bitrate for all tiles to the lowest bitrate option and calculates the remaining available bandwidth (lines 6–7). If there are tiles in the marginal region (i.e., \(\mathcal {T}_{l}^{m}(i) \ne \emptyset\)), the algorithm updates the priority weights for the viewport and marginal tiles (lines 9–10). The priority weights are determined based on the number of tiles in the viewport and marginal regions, with the viewport tiles being given higher priority. The viewport and marginal tiles (only possible in Case 2) are then allocated bandwidth based on the computed weights (lines 11–12). Next, the highest possible bitrates for the viewport and marginal tiles are chosen based on the available bandwidth for each region (lines 13–14). This ensures the weighted quality adaptation for viewport and marginal tiles. If there are no marginal tiles, then for Case 1 or Case 3 of Algorithm 2, an aggressive or relatively conservative quality allocation is considered for viewport tiles to ensure visual smoothness. After determining the bitrates for the viewport and marginal tiles, the bandwidth for the background tiles is calculated by subtracting the sum of these bitrates from the revised overall bandwidth budget (line 15). Finally, the bitrate of the background tiles is also increased, as long as it does not exceed the available bandwidth budget (line 16).

5 Experimental Evaluation

This section presents the experimental evaluations of our proposed solutions using a diverse range of content and network conditions.

5.1 Experimental Setup

The proposed solution evaluation is performed by modifying a VR player provided by [15] on a machine with an Intel Core i7-7500U CPU and 16 GB of memory running Ubuntu 16.04. In the experiments, the VR player retrieves 360° video segments from an HTTP server while the connection speed between the VR player and HTTP server was varied, as illustrated in Figure 4. Bandwidth trace 1 has more irregular increasing and decreasing trends compared to bandwidth trace 2. The maximum connection speed for trace 1 is 20 Mbps, while for trace 2, the maximum bandwidth value is 12 Mbps.
Fig. 4.
Fig. 4. Bandwidth traces employed in experiments.

5.1.1 Content Pre-processing.

This work employs a highly cited open-source video and head movement dataset captured by Wu et al. [47]. The dataset contains real head movement patterns of 48 unique VR users viewing 18 long-duration videos in two learning-based testing sessions using an HTC Vive headset with a field of view of 110°. In the first experiment, participants were asked to explore the content without paying too much attention to the specifics of what they were looking at. In the second experiment, on the other hand, they were asked to focus on the content and pay close attention to it, simulating certain behaviors or habits. We choose five videos, namely, LOSC Football (experiment 1), Weekly Idol-Dancing (experiment 2), Google Spotlight-HELP (experiment 1), GoPro VR-Tahiti Surf (experiment 1), and Rio Olympics VR Interview (experiment 2), from this dataset. This is in line with the recommendations of ITU-T R. P.913 [18] and is typical for research and development solution evaluations. The five different duration immersive clips in this dataset can be classified into four categories: Sport (LOSC Football and GoPro VR-Tahiti Surf), Performance (Weekly Idol-Dancing), Film (Google Spotlight-HELP), and Talkshow (Rio Olympics VR Interview). These videos are referred to as Football, Performance, Spotlight, Surfing, and VR Interview throughout the remaining article. Table 2 summarizes the content features of five videos. All of the videos were resized to 4K resolution using FFmpeg1 software. Following [12], we spatially split 360° videos into 4 \(\times\) 3, 6 \(\times\) 4, and 8 \(\times\) 6 tiling layouts. This work suggests that the 6 \(\times\) 4 tiling structure results in an optimal tradeoff between viewport availability, bitrate overhead, and bandwidth requirements. The video tiles were encoded using an open-source encoder called Kvazaar,2 with five different quantization parameter (QP) values: 22, 27, 32, 37, and 42. Considering the experimental recommendations for selecting segment duration for viewport adaptive streaming [7, 38], three different duration, i.e., 1s, 1.5s, and 2s, MPEG-DASH video segments were generated using GPAC MP4Box.3 The playback buffer was set to two segments for each experiment. The average segment sizes for each video are shown in Table 3. The simulation length was set according to the duration of each video.
Table 2.
VideosCategoryDurationResolutionFPS
FootballSport\(2^{\prime }44^{\prime \prime }\)3840 \(\times\) 216025
PerformancePerformance\(4^{\prime }38^{\prime \prime }\)3840 \(\times\) 192029
SpotlightFilm\(4^{\prime }53^{\prime \prime }\)3840 \(\times\) 216030
SurfingSport\(3^{\prime }25^{\prime \prime }\)3840 \(\times\) 192029
VR-InterviewTalkshow\(3^{\prime }07^{\prime \prime }\)3840 \(\times\) 192025
Table 2. Content Characteristics
Table 3.
VideoQP1s1.5s2s
4 \(\times\) 36 \(\times\) 48 \(\times\) 64 \(\times\) 36 \(\times\) 48 \(\times\) 64 \(\times\) 36 \(\times\) 48 \(\times\) 6
Football226.9 \(\pm\) 2.37.0 \(\pm\) 2.37.2 \(\pm\) 2.310.5 \(\pm\) 5.110.6 \(\pm\) 5.110.9 \(\pm\) 5.213.8 \(\pm\) 4.614.1 \(\pm\) 4.614.4 \(\pm\) 4.6
273.5 \(\pm\) 1.33.6 \(\pm\) 1.43.8 \(\pm\) 1.45.3 \(\pm\) 2.85.5 \(\pm\) 2.85.7 \(\pm\) 2.97.1 \(\pm\) 2.77.3 \(\pm\) 2.77.6 \(\pm\) 2.7
321.9 \(\pm\) 0.82 \(\pm\) 0.82.2 \(\pm\) 0.82.9 \(\pm\) 1.63.1 \(\pm\) 1.63.3 \(\pm\) 1.63.9 \(\pm\) 1.54.1 \(\pm\) 1.54.5 \(\pm\) 1.6
371.1 \(\pm\) 0.41.2 \(\pm\) 0.41.4 \(\pm\) 0.41.7 \(\pm\) 0.91.8 \(\pm\) 0.92.1 \(\pm\) 12.3 \(\pm\) 0.92.4 \(\pm\) 0.92.8 \(\pm\) 0.9
420.7 \(\pm\) 0.20.7 \(\pm\) 0.20.9 \(\pm\) 0.21 \(\pm\) 0.51.1 \(\pm\) 0.51.4 \(\pm\) 0.61.3 \(\pm\) 0.51.5 \(\pm\) 0.51.8 \(\pm\) 0.5
Performance228.5 \(\pm\) 2.98.6 \(\pm\) 2.98.9 \(\pm\) 3.012.8 \(\pm\) 5.913.0 \(\pm\) 5.913.4 \(\pm\) 6.017.0 \(\pm\) 4.717.3 \(\pm\) 4.717.8 \(\pm\) 4.8
274.6 \(\pm\) 1.74.7 \(\pm\) 1.75.0 \(\pm\) 1.76.9 \(\pm\) 3.37.1 \(\pm\) 3.37.5 \(\pm\) 3.49.3 \(\pm\) 2.79.5 \(\pm\) 2.710.0 \(\pm\) 2.7
322.6 \(\pm\) 0.92.7 \(\pm\) 0.92.9 \(\pm\) 0.94.0 \(\pm\) 1.94.1 \(\pm\) 1.94.5 \(\pm\) 2.05.3 \(\pm\) 1.55.5 \(\pm\) 1.56.0 \(\pm\) 1.5
371.6 \(\pm\) 0.51.7 \(\pm\) 0.51.9 \(\pm\) 0.52.4 \(\pm\) 1.12.5 \(\pm\) 1.12.8 \(\pm\) 1.23.2 \(\pm\) 0.93.4 \(\pm\) 0.83.8 \(\pm\) 0.9
420.9 \(\pm\) 0.31.0 \(\pm\) 0.31.2 \(\pm\) 0.31.4 \(\pm\) 0.61.6 \(\pm\) 0.61.9 \(\pm\) 0.71.9 \(\pm\) 0.52.1 \(\pm\) 0.52.5 \(\pm\) 0.5
Spotlight2213.6 \(\pm\) 8.813.9 \(\pm\) 8.814.3 \(\pm\) 8.920.4 \(\pm\) 15.020.9 \(\pm\) 15.221.5 \(\pm\) 15.427.1 \(\pm\) 17.127.7 \(\pm\) 17.228.5 \(\pm\) 17.3
277.2 \(\pm\) 5.37.4 \(\pm\) 5.37.7 \(\pm\) 5.410.8 \(\pm\) 8.911.1 \(\pm\) 9.011.6 \(\pm\) 9.114.3 \(\pm\) 10.314.8 \(\pm\) 10.415.5 \(\pm\) 10.5
324.0 \(\pm\) 3.14.2 \(\pm\) 3.14.5 \(\pm\) 3.26.1 \(\pm\) 5.26.3 \(\pm\) 5.36.7 \(\pm\) 5.48.1 \(\pm\) 6.18.4 \(\pm\) 6.19.0 \(\pm\) 6.2
372.3 \(\pm\) 1.82.4 \(\pm\) 1.82.7 \(\pm\) 1.83.5 \(\pm\) 2.93.7 \(\pm\) 3.04.1 \(\pm\) 3.14.7 \(\pm\) 3.54.9 \(\pm\) 3.55.4 \(\pm\) 3.5
421.3 \(\pm\) 0.91.4 \(\pm\) 0.91.6 \(\pm\) 0.92.0 \(\pm\) 1.52.2 \(\pm\) 1.62.5 \(\pm\) 1.62.7 \(\pm\) 1.72.9 \(\pm\) 1.83.3 \(\pm\) 1.8
Surfing2222.7 \(\pm\) 11.223.0 \(\pm\) 11.323.5 \(\pm\) 11.434.0 \(\pm\) 21.034.5 \(\pm\) 21.235.3 \(\pm\) 21.445.3 \(\pm\) 22.245.9 \(\pm\) 22.346.9 \(\pm\) 22.5
2712.8 \(\pm\) 6.713.0 \(\pm\) 6.813.4 \(\pm\) 6.819.2 \(\pm\) 12.419.5 \(\pm\) 12.520.2 \(\pm\) 12.725.5 \(\pm\) 13.326.0 \(\pm\) 13.426.8 \(\pm\) 13.5
327.2 \(\pm\) 3.97.4 \(\pm\) 3.97.7 \(\pm\) 3.910.8 \(\pm\) 7.111.1 \(\pm\) 7.211.6 \(\pm\) 7.314.4 \(\pm\) 7.714.7 \(\pm\) 7.815.4 \(\pm\) 7.8
374.0 \(\pm\) 2.24.1 \(\pm\) 2.24.4 \(\pm\) 2.26.0 \(\pm\) 3.96.2 \(\pm\) 4.06.6 \(\pm\) 4.17.9 \(\pm\) 4.38.2 \(\pm\) 4.38.8 \(\pm\) 4.3
422.1 \(\pm\) 1.12.2 \(\pm\) 1.12.5 \(\pm\) 1.13.2 \(\pm\) 2.03.4 \(\pm\) 2.13.7 \(\pm\) 2.24.2 \(\pm\) 2.24.5 \(\pm\) 2.25.0 \(\pm\) 2.2
VR Interview227.6 \(\pm\) 1.07.7 \(\pm\) 1.17.8 \(\pm\) 1.111.4 \(\pm\) 4.111.5 \(\pm\) 4.211.8 \(\pm\) 4.315.2 \(\pm\) 2.015.4 \(\pm\) 2.015.7 \(\pm\) 2.1
273.7 \(\pm\) 0.73.8 \(\pm\) 0.73.9 \(\pm\) 0.75.5 \(\pm\) 2.15.7 \(\pm\) 2.25.9 \(\pm\) 2.37.4 \(\pm\) 1.37.6 \(\pm\) 1.37.9 \(\pm\) 1.4
321.7 \(\pm\) 0.31.8 \(\pm\) 0.42.0 \(\pm\) 0.42.6 \(\pm\) 1.02.8 \(\pm\) 1.13.0 \(\pm\) 1.23.5 \(\pm\) 0.73.7 \(\pm\) 0.74.0 \(\pm\) 0.8
370.9 \(\pm\) 0.21.0 \(\pm\) 0.21.2 \(\pm\) 0.21.4 \(\pm\) 0.61.6 \(\pm\) 0.61.8 \(\pm\) 0.71.9 \(\pm\) 0.42.1 \(\pm\) 0.42.5 \(\pm\) 0.4
420.6 \(\pm\) 0.10.7 \(\pm\) 0.10.8 \(\pm\) 0.10.9 \(\pm\) 0.31.0 \(\pm\) 0.41.3 \(\pm\) 0.41.2 \(\pm\) 0.21.4 \(\pm\) 0.21.7 \(\pm\) 0.2
Table 3. Average and Standard Deviations of Segment Bitrates (Mbps) for the Football, Performance, Spotlight, Surfing, and VR Interview Videos

5.1.2 Comparative Approaches.

DFT solutions are compared with dynamic tiling-based (ATS) and fixed tiling-based (UVP, CTF, PBA, AVR) solutions.
(1)
ATS [32]: This solution performs adaptive tile selection based on weighted viewport distortions. The tiling layout resulting in minimum viewport distortion or maximum viewport bitrate is selected for streaming during each decision interval.
(2)
UVP [15]: A straightforward per-region uniform quality adaptation approach for different frame areas classified by considering the user’s walk on a spherical surface prediction mechanism.
(3)
CTF [15]: This scheme is an extended version of UVP but takes into consideration the entire frame as a potential viewing area. Rather than dividing the frame into regions and assigning bitrates evenly across them, this method increases the quality of the video in a per-tile fashion, beginning with the center tiles and working outward toward the edges.
(4)
PBA [16]: The highly cited approach divides tiles into three zones, \(Z_1\) (viewport center tile), \(Z_2\) (surrounding tiles), and \(Z_3\) (background tiles). In this system, priority-based bitrate adaptation is applied to tiles within certain regions, while also considering the available bandwidth budget.
(5)
AVR [36]: One of the early approaches that allows for efficient use of resources while maintaining a high quality of playback by dividing 360° frames into viewport, adjacent, and outside regions.

5.1.3 Evaluation Metrics.

The performance of the proposed and comparative schemes is assessed in terms of the following metrics:
(1)
Streaming Behavior: We evaluate how the DFT1 and DFT2 switch to different tiling layouts and behave in terms of adopting tile selection and bitrate adaptation scenarios. We also show how the ATS client switches between available tiling layouts for each streaming session.
(2)
Tile Overlap: This metric measures the real and predicted viewport tile overlap as defined in Equation (9).
(3)
Average QoE: It reflects the average quality score of all the users for each video for the QoE metric defined in Equation (5).

5.2 Experimental Results

This subsection presents the results of experiments and a thorough analysis of the performance of each solution in a variety of testing conditions.

5.2.1 Streaming Behavior.

Table 4 provides insight into how the DFT1 solution performs in erms of tiling layout selection, tile selection, and bitrate adaptation for five different motion 360° videos. DFT1 supports the larger visual area with higher-quality streaming; therefore, for all the videos, larger-resolution tiles (i.e., 4 \(\times\) 3 and 6 \(\times\) 4) are predominantly selected. However, the use of the 6 \(\times\) 4 tiling layout decreases, while the use of the 4 \(\times\) 3 tiling layout slightly increases (by 5.38%) when the segment duration is increased from 1s to 2s for all the videos. Overall, a small percentage of smaller-resolution tiles (i.e., 8 \(\times\) 6) is selected for all videos. DFT1 selects a 4 \(\times\) 3 tiling layout for more than 67% for the VR Interview video and mostly performs aggressive bitrate selection for selected tiling layouts. DFT1 fetches the segments of the Football, Performance, Spotlight, Surfing, and VR Interview videos by performing aggressive bitrate selection by up to 59.14%, 75.33%, 66.27%, 58.75%, and 78.43%, respectively, averaged across three segment durations. DFT1 performs weighted quality adjustments for segments of these videos by up to 32.37%, 21.08%, 27.64%, 32.66%, and 17.18%. Interestingly, there is a decrease in the percentage of aggressive quality adjustments and an increase in the percentage of weighted quality adjustments when the segment duration is increased. In addition, a tiny percentage of conservative bitrate selection is observed for all the videos in the DFT1 solution.
Table 4.
VideosSegment DurationTiling Layout (%)Tile Selection: Case 1 Bitrate: AggressiveTile Selection: Case 2 Bitrate: WeightedTile Selection: Case 3 Bitrate: Conservative
8 \(\times\) 66 \(\times\) 44 \(\times\) 38 \(\times\) 66 \(\times\) 44 \(\times\) 38 \(\times\) 66 \(\times\) 44 \(\times\) 38 \(\times\) 66 \(\times\) 44 \(\times\) 3
Football117.7352.2730.008.8437.3920.458.7014.366.450.190.533.09
1.517.5350.1032.387.4232.5917.749.5416.027.910.571.496.73
217.4847.9734.556.4330.0616.519.8815.199.071.172.728.97
Performance14.9666.7428.302.6855.4021.412.1611.085.420.120.251.47
1.55.4263.7730.812.8051.0021.322.4812.186.480.140.593.02
26.7160.4932.793.7546.6620.972.6713.087.690.300.754.14
Spotlight121.0044.4934.5112.9932.7527.487.7111.135.090.300.611.93
1.519.0142.0039.0010.3027.6727.248.1012.867.470.611.464.28
218.0638.9043.048.5724.1427.668.6512.509.410.842.275.97
Surfing120.4642.2337.3211.1228.7527.639.1012.897.000.240.592.68
1.519.3039.8140.898.6723.6224.6810.0414.329.870.591.876.34
217.8436.6745.497.1219.7224.969.5914.0211.171.132.939.36
VR Interview16.4626.4267.113.9319.1159.602.466.875.990.070.451.52
1.57.2924.1468.574.3316.1557.482.677.177.640.290.823.44
28.6023.1668.234.6114.4555.653.207.538.000.781.194.59
Table 4. Streaming Behavior of DFT1 Client in Terms of Tiling Layout Selection, Tile Selection, and Bitrate Adaptation Scenarios
The Percentage Results are Averaged for Five Videos Watched by 48 VR Users.
The streaming behavior of the DFT2 client is presented in Table 5. DFT2 achieves a perfect viewport match (by up to 65.50%), a partial viewport match (by up to 27.71%), and a complete viewport mismatch (by up to 6.77%) by selecting on average 8 \(\times\) 6, 6 \(\times\) 4, and 4 \(\times\) 3 tiling layouts, respectively. In particular, DFT2 observes a perfect viewport match (i.e., 57.35% for the Football video, 73.88% for the Performance video, 64.95% for the Spotlight video, 55.46% for the Surfing video, and 75.86% for the VR Interview video) averaged across three prediction horizons. The lower values of perfect viewport match for the sports videos, i.e., Football and Surfing, reflect the fast-moving objects within these videos. Therefore, the client observes a lower percentage of aggressive bitrate adaptation, 34.03% and 31.31% for the Football and Surfing videos, with an 8 \(\times\) 6 tiling layout in comparison to other videos. For content with minimal movements, such as the Performance and VR Interview videos, there is only a small percentage of viewport mismatch even when the segment duration is set to 2s. DFT2 requests a 4 \(\times\) 3 tiling layout by up to 3.36% and 5.94% for the Performance and VR Interview videos, respectively, for 2s segment duration. Therefore, these videos observe a limited percentage of extended viewport and conservative bitrate adaptation cases compared to the other videos. Interestingly, the percentage of the 4 \(\times\) 3 and 6 \(\times\) 4 tiling layout selection increases with the increase in segment duration. Additionally, as the segment duration increases, the viewer tends to experience a higher percentage of weighted and conservative quality adjustments. Conversely, the percentage of fixed viewport cases that come with aggressive bitrate adjustments tends to decrease with longer segment durations. This is because the accuracy of predictions tends to decline when attempting to predict further into the future.
Table 5.
VideosSegment DurationTiling Layout (%)Tile Selection: Case 1 Bitrate: AggressiveTile Selection: Case 2 Bitrate: WeightedTile Selection: Case 3 Bitrate: Conservative
8 \(\times\) 66 \(\times\) 44 \(\times\) 38 \(\times\) 66 \(\times\) 44 \(\times\) 38 \(\times\) 66 \(\times\) 44 \(\times\) 38 \(\times\) 66 \(\times\) 44 \(\times\) 3
Football162.9431.975.0939.4610.660.7423.2421.162.390.240.151.96
1.556.3133.6810.0233.039.270.7322.7624.064.050.520.345.24
252.8234.5512.6329.628.510.7622.4125.284.070.790.767.80
Performance177.8120.511.6857.958.030.2019.6612.400.850.190.080.62
1.573.8523.282.8752.487.380.1721.0715.641.190.300.261.51
269.9926.653.3648.917.760.1020.6518.510.990.430.372.26
Spotlight171.3924.184.4446.038.260.7525.2115.731.920.150.191.76
1.564.1228.407.4838.767.690.5624.9620.422.690.410.294.23
259.3530.779.8835.509.181.5023.3821.032.950.470.555.43
Surfing162.0832.175.7437.1010.250.8724.7521.782.710.230.142.15
1.554.4134.9310.6630.528.060.8123.5126.514.430.380.365.43
249.9036.2513.8626.317.710.8522.9227.514.530.671.038.47
VR Interview179.1817.673.1558.256.380.2920.6811.231.590.250.061.27
1.574.7520.334.9253.416.150.3420.9313.931.650.400.252.94
273.6620.415.9450.135.620.3122.9214.141.680.600.653.94
Table 5. Streaming Behavior of DFT2 Client in Terms of Tiling Layout Selection, Tile Selection, and Bitrate Adaptation Scenarios
The Percentage Results are Averaged for Five Videos Watched by 48 VR Users.
Figure 5 represents the streaming behavior of the ATS algorithm in terms of selecting the average tiling layouts for the entire video dataset. ATS selects tiling layouts based on the minimum weighted viewport distortions measured to achieve the maximum viewport bitrate. ATS results in selecting a 4 \(\times\) 3 tiling layout mostly for the Football video, followed by 8 \(\times\) 6 and 6 \(\times\) 4 tiling grids. ATS requests 6 \(\times\) 4 and 8 \(\times\) 6 tiling layouts for about 15.06% and 50.94% of the streaming session for the Performance video with 1s, 1.5s, and 2s. The 6 \(\times\) 4 tiling layout is mostly requested for the VR Interview video with a 2s segment duration. For the Spotlight and Surfing videos, ATS mostly requests an 8 \(\times\) 6 tiling layout (41.23% and 38.71%) followed by the 6 \(\times\) 4 (34.97% and 30.29%) and 4 \(\times\) 3 (23.79% and 30.98%), respectively. For the entire test dataset, the ATS method achieves 42.61% for selecting 8 \(\times\) 6, 29% for 6 \(\times\) 4, and 28.39% for 4 \(\times\) 3. This is because the larger tiling layout results in relatively larger segment sizes.
Fig. 5.
Fig. 5. Average tiling layout selection in the ATS method.

5.2.2 Average Tile Overlap.

Figure 6 summarizes the average tile overlap results (per video 48 head movement traces) for the DFT1, DFT2, ATS, and UVP methods under various prediction horizons. The ATS, UVP, CTF, PBA, and AVR streaming algorithms all use the spherical walk prediction method, which is used to inform adaptive tile selection and bitrate selection. According to Figure 6, it can be seen that the DFT1 method leads to higher tile overlap for all five videos. This is because the tiles in the dynamic tiling layouts produced by DFT1 are arranged based on the arc distance between the viewpoint and the center of each tile. This allows DFT1 to cover the viewport and reduce the risk of gaps in the visual field. The Football and Surfing videos tend to elicit more dynamic head movements from viewers because they contain fast-moving outdoor sports-related objects. In contrast, the Performance and VR interview videos tend to have a higher average tile overlap because they feature slower-moving indoor objects that are the primary focus of attention. This suggests that the nature of the content being watched can impact the amount of head movement and, in turn, the tile overlap observed in the video. It is notable that DFT1 and DFT2 attain higher matching performance and outperform the ATS and UVP methods for different user behaviors. For all 48 VR users, DFT1 and DFT2 experience an average tile overlap of 85.40% and 81.95% (Football), 92.22% and 90.43% (Performance) for 1s (Figure 6(a)), 84.09% and 80.50% (Spotlight), 90.03% and 86.94% (VR Interview) for 1.5s (Figure 6(b)), 74.93% and 70.91% (Surfing), and 88.92% and 85.54% (Performance) for 2s (Figure 6(c)) prediction windows. The proactive tile selection methods are able to adapt more effectively to the varied spatial and temporal information present in different motion scenes, which explains their superior performance. Simultaneously, the ATS method exhibits a lower average tile overlap than the UVP method for content with fast and stable head movements. As can be seen in the Spotlight video, DFT1 outperforms the ATS and UVP methods by up to 8.88% and 11.19% for the next 1.5s (Figure 6(b)), and by 14.66% and 12.43% for a 2s prediction horizon (Figure 6(c)), respectively. Similarly, DFT2 demonstrates its ability to increase viewport overlap for the Surfing video, outperforming other methods by achieving viewport overlap that is about 7.37%, 9.02%, and 10.35% higher for 1s, 1.5s, and 2s prediction times, respectively. For the Spotlight video, the average gain of DFT methods ranges from 6.27% to 9.32%, 7.02% to 10.61%, and 9.28% to 12.98% for different prediction horizons. The tile overlap for the DFT2 is reduced by 8.64% (Football) and by 10.23% (Surfing) when the segment duration is increased from 1s to 2s. In contrast, for the ATS and UVP methods, the tile overlap is reduced by 11.83% and 11.57% (Football) and by 13.29% and 13.17% (Surfing), respectively (Figure 6). This indicates that the DFT2 method is more effective at maintaining a high level of tile overlap even when the segment duration is increased. As a result, it can be concluded that employing two prediction mechanisms (as in DFT) leads to better viewing probability than employing a single prediction mechanism for fixed (UVP) and dynamic (ATS) tiling-based streaming.
Fig. 6.
Fig. 6. Average tile overlap achieved by DFT1, DFT2, and Spherical Walk methods for the Football, Performance, Spotlight, Surfing, and VR Interview videos. These videos were prepared in 4 \(\times\) 3, 6 \(\times\) 4, and 8 \(\times\) 6 tiling layouts, which were watched by 48 VR users. The recorded results are for 1s, 1.5s, and 2s segment durations.

5.2.3 Average QoE.

Next, the performance of the proposed solutions is tested against five tile-based methods by employing bandwidth trace 1 and trace 2 for the Football and Performance videos. We normalized the values of the QoE functions defined in Equations (1) through (4). The QoE weight coefficients are set as \(\alpha =1, \beta = 0.8, \gamma = 0.6, \delta =0.2\). The weights are selected to emphasize a different combination of QoE objectives. A larger value of \(\alpha\) indicates that the user is more concerned with the quality of the viewport, while a smaller value of \(\delta\) indicates that the user places less importance on playback buffer risk. Increasing the weights of the \(\beta\), \(\gamma\), and \(\delta\) parameters results in negative QoE values for CTF and PBA clients for Surfing videos. Therefore, these values are selected to provide a useful QoE comparison between the proposed and other solutions.
The reference tile-based delivery solutions use viewers’ head motion patterns to adaptively select bitrates. Figure 7 depicts the video quality experienced and averaged across 48 users for 1s, 1.5s, and 2s segments. It can be seen that the performance of the algorithms in Figures 7(a) and 7(c) is higher than that shown in Figures 7(b) and 7(d). The average QoE values are lower accordingly with bandwidth decrease for the same QoE weight coefficients. The higher QoE scores of larger tiling layouts (i.e., 6 \(\times\) 4 and 8 \(\times\) 6) for the 1s Performance video (Figures 7(c) and 7(d)) are due to the higher average tile overlap. Despite the lower tile overlap, the UVP, CTF, PBA, and AVR streaming methods achieve higher-quality scores for the Football video with a 1s segment duration due to the smaller average segment sizes (Figures 7(a) and 7(b)). Figure 7(a) results show that DFT1 improves the QoE compared to other methods by about 3.96%, 9.29%, and 12.90% for the Football video with 1s, 1.5s, and 2s segment durations when employing bandwidth trace 1. For both bandwidth traces, DFT1 outperforms ATS by about 25.31% to 38.71%, UVP by about 2.25% to 4.25%, CTF by about 5.08% to 7.67%, PBA by about 11.16% to 15.42%, and AVR by about 13.37% to 20.07% for the Football video with a 1.5s segment duration. Figure 7(b) shows that DFT2 achieves about 5.44% (for 1s), 12.56% (for 1.5s), and 15.98% (for 2s) higher average QoE for Football video streaming in comparison to other solutions. The increment in quality with the increase in segment duration reflects that DFT solutions have better prediction accuracy with longer segment duration. Similarly, Figures 7(c) and 7(d) show that DFT solutions observe the highest visual quality levels for all segment durations since they better accommodate the user’s viewing directions than the other methods. In particular, DFT1 achieves an average gain of 7.45% (1s), 14.42% (1.5s), and 17.69% (2s) for Performance video streaming under bandwidth trace 1, while it is increased to 10.34% (1s), 23.20% (1.5s), and 27.23% (2s) for bandwidth trace 2. Viewport mismatch leads to a drop in quality for tile-based streaming methods for longer segment lengths. In DFT methods, the combination of viewport coverage selection and bitrate selection policies favor the higher-quality perceptibility of the viewing area. For the Performance video with a 2s segment duration, DFT2 outperforms fixed tiling-based solutions by about 2.34% to 6.39%, 11.76% to 21.67%, 27.75% to 43.46%, and 23.80% to 35.58% for both bandwidth scenarios. The improved performance of DFT solutions over CTF and PBA methods is for the reason that they perform a uniform quality allocation for the predictive tiles to favor the higher visual quality levels with a reduced amount of data for the background tiles.
Fig. 7.
Fig. 7. Average QoE achieved by DFT1, DFT2, ATS, UVP, CTF, PBA, and AVR streaming clients for Football and Performance videos.
The results of the experiments on the Spotlight, Surfing, and VR Interview videos are shown in Figure 8. It can be seen that the Surfing and Spotlight videos require higher bitrates for satisfactory quality scores (as seen in Table 3), making it more difficult to achieve a high QoE with limited network connections and high QoE expectations. On the other hand, the VR Interview video has higher QoE scores due to its smaller average segment sizes and higher viewport overlap. Therefore, factors such as segment size, bandwidth capacity, and viewport prediction significantly impact the streaming performance of 360° videos. For example, when streaming the Spotlight video with a 2s segment duration, the DFT1 method achieves average QoE improvements of up to 29.8%, 12.15%, 24.36%, 28.7%, and 30.6% compared to ATS, UVP 8 \(\times\) 6, CTF 8 \(\times\) 6, PBA 6 \(\times\) 4, and AVR 8 \(\times\) 6, respectively (Figure 8(b)). This is because DFT1 has 14.65% and 12.15% higher average tile overlap than the ATS and UVP methods for the Spotlight video with a 2s segment duration (Figure 6(c)). The average quality score for the Surfing video with a 1s segment duration under bandwidth trace 2 (Figure 8(d)) is 64.21% for DFT1, 61.57% for DFT2, 37.53% for ATS, 56.45% for UVP 4 \(\times\) 3, 48.79% for CTF 6 \(\times\) 4, 38.9% for PBA 8 \(\times\) 6, and 41.08% for AVR 4 \(\times\) 3. For the VR Interview video with a 1.5s segment duration, DFT2 improves the average QoE by up to 20.55% compared to ATS, 3.02% compared to UVP, 9% compared to CTF, 17.61% compared to PBA, and 37.74% compared to AVR for bandwidth trace 2 (Figure 8(f)), while the average improvement for DFT1 is 25.7%, 5.3%, 13.94%, 23.64%, and 42.3% for all tiling layouts of ATS, UVP, CTF, PBA, and AVR, respectively, for the 2s VR Interview video (Figure 8(e)). The ATS method performs better than the AVR method in only a few cases for the Performance and VR Interview videos. The poor performance of the ATS method is due to its restriction of the quality of background tiles to minimum levels, which leads to lower-quality scores under lower and medium prediction performance. In Figure 8, it can be seen that when simulated with all tiling layouts, segment durations, and bandwidth profiles, the DFT1 and DFT2 methods result in QoE for the Spotlight, Surfing, and VR Interview videos, with improvements of 16.53%, 15.56%, and 13.62%, respectively. This is because the QoE metric used favors higher visible quality. The lower QoE values for the PBA algorithm are due to its strategy of assigning different priorities to tiles within the viewport zones (Z_1 and Z_2) and lead to poor user-perceived quality and visual smoothness. The AVR method, meanwhile, performs poorly even under stable head movements because it unnecessarily increases the quality of adjacent tiles. In general, the DFT1 and DFT2 solutions lead to average QoE improvements of 9.70% to 10.56% for the Football, 16.33% to 16.72% for the Performance, 15.08% to 18% for the Spotlight, 14.33% to 16.79% for the Surfing, and 13.45% to 13.79% for the VR Interview videos compared to other solutions.
Fig. 8.
Fig. 8. Average QoE achieved by DFT1, DFT2, ATS, UVP, CTF, PBA, and AVR streaming clients for the Spotlight, Surfing, and VR Interview videos.

5.2.4 Ablation Study–Impact of QoE Weight Coefficients.

We investigated and evaluated the influence of QoE weight coefficients on the streaming performance of adaptive 360° video solutions. For each streaming solution, we collected streaming metrics, presented in Equations (1) through (4), including viewport quality, temporal quality oscillations, spatial quality oscillation, and playback buffer risk, across a comprehensive testing dataset that encompassed five videos, three tiling patterns, three segment durations, and two bandwidth traces. Figure 9(a) illustrates the values for QoE weight coefficients where the values of \(\alpha\), \(\beta\), \(\gamma\), and \(\delta\) are in the range of 0 and 1. Figure 9(b) displays the average QoE values for each corresponding weight sample. The findings from Figure 9(b) reveal that DFT1 and DFT2 solutions consistently outperform the other methods by achieving the highest QoE scores across all combinations of QoE weight samples. The average QoE scores obtained are as follows: DFT1 (60.82%), DFT2 (59.17%), ATS (35.08%), UVP (56.06%), CTF (48.45%), PBA (38.23%), and AVR (36.02%). In general, DFT1 and DFT2 surpass ATS by up to 24% to 25.74%, UVP by up to 3.11% to 4.76%, CTF by up to 10.71% to 12.36%, PBA by up to 20.94% to 22.59%, and AVR by up to 23.14% to 24.79% in terms of improved QoE performance. The average QoE weight coefficients are \(\alpha\) = 0.885, \(\beta\) = 0.835, \(\gamma\) = 0.817, and \(\delta\) = 0.466.
Fig. 9.
Fig. 9. Average QoE obtained by DFT1, DFT2, ATS, UVP, CTF, PBA, and AVR streaming clients for the comprehensive dataset (comprising five videos, three tiling patterns, three segment durations, and two bandwidth traces) when assessed under varying QoE weight coefficients.

5.3 Discussion

Existing fixed tiling-based adaptive streaming solutions aim to improve visual quality while reducing variations in spatial and temporal quality and the risk of playback interruptions. However, the proposed dynamic tiling-based streaming solutions result in more accurate viewport prediction and higher QoE levels since they systematize the best-resolution tiles for static and dynamic motion scenes. The ATS and UVP solutions allocate bitrate uniformly to tiles in the same classification to improve the visual smoothness objectives defined in Equations (2) and (3). However, ATS limits the background quality to the minimum level with lower viewport matching performance and achieves the lowest average QoE values for the entire test dataset. The UVP solution, on the other hand, increases the quality of the whole video to the highest possible level and produces better-quality scores even under difficult-to-predict head movements. For CTF and PBA solutions, the primary focus is on improving the quality of the center tile, which significantly leads to degraded quality levels for poor viewport prediction and spatial quality variations even for stable viewport prediction results. The underperformance of the AVR streaming method under drastic and stable viewport switches is due to the inefficient tiles’ arrangement to consume an essential share of the network bandwidth. In contrast, DFT solutions consume a much larger bandwidth share for the most likely to be watched tiles and result in higher QoE scores than the comparative methods for all tested datasets. DFT1 provides a useful tradeoff between visual area and visual quality, and DFT2 works to minimize the viewport mismatch ratio. Both proposed solutions work reasonably well under different testing settings and try to avoid unacceptable viewport deviations for end-users. The DFT solutions allocate a fair share of the bandwidth to tiles in the viewport, marginal, and background regions, resulting in lower spatial and temporal quality variations for different viewport prediction results. Under stable or variable motions of experienced or naive VR users, the dynamic selection of the tiling layouts and coverage of the visible region (Fixed/Extended) along with the aggressive, weighted, and/or conservative quality adjustment policies provide improved QoE for different bandwidth settings, segment sizes, and motion trends. Therefore, the proposed solutions have demonstrated their potential to offer superior quality of experience compared to other approaches for delivering 360° video.

6 Conclusions and Future Works

This artice proposed and evaluated two novel dynamic video frame tiling-based solutions, DFT1 and DFT2, for advanced predictive tile selection during adaptive 360° video streaming. DFT solutions achieve an appropriate balance between viewport availability and perceived visual quality. DFT1 performs an interactive tiling layout selection by leveraging the visual area and associated weighted quality with overcoming the attention field’s dynamics. DFT2 observes the potential viewport prediction errors to best accommodate different tiling layouts. DFT solutions extract the user attention fields by leveraging two viewport prediction mechanisms to select the best-fit dynamic size regions for transmission over bandwidth-limited networks. The proposed solutions consider the level of interest in each region when deciding how much bitrate it should receive in order to simplify the process of selecting the appropriate bitrate for each tile. The effectiveness of the DFT algorithms was evaluated through extensive trace-driven experiments. The experimental results on publicly available datasets under different segment lengths and bandwidth settings demonstrate that the proposed solutions achieve up to 8.6%, 9.77%, and 11.2% improved viewport availability for 1s, 1.5s, and 2s segment duration. At the same time, DFT solutions can improve QoE (9.7% to 18%) for different motion VR videos compared to other alternative solutions. In the future, we aim to develop a guidance-enhanced fuzzy reinforcement learning (FRL) solution to control the continuous tile selection and bitrate adaptation for equirectangular, cubemap, and truncated squared pyramid projected 360° videos under more complex network and head movement datasets. Using advanced QoE metrics, we will evaluate the effectiveness of our FRL-based solution and identify any potential optimization opportunities.

Footnotes

References

[1]
Yanan Bao, Huasen Wu, Tianxiao Zhang, Albara A. H. Ramli, and Xin Liu. 2016. Shooting a moving target: Motion-prediction-based transmission for 360-degree videos. In 2016 IEEE International Conference on Big Data (Big Data’16). IEEE, 1161–1170.
[2]
M. Ben Yahia, Y. Le Louedec, G. Simon, and L. Nuaymi. 2018. HTTP/2-based streaming solutions for tiled omnidirectional videos. In 2018 IEEE International Symposium on Multimedia (ISM’18). 89–96. DOI:
[3]
Tengfei Cao, Changqiao Xu, Mu Wang, Zhongbai Jiang, Xingyan Chen, Lujie Zhong, and Luigi Alfredo Grieco. 2019. Stochastic optimization for green multimedia services in dense 5G networks. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15, 3 (2019), 1–22.
[4]
Xiaolei Chen, Di Wu, and Ishfaq Ahmad. 2021. Optimized viewport-adaptive 360-degree video streaming. CAAI Transactions on Intelligence Technology 6, 3 (2021), 347–359.
[5]
Cyril Concolato, Jean Le Feuvre, Franck Denoual, Frédéric Mazé, Eric Nassor, Nael Ouedraogo, and Jonathan Taquet. 2017. Adaptive streaming of HEVC tiled videos using MPEG-DASH. IEEE Transactions on Circuits and Systems for Video Technology 28, 8 (2017), 1981–1992.
[6]
Xavier Corbillon, Francesca De Simone, and Gwendal Simon. 2017. 360-degree video head movement dataset. In Proceedings of the 8th ACM on Multimedia Systems Conference. 199–204.
[7]
Xavier Corbillon, Gwendal Simon, Alisa Devlic, and Jacob Chakareski. 2017. Viewport-adaptive navigable 360-degree video delivery. In 2017 IEEE International Conference on Communications (ICC’17). IEEE, 1–7.
[8]
R. G. d. A. Azevedo, N. Birkbeck, F. De Simone, I. Janatra, B. Adsumilli, and P. Frossard. 2019. Visual distortions in 360-degree videos. IEEE Transactions on Circuits and Systems for Video Technology 30, 8 (2019), 2524–2537.
[9]
Pingping Dong, Rongcheng Shen, Xiaowei Xie, Yajing Li, Yuning Zuo, and Lianming Zhang. 2022. Predicting long-term field of view in 360-degree video streaming. IEEE Network (2022), 1–8. DOI:
[10]
Miguel Fabian Romero Rondon, Lucile Sassatelli, Ramon Aparicio Pardo, and Frederic Precioso. 2019. Revisiting deep architectures for head motion prediction in 360° Videos. arXiv e-prints, Article arXiv:1911.11702 (Nov.2019), arXiv:1911.11702 pages. arxiv:1911.11702 [cs.CV]
[11]
Ajoy S. Fernandes and Steven K. Feiner. 2016. Combating VR sickness through subtle dynamic field-of-view modification. In 2016 IEEE Symposium on 3D User Interfaces (3DUI’16). IEEE, 201–210.
[12]
Mario Graf, Christian Timmerer, and Christopher Mueller. 2017. Towards bandwidth efficient adaptive streaming of omnidirectional video over HTTP: Design, implementation, and evaluation. In Proceedings of the 8th ACM on Multimedia Systems Conference (MMSys’17). ACM, New York, NY, 261–271. DOI:
[13]
Chengjun Guo, Ying Cui, and Zhi Liu. 2018. Optimal multicast of tiled 360 VR video. IEEE Wireless Communications Letters 8, 1 (2018), 145–148.
[14]
Dongbiao He, Cedric Westphal, and J. Garcia-Luna-Aceves. 2018. Joint rate and FoV adaptation in immersive video streaming. In ACM Sigcomm Workshop on AR/VR Networks.
[15]
Jeroen Van der Hooft, Maria Torres Vega, Stefano Petrangeli, Tim Wauters, and Filip De Turck. 2019. Tile-based adaptive streaming for virtual reality video. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15, 4 (2019), 1–24.
[16]
Mohammad Hosseini and Viswanathan Swaminathan. 2016. Adaptive 360 VR video streaming: Divide and conquer. In 2016 IEEE International Symposium on Multimedia (ISM’16). 107–110. DOI:
[17]
Xinjue Hu, Wei Quan, Tao Guo, Yu Liu, and Lin Zhang. 2019. Mobile edge assisted live streaming system for omnidirectional video. Mobile Information Systems 2019 (2019), 8487372:1–8487372:15.
[18]
ITU-T Recommendation. 2014. P. 913: Methods for the Subjective Assessment of Video Quality, Audio Quality and Audiovisual Quality of Internet Video and Distribution Quality Television in Any Environment.
[19]
Xiaolan Jiang, Yi-Han Chiang, Yang Zhao, and Yusheng Ji. 2018. Plato: Learning-based adaptive streaming of 360-degree videos. In 2018 IEEE 43rd Conference on Local Computer Networks (LCN’18). IEEE, 393–400.
[20]
Chamara Kattadige and Kanchana Thilakarathna. 2021. VAD360: Viewport aware dynamic 360-degree video frame tiling. arXiv preprint arXiv:2105.11563 (2021).
[21]
Jean Le Feuvre and Cyril Concolato. 2016. Tiled-based adaptive streaming using MPEG-DASH. In Proceedings of the 7th International Conference on Multimedia Systems. ACM, 41.
[22]
Weihe Li, Jiawei Huang, Wenjun Lyu, Baoshen Guo, Wanchun Jiang, and Jianxin Wang. 2022. RAV: Learning-based adaptive streaming to coordinate the audio and video bitrate selections. IEEE Transactions on Multimedia (2022), 1–14.
[23]
Wen-Chih Lo, Ching-Ling Fan, Jean Lee, Chun-Ying Huang, Kuan-Ta Chen, and Cheng-Hsin Hsu. 2017. 360 video viewing dataset in head-mounted virtual reality. In Proceedings of the 8th ACM on Multimedia Systems Conference. 211–216.
[24]
Kaixuan Long, Chencheng Ye, Ying Cui, and Zhi Liu. 2018. Optimal multi-quality multicast for 360 virtual reality video. In 2018 IEEE Global Communications Conference (GLOBECOM’18). IEEE, 1–6.
[25]
Pietro Lungaro, Rickard Sjöberg, Alfredo Jose Fanghella Valero, Ashutosh Mittal, and Konrad Tollmar. 2018. Gaze-aware streaming solutions for the next generation of mobile VR experiences. IEEE Transactions on Visualization and Computer Graphics 24, 4 (2018), 1535–1544.
[26]
Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. Neural adaptive video streaming with PENSIEVE. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication. ACM, 197–210.
[27]
Gebremariam Mesfin, Nadia Hussain, Alexandra Covaci, and Gheorghita Ghinea. 2019. Using eye tracking and heart-rate activity to examine crossmodal correspondences QoE in mulsemedia. ACM Trans. Multimedia Comput. Commun. Appl. 15, 2, Article 34 (June2019), 22 pages. DOI:
[28]
Afshin Taghavi Nasrabadi, Anahita Mahzari, Joseph D. Beshay, and Ravi Prakash. 2017. Adaptive 360-degree video streaming using scalable video coding. In Proceedings of the 2017 ACM on Multimedia Conference. ACM, 1689–1697.
[29]
Afshin Taghavi Nasrabadi, Aliehsan Samiei, and Ravi Prakash. 2020. Viewport prediction for 360° videos: A clustering approach. In Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV’20). Association for Computing Machinery, New York, NY, 34–39. DOI:
[30]
Khiem Quang Minh Ngo, Ravindra Guntur, and Wei Tsang Ooi. 2011. Adaptive encoding of zoomable video streams based on user access pattern. In Proceedings of the 2nd Annual ACM Conference on Multimedia Systems. 211–222.
[31]
Duc V. Nguyen, Huyen T. T. Tran, Anh T. Pham, and Truong Cong Thang. 2019. An optimal tile-based approach for viewport-adaptive 360-degree video streamings. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, 1 (2019), 29–42.
[32]
Duc V. Nguyen, Huyen T. T. Tran, and Truong Cong Thang. 2019. Adaptive tiling selection for viewport adaptive streaming of 360-degree video. IEICE Transactions on Information and Systems 102, 1 (2019), 48–51.
[33]
Leandro Ordonez-Ante, Jeroen van der Hooft, Tim Wauters, Gregory Van Seghbroeck, Bruno Volckaert, and Filip De Turck. 2022. Explora-VR: Content prefetching for tile-based immersive video streaming applications. Journal of Network and Systems Management 30, 3 (2022), 1–30.
[34]
Cagri Ozcinar, Julián Cabrera, and Aljosa Smolic. 2018. Omnidirectional video streaming using visual attention-driven dynamic tiling for VR. In 2018 IEEE Visual Communications and Image Processing (VCIP’18). IEEE, 1–4.
[35]
C. Ozcinar, J. Cabrera, and A. Smolic. 2019. Visual attention-aware omnidirectional video streaming using optimal tiles for virtual reality. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, 1 (March2019), 217–230. DOI:
[36]
Stefano Petrangeli, Viswanathan Swaminathan, Mohammad Hosseini, and Filip De Turck. 2017. An HTTP/2-based adaptive streaming framework for 360 virtual reality videos. In Proceedings of the 2017 ACM on Multimedia Conference. ACM, 306–314.
[37]
Feng Qian, Bo Han, Qingyang Xiao, and Vijay Gopalakrishnan. 2018. Flare: Practical viewport-adaptive 360-degree video streaming for mobile devices. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking. ACM, 99–114.
[38]
Feng Qian, Lusheng Ji, Bo Han, and Vijay Gopalakrishnan. 2016. Optimizing 360 video delivery over cellular networks. In Proceedings of the 5th Workshop on All Things Cellular: Operations, Applications and Challenges. ACM, 1–6.
[39]
Ngo Quang Minh Khiem, Guntur Ravindra, Axel Carlier, and Wei Tsang Ooi. 2010. Supporting zoomable video streams with dynamic region-of-interest cropping. In Proceedings of the 1st Annual ACM SIGMM Conference on Multimedia Systems. 259–270.
[40]
Yago Sánchez, Robert Skupin, and Thomas Schierl. 2015. Compressed domain video processing for tile based panoramic streaming using HEVC. In 2015 IEEE International Conference on Image Processing (ICIP’15). IEEE, 2244–2248.
[41]
Muhammad Shahid Anwar, Jing Wang, Sadique Ahmad, Asad Ullah, Wahab Khan, and Zesong Fei. 2020. Evaluating the factors affecting QoE of 360-degree videos and cybersickness levels predictions in virtual reality. Electronics 9, 9 (2020), 1530.
[42]
Kevin Spiteri, Rahul Urgaonkar, and Ramesh K. Sitaraman. 2016. BOLA: Near-optimal bitrate adaptation for online videos. In The 35th Annual IEEE International Conference on Computer Communications (INFOCOM’16). IEEE, 1–9.
[43]
Evgeniy Upenik and Touradj Ebrahimi. 2017. A simple method to obtain visual attention data in head mounted virtual reality. In 2017 IEEE International Conference on Multimedia Expo Workshops (ICMEW’17). 73–78. DOI:
[44]
Hui Wang, Vu-Thanh Nguyen, Wei Tsang Ooi, and Mun Choon Chan. 2014. Mixing tile resolutions in tiled video: A perceptual quality assessment. In Proceedings of Network and Operating System Support on Digital Audio and Video Workshop. ACM, 25.
[45]
Xuekai Wei, Mingliang Zhou, Sam Kwong, Hui Yuan, and Weijia Jia. 2022. A hybrid control scheme for 360-degree dynamic adaptive video streaming over mobile devices. IEEE Transactions on Mobile Computing 21, 10 (2022), 3428–3442.
[46]
Xuekai Wei, Mingliang Zhou, Sam Kwong, Hui Yuan, Shiqi Wang, Guopu Zhu, and Jingchao Cao. 2021. Reinforcement learning-based QoE-oriented dynamic adaptive streaming framework. Information Sciences 569 (2021), 786–803.
[47]
Chenglei Wu, Zhihao Tan, Zhi Wang, and Shiqiang Yang. 2017. A dataset for exploring user behaviors in VR spherical video streaming. In Proceedings of the 8th ACM on Multimedia Systems Conference. 193–198.
[48]
Mengbai Xiao, Chao Zhou, Yao Liu, and Songqing Chen. 2017. Optile: Toward optimal tiling in 360-degree video streaming. In Proceedings of the 25th ACM international conference on Multimedia. 708–716.
[49]
Lan Xie, Zhimin Xu, Yixuan Ban, Xinggong Zhang, and Zongming Guo. 2017. 360ProbDASH: Improving QoE of 360 video streaming using tile-based HTTP adaptive streaming. In Proceedings of the ACM Multimedia Conference, 315–323. https://doi.org/10.1145/3123266.3123291.
[50]
Praveen Kumar Yadav and Wei Tsang Ooi. 2020. Tile rate allocation for 360-degree tiled adaptive video streaming. In Proceedings of the 28th ACM International Conference on Multimedia. 3724–3733.
[51]
Praveen Kumar Yadav, Arash Shafiei, and Wei Tsang Ooi. 2017. QUETRA: A queuing theory approach to DASH rate adaptation. In Proceedings of the 25th ACM International Conference on Multimedia (MM’17). Association for Computing Machinery, New York, NY, 1130–1138. DOI:
[52]
A. Yaqoob, T. Bi, and G. M. Muntean. 2019. A DASH-based efficient throughput and buffer occupancy-based adaptation algorithm for smooth multimedia streaming. In 2019 15th International Wireless Communications Mobile Computing Conference (IWCMC’19). 643–649. DOI:
[53]
A. Yaqoob, T. Bi, and G. M. Muntean. 2020. A survey on adaptive 360° video streaming: Solutions, challenges and opportunities. IEEE Communications Surveys Tutorials 22, 4 (2020), 2801–2838. DOI:
[54]
Abid Yaqoob and Gabriel-Miro Muntean. 2020. A weighted tile-based approach for viewport adaptive 360\(^\circ\) video streaming. In 2020 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB’20). IEEE.
[55]
Abid Yaqoob and Gabriel-Miro Muntean. 2021. A combined field-of-view prediction-assisted viewport adaptive delivery scheme for 360° videos. IEEE Transactions on Broadcasting 67, 3 (2021), 746–760. DOI:
[56]
Abid Yaqoob, Mohammed Amine Togou, and Gabriel-Miro Muntean. 2022. Dynamic viewport selection-based prioritized bitrate adaptation for tile-based 360° video streaming. IEEE Access 10 (2022), 29377–29392. DOI:
[57]
Hui Yuan, Shiyun Zhao, Junhui Hou, Xuekai Wei, and Sam Kwong. 2019. Spatial and temporal consistency-aware dynamic adaptive streaming for 360-degree videos. IEEE Journal of Selected Topics in Signal Processing 14, 1 (2019), 177–193.
[58]
Zhenhui Yuan, Shengyang Chen, Gheorghita Ghinea, and Gabriel-Miro Muntean. 2014. User quality of experience of mulsemedia applications. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 11, 1s (2014), 1–19.
[59]
Alireza Zare, Alireza Aminlou, Miska M. Hannuksela, and Moncef Gabbouj. 2016. HEVC-compliant tile-based streaming of panoramic video for virtual reality applications. In Proceedings of the 24th ACM International Conference on Multimedia. ACM, 601–605.
[60]
Haodan Zhang, Yixuan Ban, Zongming Guo, Ken Chen, and Xinggong Zhang. 2022. RAM360: Robust adaptive multi-layer 360 video streaming with lyapunov optimization. IEEE Transactions on Multimedia (2022).
[61]
Lei Zhang, Yanyan Suo, Ximing Wu, Feng Wang, Yuchi Chen, Laizhong Cui, Jiangchuan Liu, and Zhong Ming. 2021. TBRA: Tiling and bitrate adaptation for mobile 360-degree video streaming. In Proceedings of the 29th ACM International Conference on Multimedia. 4007–4015.
[62]
Yuanhong Zhang, Zhiwen Wang, Junquan Liu, Haipeng Du, Qinghua Zheng, and Weizhan Zhang. 2022. Deep reinforcement learning based adaptive 360-degree video streaming with field of view joint prediction. In 2022 IEEE Symposium on Computers and Communications (ISCC’22). 1–8. DOI:
[63]
Yuanxing Zhang, Pengyu Zhao, Kaigui Bian, Yunxin Liu, Lingyang Song, and Xiaoming Li. 2019. DRL360: 360-degree video streaming with deep reinforcement learning. In IEEE Conference on Computer Communications (INFOCOM’19). IEEE, 1252–1260.
[64]
Chao Zhou, Zhenhua Li, Joe Osgood, and Yao Liu. 2018. On the effectiveness of offset projections for 360-degree video streaming. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14, 3s (2018), 1–24.
[65]
Junni Zou, Chenglin Li, Chengming Liu, Qin Yang, Hongkai Xiong, and Eckehard Steinbach. 2019. Probabilistic tile visibility-based server-side rate adaptation for adaptive 360-degree video streaming. IEEE Journal of Selected Topics in Signal Processing 14, 1 (2019), 161–176.

Cited By

View all
  • (2024)Edge-assisted Real-time Dynamic 3D Point Cloud Rendering for Multi-party Mobile Virtual RealityProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681650(2824-2832)Online publication date: 28-Oct-2024
  • (2024)NOVA: Neural-Optimized Viewport Adaptive 360-Degree Video Streaming at the EdgeIEEE Transactions on Services Computing10.1109/TSC.2024.3451237(1-15)Online publication date: 2024
  • (2024)Deep Curriculum Reinforcement Learning for Adaptive 360° Video Streaming With Two-Stage TrainingIEEE Transactions on Broadcasting10.1109/TBC.2023.333413770:2(441-452)Online publication date: Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 1
January 2024
639 pages
EISSN:1551-6865
DOI:10.1145/3613542
  • Editor:
  • Abdulmotaleb El Saddik
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2023
Online AM: 01 June 2023
Accepted: 22 May 2023
Revised: 28 March 2023
Received: 10 May 2022
Published in TOMM Volume 20, Issue 1

Check for updates

Author Tags

  1. 360° Video streaming
  2. dynamic tiling
  3. tiles selection
  4. bitrate adaptation
  5. QoE

Qualifiers

  • Research-article

Funding Sources

  • Science Foundation Ireland (SFI)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,355
  • Downloads (Last 6 weeks)128
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Edge-assisted Real-time Dynamic 3D Point Cloud Rendering for Multi-party Mobile Virtual RealityProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681650(2824-2832)Online publication date: 28-Oct-2024
  • (2024)NOVA: Neural-Optimized Viewport Adaptive 360-Degree Video Streaming at the EdgeIEEE Transactions on Services Computing10.1109/TSC.2024.3451237(1-15)Online publication date: 2024
  • (2024)Deep Curriculum Reinforcement Learning for Adaptive 360° Video Streaming With Two-Stage TrainingIEEE Transactions on Broadcasting10.1109/TBC.2023.333413770:2(441-452)Online publication date: Jun-2024
  • (2024)5GSliceStream-5G New Radio-enabled Advanced MPEG-DASH Adaptive Streaming Solution with Active RAN Slicing2024 IEEE 35th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC)10.1109/PIMRC59610.2024.10817326(1-7)Online publication date: 2-Sep-2024
  • (2024)EDQD: An Edge-Driven Multi-Agent DRL Solution for Improving Joint QoE in DASH-based Rich Media Content Delivery2024 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)10.1109/BMSB62888.2024.10608238(1-7)Online publication date: 19-Jun-2024
  • (2024)Tile-size aware bitrate allocation for adaptive 360$$^{\circ }$$ video streamingMultimedia Tools and Applications10.1007/s11042-024-19486-0Online publication date: 5-Jun-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media