research-article

A Visual Sensitivity Aware ABR Algorithm for DASH via Deep Reinforcement Learning

Authors:

Wenchao JiangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 3

Article No.: 77, Pages 1 - 22

https://doi.org/10.1145/3591108

Published: 10 November 2023 Publication History

Abstract

In order to cope with the fluctuation of network bandwidth and provide smooth video services, adaptive video streaming technology is proposed. In particular, the adaptive bitrate (ABR) algorithm is widely used in dynamic adaptive streaming over HTTP (DASH) to improve quality of experience (QoE). However, existing ABR algorithms still ignore the inherent visual sensitivity of human visual system (HVS). As the final receiver of video, HVS has different sensitivity to the quality distortion of different video content, and video content with high visual sensitivity needs to allocate more bitrate resources. Therefore, existing ABR algorithms still have limitations in reasonably allocating bitrate and maximizing QoE. To solve this problem, this paper designs an adaptive bitrate strategy from the perspective of user vision, studies the modeling of visual sensitivity, and proposes a visual sensitivity aware ABR algorithm. We extract a set of content features and attribute features from the video, and consider the simulation of HVS to establish a total masking effect model that reflects the visual sensitivity more accurately. Further, the network status, buffer occupancy, and visual sensitivity are comprehensively considered under a deep reinforcement learning framework to select the appropriate bitrate for maximizing QoE. We implement the proposed algorithm over a realistic trace-driven evaluation and compare its performance with several latest algorithms. Experimental results show that our algorithm can align ABR strategy with visual sensitivity to achieve better QoE in high visual sensitivity content, and improves the average perceptual video quality and overall user QoE by 18.3% and 22.8%, respectively. Additionally, we prove the feasibility of our algorithm through subjective evaluation in the real environment.

References

[1]

Cisco. 2017. Cisco visual networking index: Forecast and methodology, 2016–2021.

[2]

T. Stockhammer. 2011. Dynamic adaptive streaming over HTTP–standards and design principles. In Proceedings of the Second Annual ACM Conference on Multimedia Systems (MMSys’11). San Jose, CA, USA, 133–144.

Digital Library

[3]

Yi Sun, Xiaoqi Yin, Junchen Jiang, Vyas Sekar, et al. 2016. CS2P: Improving video bitrate selection and adaptation with data-driven throughput prediction. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM’16). Association for Computing Machinery, New York, NY, USA, 272–285.

Digital Library

[4]

J. Jiang, V. Sekar, and H. Zhang. 2014. Improving fairness, efficiency, and stability in HTTP-based adaptive video streaming with Festive. IEEE/ACM Transactions on Networking 22, 1 (2014), 326–340.

[5]

T. Huang, R. Johari, N. McKeown, M. Trunnell, and M. Watson. 2014. A buffer-based approach to rate adaptation: Evidence from a large video streaming service. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM’14). Association for Computing Machinery, New York, NY, USA, 187–198.

[6]

K. Spiteri, R. Urgaonkar, and R. K. Sitaraman. 2016. BOLA: Near-optimal bitrate adaptation for online videos. The 35th Annual IEEE International Conference on Computer Communications (INFOCOM’16). San Francisco, CA, USA, 1–9.

[7]

Xiaoqi Yin, Abhishek Jindal, Vyas Sekar, and Bruno Sinopoli. 2015. A control-theoretic approach for dynamic adaptive video streaming over HTTP. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM’15). Association for Computing Machinery, New York, NY, USA, 325–338.

Digital Library

[8]

A. Bokani, M. Hassan, S. Kanhere, and X. Zhu. 2015. Optimizing HTTP-based adaptive streaming in vehicular environment using Markov decision process. IEEE Transactions on Multimedia 17, 12 (2015), 2297–2309.

Digital Library

[9]

C. Zhou, C.-W. Lin, and Z. Guo. 2016. mDASH: A Markov decision-based rate adaptation approach for dynamic HTTP streaming. IEEE Transactions on Multimedia 18, 4 (2016), 738–751.

Digital Library

[10]

M. Gadaleta, F. Chiariotti, M. Rossi, and A. Zanella. 2017. D-DASH: A deep Q-Learning framework for DASH video streaming. IEEE Transactions on Cognitive Communications and Networking 3, 4 (2017), 703–718.

[11]

Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. Neural adaptive video streaming with Pensieve. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’17). Association for Computing Machinery, New York, NY, USA, 197–210.

Digital Library

[12]

T. Huang, C. Zhou, R. Zhang, C. Wu, X. Yao, and L. Sun. 2019. Comyco: Quality-aware adaptive video streaming via imitation learning. In Proceedings of the 27th ACM International Conference on Multimedia (MM’19). Association for Computing Machinery, New York, NY, USA, 429–437.

[13]

T. Huang, X. Yao, C. Wu, et al. 2019. Tiyuntsong: A self-play reinforcement learning approach for ABR video streaming. 2019 IEEE International Conference on Multimedia and Expo (ICME). Shanghai, China, 1678–1683.

[14]

S. Hu, L. Sun, C. Gui, E. Jammeh, and I. Mkwawa. 2014. Content-aware adaptation scheme for QoE optimized DASH applications. 2014 IEEE Global Communications Conference. Austin, TX, 1336–1341.

[15]

Stefan Wilk, Denny Stohr, and Wolfgang Effelsberg. 2016. A content-aware video adaptation service to support mobile video. ACM Trans. Multimedia Comput. Commun. Appl 12, 5, Article 82 (November 2016), 1–23.

Digital Library

[16]

B. Ciubotaru, G. Ghinea, and G. Muntean. 2014. Subjective assessment of region of interest-aware adaptive multimedia streaming quality. IEEE Transactions on Broadcasting 60, 1 (March 2014), 50–60.

[17]

Maarten Wijnants, Sven Coppers, Gustavo Rovelo Ruiz, Peter Quax, and Wim Lamotte. 2019. Talking video heads: Saving streaming bitrate by adaptively applying object-based video principles to interview-like footage. In Proceedings of the 27th ACM International Conference on Multimedia (MM’19). Association for Computing Machinery, New York, NY, USA, 2449–2458.

Digital Library

[18]

G. Gao et al. 2018. Optimizing quality of experience for adaptive bitrate streaming via viewer interest inference. IEEE Transactions on Multimedia 20, 12 (Dec. 2018), 3399–3413.

[19]

Shenghong Hu, Lingfen Sun, Chunxia Xiao, and Chao Gui. 2017. Semantic-aware adaptation scheme for soccer video over MPEG-DASH. In Proceedings of the IEEE International Conference on Multimedia & Expo (ICME’17). Hong Kong, China, 493–498.

[20]

Shenghong Hu, Min Xu, Haimin Zhang, Chunxia Xiao, and Chao Gui. 2020. Affective content-aware adaptation scheme on QoE optimization of adaptive streaming over HTTP. ACM Trans. Multimedia Comput. Commun. Appl 15, 3, Article 100 (January 2020), 1–18.

[21]

H. V. Mnih et al. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML’16). New York, NY, USA, 1928–1937.

[22]

A. B. Watson, R. Borthwick, and M. Taylor. 1997. Image quality and entropy masking. Electronic Imaging’97. International Society for Optics and Photonics, 2–12.

[23]

P. Gao, P. Zhang, and A. Smolic. 2022. Quality assessment for omnidirectional video: A spatio-temporal distortion modeling approach. IEEE Transactions on Multimedia, 24, 1–16.

Digital Library

[24]

L. K. Choi and A. C. Bovik. 2018. Video quality assessment accounting for temporal visual masking of local flicker. Signal Processing Image Communication 67 (Sep. 2018), 182–198.

[25]

H. Roodaki, Z. Iravani, M. R. Hashemi, and S. Shirmohammadi. 2016. A view-level rate distortion model for multi-view/3D video. IEEE Transactions on Multimedia 18, 1 (Jan. 2016), 14–24.

Digital Library

[26]

H. Liu et al. 2020. Deep learning-based picture-wise just noticeable distortion prediction model for image compression. IEEE Transactions on Image Processing, 29, 641–656.

Digital Library

[27]

Q. Huang, H. Wang, S. C. Lim, H. Y. Kim, S. Y. Jeong, and C.-C.-J. Kuo. 2017. Measure and prediction of HEVC perceptually lossy/lossless boundary QP values. In 2017 Data Compression Conference (DCC’17). Snowbird, UT, USA, 42–51.

[28]

L. Jin, J. Lin, S. Hu, et al. 2016. Statistical study on perceived JPEG image quality via MCL-JCI dataset construction and analysis. IS&T/SPIE Electronic Imaging, International Society for Optics and Photonics, 13, 1–9.

[29]

X. Shen, Z. Ni, W. Yang, X. Zhang, S. Wang, and S. Kwong. 2020. A JND dataset based on VVC compressed images. In IEEE International Conference on Multimedia & Expo Workshops (ICMEW’20). London, UK, 1–6.

[30]

H. Wang et al. 2016. MCL-JCV: A JND-based H.264/AVC video quality assessment dataset. In 2016 IEEE International Conference on Image Processing (ICIP’16). Phoenix, AZ, USA, 1509–1513.

[31]

H. Wang et al. 2017. VideoSet: A large-scale compressed video quality dataset based on JND measurement. J. Vis. Commun. Image Represent 46 (Jul. 2017), 292–302.

[32]

H. Wang, I. Katsavounidis, Q. Huang, X. Zhou, and C.-C. J. Kuo. 2018. Prediction of satisfied user ratio for compressed video. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’18). Calgary, AB, Canada, 6747–6751.

Digital Library

[33]

X. Zhang, C. Yang, H. Wang, W. Xu, and C. -C. J. Kuo. 2020. Satisfied-user-ratio modeling for compressed video. IEEE Transactions on Image Processing, 29, 3777–3789.

[34]

Meng Dan, Jin Ye, Wenchao Jiang, and Yuanchao Shan. 2021. Visual sensitivity aware rate adaptation for video streaming via deep reinforcement learning. In 23rd IEEE International Conference on High Performance Computing and Communications (HPCC’21). To appear.

[35]

W. Zhou, Y. Zhu, J. Lei, J. Wan, and L. Yu. 2022. CCAFNet: Crossflow and cross-scale adaptive fusion network for detecting salient objects in RGB-D images. IEEE Transactions on Multimedia, 24, 2192–2204.

[36]

Kai Lin, Chuanmin Jia, Xinfeng Zhang, Shanshe Wang, Siwei Ma, and Wen Gao. 2022. NR-CNN: Nested-residual guided CNN In-loop filtering for video coding. ACM Trans. Multimedia Comput. Commun. Appl 18, 4 (2022), 1–22.

Digital Library

[37]

D. Zhang, L. Yao, K. Chen, S. Wang, X. Chang, and Y. Liu. 2020. Making sense of spatio-temporal preserving representations for EEG-based human intention recognition. IEEE Transactions on Cybernetics 50, 7 (2020), 3033–3044.

[38]

M. Luo, X. Chang, L. Nie, Y. Yang, A. G. Hauptmann, and Q. Zheng. 2018. An adaptive semisupervised feature analysis for video semantic recognition. IEEE Transactions on Cybernetics 48, 2 (2018), 648–660.

[39]

K. Chen, L. Yao, D. Zhang, X. Wang, X. Chang, and F. Nie. 2020. A semisupervised recurrent convolutional attention model for human activity recognition. IEEE Transactions on Neural Networks and Learning Systems 31, 5 (2020), 1747–1756.

[40]

W. Kim, A.-D. Nguyen, S. Lee, and A. C. Bovik. 2020. Dynamic receptive field generation for full-reference image quality assessment. IEEE Transactions on Image Processing, 29, 4219–4231.

Digital Library

[41]

J. Kim and S. Lee. 2017. Deep learning of human visual sensitivity in image quality assessment framework. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, 1676–1684.

[42]

N. Kruger et al. 2013. Deep hierarchies in the primate visual cortex: What can we learn for computer vision? IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (Aug. 2013), 1847–1871.

[43]

T. S. Lee and D. Mumford. 2003. Hierarchical Bayesian inference in the visual cortex. JOSA A 20, 7, 1434–1448.

[44]

R. M. Cichy, D. Pantazis, and A. Oliva. 2014. Resolving human object recognition in space and time. Nature Publishing Group 17, 3 (Jan. 2014), 455–462.

[45]

DASH Industry Form. 2016. Reference Client 2.4.0. Retrieved 2016 from http://mediapm.edgesuite.net/dash/public/nightly/samples/dash-if-reference-player/index.html.

[46]

X. Liu, X. Tao, M. Xu, Y. Zhan, and J. Lu. 2020. An EEG-based study on perception of video distortion under various content motion conditions. IEEE Transactions on Multimedia 22, 4 (April 2020), 949–960.

Digital Library

[47]

Netflix. 2018. VMAF - Video Multi-Method Assessment Fusion. Retrieved December, 2018 from https://github.com/Netflix/vmaf.

[48]

Nabajeet Barman, Steven Schmidt, Saman Zadtootaghaj, Maria G. Martini, and Sebastian Möller. 2018. An evaluation of video quality assessment metrics for passive gaming video streaming. In Proceedings of the 23rd Packet Video Workshop (PV’18). Amsterdam, the Netherlands, 7–12.

Digital Library

[49]

H. Riiser et al. 2013. Commute path bandwidth traces from 3G networks: Analysis and applications. In Proceedings of the 4th ACM Multimedia Systems Conference (MMSys’13). Association for Computing Machinery, New York, NY, USA, 114–118.

[50]

Y.-F. Ou, Y. Xue, and Y. Wang. 2014. Q-star: A perceptual video quality model considering impact of spatial, temporal, and amplitude resolutions. IEEE Transactions on Image Processing 23, 6, 2473–2486.

Digital Library

[51]

R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk. 2009. Frequency-tuned salient region detection. 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL, USA, 1597–1604.

[52]

Chun Hsien Chou and Yun Chin Li. 1995. A perceptually tuned subband image coder based on the measure of just-noticeable-distortion profile. IEEE Transactions on Circuits and Systems for Video Technology 5, 6, 467–476.

Digital Library

[53]

Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (April 2004), 600–612.

Digital Library

[54]

J. Long, E. Shelhamer, and T. Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). Boston, MA, USA, 3431–3440.

[55]

D. P. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. Retrieved 2014 from https://arxiv.org/abs/1412.6980.

[56]

Federal Communications Commission. 2016. Raw Data - Measuring Broadband America. Retrieved 2016 from https://www.fcc.gov/reportsresearch/reports/.

[57]

Akamai. 2016. dash.js. Retrieved 2016 from https://github.com/Dash-Industry-Forum/dash.js/.

[58]

R. Netravali et al. 2015. Mahimahi: Accurate record-and-replay for HTTP. In Proceedings of USENIX ATC.

Index Terms

A Visual Sensitivity Aware ABR Algorithm for DASH via Deep Reinforcement Learning
1. Information systems
  1. Information systems applications
    1. Multimedia information systems
      1. Multimedia streaming

Recommendations

Modeling visual attention's modulatory aftereffects on visual sensitivity and quality evaluation

With the fast development of visual noise-shaping related applications (visual compression, error resilience, watermarking, encryption, and display), there is an increasingly significant demand on incorporating perceptual characteristics into these ...
QoE-Aware Adaptive Bitrate Algorithm Based on Subepisodic Deep Reinforcement Learning for DASH
ICMLC '23: Proceedings of the 2023 15th International Conference on Machine Learning and Computing

Recently, mobile video service is booming and its traffic accounts for the vast majority of network traffic. The adaptive bitrate (ABR) algorithm in dynamic adaptive streaming over HTTP (DASH) is the key technology to improve user’s quality of ...
COBIRAS: Offering a Continuous Bit Rate Slide to Maximize DASH Streaming Bandwidth Utilization
Reaching close-to-optimal bandwidth utilization in dynamic adaptive streaming over HTTP (DASH) systems can, in theory, be achieved with a small discrete set of bit rate representations. This includes typical bit rate ladders used in state-of-the-art DASH ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 3

March 2024

665 pages

EISSN:1551-6865

DOI:10.1145/3613614

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 November 2023

Online AM: 29 June 2023

Accepted: 26 March 2023

Revised: 28 December 2022

Received: 16 April 2022

Published in TOMM Volume 20, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Project of End to End Transmission Theory and Key Technologies Ensuring Deterministic Delay
Research on Load Balancing Mechanism for Heterogeneous Traffic in Data Center Network
Key Project of Guangxi Science & Technology
Ministry of Education, Singapore, under its Academic Research Fund Tier 2
National Research Foundation, Singapore and Infocomm Media Development Authority under its Future Communications Research & Development Programme; and the Key Project of Guangxi Science & Technology

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
279
Total Downloads

Downloads (Last 12 months)176
Downloads (Last 6 weeks)6

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents