A User Study of Netflix Streaming

Jackson, France; Amin, Rahul; Fu, Yunhui; Gilbert, Juan E.; Martin, James

doi:10.1007/978-3-319-20886-2_45

France Jackson¹⁴,
Rahul Amin¹⁵,
Yunhui Fu¹⁵,
Juan E. Gilbert¹⁴ &
…
James Martin¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9186))

16k Accesses
3 Citations

Abstract

Netflix and Hulu are examples of HTTP-based Adaptive Streaming (HAS). HAS is unique because it attempts to manage the user’s perceived quality by adapting video quality. Current HAS research fails to address whether adaptations actually make a difference? The main challenge in answering this is the lack of consideration for the end user’s perceived quality. The research community is converging on an accepted set of ‘component metrics’ for HAS. However, determining an objective Quality of Experience (QoE) estimate is an open issue. A between-subject user study of Netflix was conducted to shed light on the user’s perception of quality. We found that users prefer to receive lower video quality levels first with marginal improvements made over time. Currently, content providers switch between the highest and lowest level of quality. This paper seeks to explain a better method that led to higher user satisfaction based on Mean opinion score values (MOS).

You have full access to this open access chapter, Download conference paper PDF

Smart Viewing: Redefining Value in Over-the-Top Streaming Experiences

A Comparative Analysis of Over-the-Top Platforms: Amazon Prime Video and Netflix

A Survey on Streaming Adaptation Techniques for QoS and QoE in Real-Time Video Streaming

Keywords

1 Introduction

Sandvine’s recent Internet usage report estimates that 65 % of downstream traffic during peak usage times for fixed access networks is ‘real-time entertainment’ [1]. This traffic category represents streamed content that primarily consists of Netflix and YouTube traffic. Ten years ago the term video streaming implied UDP transport. Now, video streaming typically refers to HTTP-based adaptive streaming (HAS). Various, similar approaches for HAS have evolved from companies such as Netflix, Microsoft, Apple, and Google. This evolution motivated the development of the Dynamic Adaptive Streaming Over HTTP (DASH) Protocol. DASH provides a standard method for containing and distributing video content over the Internet [2, 3]. While it is not clear when or if the current set of HAS applications will converge towards a single standard, it is clear that HAS applications will be the dominant consumer of bandwidth in broadband access networks in the foreseeable future. Given its popularity, we chose Netflix as the video content delivery system to use in our study.

The idea behind HAS is that matching the video content bitrate to the available path bandwidth leads to a better user experience and to reduced bandwidth consumption compared to if the video was streamed at a fixed bitrate. This implies that the application voluntarily gives up available TCP bandwidth with the assumption that this improves the end user experience. This behavior is rationalized throughout the literature. For example, the work in [4] suggests that buffer stalls have the biggest impact on user engagement; the work in [5] suggests that frequent adaptations are distracting; and the work in [6] suggests that sudden changes in video quality trigger poor subjective scores. In addition, there are several recent performance studies of HAS (e.g., [7–9]) that do consider Quality of Experience (QoE). However, determining the perceived QoE of a video streaming session is very complex as the assessment depends on many factors including the viewer, the video encoding details, and the content.

This study explores techniques content providers can use to positively influence user perception and investigate the power of setting user expectations. Our work focuses on two research questions. First, we address the issue of how quickly video rendering should start. The tradeoff may seem trivial: should HAS buffer content when network conditions are excellent while showing the user a fairly high resolution video content, or should HAS use up all the bandwidth to show the user the highest resolution video content and not waste any bandwidth on buffering for the time when network impairments are experienced?

To provide an insight to this complex question, we investigate two relatively simpler questions. If video quality degradation is necessary; it is not beneficial for video content providers to provide content at two extremes, (i.e. starting a video at a lower quality and eventually moving to the highest quality or the converse, starting the video at the highest quality and degrading overtime) or if the user’s expectation is set low; by low video quality in the beginning, are the users ok with a sub-par, but slightly better, quality for the rest of the video? Additionally, do users have different video quality expectations if they are told that the video they are watching is online content such as Netflix versus TV Cable provider content such as On-Demand movies?

This work is a continuation of a previous study where user satisfaction with an online game was evaluated. In the previous study, end user’s home networks were emulated, including network impairment such as packet delay by utilizing the Linux network emulator netem. While under observation, users played a popular online game Call of Duty: Modern Warfare 2 on Xbox 360. Survey responses were used to calculate a Mean Opinion Score (MOS). It was concluded that although more experienced gamers are more sensitive to network delays compared to novice users, they are still unable to adequately quantify the amount of degradation they experience. Because online videos are also a large source of bandwidth consumption in homes today, we evaluated the online video streaming space in this study [1]. In the research presented in this paper, a subjective evaluation of HAS in the presence of controlled levels of network impairment was performed by conducting actual user studies. Research done by [21] supports the argument that QoE for multimedia services should be driven by the user perception of quality rather than raw engineering parameters such as latency, jitter, and bandwidth.

2 Related Work

The networking community is just beginning to study DASH-like protocols. Most of the recent work has focused on characterizing widely deployed applications like Netflix or YouTube [7–9, 14–17]. The work in [7, 8] is similar to this work as they incorporate QoE assessment in the analysis methodology. Jiang, Sekar, and Zhang introduce an instability metric in [7] that quantifies the level and relative weight of bitrate switches.

Also, there are very few studies that assess the perceived quality of modern Internet protocol television (IPTV) video distribution systems. The art of perceived quality assessment of video broadcasts is well established; however, the majority of this work focuses on traditional broadcast technology. The majority of techniques that are used assume a methodology involving reference streams. The deviation of the received content from the original content is assessed based on numerous standard metrics [19, 20].

Assessing the perceived quality of video without a reference is much more challenging. The 3GPP community has identified several quality metrics for DASH including HTTP request/response transaction times, average throughput and initial playout delay [3GPP]. In [4], the authors explore measures that impact perceived quality of Internet broadcasts and found that the percent buffering time has the largest impact on user engagement although the specific impact varies by content genre. Other metrics that are established in the Internet broadcast communities are:

Zapping time: The time required for a new stream to begin rendering.
Rate of re-buffering: This is the rate of re-buffering events.

The work in [8] performed subjective tests to determine which bitrate adaptation behaviors led to the highest overall perceived quality. The work in [18] evaluated three commercial HAS products (Microsoft Smooth Streaming, Adobe Dynamic Streaming, and Apple Live Streaming) as well as an open source HAS implementation. They determine that in vehicular scenarios neither method will always achieve the maximum available bandwidth with a minimal number of quality switches.

To date, there has not been a human factors study that correlates the impacts of the network with perceived quality by end users viewing HAS-based streamed content.

3 Methodology

Video content type could make a difference in the perceived quality. For this study an action movie because the special effects in an action movie allows for easier detection of visual artifacts. An artifact is an anomaly found during visual representation of digital graphics and imagery. Using the action movie video content, preliminary studies were conducted to determine the appropriate level of packet loss for the actual study. In selecting the correct packet loss rate, one of the requirements was that a buffering message should not appear on the screen during the study. When a packet loss rate of 12 % was used, a buffering message appeared; therefore, it was determined that the worst packet loss setting had to be below 12 %. Expert viewers from the University’s Digital Production group watched the movie clip at various packet loss levels below 12 % to determine the correct packet loss settings for the actual study where artifacts could be detected.

3.1 Experiment Setup

The action video clip used in the study was 10 min in length. For the actual experiment, 3 settings were consided based on the findings from the preliminary study. See Table 1 for the packet loss settings used for the study.

Table 1. Packet loss settings

Full size table

3.2 Study Design

A between subject approach was taken where each user viewed only one setting. In an attempt to have more control over the video content, each setting described in Table 1 was pre-recorded. For each setting, the video was played with the appropriate beginning and appropriate end setting, and a screen capture software, Fraps,^{Footnote 1} was used to capture the video. Therefore, each participant scheduled to view Setting 1 video, saw the exact same video clip. The same is true for Settings 2 and 3.

A total of 56 experiments were performed, which resulted in 112 survey responses. To test one of the hypotheses related to online vs. on-demand content, half of the participants were told that they were watching a free online movie by the experimenter, while the other half were told they were watching a paid on-demand style movie. During each experiment, no more than four users would participate in the study. The number was kept below four to preserve a fairly straight viewing angle for each user. The distance between the viewers and the 32’ 1080p HD TV where the video content was displayed was approximately 6 feet which is a typical distance used by viewers in a home setting.

3.3 Experimental Procedure

Prior to watching the movie clip, participants completed a pre-survey questionnaire where demographic information as well as information related to their movie watching behavior was collected. Following the movie, a post survey questionnaire was completed where users used a 5 point Likert scale to assess their “Level of Frustration”, “Video Clarity” and “Distortion” for the beginning and end of the video clip as well as their overall experience. Video clarity was defined as the crispness of the image, while distortion was defined as the presence of artifacts in the image. They were also asked about sound distortion.

There were two additional questions that were very important to our stakeholders. We asked users how much they would be willing to pay for a movie with the current video quality and how likely would they be to switch service providers if they continuously received service at this level. These responses were recorded and analyzed to determine if there was a correlation between willingness to pay and whether the user was told they were watching an online movie versus an on-demand movie.

Finally, there was an oral question. Each user was asked if they prefer to have bad quality in the beginning or at the end of a movie. There was also an option to indicate that any degradation in quality is unacceptable.

4 Results and Analysis

4.1 Calculating Mean Opinion Scores (MOS)

The responses to the post-survey questions were used to calculate a Mean Opinion Score (MOS) value. The MOS metric was used to quantify the user’s QoE. Participant responses were first categorized by on-demand or online perception. Next they were categorized by the setting (bad to good, good to bad, and bad to ok) as illustrated in Fig. 1.

Within each of these categories, the responses for the reported level of frustration at the beginning of the video, the responses for the video clarity at the beginning of the video, and the responses for the distortion at the beginning of the video were all averaged. This created three MOS values. These values were averaged to create the overall MOS for the beginning of the video. The same process was used to calucalate the MOS for the end of the video, as well as the overall opinion(see Fig. 1). The MOS calculation breakdown can be seen in Fig. 2. The process shown in Fig. 2 was repeated for each setting. Each setting under the online condition was also used to calculate MOS values.

4.2 Interpreting Mean Opinion Scores

Based on the results, the bad to ok setting has the highest “overall” MOS. So, starting with low video quality and then improving it a little satisfied most of the users. The bad to ok overall MOS results support our first hypothesis that video quality should start at a lower quality and eventually move to a little higher, but sub-par quality in the presence of network congestion. Both settings that involved the two extremes (Good to Bad and Bad to Good) resulted in lower MOS values, this also supports our first hypothesis, that content providers current method of showing extremes is not beneficial. The responses to two extreme settings result from the fact that users get to see the “best quality”. They know the quality they should have been obtaining the entire time and they can more visibly see the drastic change, as opposed to a less drastic difference. Since the user expectation is set low by low video quality in the beginning, the users actually prefer a sub-par (but slightly better) quality for the rest of the video rather than going to the best quality which supports our hypothesis. The results for both on-demand and on-line studies are illustrated in Figs. 3 and 4 respectively.

When comparing Figs. 3 and 4, it is also apparent that the “overall” MOS for all three settings is lower for online study compared to on-demand study. This supports our second hypothesis that users’ perceived level of quality can be influenced by setting a level of expectation based on the method used to provide the content.

The MOS results from the “Willingness to Pay” metric, seen in Fig. 5, also support this hypothesis by showing that the MOS of each setting is lower for the online study compared to the on-demand study.

When asked if they prefered degradation in quality in the beginning or end of the video, the majority (45 %) reported that any degradation in quality is unacceptable.

Of the people that would accept degradation, they preferred that it occur in the beginning. This supports the hypothesis that starting at poor quality and slightly improving over time is beneficial.

5 Conclusion

Our analysis supported our hypothesis that users have different levels of expectation based on the method used to deliver the video content. When users watch online movie content such as Netflix or Hulu, they have a lower level of expectation than when they are watching on-demand content. This preset expectation influences the way they perceive and assess the video quality as well as the amount they are willing to spend on the content. Our results also suggest that when a stream is starting and network conditions impair quality, the system should begin at a reduced quality and improve over time. Content providers can still satisfy customers if they start with a bad quality stream and improve the quality marginally over time. This could lead to better bandwidth management for cable providers and result in a conservation of resources.

Notes

1.
More details about the screen capture software available at: www.fraps.com.

References

Global internet phenomena report. Sandvine Corporation, 2H (2012). http://www.sandvine.com/downloads/-documents/Phenomena_2H_2012/Sandvine_Global_Internet_Phenomena_Report_2H_2012.pdf
GPP TS 26.247 version 10.1.0 Release 10: Transparent end-to-end packet switched streaming service (PSS); progressive download and dynamic adaptive service over HTTP, 3GPP, January 2012
Google Scholar
ISO/IEC: Information technology — MPEG systems technologies — part 6: dynamic adaptive streaming over HTTP (DASH), Jan 2011
Google Scholar
Dobrian, F., Awan, A., Joseph, D., Ganjamm, A., Zhan, J., Sekar, V., Stoica, I., Zhang, H.: Understanding the impact of video quality on user engagement. In: Proceedings of SIGCOMM 2011, August 2011
Google Scholar
Cranley, N., Perry, P., Murphy, L.: User perception of adapting video quality. Int. J. Hum. Comput. Stud. 64(8), 637–647 (2006)
Article Google Scholar
Muller, C., Timmerer, C.: A testbed for the dynamic adaptive streaming over HTTP featuring session mobility. In: Proceedings of ACM MMSys, February 2011
Google Scholar
Jiang, J., Sekar, V., Zhang, H.: Improving fairness, efficiency, and stability in HTTP-based adaptive video streaming with festive. In: Proceedings of CoNEXT 2012, December 2012
Google Scholar
Mok, R., Luo, X., Chan, E., Chang, R.: QDASH: a QoE-aware DASH system. In: Proceedings of the ACM MMSys, December 2012
Google Scholar
Huang, T., Handigol, N., Heller, B., McKeown, N., Johari, R.: Confused, timid, and unstable: picking a video streaming rate is hard. In: Proceedings of the IMC 2012, November 2012
Google Scholar
Martin, J., Fu, Y., et al.: Characterizing netflix bandwidth consumption. In: Proceedings of the IEEE CCNC, January 2013
Google Scholar
Martin, J., Fu, Y., Hong, G.: On the efficacy of the dynamic adaptive streaming over HTTP (DASH) protocol – extended version. Technical report (2013). http://www.cs.clemson.edu/~jmarty/papers/EfficacyDASHExtended.pdf
Balachandran, A., Sekar, V., Akella, A., Seshan, S., Stoica, I., Zhang, H.: A quest for an internet video quality-of-experience metric. In: Proceedings of the ACM HotNets 2012, October 2012
Google Scholar
Oyman, O., Singh, S.: Quality of experience for HTTP adaptive streaming services. IEEE Communications Magazine, April 2012
Google Scholar
Akhshabi, S., Begen, A., Dovrolis, C.: An experimental evaluation of rate-adaptation algorithms in adaptive streaming over HTTP. In: Proceedings of ACM MMSys, February 2011
Google Scholar
Liu, C., Bouazizi, I., Gabbouj, M.: Rate adaptation for adaptive HTTP streaming. In: Proceedings of ACM MMSys, February 2011
Google Scholar
Lederer, S., Muller, C., Timmerer, C.: Dynamic adaptive streaming over HTTP dataset. In: Proceedings of ACM MMSys, February 2012
Google Scholar
Akhshabi, S., Anantakrishnan, L., Dovrolis, C., Begen, A.: What happens when HTTP adaptive streaming players compete for bandwidth. In: Proceedings of ACM NOSSDAV 2012, June 2012
Google Scholar
Muller, C., Lederer, S., Timmerer, C.: An evaluation of dynamic adaptive streaming over HTTP in vehicular environments. In: Proceedings of ACM MoVid 2012, February 2012
Google Scholar
Wang, Z., Bovik, A., Sheikh, H., et al.: Image quality assessment: from error visibility to structure similarity. IEEE Trans. Image Process. 14(12), 2117–2128 (2005)
Article Google Scholar
Xia, J., Shi, Y., Teunissen, K., Heynderickx, I.: Perceivable artifacts in compressed video and their relation to video quality. Signal Process. Image Commun. 24, 548–556 (2009)
Article Google Scholar
Agboma, F., Liotta, A.: Addressing user expectations in mobile content delivery. Mob. Inf. Syst. 3(3), 1 (2007)
Google Scholar

Download references

Acknowledgements

This material is based upon work supported by CableLabs, Inc. Opinions or points of views expressed in this document are those of the authors and do not necessarily reflect the official position of, or a position endorsed by CableLabs, Inc.

Author information

Authors and Affiliations

CISE Department, University of Florida, Gainesville, FL, USA
France Jackson & Juan E. Gilbert
School of Computing, Clemson University, Clemson, SC, USA
Rahul Amin, Yunhui Fu & James Martin

Authors

France Jackson
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Amin
View author publications
You can also search for this author in PubMed Google Scholar
Yunhui Fu
View author publications
You can also search for this author in PubMed Google Scholar
Juan E. Gilbert
View author publications
You can also search for this author in PubMed Google Scholar
James Martin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to France Jackson .

Editor information

Editors and Affiliations

Aaron Marcus and Associates, Berkeley, CA, USA
Aaron Marcus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jackson, F., Amin, R., Fu, Y., Gilbert, J.E., Martin, J. (2015). A User Study of Netflix Streaming. In: Marcus, A. (eds) Design, User Experience, and Usability: Design Discourse. Lecture Notes in Computer Science(), vol 9186. Springer, Cham. https://doi.org/10.1007/978-3-319-20886-2_45

Download citation

DOI: https://doi.org/10.1007/978-3-319-20886-2_45
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20885-5
Online ISBN: 978-3-319-20886-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics