Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3487552.3487819acmconferencesArticle/Chapter ViewAbstractPublication PagesimcConference Proceedingsconference-collections
research-article
Open access

The shape of view: an alert system for video viewership anomalies

Published: 02 November 2021 Publication History

Abstract

Internet video providers rely on alerting workflows to identify and remedy incidents that can impact users (e.g., outages or buggy players). There is growing evidence for the need for viewership-based analytics---detecting and diagnosing incidents that manifest through changes in viewership patterns but not in other (e.g., QoE) metrics. However, both detection and diagnosis of viewership anomalies is challenging due to the contextual nature of anomalies, non-stationarity of viewership, and complex dependencies between the structure of events and how they impact different subpopulations of viewers. We present Proteas, an alerting framework for video viewership anomalies that tackles these challenges. Proteas builds on key spatiotemporal structural insights. First, across different sub-populations of viewers and days of the week, we find that the shape of the viewership curve remains invariant over multiple weeks, thus enabling anomaly detection. Second, we use the hierarchy of viewership groups to produce compact alerts. Finally, we find that common anomalies manifest with spatiotemporal signatures, which enables us to classify anomalies to produce actionable alerts. We evaluate Proteas using 3 months of real viewership data (including the onset of the COVID-19 pandemic) and show that Proteas is accurate with over 80% True Positive Rate, average precision of over 86% (i.e., few false positives) and doesn't miss any major events. In addition, we find that approximately half of Proteas's alerts refer to events not caught by other alerting workflows, thus adding value to operators' existing toolkit.

References

[1]
Comcast outage. https://www.miamiherald.com/news/nation-world/national/article214067869.html.
[2]
Compira Labs. https://www.compiralabs.com/.
[3]
Confidence Interval Critical Values. https://bit.ly/3bRM9aK.
[4]
Conviva - Real-time Streaming Video Intelligence. https://www.conviva.com/.
[5]
Cosine Distance. https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.distance.cosine.html.
[6]
Down Detector. https://www.downdetector.com/.
[7]
ENSEO systems. https://www.enseo.com/.
[8]
F-score. hhttps://deepai.org/machine-learning-glossary-and-terms/f-score.
[9]
LightUp AI. https://www.lightup.ai/.
[10]
Matern Kernel. https://en.wikipedia.org/wiki/Mat%C3%A9rn_covariance_function.
[11]
Ooyala. https://ooyala.dalet.com/.
[12]
Periodic Kernel. http://jhamrick.github.io/gaussian_processes/gp.kernels.PeriodicKernel.html.
[13]
Precision Recall Analysis. https://towardsdatascience.com/a-look-at-precision-recall-and-f1-score-36b5fd0dd3ec.
[14]
Prophet. https://facebook.github.io/prophet/.
[15]
Squared Exponential Kernel. http://evelinag.com/Ariadne/covarianceFunctions.html.
[16]
StatsModels. https://www.statsmodels.org/stable/index.html.
[17]
Streaming Video Alliance. https://www.streamingvideoalliance.org/.
[18]
SURUS - Anomaly detection at Netflix. https://netflixtechblog.com/rad-outlier-detection-on-big-data-d6b0494371cc.
[19]
Verizon Outage. https://bgr.com/2019/06/24/internet-outage-2019-google-amazon-reddit-down/.
[20]
White noise Kernel. https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.WhiteKernel.html.
[21]
H. Abrahamsson and M. Nordmark. Program popularity and viewer behaviour in a large tv-on-demand system. In Proceedings of the 2012 Internet Measurement Conference, IMC '12, pages 199--210, New York, NY, USA, 2012. ACM.
[22]
B. Agarwal, R. Bhagwan, T. Das, S. Eswaran, V. N. Padmanabhan, and G. M. Voelker. Netprints: Diagnosing home network misconfigurations using shared knowledge. In NSDI, volume 9, pages 349--364, 2009.
[23]
C. C. Aggarwal. Outlier Analysis. Springer Publishing Company, Incorporated, 2013.
[24]
F. Ahmed, J. Erman, Z. Ge, A. X. Liu, J. Wang, and H. Yan. Detecting and localizing end-to-end performance degradation for cellular data services. In IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications, pages 1--9. IEEE, 2016.
[25]
W. Aiello, A. Gilbert, B. Rexroad, and V. Sekar. Sparse approximations for high fidelity compression of network traffic data. In Proceedings of the 5th ACM SIGCOMM conference on Internet measurement, pages 22--22. USENIX Association, 2005.
[26]
V. Alarcon-Aquino and J. A. Barria. Anomaly detection in communication networks using wavelets. 2001.
[27]
S. Aminikhanghahi and D. J. Cook. A survey of methods for time series change point detection. Knowl. Inf. Syst., 51(2):339--367, May 2017.
[28]
P. Bahl, R. Chandra, A. Greenberg, S. Kandula, D. A. Maltz, and M. Zhang. Towards highly reliable enterprise network services via inference of multi-level dependencies. ACM SIGCOMM Computer Communication Review, 37(4):13--24, 2007.
[29]
A. Balachandran, V. Sekar, A. Akella, S. Seshan, I. Stoica, and H. Zhang. Developing a predictive model of quality of experience for internet video. SIGCOMM Comput. Commun. Rev., 43(4):339--350, Aug. 2013.
[30]
P. Barford, J. Kline, D. Plonka, and A. Ron. A signal analysis of network traffic anomalies. In Proceedings of the 2Nd ACM SIGCOMM Workshop on Internet Measurment, IMW '02, pages 71--82, New York, NY, USA, 2002. ACM.
[31]
P. Barham, A. Donnelly, R. Isaacs, and R. Mortier. Using magpie for request extraction and workload modelling. In OSDI, volume 4, pages 18--18, 2004.
[32]
C. Bergmeir, R. J. Hyndman, and J. M. Benítez. Bagging exponential smoothing methods using stl decomposition and box-cox transformation. International journal of forecasting, 32(2):303--312, 2016.
[33]
D. J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series. In KDD workshop, volume 10, pages 359--370. Seattle, WA, 1994.
[34]
R. Bhagwan, R. Kumar, R. Ramjee, G. Varghese, S. Mohapatra, H. Manoharan, and P. Shah. Adtributor: Revenue debugging in advertising systems. In 11th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 14), pages 43--55, 2014.
[35]
C. M. Bishop. Pattern recognition and machine learning. springer, 2006.
[36]
T. Böttger, G. Ibrahim, and B. Vallis. How the internet reacted to covid-19: A perspective from facebook's edge network. In Proceedings of the ACM Internet Measurement Conference, IMC '20, page 34--41, New York, NY, USA, 2020. Association for Computing Machinery.
[37]
M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon. Analyzing the video popularity characteristics of large-scale user generated content systems. IEEE/ACM Trans. Netw., 17(5):1357--1370, Oct. 2009.
[38]
V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Comput. Surv., 41(3):15:1--15:58, July 2009.
[39]
C. Chatfield. The holt-winters forecasting procedure. Journal of the Royal Statistical Society: Series C (Applied Statistics), 27(3):264--279, 1978.
[40]
R. B. Cleveland, W. S. Cleveland, J. E. McRae, and I. Terpenning. Stl: A seasonal-trend decomposition. Journal of official statistics, 6(1):3--73, 1990.
[41]
A. Crotty, A. Galakatos, E. Zgraggen, C. Binnig, and T. Kraska. Vizdom: interactive analytics through pen and touch. Proceedings of the VLDB Endowment, 8(12):2024--2027, 2015.
[42]
F. Dobrian, V. Sekar, A. Awan, I. Stoica, D. Joseph, A. Ganjam, J. Zhan, and H. Zhang. Understanding the impact of video quality on user engagement. ACM SIGCOMM Computer Communication Review, 41(4):362--373, 2011.
[43]
D. Duvenaud. Automatic model construction with Gaussian processes. PhD thesis, University of Cambridge, 2014.
[44]
A. Feldmann, O. Gasser, F. Lichtblau, E. Pujol, I. Poese, C. Dietzel, D. Wagner, M. Wichtlhuber, J. Tapiador, N. Vallina-Rodriguez, O. Hohlfeld, and G. Smaragdakis. The lockdown effect: Implications of the covid-19 pandemic on internet traffic. In Proceedings of the ACM Internet Measurement Conference, IMC '20, page 1--18, New York, NY, USA, 2020. Association for Computing Machinery.
[45]
W. A. Fuller. Introduction to statistical time series, volume 428. John Wiley & Sons, 2009.
[46]
F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with lstm. 1999.
[47]
J. Görtler, R. Kehlbeck, and O. Deussen. A visual exploration of gaussian processes. Distill, 2019. https://distill.pub/2019/visual-exploration-gaussian-processes.
[48]
D. Hawkins. Identification of Outliers. Chapman and Hall, 1980.
[49]
J. Hochenbaum, O. S. Vallis, and A. Kejariwal. Automatic anomaly detection in the cloud via statistical learning. arXiv preprint arXiv:1704.07706, 2017.
[50]
V. J. Hodge and J. Austin. A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2):85--126, Oct 2004.
[51]
A. Hussain, J. Heidemann, J. Heidemann, and C. Papadopoulos. A framework for classifying denial of service attacks. In Proceedings of the 2003 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, SIGCOMM '03, pages 99--110, New York, NY, USA, 2003. ACM.
[52]
J. Jiang, V. Sekar, H. Milner, D. Shepherd, I. Stoica, and H. Zhang. Cfa: A practical prediction system for video qoe optimization. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation, NSDI'16, pages 137--150, Berkeley, CA, USA, 2016. USENIX Association.
[53]
J. Jiang, V. Sekar, I. Stoica, and H. Zhang. Shedding light on the structure of internet video quality problems in the wild. In Proceedings of the ninth ACM conference on Emerging networking experiments and technologies, pages 357--368. ACM, 2013.
[54]
J. Jiang, V. Sekar, and H. Zhang. Improving fairness, efficiency, and stability in http-based adaptive video streaming with festive. IEEE/ACM Transactions on Networking (ToN), 22(1):326--340, 2014.
[55]
S. Kandula, R. Mahajan, P. Verkaik, S. Agarwal, J. Padhye, and P. Bahl. Detailed diagnosis in enterprise networks. ACM SIGCOMM Computer Communication Review, 39(4):243--254, 2009.
[56]
R. R. Kompella, J. Yates, A. Greenberg, and A. C. Snoeren. Fault localization via risk modeling. IEEE Transactions on Dependable and Secure Computing, 7(4):396--409, 2009.
[57]
T. Kraska. Northstar: An interactive data science system. 2021.
[58]
S. S. Krishnan and R. K. Sitaraman. Video stream quality impacts viewer behavior: Inferring causality using quasi-experimental designs. In Proceedings of the 2012 Internet Measurement Conference, IMC '12, pages 211--224, New York, NY, USA, 2012. ACM.
[59]
N. Laptev, S. Amizadeh, and I. Flint. Generic and scalable framework for automated time-series anomaly detection. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pages 1939--1947, 2015.
[60]
A. Lazarevic, A. Banerjee, V. Chandola, V. Kumar, and J. Srivastava. Data mining for anomaly detection. In Tutorial at the European Conference on Principles and Practice of Knowledge Discovery in Databases, 2008.
[61]
Q. Lin, J.-G. Lou, H. Zhang, and D. Zhang. idice: problem identification for emerging issues. In Proceedings of the 38th International Conference on Software Engineering, pages 214--224, 2016.
[62]
X. Liu, F. Dobrian, H. Milner, J. Jiang, V. Sekar, I. Stoica, and H. Zhang. A case for a coordinated internet video control plane. In Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication, pages 359--370, 2012.
[63]
A. Lutu, D. Perino, M. Bagnulo, E. Frias-Martinez, and J. Khangosstar. A characterization of the covid-19 pandemic impact on a mobile network operator traffic. In Proceedings of the ACM Internet Measurement Conference, IMC '20, page 19--33, New York, NY, USA, 2020. Association for Computing Machinery.
[64]
V. Morgenstern, B. Upadhyaya, and M. Benedetti. Signal anomaly detection using modified cusum method. In Proceedings of the 27th IEEE Conference on Decision and Control, pages 2340--2341. IEEE, 1988.
[65]
G. Nason. Stationary and non-stationary time series, pages 129 -- 142. Geological Society of London, United Kingdom, 2006.
[66]
J. Ndong and K. Salamatian. Signal processing-based anomaly detection techniques: A comparative analysis. 06 2011.
[67]
B. Nguyen, Z. Ge, J. Van der Merwe, H. Yan, and J. Yates. Absence: Usage-based failure detection in mobile networks. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking, pages 464--476, 2015.
[68]
K. Ord. Outliers in statistical data : V. Barnett and T. Lewis, 1994, 3rd edition, (John Wiley & Sons, Chichester), 584 pp., [UK pound]55.00, ISBN 0-471-93094-6. International Journal of Forecasting, 12(1):175--176, March 1996.
[69]
X. Pan, J. Tan, S. Kavulya, R. Gandhi, and P. Narasimhan. Ganesha: Blackbox diagnosis of mapreduce systems. ACM SIGMETRICS Performance Evaluation Review, 37(3):8--13, 2010.
[70]
B. Pang, L. Lee, et al. Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1-2):1--135, 2008.
[71]
D.-S. Pham, S. Venkatesh, M. Lazarescu, and S. Budhaditya. Anomaly detection in large-scale data stream networks. Data Mining and Knowledge Discovery, 28(1):145--189, 2014.
[72]
L. Plissonneau and E. Biersack. A longitudinal view of http video streaming performance. In Proceedings of the 3rd Multimedia Systems Conference, pages 203--214. ACM, 2012.
[73]
C. E. Rasmussen. Gaussian processes in machine learning. In Summer School on Machine Learning, pages 63--71. Springer, 2003.
[74]
H. Ren, B. Xu, Y. Wang, C. Yi, C. Huang, X. Kou, T. Xing, M. Yang, J. Tong, and Q. Zhang. Time-series anomaly detection service at microsoft. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3009--3017, 2019.
[75]
A. Siffer, P.-A. Fouque, A. Termier, and C. Largouet. Anomaly detection in streams with extreme value theory. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1067--1075, 2017.
[76]
H. H. Song, Z. Ge, A. Mahimkar, J. Wang, J. Yates, Y. Zhang, A. Basso, and M. Chen. Q-score: Proactive service quality assessment in a large iptv system. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, pages 195--208. ACM, 2011.
[77]
M. Theodosiou. Forecasting monthly and quarterly time series using stl decomposition. International Journal of Forecasting, 27(4):1178--1195, 2011.
[78]
R. Torres, A. Finamore, J. R. Kim, M. Mellia, M. M. Munafo, and S. Rao. Dissecting video server selection strategies in the youtube cdn. In 2011 31st International Conference on Distributed Computing Systems, pages 248--257. IEEE, 2011.
[79]
W. Wang, Z. Liu, X. Shi, and L. Pierce. Online fdr controlled anomaly detection for streaming time series. 2019.
[80]
W.-K. Wong and D. B. Neill. Tutorial on event detection kdd 2009.
[81]
D. Xin, H. Miao, A. Parameswaran, and N. Polyzotis. Production machine learning pipelines: Empirical analysis and optimization opportunities, 2021.
[82]
D. Xin, E. Y. Wu, D. J.-L. Lee, N. Salehi, and A. Parameswaran. Whither automl? understanding the role of automation in machine learning workflows. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1--16, 2021.
[83]
H. Xu, W. Chen, N. Zhao, Z. Li, J. Bu, Z. Li, Y. Liu, Y. Zhao, D. Pei, Y. Feng, et al. Unsupervised anomaly detection via variational auto-encoder for seasonal kpis in web applications. In Proceedings of the 2018 World Wide Web Conference, pages 187--196, 2018.
[84]
Z. Xu, K. Kersting, and L. Von Ritter. Stochastic online anomaly analysis for streaming time series. In IJCAI, pages 3189--3195, 2017.
[85]
H. Yin, X. Liu, F. Qiu, N. Xia, C. Lin, H. Zhang, V. Sekar, and G. Min. Inside the bird's nest: Measurements of large-scale live vod from the 2008 olympics. In Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, IMC '09, pages 442--455, New York, NY, USA, 2009. ACM.
[86]
X. Yin, A. Jindal, V. Sekar, and B. Sinopoli. A control-theoretic approach for dynamic adaptive video streaming over http. In ACM SIGCOMM Computer Communication Review, volume 45, pages 325--338. ACM, 2015.
[87]
M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, et al. Apache spark: a unified engine for big data processing. Communications of the ACM, 59(11):56--65, 2016.
[88]
G. P. Zhang. Time series forecasting using a hybrid arima and neural network model. Neurocomputing, 50:159--175, 2003.
[89]
Z.-G. Zhou and P. Tang. Improving time series anomaly detection based on exponentially weighted moving average (ewma) of season-trend model residuals. In 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pages 3414--3417. IEEE, 2016.

Cited By

View all
  • (2023)Don't Forget the UserProceedings of the 22nd ACM Workshop on Hot Topics in Networks10.1145/3626111.3630095(109-116)Online publication date: 28-Nov-2023
  • (2022)Temporal Correlation of Internet Observatories and Outposts2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW55747.2022.00054(247-254)Online publication date: May-2022

Index Terms

  1. The shape of view: an alert system for video viewership anomalies

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    IMC '21: Proceedings of the 21st ACM Internet Measurement Conference
    November 2021
    768 pages
    ISBN:9781450391290
    DOI:10.1145/3487552
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    In-Cooperation

    • USENIX Assoc: USENIX Assoc

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 November 2021

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    IMC '21
    IMC '21: ACM Internet Measurement Conference
    November 2 - 4, 2021
    Virtual Event

    Acceptance Rates

    Overall Acceptance Rate 277 of 1,083 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)199
    • Downloads (Last 6 weeks)24
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Don't Forget the UserProceedings of the 22nd ACM Workshop on Hot Topics in Networks10.1145/3626111.3630095(109-116)Online publication date: 28-Nov-2023
    • (2022)Temporal Correlation of Internet Observatories and Outposts2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW55747.2022.00054(247-254)Online publication date: May-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media