Abstract
Existing website fingerprinting techniques are not effective with video streaming traffic when the encrypted traffic contains multiple streams. This paper presents a deep learning-based source identification method for identifying multiple video sources within a single encrypted tunnel. The core contribution is a novel feature inspired by natural language processing (NLP) that allows existing NLP techniques to identify the source. The feature extraction method is described. A large dataset containing video streaming and web traffic is created to verify its effectiveness. Results are obtained by applying several NLP methods to show that the proposed method performs well on both binary and multilabel traffic classification problems. The work proves that the method can overcome the challenges given by mixed-traffic tunnels.
Similar content being viewed by others
Availability of data material
No.
Code availability
No.
References
Burroughs, B., Rugg, A.: Extending the broadcast: streaming culture and the problems of digital geographies. J. Broadcast. Electron. Media 58(3), 365–380 (2014)
Aceto, G., Ciuonzo, D., Montieri, A., Pescapé, A.: Mobile encrypted traffic classification using deep learning: experimental evaluation, lessons learned, and challenges. IEEE Trans. Netw. Serv. Manag. 16(2), 445–458 (2019)
Panchenko, A., Niessen, L., Zinnen, A., Engel, T.: Website fingerprinting in onion routing based anonymization networks. In: Proceedings of the 10th Annual ACM Workshop on Privacy in the Electronic Society, pp. 103–114 (2011)
Cai, X., Zhang, X.C., Joshi, B., Johnson, R.: Touching from a distance: website fingerprinting attacks and defenses. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 605–616 (2012)
Dyer, K.P., Coull, S.E., Ristenpart, T., Shrimpton, T.: Peek-a-Boo, I still see you: why efficient traffic analysis countermeasures fail. In: Proceedings of the 2012 IEEE Symposium on Security and Privacy, pp. 332–346 (2012)
Wang, T., Wang, G., Li, X., Zheng, H., Zhao, B.Y.: Characterizing and detecting malicious crowdsourcing. In: Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM, pp. 537–538 (2013)
Sirinam, P., Juarez, M., Imani, M., Wright, M.: Deep fingerprinting: undermining website fingerprinting defenses with deep learning. In: Proceedings of the ACM Conference on Computer and Communications Security, pp. 1928–1943 (2018).
Panchenko, A., et al.: Website Fingerprinting at Internet Scale, pp. 21–24 (2017)
Dubin, R., Dvir, A., Pele, O., Hadar, O.: I know what you saw last minute-encrypted HTTP adaptive video streaming title classification. IEEE Trans. Inf. Forensics Secur. 12(12), 3039–3049 (2017)
Rahman, M.S., Mathews, N., Wright, M.: Poster: video fingerprinting in tor. In: Proceedings of the ACM Conference on Computer and Communications Security (2019)
Cui, W., Chen, T., Fields, C., Chen, J., Sierra, A., Chan-Tin, E.: Revisiting assumptions for website fingerprinting attacks. In: AsiaCCS 2019—Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security, pp. 328–339 (2019)
Vinayakumar, R., Soman, K.P., Poornachandrany, P.: Secure shell (SSH) traffic analysis with flow based features using shallow and deep networks. In: 2017 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2017, vol. 2017, pp. 2026–2032 (2017)
Shi, Y., Biswas, S.: A deep-learning enabled traffic analysis engine for video source identification. In: 2019 11th International Conference on Communication Systems and Networks, COMSNETS 2019, pp. 15–21 (2019)
Cruz, M., Ocampo, R., Montes, I., Atienza, R.: Fingerprinting BitTorrent traffic in encrypted tunnels using recurrent deep learning. In: Proceedings—2017 5th International Symposium on Computing and Networking, CANDAR 2017, vol. 2018-Janua, pp. 434–438 (2018)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: 2016 IEEE Workshop on Spoken Language Technology, SLT 2016—Proceedings, pp. 414–419 (2013)
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP 2014—2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pp. 1746–1751 (2014)
Berger, M.J.: Large scale multi-label text classification with semantic word vectors. Tech. Rep., pp. 1–8 (2014)
Feilner, M.: OpenVPN: Building and Integrating Virtual Private Networks. Packt Publishing, Birmingham (2006)
Funding
None.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Shi, Y., Feng, D., Cheng, Y. et al. A natural language-inspired multilabel video streaming source identification method based on deep neural networks. SIViP 15, 1161–1168 (2021). https://doi.org/10.1007/s11760-020-01844-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-020-01844-8