Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Active learning for P2P traffic identification

  • Published:
Peer-to-Peer Networking and Applications Aims and scope Submit manuscript

Abstract

P2P traffic identification methods by using machine learning have been provided in a great number of works, which suffer from a large and representative labeled sample set. To overcome the sample labeling problem, a new P2P traffic identification approach by active learning called P2PTIAL is presented. P2PTIAL is composed of two parts: support vector machine as learner and uncertainty selection based on distance. In order to improve the effectiveness of P2PTIAL, we add filtering policy and balanced policy to P2PTIAL. Firstly, we use support vector data description (SVDD) theory to filter some unlabeled samples, which have little contribution on active learning, and so it can save computation cost and storage space. Secondly, we use the unlabeled sample’s pre-labeled information to develop balanced policy, which can keep balanced learning. Lastly, we support our design with extensive simulation experiments, and our results show P2PTIAL is feasible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Chen ZQ, Delis A, Wei P (2008) Identification and management of sessions generated by instant messaging and peer-to-peer systems. International Journal of Cooperative Information Systems 3:1–50

    Article  MATH  Google Scholar 

  2. Karagiannis T, Broido A, Faloutsos M, Claffy K (2004) Transport layer identification of P2P traffic. IMC’04, Taormina, sicily, Italy, pp 121–134

  3. Sen S, Spatscheck O, Wang D (2004) Accurate scalable in-network identification of P2P traffic using application signatures. In: Proc. of the 13th international conference on WWW, New York, USA, pp 512–521

  4. Moore AW, Papagiannaki K (2005) Toward the accurate identification of network applications. Springer-Verlag, Heidelberg, pp 41–54

    Google Scholar 

  5. Satoshi O, Yoichi H, Matsuaki T, Konosuke K (2005) A traffic identification method and evaluations for a pure P2P application. PAM 2005, LNCS 3431, pp 55–68

  6. Thomas K, Andre B, Michalis F, Kimberly C, Claffy (2004) Transport layer identification of P2P traffic. In: Proc. of the 4th ACM SIGCOMM conference on Internet measurement, Sicily, Italy: ACM Press, pp 121–134

  7. Kyoungwon S, Figueiredo DR, Kurose J, Don T (2006) Characterizing and detecting skype-relayed traffic. In:Proc. of IEEE Conference on computer Communications, pp 1–12

  8. Karagiannis T, Papagiannaki K, Faloutsos M (2005) BLINC: multilevel traffic classification in the dark. ACM SIGCOMM, pp 229–240

  9. Mdhukar A, Wiliamson C. (2006) A longitudinal study of P2P traffic classification. The 14th IEEE Int’1 Symp on Modeling, Analysis, Simulation of Computer and Telecommunication Systems, pp 179–188

  10. Chen ZQ, Zhang Y, Chen ZR, Delis A (2009) A digest and pattern matching-based intrusion detection engine. Comput J 3:1–25

    Google Scholar 

  11. Marco M, Antonio P, Luca S (2009) Traffic classification and its applications to modern networks. Comput Netw 53(6):759–76

    Article  Google Scholar 

  12. McGregor A, Hall M, Lorier P, Brunskill J (2004) Flow clustering using machine learning techniques. In: Proc. of 5th passive Active measurement Workshop (PAM), pp 205–214

  13. Bernaille L, Teixeira R, Salamatian K (2006) Early application identification. Proc. of 2006 ACM CoNEXT, Lisboa, pp 1–12

    Google Scholar 

  14. Erman J, Arlitt M, Mahanti A (2006) Traffic classification using clustering algorithms. Proc. of the SIGCOMM Workshop on Mining Network Data, Pisa, pp 281–286

    Google Scholar 

  15. Zuev D, Moore AW (2005) Traffic classification using a statistical approach. Springer-Verlag, Heidelberg, pp 321–324

    Google Scholar 

  16. Moore AW, Zuev D (2005) Internet traffic classification using Bayesian analysis techniques. In: Proc. of the 2005 ACM SIGMETRICS, pp 50–60

  17. Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proc. of the 20th International Conference on Machine Learning, pp 856–863

  18. Auld T, Moore AW, Gull SF (2007) Bayesian neural networks for internet traffic classification. IEEE Transaction on Neural Network 18(1):223–239

    Article  Google Scholar 

  19. Alice E, Francesco G, Luca S (2009) Support vector machines for TCP traffic classification. Comput Netw 53:2476–2490

    Article  MATH  Google Scholar 

  20. Li Z, Yuan RX, Guan XH (2007) Accurate classification of the internet traffic based on the SVM method. In:Proc. of IEEE Int Conference on Communications, Glasgow, Scotland, pp 1373–1378

  21. Li X, Feng Q, Xu D, Qiu XS (2011) An internet traffic classification method based on semi-supervised support vector machine. In:Proc. of IEEE Int. Conference on Communications, Kyoto, Japan, pp 1–5

  22. Yang G, Yuan L, He Y (2012) Timely traffic identification on P2P streaming media. Journal of China University of Posts and Telecommunications 19(2):67–73

    Article  Google Scholar 

  23. Huang NF, Jai GY, Chao HC, Tzang YJ, Chang HY (2013) Application traffic classification at the early stage by characterizing application rounds. Inf Sci 232:130–142

    Article  Google Scholar 

  24. David MJT, Robert PWD (2004) Support vector data description. Mach Learn 54:45–66

    Article  MATH  Google Scholar 

  25. Burr S (2009) Active learning literature survey. Computer sciences technical report 1648

  26. Vapnik V (1995) The nature of statistical learning theory. Springer, New York

    Book  MATH  Google Scholar 

  27. Chen ZQ, Roussopoulos M, Liang ZY, Zhang Y, Chen ZR, Delis A (2012) Malware characteristics and threats on the internet ecosystem. J Syst Softw 85(7):1650–1672

    Article  Google Scholar 

  28. Sotiris K, Dimitris K, Panayiotis P (2006) Handing imbalanced datasets: a review. International Transactions on Computer Science and Engineering 30:1–12

    Google Scholar 

  29. LIBSVM Toolbox. http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Acknowledgments

The authors would like to thank the anonymous referees for their very valuable comments. This work was supported in part by the National Science Foundation of China under (Grant 60973140, 61170276, 61300170, 61373135, 71371012), the Key Project for Outstanding Young Talents in Higher Education Institutions of Anhui Province of China under Grant 2013 SQRL034ZD, the Natural Science Foundation of the Higher Education Institutions of Jiangsu Province of China under Grant 12 KJA520003, and the Innovation Fund for Technology based Enterprise of Jiangsu Province of China under Grant BC2013027.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to San-Min Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, SM., Sun, ZX. Active learning for P2P traffic identification. Peer-to-Peer Netw. Appl. 8, 733–740 (2015). https://doi.org/10.1007/s12083-014-0281-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12083-014-0281-3

Keywords