Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Enhanced Machine Learning Sketches for Network Measurements

Published: 01 April 2023 Publication History

Abstract

Network monitoring and management require accurate statistics of a variety of flow-level metrics such as flow sizes, top-<inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq1-3185560.gif"/></alternatives></inline-formula> flows, and number of flows. Arguably, the current best technique to measure these metrics is sketches. While a significant amount of work has already been done on sketching techniques, there is still a lot of room for improvement because the accuracy of existing sketches varies with changing characteristics of network traffic. In this paper, we propose the idea of using machine learning to improve the accuracy of sketches, and propose a <italic>generic machine learning framework</italic> to reduce the dependence of accuracy of sketches on network traffic characteristics. We further present three case studies, where we applied our machine learning framework on sketches for measuring three flow-level network metrics, namely flow sizes, top-<inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq2-3185560.gif"/></alternatives></inline-formula> flows, and number of flows. We implemented and extensively evaluated this framework for these three metrics using both real-world and synthetic traffic traces. To the best of our knowledge, this is the first work that uses machine learning to reduce the dependence of sketching techniques on the characteristics of network traffic. We have released all our traces and implementation codes at Github.

References

[1]
DPDK websit, [Online]. Available: http://dpdk.org/
[2]
“Experimental and implemention codes,” [Online]. Available: https://github.com/spartazhihu/ML-Sketch
[3]
N. Bandi, A. Metwally, D. Agrawal, and A. El Abbadi, “Fast data stream algorithms using associative memories,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2007, pp. 247–256.
[4]
Z. Bar-Yossef, T. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan, “Counting distinct elements in a data stream,” Randomization and Approximation Techniques in Computer Science. Berlin, Germany: Springer, pp. 1–10, 2002.
[5]
G. Bianchi, K. Duffy, D. Leith, and V. Shneer, “Modeling conservative updates in multi-hash approximate count sketches,” in Proc. Int. Teletraffic Congr., 2012, pp. 1–8.
[6]
M. Charikar, K. Chen, and M. Farach-Colton, “Finding frequent items in data streams,” in Automata, Languages and Programming, Berlin, Germany: Springer, 2002.
[7]
G. Cormode, “Sketch techniques for approximate query processing,” in Foundations Trends in Databases, Boston, MA, USA: NOW Publishers, 2011.
[8]
G. Cormode and M. Garofalakis, “Sketching streams through the net: Distributed approximate query tracking,” in Proc. 31st Int. Conf. Very Large Data Bases, 2005, pp. 13–24.
[9]
G. Cormode and S. Muthukrishnan, “An improved data stream summary: The count-min sketch and its applications,” J. Algorithms, vol. 55, vol. 1, pp. 58–75, 2005.
[10]
G. Cormode and S. Muthukrishnan, “Summarizing and mining skewed data streams,” in Proc. SIAM Data Mining Conf., 2005.
[11]
G. Cormode and S. Muthukrishnan, “What's hot and what's not: Tracking most frequent items dynamically,” ACM Trans. Database Syst., vol. 30, vol. 1, pp. 249–278, 2005.
[12]
G. Cormode and S. Muthukrishnan, “What's new: Finding significant differences in network data streams,” IEEE/ACM Trans. Netw., vol. 13, vol. 6, pp. 1219–1232, 2005.
[13]
F. Deng and D. Rafiei, “New estimation algorithms for streaming data: Count-min can do more”, 2007.
[14]
M. Durand and P. Flajolet, “Loglog counting of large cardinalities,” in Proc. Eur. Symp. Algorithms, 2003, pp. 605–617.
[15]
C. Estan and G. Varghese, “New directions in traffic measurement and accounting,” Comput. Commun. Rev., vol. 32, no. 4, 2002.
[16]
L. Fan, P. Cao, J. Almeida, and A. Z. Broder, “Summary cache: A scalable wide-area web cache sharing protocol,” in Proc. ACM SIGCOMM Conf., 1998.
[17]
P. Flajolet, “On adaptive sampling,” Computing, vol. 43, vol. 4, pp. 391–400, 1990.
[18]
P. Flajolet, É. Fusy, O. Gandouet, and F. Meunier, “HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm,” AofA: Anal. Algorithms, pp. 137–156, 2007.
[19]
P. Flajolet and G. N. Martin, “Probabilistic counting algorithms for data base applications,” J. Comput. Syst. Sci., vol. 31, pp. 182–209, 1985.
[20]
P. B. Gibbons, “Distinct sampling for highly-accurate answers to distinct values queries and event reports,” Proc. VLDB, vol. 1, pp. 541–550, 2001.
[21]
K. Hu, H. K. Chandrikakutty, Z. Goodman, R. Tessier, and T. Wolf, “Dynamic hardware monitors for network processor protection,” IEEE Trans. Comput., vol. 65, vol. 3, pp. 860–872, Mar. 2016.
[22]
Q. Huanget al., “Sketchvisor: Robust network measurement for software packet processing,” in Proc. Conf. ACM Special Int. Group Data Commun., 2017, pp. 113–126.
[23]
Q. Huang and P. P. Lee, “LD-sketch: A distributed sketching design for accurate and scalable anomaly detection in network data streams,” in Proc IEEE Conf. Comput. Commun., 2014, pp. 1420–1428.
[24]
B. Krishnamurthy, S. Sen, Y. Zhang, and Y. Chen, “Sketch-based change detection: Methods, evaluation, and applications,” in Proc. 3rd ACM SIGCOMM Conf. Internet Meas., 2003, pp. 234–247.
[25]
B. Lahiri, J. Chandrashekar, and S. Tirthapura, “Space-efficient tracking of persistent items in a massive data stream,” in Proc. 5th ACM Int. Conf. Distrib. Event-Based Syst., 2011, pp. 255–266.
[26]
Y. Li, R. Miao, C. Kim, and M. Yu, “FlowRadar: A better netflow for data centers,” in Proc. 13th USENIX Conf. Netw. Syst. Des. Implementation, 2016, pp. 311–324.
[27]
Z. Liuet al., “One sketch to rule them all: Rethinking network flow monitoring with univmon,” in Proc. ACM SIGCOMM Conf., 2016, pp. 101–114.
[28]
Z. Liu, G. Vorsanger, V. Braverman, and V. Sekar, “Enabling a “RISC” approach for software-defined monitoring using universal streaming,” in Proc. 14th ACM Workshop Hot Top. Netw., 2015, Art. no.
[29]
Y. Luet al., “Counter braids: A novel counter architecture for per-flow measurement,” in Proc. ACM SIGMETRICS Conf., 2008.
[30]
C. Monsantoet al., “Composing software defined networks,” in Proc. 10th USENIX Conf. Netw.ed Syst. Des. Implementation, 2013, pp. 1–13.
[31]
M. Moshref, M. Yu, R. Govindan, and A. Vahdat, “SCREAM: Sketch resource allocation for software-defined measurement,” in Proc. 11th ACM Conf. Emerg. Netw. Experiments Technol., 2015, Art. no.
[32]
M. Moshref, M. Yu, R. Govindan, and A. Vahdat, “Trumpet: Timely and precise triggers in data centers,” in Proc. ACM SIGCOMM Conf., 2016, pp. 129–143.
[33]
P. Roy, A. Khan, and G. Alonso, “Augmented sketch: Faster and more accurate stream processing,” in Proc. Int. Conf. Manage. Data, 2016, pp. 1449–1463.
[34]
J. Sanjuàs-Cuxart, P. Barlet-Ros, N. Duffield, and R. R. Kompella, “Sketching the delay: Tracking temporally uncorrelated flow-level latencies,” in Proc. ACM SIGCOMM Internet Meas. Conf., 2011, pp. 483–498.
[35]
V. Sekar, M. K. Reiter, W. Willinger, H. Zhang, R. R. Kompella, and D. G. Andersen, “CSAMP: A system for network-wide flow monitoring,” in Proc. USENIX Symp. Netw. Syst. Des. Implementation, 2008, pp. 233–246.
[36]
M. Shahzad and A. X. Liu, “Noise can help: Accurate and efficient per-flow latency measurement without packet probing and time stamping,” in Proc. ACM SIGMETRICS Conf., 2014.
[37]
U. Srivastava and J. Widom, “Flexible time management in data stream systems,” in Proc. 23rd ACM SIGMOD-SIGACT-SIGART Symp. Princ. Database Syst., 2004, pp. 263–274.
[38]
M. Wang, B. Li, and Z. Li, “sFlow: Towards resource-efficient and agile service federation in service overlay networks,” in Proc IEEE Int. Conf. Distrib. Comput. Syst., 2004, pp. 628–635.
[39]
T. Yanget al., “Elastic sketch: Adaptive and fast network-wide measurements,” in Proc. Conf. ACM Special Int. Group Data Commun., 2018, pp. 561–575.
[40]
M. Yu, L. Jose, and R. Miao, “Software defined traffic measurement with opensketch,” in Proc. USENIX Symp. Netw. Syst. Des. Implementation, 2013, pp. 29–42.
[41]
M. Yu, L. Jose, and R. Miao, “Software defined traffic measurement with opensketch,” in Proc. USENIX Conf. Netw. Syst. Des. Implementation, 2013, pp. 29–42.
[42]
M. Zhang, C. Yi, B. Liu, and B. Zhang, “GreenTE: Power-aware traffic engineering,” in Proc IEEE Int. Conf. Netw. Protoc., 2010, pp. 21–30.
[43]
N. Zhang, R. Bettati, W. Yu, W. Zhao, and X. Fu, “Localization attacks to internet threat monitors: Modeling and countermeasures,” IEEE Trans. Comput., vol. 9, no. 12, pp. 1655–1668, Dec. 2010.
[44]
T. Zhang, “Solving large scale linear prediction problems using stochastic gradient descent algorithms,” in Proc. ACM Int. Conf. Mach. Learn., 2004.
[45]
Y. Zhanget al., “CocoSketch: High-performance sketch-based measurement over arbitrary partial key query,” in Proc. ACM SIGCOMM 2021 Conf., 2021, pp. 207–222.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 72, Issue 4
April 2023
323 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 April 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media