Article

Robust representation for domain adaptation in network security

Authors:

Michal SofkaAuthors Info & Claims

ECMLPKDD'15: Proceedings of the 2015th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part III

Pages 116 - 132

Published: 07 September 2015 Publication History

Abstract

The goal of domain adaptation is to solve the problem of different joint distribution of observation and labels in the training and testing data sets. This problem happens in many practical situations such as when a malware detector is trained from labeled datasets at certain time point but later evolves to evade detection. We solve the problem by introducing a new representation which ensures that a conditional distribution of the observation given labels is the same. The representation is computed for bags of samples (network traffic logs) and is designed to be invariant under shifting and scaling of the feature values extracted from the logs and under permutation and size changes of the bags. The invariance of the representation is achieved by relying on a self-similarity matrix computed for each bag. In our experiments, we will show that the representation is effective for training detector of malicious traffic in large corporate networks. Compared to the case without domain adaptation, the recall of the detector improves from 0.81 to 0.88 and precision from 0.998 to 0.999.

References

[1]

Ben-David, S., Blitzer, J., Crammer, K., Pereira, F., et al.: Analysis of representations for domain adaptation. Advances in neural information processing systems 19, 137 (2007)

Digital Library

[2]

Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 120-128. Association for Computational Linguistics (2006)

Digital Library

[3]

Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., Wagner, D.: On modularity clustering. IEEE Transactions on Knowledge and Data Engineering 20(2), 172-188 (2008)

Digital Library

[4]

Chechik, G., Sharma, V., Shalit, U., Bengio, S.: Large scale online learning of image similarity through ranking. The Journal of Machine Learning Research 11, 1109-1135 (2010)

Digital Library

[5]

Chen, Y., Garcia, E.K., Gupta, M.R., Rahimi, A., Cazzanti, L.: Similarity-based classification: Concepts and algorithms. The Journal of Machine Learning Research 10, 747-776 (2009)

Digital Library

[6]

Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. The Journal of Machine Learning Research 7, 551-585 (2006)

Digital Library

[7]

Dai, W., Yang, Q., Xue, G.-R., Yu, Y.: Boosting for transfer learning. In: Proceedings of the 24th International Conference on Machine learning, pp. 193-200. ACM (2007)

Digital Library

[8]

Duan, L., Tsang, I.W., Xu, D.: Domain transfer multiple kernel learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(3) (2012)

Digital Library

[9]

Farnham, G., Leune, K.: Tools and standards for cyber threat intelligence projects. Technical report, SANS Institute InfoSec Reading Room, p. 10 (2013)

[10]

Gretton, A., Smola, A., Huang, J., Schmittfull, M., Borgwardt, K., Schölkopf, B.: Covariate shift by kernel mean matching. Dataset shift in machine learning (2009)

[11]

Iyer, A., Nath, S., Sarawagi, S.: Maximum mean discrepancy for class ratio estimation: convergence bounds and kernel selection. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 530-538 (2014)

Digital Library

[12]

Junejo, I.N., Dexter, E., Laptev, I., Perez, P.: View-independent action recognition from temporal self-similarities. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(1), 172-185 (2011)

Digital Library

[13]

Körner, M., Denzler, J.: Temporal self-similarity for appearance-based action recognition in multi-view setups. In: Wilson, R., Hancock, E., Bors, A., Smith, W. (eds.) CAIP 2013, Part I. LNCS, vol. 8047, pp. 163-171. Springer, Heidelberg (2013)

Digital Library

[14]

Müller, M., Clausen, C.: Transposition-invariant self-similarity matrices. In: Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR), pp. 47-50 (2007)

[15]

Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: Advances in Neural Information Processing Systems (NIPS), pp. 1410-1418 (2009)

Digital Library

[16]

Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference 90(2), 227-244 (2000)

[17]

Socher, R., Ganjoo, M., Manning, C.D., Ng, A.: Zero-shot learning through cross-modal transfer. In: Advances in Neural Information Processing Systems, pp. 935-943 (2013)

Digital Library

[18]

Zhang, K., Schölkopf, B., Muandet, K., Wang, Z.: Domain adaptation under target and conditional shift. In: Dasgupta, S., Mcallester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning (ICML 2013), JMLR Workshop and Conference Proceedings, vol. 28, pp. 819-827 (2013)

Digital Library

Cited By

Bartos KSofka MFranc V(2016)Learning invariant representation for malicious network traffic detectionProceedings of the Twenty-second European Conference on Artificial Intelligence10.3233/978-1-61499-672-9-1132(1132-1139)Online publication date: 29-Aug-2016
https://dl.acm.org/doi/10.3233/978-1-61499-672-9-1132

Index Terms

Robust representation for domain adaptation in network security
1. Security and privacy

Index terms have been assigned to the content through auto-classification.

Recommendations

Dictionary based sparse representation for domain adaptation
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Machine Learning algorithms are often as good as the data they can learn from. Enormous amount of unlabeled data is readily available and the ability to efficiently use such amount of unlabeled data holds a significant promise in terms of increasing the ...
Unsupervised Domain Adaptation for Static Malware Detection based on Gradient Boosting Trees
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Static malware detection is important for protection against malware by allowing for malicious files to be detected prior to execution. It is also especially suitable for machine learning-based approaches. Recently, gradient boosting decision trees (...
Semi-supervised and Compound Classification of Network Traffic
ICDCSW '12: Proceedings of the 2012 32nd International Conference on Distributed Computing Systems Workshops

This paper presents a new semi-supervised method to effectively improve traffic classification performance when few supervised training data are available. Existing semi supervised methods label a large proportion of testing flows as unknown flows due ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ECMLPKDD'15: Proceedings of the 2015th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part III

September 2015

340 pages

ISBN:9783319234601

Editors:
Albert Bifet
Huawei Noah's Ark Lab, Shatin, Hong Kong SAR
,
Michael May
Siemens AG Corporate Technology, München, Germany
,
Bianca Zadrozny
IBM Research Brazil, Rio de Janeiro, Brazil
,
Ricard Gavalda
Universitat Politècnica de Catalunya, Barcelona, Spain
,
Dino Pedreschi
Università di Pisa, Pisa, Italy

Sponsors

Huawei Technologies Co. Ltd.: Huawei Technologies Co. Ltd.
Zalando: Zalando
ONRGlobal: U.S. Office of Naval Research Global
BNPPARIBAS: BNP PARIBAS
Amazon: Amazon.com

Publisher

Springer

Gewerbestrasse 11 CH-6330, Cham (ZG), Switzerland

Publication History

Published: 07 September 2015

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bartos KSofka MFranc V(2016)Learning invariant representation for malicious network traffic detectionProceedings of the Twenty-second European Conference on Artificial Intelligence10.3233/978-1-61499-672-9-1132(1132-1139)Online publication date: 29-Aug-2016
https://dl.acm.org/doi/10.3233/978-1-61499-672-9-1132

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents