Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Temporal sequence learning and data reduction for anomaly detection

Published: 01 August 1999 Publication History

Abstract

The anomaly-detection problem can be formulated as one of learning to characterize the behaviors of an individual, system, or network in terms of temporal sequences of discrete data. We present an approach on the basis of instance-based learning (IBL) techniques. To cast the anomaly-detection task in an IBL framework, we employ an approach that transforms temporal sequences of discrete, unordered observations into a metric space via a similarity measure that encodes intra-attribute dependencies. Classification boundaries are selected from an a posteriori characterization of valid user behaviors, coupled with a domain heuristic. An empirical evaluation of the approach on user command data demonstrates that we can accurately differentiate the profiled user from alternative users when the available features encode sufficient information. Furthermore, we demonstrate that the system detects anomalous conditions quickly — an important quality for reducing potential damage by a malicious user. We present several techniques for reducing data storage requirements of the user profile, including instance-selection methods and clustering. As empirical evaluation shows that a new greedy clustering algorithm reduces the size of the user model by 70%, with only a small loss in accuracy.

References

[1]
AHA, D. W., KIBLER, D., AND ALBERT, M. K. 1991. Instance-based learning algorithms. Mach. Learn. 6, 1 (Jan. 1991), 37-66.
[2]
ANGLUIN, D. 1987. Learning regular sets from queries and counterexamples. Inf. Comput. 75, 2 (Nov. 1, 1987), 87-106.
[3]
ASLAM, J. A. AND RIVEST, R. L. 1990. Inferring graphs from walks. In Proceedings of the 3rd Annual Workshop on Computational Learning Theory (COLT '90, Rochester, NY, Aug. 6-8), M. Fulk and J. Case, Eds. Morgan Kaufmann Publishers Inc., San Francisco, CA, 359-370.
[4]
BALASUBRAMANIYAN, J. S., GARCIA-FERNANDEZ, J. O., ISACOFF, D., SPAFFORD, E., AND ZAMBONI, D. 1998. An architecture for intrusion detection using autonomous agents. Tech. Rep. COAST TR 98/05. Purdue University, West Lafayette, IN.
[5]
BOLLOB S, B., DAS, G., GUNOPULOS, D., AND MANNILA, H. 1997. Time-series similarity problems and well-separated geometric sets. In Proceedings of the 13th Annual Symposium on Computational Geometry (Nice, France, June 4-6, 1997), J.-D. Boissonnat, Ed. ACM Press, New York, NY, 454-456.
[6]
CASELLA, G. AND BERGER, R. L. 1990. Statistical Inference. Brooks-Cole, CA.
[7]
CHARNIAK, E. 1997. Statistical techniques for natural language parsing. AI Mag. 18, 4, 33-43.
[8]
CHENOWETH, T. AND OBRADOVIC, Z. 1996. A multi-component nonlinear prediction system for the S&P 500 index. Neurocomputing 10, 3, 275-290.
[9]
CORMEN, T. H., LEISERSON, C. E., AND RIVEST, R. L. 1990. Introduction to Algorithms. MIT Press, Cambridge, MA.
[10]
DAS, G., GUNOPULOS, D., AND MANNILA, H. 1997. Finding similar time series. In Proceedings of the ACM SIGMOD International Workshop on Data Mining and Knowledge Discovery (Aug.), R. Ng, Ed. ACM Press, New York, NY.
[11]
DENNING, D. E. 1987. An intrusion-detection model. IEEE Trans. Softw. Eng. 13, 2, 222-232.
[12]
DOMINGOS, P. 1995. Rule induction and instance-based learning: A unified approach. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (AAAI-95, Montreal, Que., Canada). Morgan Kaufmann, San Mateo, CA, 1226-1232.
[13]
FARMER, D. AND VENEMA, W. 1995. SATAN overview (Security Administrator Tool for Analyzing Networks).
[14]
FREUND, Y. AND SCHAPIRE, R. E. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 1, 119-139.
[15]
FUKUNAGA, Z. 1990. Introduction to Statistical Pattern Recognition. 2ND Academic Press Prof., Inc., San Diego, CA.
[16]
GORDON, S. 1996. Current computer virus threats, countermeasures, and strategic solutions, White paper. McAfee Associates.
[17]
HEBERLEIN, L. T., DIAS, G. V., LEVITT, K. N., MUKHERJEE, B., WOOD, J., AND WOLBER, D. 1990. A network security monitor. In Proceedings of the IEEE Symposium on Research in Security and Privacy (Oakland, CA). IEEE Computer Society Press, Los Alamitos, CA, 296-30304.
[18]
IBA, G.A. 1979. Learning disjunctive concepts from examples. Master's Thesis. MIT Press, Cambridge, MA.
[19]
KUMAR, S. 1995. Classification and detection of computer intrusions. Ph.D. Dissertation. Purdue University, West Lafayette, IN.
[20]
KUMAR, S. AND SPAFFORD, E. 1994. An application of pattern matching in intrusion detection. Tech. Rep. CSD-TR-94-013. Purdue University, West Lafayette, IN.
[21]
LANE, T. 1999. Hidden Markov models for human/computer interface modeling. In Proceedings of the IJCAI-99 Workshop on Learning About Users. 35-44.
[22]
LANE, T. AND BRODLEY, C. E. 1997a. An application of machine learning to anomaly detection. In Proceedings of the 20th National Conference on National Information Systems Security. Vol.1 (Baltimore, MD). National Institute of Standards and Technology, Gaithersburg, MD, 366-380.
[23]
LANE, T. AND BRODLEY, C. E. 1997b. Sequence matching and learning in anomaly detection for computer security. In Proceedings of the AAAI-97 Workshop on AI Approaches to Fraud Detection and Risk Management (AAAI-97). 43-49.
[24]
LANE, T. AND BRODLEY, C. E. 1998. Approaches to online learning and concept drift for user identification in computer security. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. 259-263.
[25]
LUNT, T. F. AND JAGANNATHAN, R. 1988. A prototype real-time intrusion-detection expert system. In Proceedings of the IEEE Symposium on Research in Security and Privacy. IEEE Computer Society Press, Los Alamitos, CA, 59-66.
[26]
MOON, T. K. 1996. The expectation-maximization algorithm. IEEE Trans. Signal Process. 44, 1, 47-59.
[27]
NORTON, S.W. 1994. Learning to recognize promoter sequences in E. coli by modeling uncertainty in the training data. In Proceedings of the 12th National Conference on Artificial Intelligence (vol. 1) (AAAI '94, Seattle, WA, July 31-Aug. 4, 1994), B. Hayes-Roth and R. E. Korf, Eds. Amer. Assn. for Artificial Intelligence, Menlo Park, CA, 657-663.
[28]
OPPENHEIM, A. V. AND SCHAFER, R.W. 1989. Discrete-Time Signal Processing. Prentice-Hall, Inc., Upper Saddle River, NJ.
[29]
PORRAS, P. AND NEUMANN, P. 1997. EMERALD: Event monitoring enabling responses to anomalous live disturbances. In Proceedings of the 20th National Conference on National Information Systems Security. Vol.1 (Baltimore, MD). National Institute of Standards and Technology, Gaithersburg, MD, 353-365.
[30]
PROVOST, F. AND FAWCETT, T. 1998. Robust classification systems for imprecise environments. In Proceedings of the 15th National Conference on Artificial Intelligence: Innovative Applications of Artificial Intelligence (AAAI '98/IAAI '98, July 26-30, 1998, Madison, WI), J. Mostow, C. Rich, and B. Buchanan, Eds. Amer. Assn. for Artificial Intelligence, Menlo Park, CA, 706-713.
[31]
QUINLAN, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA.
[32]
RABINER, L. AND JUANG, B.-H. 1993. Fundamentals of Speech Recognition. Prentice-Hall signal processing series. Prentice-Hall, Inc., Upper Saddle River, NJ.
[33]
RABINER, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2 (Feb.).
[34]
RIPLEY, B. D. 1996. Pattern Recognition and Neural Networks. Cambridge University Press, New York, NY.
[35]
RIVEST, R. L. AND SCHAPIRE, R. E. 1989. Inference of finite automata using homing sequences. In Proceedings of the 21st Annual ACM Symposium on Theoretical Computing. 411-420.
[36]
SALZBERG, S. 1991. A nearest hyperrectangle learning method. Mach. Learn. 6, 3 (May 1991), 251-276.
[37]
SALZBERG, S. 1995. Locating protein coding regions in human DNA using a decision tree algorithm. J. Comput. Biology 2, 3, 473-485.
[38]
SCHAFFER, C. 1994. Cross-validation, stacking, and bi-level methods for stacking: Metamethods for classification learning. In Selecting Models from Data: Artificial Intelligence and Statistics, P. Cheeseman and W. Oldford, Eds. Springer-Verlag, Vienna, Austria.
[39]
SMAHA, S. E. 1988. Haystack: An intrusion detection system. In Proceedings of the Fourth Conference on Aerospace Computer Security Applications. 37-44.
[40]
SRIKANT, R. AND AGRAWAL, R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the Fifth International Conference on Extending Database Technology (Avignon, France).
[41]
STANIFORD-CHEN, S., CHEUNG, S., CRAWFORD, R., DILGER, M., FRANK, J., HOAGLAND, J., LEVITT, K., WEE, C., YIP, R., AND ZERKLE, D. 1996. GrIDS--a graph-based intrusion detection system for large networks. In Proceedings of the 19th Conference on National Information Systems Security (Oct.). National Institute of Standards and Technology, Gaithersburg, MD.
[42]
WILSON, D. R. AND MARTINEZ, T. R. 1999. Reduction techniques for exemplar-based learning algorithms. Mach. Learn. 34.

Cited By

View all
  • (2024)Anomaly Detection in IoT: Recent Advances, AI and ML Perspectives and ApplicationsAnomaly Detection - Recent Advances, AI and ML Perspectives and Applications10.5772/intechopen.111944Online publication date: 17-Jan-2024
  • (2024)Mouse Dynamics Behavioral Biometrics: A SurveyACM Computing Surveys10.1145/364031156:6(1-33)Online publication date: 24-Jan-2024
  • (2023)Anomaly Detection of High-Frequency Sensing Data in Transportation Infrastructure Monitoring System Based on Fine-Tuned ModelIEEE Sensors Journal10.1109/JSEN.2023.325450623:8(8630-8638)Online publication date: 15-Apr-2023
  • Show More Cited By

Index Terms

  1. Temporal sequence learning and data reduction for anomaly detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information and System Security
    ACM Transactions on Information and System Security  Volume 2, Issue 3
    Aug. 1999
    122 pages
    ISSN:1094-9224
    EISSN:1557-7406
    DOI:10.1145/322510
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 August 1999
    Published in TISSEC Volume 2, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. anomaly detection
    2. clustering
    3. data reduction
    4. empirical evaluation
    5. instance based learning
    6. machine learning
    7. user profiling

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)208
    • Downloads (Last 6 weeks)17
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Anomaly Detection in IoT: Recent Advances, AI and ML Perspectives and ApplicationsAnomaly Detection - Recent Advances, AI and ML Perspectives and Applications10.5772/intechopen.111944Online publication date: 17-Jan-2024
    • (2024)Mouse Dynamics Behavioral Biometrics: A SurveyACM Computing Surveys10.1145/364031156:6(1-33)Online publication date: 24-Jan-2024
    • (2023)Anomaly Detection of High-Frequency Sensing Data in Transportation Infrastructure Monitoring System Based on Fine-Tuned ModelIEEE Sensors Journal10.1109/JSEN.2023.325450623:8(8630-8638)Online publication date: 15-Apr-2023
    • (2023)Machine Learning Techniques for Anomaly Detection Application DomainsParadigms of Smart and Intelligent Communication, 5G and Beyond10.1007/978-981-99-0109-8_8(129-147)Online publication date: 24-May-2023
    • (2023)Anomaly Detection in Sliding Windows Using Dissimilarity Metrics in Time Series Data4th International Conference on Artificial Intelligence and Applied Mathematics in Engineering10.1007/978-3-031-31956-3_54(640-651)Online publication date: 27-May-2023
    • (2022)Recent Advances in Artificial Intelligence and Tactical Autonomy: Current Status, Challenges, and PerspectivesSensors10.3390/s2224991622:24(9916)Online publication date: 16-Dec-2022
    • (2022)An Analytical Investigation of Anomaly Detection Methods Based on Sequence to Sequence Model in Satellite Power SubsystemSensors10.3390/s2205181922:5(1819)Online publication date: 25-Feb-2022
    • (2022)A Pattern Dictionary Method for Anomaly DetectionEntropy10.3390/e2408109524:8(1095)Online publication date: 9-Aug-2022
    • (2022)An adaptive estimation method with exploration and exploitation modes for non-stationary environmentsPattern Recognition10.1016/j.patcog.2022.108702129:COnline publication date: 1-Sep-2022
    • (2022)A comparative study on outlier detection techniques for noisy production data from unconventional shale reservoirsJournal of Natural Gas Science and Engineering10.1016/j.jngse.2022.104720105(104720)Online publication date: Sep-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media