Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3001460.3001495guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A linear method for deviation detection in large databases

Published: 02 August 1996 Publication History

Abstract

We describe the problem of finding deviations in large data bases. Normally, explicit information outside the data, like integrity constraints or predefined patterns, is used for deviation detection. In contrast, we approach the problem from the inside of the data, using the implicit redundancy of the data.
We give a formal description of the problem and present a linear algorithm for detecting deviations. Our solution simulates a mechanism familiar to human beings: after seeing a series of similar data, an element disturbing the series is considered an exception. We also present experimental results from the application of this algorithm on real-life datasets showing its effectiveness.

References

[1]
Agrawal, R., and Srikant, R. 1994. Fast algorithms for mining association rules. In Proceedings of the VLDB Conference.
[2]
Agrawal, R.; Imielinski, T.; and Swami, A. 1993. Database mining: A performance perspective. IEEE Transactions on Knowledge and Data Engineering 5(6):914-925.
[3]
Aha, D. W.; Kibler, D.; and Albert, M. K. 1991. Instance-based learning algorithms. Machine Learning 6(1):37-66.
[4]
Angluin, D., and Laird, P. 1988. Learning from noisy examples. Machine Learning 2(4):343-370.
[5]
Arning, A. 1995. Fehlersuche in großen Datenmengen unter Verwendung der in den Daten vorhandenen Redundanz. PhD dissertation, Universität Osnabrück, Fachbereich Sprach- und Literaturwissenschaft.
[6]
Chamberlin, D. 1996. Using the New DB2: IBM's Object-Relational Database System. Morgan Kaufmann.
[7]
Fisher, D. H. 1987. Knowledge acquisition via incremental conceptual clustering. Machine Learning 2(2):139-172.
[8]
Garey, M., and Johnson, D. 1979. Computers and Intractability: a guide to the theory of NP-completeness. W. H. Freeman.
[9]
Hanson, S. J., and Bauer, M. 1989. Conceptual clustering, categorization, and polymorphy. Machine Learning 3(4):343-372.
[10]
Hoaglin, D.; Mosteller, F.; and Tukey, J. 1983. Understanding Robust and Exploratory Data Analysis. New York: John Wiley.
[11]
Johnson, R. 1992. Applied Multivariate Statistical Analysis. Prentice Hall.
[12]
Li, M., and Vitanyi, P. 1991. Kolmogorov Complexity. Springer Verlag.
[13]
Michalski, R. S., and Stepp, R. E. 1983. Learning from observation: conceptual clustering. In Michalski et al. (1983). 331-363.
[14]
Michalski, R. S.; Carbonell, J. G.; and Mitchell, T. M., eds. 1983. Machine Learning: An Artificial Intelligence Approach, volume I. Los Altos, California: Morgan Kaufmann.
[15]
Quinlan, J. R. 1986. Induction of decision trees. Machine Learning 1(1):81-106.
[16]
Rissanen, J. 1989. Stochastic Complexity in Statistical Inquiry. World Scientific Publ. Co.
[17]
Rumelhart, D. E., and Zipser, D. 1985. Feature discovery by competitive learning. Cognitive Science 9:75-112.
[18]
Shavlik, J. W., and Dietterich, T. G., eds. 1990. Readings in Machine Learning, Series in Machine Learning. Morgan Kaufmann.
[19]
Vality Technology Inc. 1995. Integrity product Overview.
[20]
Valiant, L. G. 1984. A theory of the learnable. Communications of the ACM 27(11):1134-1142.

Cited By

View all
  1. A linear method for deviation detection in large databases

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    KDD'96: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining
    August 1996
    387 pages

    Sponsors

    • AAAI: American Association for Artificial Intelligence

    Publisher

    AAAI Press

    Publication History

    Published: 02 August 1996

    Author Tags

    1. data mining
    2. deviation
    3. error
    4. exception
    5. knowledge discovery

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Uni-DetectProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3319855(811-828)Online publication date: 25-Jun-2019
    • (2018)Auto-DetectProceedings of the 2018 International Conference on Management of Data10.1145/3183713.3196889(1377-1392)Online publication date: 27-May-2018
    • (2017)Local search methods for k-means with outliersProceedings of the VLDB Endowment10.14778/3067421.306742510:7(757-768)Online publication date: 1-Mar-2017
    • (2017)Bio-inspired algorithm for outliers detectionMultimedia Tools and Applications10.1007/s11042-017-4443-176:24(25659-25677)Online publication date: 1-Dec-2017
    • (2016)OPTIMAJournal of Network and Systems Management10.1007/s10922-015-9362-824:4(859-883)Online publication date: 1-Oct-2016
    • (2012)Fast and reliable anomaly detection in categorical dataProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2396816(415-424)Online publication date: 29-Oct-2012
    • (2012)Dealing with dishonest recommendationAd Hoc Networks10.1016/j.adhoc.2011.07.01410:8(1603-1618)Online publication date: 1-Nov-2012
    • (2008)Detecting outliers in categorical record databases based on attribute associationsProceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development10.5555/1791734.1791750(111-123)Online publication date: 26-Apr-2008
    • (2006)Online outlier detection in sensor data using non-parametric modelsProceedings of the 32nd international conference on Very large data bases10.5555/1182635.1164145(187-198)Online publication date: 1-Sep-2006
    • (2005)A New Algorithm for Finding Minimal Sample Uniques for Use in Statistical Disclosure AssessmentProceedings of the Fifth IEEE International Conference on Data Mining10.1109/ICDM.2005.10(290-297)Online publication date: 27-Nov-2005
    • Show More Cited By

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media