Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/347090.347123acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article
Free access

Efficient clustering of high-dimensional data sets with application to reference matching

Published: 01 August 2000 Publication History
  • Get Citation Alerts
  • First page of PDF

    References

    [1]
    H. Akaike. On entropy maximization principle. Applications of Statistics, pages 27-41, 1977.
    [2]
    M. R. Anderberg. Cluster Analysis for Application. Academic Press, 1973.
    [3]
    P. S. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In Proc. 4th International Conf. on Knowledge Discovery and Data Mining (KDD-98). AAAI Press, August 1998.
    [4]
    I. P. Felligi and A. B. Sunter. A theory for record linkage. Journal of the American Statistical Society, 64:1183-1210, 1969.
    [5]
    J. H. Friedman, J. L. Bentley, and R. A. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Tras. Math. Software, 3(3):209-226, 1977.
    [6]
    C. L. Giles, K. D. Bollacker, and S. Lawrence. CiteSeer: An automatic citation indexing system. In Digital Libraries 98 - Third ACM Conference on Digital Libraries, 1998.
    [7]
    M. Hernandez and S. Stolfo. The merge/purge problem for large databases. In Proceedings of the 1995 ACM SIGMOD, May 1995.
    [8]
    H. Hirsh. Integrating mulitple sources of information in text classification using whril. In Snowbird Learning Conference, April 2000.
    [9]
    J. Hylton. Identifying and merging related bibliographic records. MIT LCS Masters Thesis, 1996.
    [10]
    B. Kilss and W. Alvey, editors. Record Linkage Techniques-1985, 1985. Statistics of Income Division, Internal Revenue Service Publication 1299-2-96. Available from http://www.fcsm.gov/.
    [11]
    A. McCallum, K. Nigam, J. Rennie, and K. Seymore. Automating the construction of internet portals with machine learning. Information Retrieval, 2000. To appear.
    [12]
    A. K. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/ mccallum/bow, 1996.
    [13]
    A. Monge and C. Elkan. The field-matching problem: algorithm and applications. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, August 1996.
    [14]
    A. Monge and C. Elkan. An efficient domain-independent algorithm for detecting approximately duplicate database records. In The proceedings of the SIGMOD 1997 workshop on data mining and knowledge discovery, May 1997.
    [15]
    A. Moore. Very fast EM-based mixture model clustering using multiresolution kd-trees. In Advances in Neural Information Processing Systems 11, 1999.
    [16]
    H. B. Newcombe, J. M. Kennedy, S. J. Axford, and A. P. James. Automatic linkage of vital records. Science, 130:954-959, 1959.
    [17]
    S. Omohundro. Five balltree construction algorithms. Technical report 89-063, International Computer Science Institute, Berkeley, California, 1989.
    [18]
    K. Rose. Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proceedings of the IEEE, 86(11):2210-2239, 1998.
    [19]
    G. Salton and C. Buckley. Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513-523, 1988.
    [20]
    M. Sankaran, S. Suresh, M. Wong, and D. Nesamoney. Method for incremental aggregation of dynamically increasing database data sets. U.S. Patent 5,794,246, 1998.
    [21]
    D. Sanko and J. B. Kruskal. Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, 1983.
    [22]
    J. W. Tukey and J. O. Pedersen. Method and apparatus for information access employing overlapping clusters. U.S. Patent 5,787,422, 1998.
    [23]
    T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An efficient data clustering method for very large databases. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 103-114, 1996.

    Cited By

    View all
    • (2024)Beyond SupervisionRecent Trends and Future Direction for Data Analytics10.4018/979-8-3693-3609-0.ch007(170-196)Online publication date: 12-Jul-2024
    • (2024)A Secure Parallel Pattern Mining System for Medical Internet of ThingsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2022.323380321:4(631-643)Online publication date: Jul-2024
    • (2024)Sample Efficient Reinforcement Learning Using Graph-Based Memory ReconstructionIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.32686125:2(751-762)Online publication date: Feb-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '00: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2000
    537 pages
    ISBN:1581132336
    DOI:10.1145/347090
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 August 2000

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    KDD00
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)519
    • Downloads (Last 6 weeks)67
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Beyond SupervisionRecent Trends and Future Direction for Data Analytics10.4018/979-8-3693-3609-0.ch007(170-196)Online publication date: 12-Jul-2024
    • (2024)A Secure Parallel Pattern Mining System for Medical Internet of ThingsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2022.323380321:4(631-643)Online publication date: Jul-2024
    • (2024)Sample Efficient Reinforcement Learning Using Graph-Based Memory ReconstructionIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.32686125:2(751-762)Online publication date: Feb-2024
    • (2024)The Combined Method for Detecting Anomalies in the Enterprise Telecommunication Networks2024 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)10.1109/HORA61326.2024.10550600(1-4)Online publication date: 23-May-2024
    • (2024)Abnormal Electricity Detection of Users Based on Improved Canopy-Kmeans and Isolation Forest AlgorithmsIEEE Access10.1109/ACCESS.2024.342930412(99110-99121)Online publication date: 2024
    • (2024)AI-enabled Prediction of Sim Racing Performance using Telemetry DataComputers in Human Behavior Reports10.1016/j.chbr.2024.100414(100414)Online publication date: Apr-2024
    • (2024)Scalable decision fusion algorithm for enabling decentralized computation in distributed, big data clustering problemsInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02121-715:9(3803-3827)Online publication date: 8-Apr-2024
    • (2024)Image Classification Based on Improved Unsupervised Clustering AlgorithmComputer Science and Education. Computer Science and Technology10.1007/978-981-97-0730-0_14(147-161)Online publication date: 26-Feb-2024
    • (2024)Prediction of hydrogel swelling states using machine learning methodsEngineering Reports10.1002/eng2.12893Online publication date: 2-May-2024
    • (2023)Travel Characteristics Identification Method for Expressway Passenger Cars Based on Electronic Toll Collection DataSustainability10.3390/su15151161915:15(11619)Online publication date: 27-Jul-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media