Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.3115/1219840.1219917dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering

Published: 25 June 2005 Publication History

Abstract

In this paper, we explore the power of randomized algorithm to address the challenge of working with very large amounts of data. We apply these algorithms to generate noun similarity lists from 70 million pages. We reduce the running time from quadratic to practically linear in the number of elements to be computed.

References

[1]
Banko, M. and Brill, E. 2001. Mitigating the paucity of dataproblem. In Proceedings of HLT. 2001. San Diego, CA.
[2]
Box, G. E. P. and M. E. Muller 1958. Ann. Math. Stat. 29, 610--611.
[3]
Broder, Andrei 1997. On the Resemblance and Containment of Documents. Proceedings of the Compression and Complexity of Sequences.
[4]
Cavnar, W. B. and J. M. Trenkle 1994. N-Gram-Based Text Categorization. In Proceedings of Third Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, UNLV Publications/Reprographics, 161--175.
[5]
Charikar, Moses 2002. Similarity Estimation Techniques from Rounding Algorithms In Proceedings of the 34th Annual ACM Symposium on Theory of Computing.
[6]
Church, K. and Hanks, P. 1989. Word association norms, mutual information, and lexicography. In Proceedings of ACL-89. pp. 76--83. Vancouver, Canada.
[7]
Curran, J. and Moens, M. 2002. Scaling context space. In Proceedings of ACL-02 pp 231--238, Philadelphia, PA.
[8]
Goemans, M. X. and D. P. Williamson 1995. Improved Approximation Algorithms for Maximum Cut and Satisfiability Problems Using Semidefinite Programming. JACM 42(6):1115--1145.
[9]
Hindle, D. 1990. Noun classification from predicate-argument structures. In Proceedings of ACL-90. pp. 268--275. Pittsburgh, PA.
[10]
Lin, D. 1998. Automatic retrieval and clustering of similar words. In Proceedings of COLING/ACL-98. pp. 768--774. Montreal, Canada.
[11]
Indyk, P., Motwani, R. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality Proceedings of 30thSTOC, 604--613.
[12]
A. Kolcz, A. Chowdhury, J. Alspector 2004. Improved robustness of signature-based near-replica detection via lexicon randomization. Proceedings of ACM-SIGKDD (2004).
[13]
Lin, D. 1994 Principar - an efficient, broad-coverage, principle-based parser. Proceedings of COLING-94, pp. 42--48. Kyoto, Japan.
[14]
Pantel, Patrick and Dekang Lin 2002. Discovering Word Senses from Text. In Proceedings of SIGKDD-02, pp. 613--619. Edmonton, Canada
[15]
Rabin, M. O. 1981. Fingerprinting by random polynomials. Center for research in Computing technology, Harvard University, Report TR-15-81.
[16]
Salton, G. and McGill, M. J. 1983. Introduction to Modern Information Retrieval. McGraw Hill.

Cited By

View all
  • (2020)Fast Distributed kNN Graph Construction Using Auto-tuned Locality-sensitive HashingACM Transactions on Intelligent Systems and Technology10.1145/340888911:6(1-18)Online publication date: 12-Oct-2020
  • (2019)An improved method of locality-sensitive hashing for scalable instance matchingKnowledge and Information Systems10.1007/s10115-018-1199-558:2(275-294)Online publication date: 1-Feb-2019
  • (2018)Liquid Silicon-MononaACM SIGPLAN Notices10.1145/3296957.317316753:2(214-228)Online publication date: 19-Mar-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
June 2005
657 pages
  • General Chair:
  • Kevin Knight

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 25 June 2005

Qualifiers

  • Article

Acceptance Rates

ACL '05 Paper Acceptance Rate 77 of 423 submissions, 18%;
Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)80
  • Downloads (Last 6 weeks)8
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Fast Distributed kNN Graph Construction Using Auto-tuned Locality-sensitive HashingACM Transactions on Intelligent Systems and Technology10.1145/340888911:6(1-18)Online publication date: 12-Oct-2020
  • (2019)An improved method of locality-sensitive hashing for scalable instance matchingKnowledge and Information Systems10.1007/s10115-018-1199-558:2(275-294)Online publication date: 1-Feb-2019
  • (2018)Liquid Silicon-MononaACM SIGPLAN Notices10.1145/3296957.317316753:2(214-228)Online publication date: 19-Mar-2018
  • (2018)ClassiNet -- Predicting Missing Features for Short-Text ClassificationACM Transactions on Knowledge Discovery from Data10.1145/320157812:5(1-29)Online publication date: 27-Jun-2018
  • (2018)Liquid Silicon-MononaProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3173167(214-228)Online publication date: 19-Mar-2018
  • (2018)Use of locality sensitive hashing (LSH) algorithm to match Web of Science and ScopusScientometrics10.1007/s11192-017-2569-6116:2(1229-1245)Online publication date: 1-Aug-2018
  • (2017)A systematic review and comparative analysis of cross-document coreference resolution methods and toolsComputing10.1007/s00607-016-0490-099:4(313-349)Online publication date: 1-Apr-2017
  • (2016)Query-Directed Probing LSH for Cosine SimilarityProceedings of the Fifth International Conference on Network, Communication and Computing10.1145/3033288.3033318(171-177)Online publication date: 17-Dec-2016
  • (2016)LazyLSHProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2882930(2023-2037)Online publication date: 26-Jun-2016
  • (2015)Weighted Similarity Estimation in Data StreamsProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806515(1051-1060)Online publication date: 17-Oct-2015
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media