Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1321440.1321536acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Weakly-supervised discovery of named entities using web search queries

Published: 06 November 2007 Publication History

Abstract

A seed-based framework for textual information extraction allows for weakly supervised extraction of named entities from anonymized Web search queries. The extraction is guided by a small set of seed named entities, without any need for handcrafted extraction patterns or domain-specific knowledge, allowing for the acquisition of named entities pertaining to various classes of interest to Web search users. Inherently noisy search queries are shown to be a highly valuable, albeit little explored, resource for Web-based named entity discovery.

References

[1]
E. Brill and P. Resnik. A transformation-based approach to prepositional phrase attachment disambiguation. In Proceedings of the 15th International Conference on Computational Linguistics (COLING-94), pages 1198--1204, Kyoto, Japan, 1994.
[2]
M. Cafarella, D. Downey, S. Soderland, and O. Etzioni. KnowItNow: Fast, scalable information extraction from the Web. In Proceedings of the Human Language Technology Conference (HLT-EMNLP-05), pages 563--570, Vancouver, Canada, 2005.
[3]
M. Collins and Y. Singer. Unsupervised models for named entity classification. In Proceedings of the 1999 Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-99), pages 189--196, College Park, Maryland, 1999.
[4]
S. Cucerzan and D. Yarowsky. Language independent named entity recognition combining morphological and contextual evidence. In Proceedings of the 1999 Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-99), pages 90--99, College Park, Maryland, 1999.
[5]
H. Cui, J. Wen, J. Nie, and W. Ma. Probabilistic query expansion using query logs. In Proceedings of the 11th World Wide Web Conference (WWW-02), pages 325--332, Honolulu, Hawaii, 2002.
[6]
A. Klementiev and D. Roth. Weakly supervised named entity transliteration and discovery from multilingual comparable corpora. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-06), pages 817--824, Sydney, Australia, 2006.
[7]
L. Lee. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association of Computational Linguistics (ACL-99), pages 25--32, College Park, Maryland, 1999.
[8]
M. Li, M. Zhu, Y. Zhang, and M. Zhou. Exploring distributional similarity based models for query spelling correction. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-06), pages 1025--1032, Sydney, Australia, 2006.
[9]
K. McCarthy and W. Lehnert. Using decision trees for coreference resolution. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), pages 1050--1055, Montreal, Quebec, 1995.
[10]
R. Mooney and R. Bunescu. Mining knowledge from text using information extraction. SIGKDD Explorations, 7(1):3--10, 2005.
[11]
M. Paşca. Acquisition of categorized named entities for Web search. In Proceedings of the 13th ACM Conference on Information and Knowledge Management (CIKM-04), Washington, D.C., 2004.
[12]
M. Paşca. Organizing and searching the World Wide Web of facts - step two: Harnessing the wisdom of the crowds. In Proceedings of the 16th World Wide Web Conference (WWW-07), pages 101--110, Banff, Canada, 2007.
[13]
P. Pantel and D. Ravichandran. Automatically labeling semantic classes. In Proceedings of the 2004 Human Language Technology Conference (HLT-NAACL-04), pages 321--328, Boston, Massachusetts, 2004.
[14]
E. Riloff. Automatically generating extraction patterns from untagged text. In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96), pages 1044--1049, Portland, Oregon, 1996.
[15]
E. Riloff and R. Jones. Learning dictionaries for information extraction by multi-level bootstrapping. In Proceedings of the 16th National Conference on Artificial Intelligence (AAAI-99), pages 474--479, Orlando, Florida, 1999.
[16]
L. Schubert. Turing's dream and the knowledge challenge. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), Boston, Massachusetts, 2006.
[17]
Y. Shinyama and S. Sekine. Named entity discovery using comparable news articles. In Proceedings of the 20th International Conference on Computational Linguistics (COLING-04), pages 848--853, Geneva, Switzerland, 2004.
[18]
M. Stevenson and R. Gaizauskas. Using corpus-derived name lists for named entity recognition. In Proceedings of the 6th Conference on Applied Natural Language Processing (ANLP-00), Seattle, Washington, 2000.
[19]
P. Talukdar, T. Brants, M. Liberman, and F. Pereira. A context pattern induction method for named entity extraction. In Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL-X), pages 141--148, New York, New York, 2006.
[20]
M. Thelen and E. Riloff. A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-02), pages 214--221, Philadelphia, Pennsylvania, 2002.
[21]
Z. Zhuang and S. Cucerzan. Re-ranking search results using query logs. In Proceedings of the 15th International Conference on Information and Knowledge Management (CIKM-06), pages 860--861, Arlington, Virginia, 2006.

Cited By

View all
  • (2022)A Meta Path Based Method for Entity Set Expansion in Knowledge GraphIEEE Transactions on Big Data10.1109/TBDATA.2018.28053668:3(616-629)Online publication date: 1-Jun-2022
  • (2021)Investigating Clinical Named Entity Recognition Approaches for Information Extraction from EMRTracking and Preventing Diseases with Artificial Intelligence10.1007/978-3-030-76732-7_7(153-175)Online publication date: 15-Jul-2021
  • (2020)Noise-Resilient Reconstruction of Panoramas and 3D Scenes Using Robot-Mounted Unsynchronized Commodity RGB-D CamerasACM Transactions on Graphics10.1145/338941239:5(1-15)Online publication date: 1-Jul-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
November 2007
1048 pages
ISBN:9781595938039
DOI:10.1145/1321440
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. knowledge acquisition
  2. named entities
  3. query logs
  4. unstructured text
  5. weakly supervised information extraction

Qualifiers

  • Research-article

Conference

CIKM07

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A Meta Path Based Method for Entity Set Expansion in Knowledge GraphIEEE Transactions on Big Data10.1109/TBDATA.2018.28053668:3(616-629)Online publication date: 1-Jun-2022
  • (2021)Investigating Clinical Named Entity Recognition Approaches for Information Extraction from EMRTracking and Preventing Diseases with Artificial Intelligence10.1007/978-3-030-76732-7_7(153-175)Online publication date: 15-Jul-2021
  • (2020)Noise-Resilient Reconstruction of Panoramas and 3D Scenes Using Robot-Mounted Unsynchronized Commodity RGB-D CamerasACM Transactions on Graphics10.1145/338941239:5(1-15)Online publication date: 1-Jul-2020
  • (2020)PatternRank+NNACM Transactions on the Web10.1145/338604214:3(1-15)Online publication date: 3-May-2020
  • (2020)Capturing Subjective First-Person View Shots with Drones for Automated CinematographyACM Transactions on Graphics10.1145/337867339:5(1-14)Online publication date: 10-Aug-2020
  • (2020)Query Segmentation and TaggingQuery Understanding for Search Engines10.1007/978-3-030-58334-7_3(43-67)Online publication date: 2-Dec-2020
  • (2019)Strategic Attack & Defense in Security Diffusion GamesACM Transactions on Intelligent Systems and Technology10.1145/335760511:1(1-35)Online publication date: 10-Dec-2019
  • (2019)Graph-based Recommendation Meets Bayes and Similarity MeasuresACM Transactions on Intelligent Systems and Technology10.1145/335688211:1(1-26)Online publication date: 14-Dec-2019
  • (2019)CAD-BaseACM Transactions on Design Automation of Electronic Systems10.1145/331557424:4(1-30)Online publication date: 18-Apr-2019
  • (2019)Adaptive Test for RF/Analog Circuit Using Higher Order Correlations among MeasurementsACM Transactions on Design Automation of Electronic Systems10.1145/330856624:4(1-16)Online publication date: 26-Jun-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media