Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1097047.1097063acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

A search result clustering method using informatively named entities

Published: 04 November 2005 Publication History

Abstract

Clustering the results of a search helps the user to overview the information returned. In this paper, we regard the clustering task as indexing the search results. Here, an index means a structured label list that can makes it easier for the user to comprehend the labels and search results. To realize this goal, we make three proposals. First is to use Named Entity Extraction for term extraction. Second is a new label selecting criterion based on importance in the search result and the relation between terms and search queries. The third is label categorization using category information of labels, which is generated by NE extraction. We implement a prototype system based on these proposals and find that it offers much higher performance than existing methods; we focus on news articles in this paper.

References

[1]
Belkin, N. J.:"Anomalous states of knowledge as a basis for information." Canadian Journal of Information, Vol. 5, pp. 133--143, 1980.]]
[2]
Brin, S. and Page, L.: "The anatomy of a large-scale hypertextual(Web) Search Engine." Proceedings of WWW7, pp.107--117, 1998.]]
[3]
Salton, G. and Yang, C. G.: "On the Specification of Term Values in Automatic Indexing." Journal of Documentation 29, 1973.]]
[4]
Baeza-Yates, R. and Ribeiro-Neto, B.: "Modern Information Retrieval." ACM Press, 1999.]]
[5]
Zamir, O., Etzioni, O. and Grouper, A.: "Grouper: A Dynamic Clustering Interface to Web Search Results." Proceedings of WWW8, pp.1361--1374, 1999.]]
[6]
Zeng, H. J., He, Q. C., Chen, Z., Ma, W. Y. and Ma, J.: "Learning to Cluster Web Search Results." Proceedings of SIGIR'04, pp.210--217, 2004.]]
[7]
Kummamuru, K., Lotlikar, R., Roy, S., Signal, K. and Krishnapuram, R.: "A hierarchical monothetic document clustering algorithm for summarization and browsing search results." Proceedings of WWW'04, pp.658--665, 2004.]]
[8]
Ohta, M., Narita, H. and Ohno, S.: "Overlapping Clustering Method Using Local and Global Importance of Feature Terms at NTCIR-4 Web Task." Working Notes of NTCIR(NII-NACSIS Test Collection for IR Systems)-4 Vol. Supl. 1, pp.37--44, 2004.]]
[9]
Hearst, M., and Pedersen, J.: "Reexamining the cluster hypothesis: scatter/gather on retrieval results." Proceedings of SIGIR'96, pp.76--84, 1996.]]
[10]
Leuski, A.: "Evaluating Document Clustering for Interactive Information Retrieval." Proceedings of CIKM'01, pp.33--40, 2001.]]
[11]
Hisamitsu, T., Niwa, Y. and Tsujii, J.: "Measuring Representativeness of Terms." Proceedings of IRAL'99, pp.83--90, 1999.]]
[12]
Grishman, R. and Sundheim B.: "Message Understanding Conference - 6: A Brief History." Proceedings of COLING'96, pp.466--471, 1996.]]
[13]
Sekine, S.: "Named Entity: History and Future." http://cs.nyu.edu/\~sekine/papers/NEsurvey200402.pdf, 2004.]]
[14]
Sekine, S. and Nobata, C.: "Definition, Dictionary and Tagger for Extended Named Entities." Proceedings of LREC'04, 2004.]]
[15]
Kim, J. D., Ohta, T., Tsuruoka, Y., Tateisi Y. and Collier, N.: "Introduction to the Bio-Entity Recognition Task at JNLPBA." Proceedings of JNLPBA-04. pp.70--75, 2004.]]
[16]
Shinzato, K. and Torisawa, K.: "Extracting Hyponyms of Prespecified Hypernyms from Itemizations and Headings in Web Documents." Proceedings of COLING'04, 2004.]]
[17]
Pasca, M.: "Acquisition of Categorized Named Entities for Web Search." Proceedings of CIKM'04, pp.137--145, 2004.]]
[18]
Takata, Y., Nakagawa, K. and Seki, H.: "Flexible Category Structure for Supporting WWW Retrieval." Proceedings of 2nd International Workshop on the WWW and Conceptual Modeling, pp.165--177, 2000.]]
[19]
Hayashi, Y., Tomita, J. and Kikui, G.: "Searching text-rich XML documents." ACM SIGIR Workshop on XML and Information Retrieval, pp.27--35, 2000.]]
[20]
Isozaki, H. and Kazawa, H.: "Efficient Support Vector Classifiers for Named Entity Recognition." Proceedings of COLING'02, pp390--396, 2002.]]
[21]
Sekine, S. and Isahara, H.: IREX Project Overview." Proceedings of the IREX Workshop, pp.7--12, 1999.]]

Cited By

View all
  • (2024)Event Categorization from News Articles Using Machine Learning TechniquesSpeech and Language Technologies for Low-Resource Languages10.1007/978-3-031-58495-4_19(255-267)Online publication date: 24-Apr-2024
  • (2023)ANEC: An Amharic Named Entity Corpus and Transformer Based RecognizerIEEE Access10.1109/ACCESS.2023.324346811(15799-15815)Online publication date: 2023
  • (2022)ArcheGEOProceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3535508.3545531(1-10)Online publication date: 7-Aug-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WIDM '05: Proceedings of the 7th annual ACM international workshop on Web information and data management
November 2005
96 pages
ISBN:1595931945
DOI:10.1145/1097047
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. named entity
  2. search result clustering

Qualifiers

  • Article

Conference

CIKM05
Sponsor:

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Event Categorization from News Articles Using Machine Learning TechniquesSpeech and Language Technologies for Low-Resource Languages10.1007/978-3-031-58495-4_19(255-267)Online publication date: 24-Apr-2024
  • (2023)ANEC: An Amharic Named Entity Corpus and Transformer Based RecognizerIEEE Access10.1109/ACCESS.2023.324346811(15799-15815)Online publication date: 2023
  • (2022)ArcheGEOProceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3535508.3545531(1-10)Online publication date: 7-Aug-2022
  • (2022)Empowerment of women's entrepreneurship in family business through TwitterJournal of Family Business Management10.1108/JFBM-04-2022-005013:3(607-625)Online publication date: 29-Nov-2022
  • (2020)A review of approaches for topic detection in TwitterJournal of Experimental & Theoretical Artificial Intelligence10.1080/0952813X.2020.178501933:5(747-773)Online publication date: 28-Jun-2020
  • (2020)A deep neural network-based model for named entity recognition for Hindi languageNeural Computing and Applications10.1007/s00521-020-04881-zOnline publication date: 4-Apr-2020
  • (2019)Arabic Natural Language Processing and Machine Learning-Based SystemsIEEE Access10.1109/ACCESS.2018.28900767(7011-7020)Online publication date: 2019
  • (2019)Temporal Information Retrieval and Its Application: A SurveyEmerging Research in Computing, Information, Communication and Applications10.1007/978-981-13-6001-5_19(251-262)Online publication date: 11-Sep-2019
  • (2018)Using Bidirectional Long Short-Term Memory and Conditional Random Fields for Labeling Arabic Named Entities: A Comparative Study2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS)10.1109/SNAMS.2018.8554623(135-140)Online publication date: Oct-2018
  • (2018)Deep Learning Approach for Arabic Named Entity RecognitionComputational Linguistics and Intelligent Text Processing10.1007/978-3-319-75477-2_31(439-451)Online publication date: 21-Mar-2018
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media