Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2835865.2835899guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free access

Discovering and comparing topic hierarchies

Published: 12 April 2000 Publication History

Abstract

Hierarchies have been used for organization, summarization, and access to information, yet a lingering issue is how best to construct them. In this paper, our goal is to automatically create domain specific hierarchies that can be used for browsing a document set and locating relevant documents. We examine methods of automatically generating hierarchies and evaluating them. To this end, we compare and contrast two methods of generating topic hierarchies from the text of documents: one, subsumption hierarchies, uses subsumption relations found within document sets, and the other, lexical hierarchies, utilizes frequently used words within phrases. Our evaluation shows that subsumption hierarchies divide documents into smaller groups, allowing one to find all relevant documents without looking at as many non-relevant documents. However, such hierarchies are more likely to contain no path to a relevant document.

References

[1]
Anick, P. (1999). Automatic construction of faceted terminological feedback for context-based information retrieval. Ph. D. thesis, Brandeis University.
[2]
Anick, P. & S. Tipirneni (1999). The paraphrase search assistant: Terminological feedback for iterative information seeking. In M. Hearst, F. Gey, & R. Tong (Eds.), Proceedings on the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 153--159.
[3]
Callan, J., W. Croft, & S. Harding (1992). The inquery retrieval system. In Proceedings of the 3rd International Conference on Database and Expert Systems Applications, pp. 77--83.
[4]
Crouch, C. (1988). A cluster-based approach to thesaurus construction. In Proceedings on the 11th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 309--320.
[5]
Crouch, C., D. Crouch, & K. Nareddy (1990). The automatic generation of extended queries. In Proceedings on the 13th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 369--383.
[6]
Cutting, D., D. Karger, J. Pedersen, & J. Tukey (1992). Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of the 15th Annual International ACM SIGIR conference on Research and development in information retrieval, Copenhagen Denmark, pp. 318--329.
[7]
DHTMLAB. Dhtmlab. www.dhtmlab.com.
[8]
Fuhr, N., S. Hartmann, G. Lustig, K. Tzeras, G. Knorz, & M. Schwantner (1993). Automatic indexing in operation: The rule-based system air/x for large subject fields. Technical report, Technische Hochschule Darmstadt.
[9]
Jain, A. & R. Dubes (1988). Algorithm for Clustering Data. Engelwood Cliffs, N. J.: Pretice Hall.
[10]
Koller, D. & M. Sahami (1997). Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Leaning, pp. 170--178.
[11]
Nevill-Manning, C., I. Witten, & G. Paynter (1999). Lexically-generated subject hierarchies for browsing large collections. International Journal on Digital Libraries 2(2+3), 111--123.
[12]
Sahami, M. (1998). Using Machine Learning to Improve Information Access. Ph. D. thesis, Stanford University.
[13]
Salton, G. & M. McGill (1983). Introduction to Modern Information Retrieval. McGraw-Hill Book Company.
[14]
Sanderson, M. & B. Croft (1999). Deriving concept hierarchies from text. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 206--213.
[15]
Sparck Jones, K. (1971). Automatic Keyword Classification. Butterworths.
[16]
van Rijsbergen, C. (1979). Information retrieval (second ed.). London: Butterworths.
[17]
Voorhees, E. M. & D. K. Harman (Eds.) (1997). The Sixth Text REtrieval Conference (TREC-6). Department of Commerce, National Institute of Standards and Technology.
[18]
Willett, P. (1988). Recent trends in hierarchic document clustering: A critical review. Information Processing and Management 24(5), 577--587.
[19]
Xu, J. & W. Croft (1996). Query expansion using local and global document analysis. In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 4--11.
[20]
YAHOO. Yahoo, www.yahoo.com.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
RIAO '00: Content-Based Multimedia Information Access - Volume 1
April 2000
922 pages

Publisher

LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE

Paris, France

Publication History

Published: 12 April 2000

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 33
    Total Downloads
  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)11
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media