Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/133160.133214acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free access

Scatter/Gather: a cluster-based approach to browsing large document collections

Published: 01 June 1992 Publication History

Abstract

Document clustering has not been well received as an information retrieval tool. Objections to its use fall into two main categories: first, that clustering is too slow for large corpora (with running time often quadratic in the number of documents); and second, that clustering does not appreciably improve retrieval.
We argue that these problems arise only when clustering is used in an attempt to improve conventional search techniques. However, looking at clustering as an information access tool in its own right obviates these objections, and provides a powerful new access paradigm. We present a document browsing technique that employs document clustering as its primary operation. We also present fast (linear time) clustering algorithms which support this interactive browsing paradigm.

References

[1]
Chris Buckley and Alan F. Lewit. Optimizations of inverted vector searches. In Proceedings of the Eighth Annual International A CM SIGIR Conference on Research and Development in {nfoT'mat~on Retrieval, pages 97-110, 1985.
[2]
W.B. Croft. Clustering large files of documents using the single-link method. Journal of the Amemcan Soczety for Informatzon Science, 28:341-344, 1977.
[3]
A. E1-Hamdouchi and P. Willett. Hierarchical document clustering using Ward's method. In Proceedzngs of the N, nth InternatzonaI Conference on Research and Development in Informatzon Retrieval, pages 149-156, 1986.
[4]
A. Grifiiths, H.C. Luckhurst, and P. Willett. Using inter-document similarity information in document retrieval systems. Journal of the American Society for Information Sczence, 37:3-11, 1986.
[5]
Anil K. aain and Richard C. Dubes. Algorithms for Clustering Data. Pretice Hall, Engelwood Cliffs, N.J. 07632, 1988.
[6]
N. aardine and C.J. van Rijsbergen. The use of hierarchical clustering in information retrieval. Informatzon Storage and Retrzeval, 7:217-240, 1971.
[7]
O. Pedersen, D. R. Cutting, and a. w. Tukey. Snippet search: a single phrase approach to text access. In Proceedings of the 1991 Yoznt Statistical Meetings. American Statistical Association, 1991. Also available as Xerox PARC technical report SSL- 91-08.
[8]
G. Salton. The SMART Retmeval System. Prentice- Hall, Englewood Cliffs, N.J., 1971.
[9]
G. Salton and M. a. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
[10]
R. Sibson. SLINK: an optimally efficient algorithm for the single link cluster method. Computer Journal, 16:30-34, 1973.
[11]
C.J. van Rijsbergen. Information Retmeval. Butterworths, London, second edition, 1979.
[12]
C.j. van Rijsbergen and W.B. Croft. Document clustering: An evaluation of some experiments with the Cranfield 1400 collection. Information Processing Management, 11:171-182, 1975.
[13]
P. Willett. Document clustering using an inverted file approach. Journal of Informatzon Sczence, 2:223- 231, 1980.
[14]
P. Willett. A fast procedure for the calculation of similarity coefficients in automatic classification. Informatzon Processzng ~ Management, 17:53-60, 1981.
[15]
P. Willett. Recent trends in hierarchical document clustering: A critical review. Information Processing Management, 24(5):577-597, 1988.

Cited By

View all
  • (2024)Applying Data Mining Techniques for Phrase Extraction in Document CollectionsComputer Science, Engineering and Technology10.46632/cset/2/3/62:3(44-46)Online publication date: 6-Sep-2024
  • (2024)Prompting for Discovery: Flexible Sense-Making for AI Art-Making with DreamsheetsProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642858(1-17)Online publication date: 11-May-2024
  • (2024)Marco: Supporting Business Document Workflows via Collection-Centric Information Foraging with Large Language ModelsProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3641969(1-20)Online publication date: 11-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
June 1992
352 pages
ISBN:0897915232
DOI:10.1145/133160
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1992

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGIR92
Sponsor:
  • SIGIR
  • Royal School of Lib.

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)213
  • Downloads (Last 6 weeks)44
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Applying Data Mining Techniques for Phrase Extraction in Document CollectionsComputer Science, Engineering and Technology10.46632/cset/2/3/62:3(44-46)Online publication date: 6-Sep-2024
  • (2024)Prompting for Discovery: Flexible Sense-Making for AI Art-Making with DreamsheetsProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642858(1-17)Online publication date: 11-May-2024
  • (2024)Marco: Supporting Business Document Workflows via Collection-Centric Information Foraging with Large Language ModelsProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3641969(1-20)Online publication date: 11-May-2024
  • (2024)How Generative AI Was Mentioned in Social Media and Academic Field? A Text Mining Based on Internet Text DataIEEE Access10.1109/ACCESS.2024.337901012(43940-43947)Online publication date: 2024
  • (2024)Web Search Engine Results Page Viewing Formats for Different Search TasksInternational Journal of Human–Computer Interaction10.1080/10447318.2024.2376358(1-16)Online publication date: 29-Jul-2024
  • (2024)A novel text clustering model based on topic modelling and social network analysisChaos, Solitons & Fractals10.1016/j.chaos.2024.114633181(114633)Online publication date: Apr-2024
  • (2023)Sensecape: Enabling Multilevel Exploration and Sensemaking with Large Language ModelsProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606756(1-18)Online publication date: 29-Oct-2023
  • (2023)Academic information retrieval using citation clusters: in-depth evaluation based on systematic reviewsScientometrics10.1007/s11192-023-04681-x128:5(2895-2921)Online publication date: 21-Mar-2023
  • (2022)Leveraging Regulative Learning Facilitators to Foster Student Agency and Knowledge (Co-)Construction Activities in CSCL EnvironmentsInternational Journal of Online Pedagogy and Course Design10.4018/IJOPCD.29320912:1(1-15)Online publication date: 21-Jan-2022
  • (2022)An Algebraic Approach to Clustering and Classification with Support Vector MachinesMathematics10.3390/math1001012810:1(128)Online publication date: 1-Jan-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media