Abstract
We investigate the problem of text segmentation by topic. Applications for this task include topic tracking of broadcast speech data and topic identification in full-text databases. Researchers have tackled similar problems before but with different goals. This study focuses on data with relatively small segment sizes and for which within-segment sentences have relatively few words in common making the problem challenging. We present a method for segmentation which makes use of a query expansion technique to find common features for the topic segments. Experiments with the technique show that it can be effective.
Preview
Unable to display preview. Download preview PDF.
References
Callan J. P., “Passage-Level Evidence in Document Retrieval.” In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, July, 1994 (pp. 302–310).
Croft, W. B. and D. J. Harper. “Using probabilistic models of document retrieval without relevance information.” Journal of Documentation, 35, 1979 (pp. 285–295).
Hearst, M. “Multi-Paragraph Segmentation of Expository Text”, Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, NM, June 1994.
Hearst, M. and Plaunt, C. Subtopic Structuring for Full-Length Document Access, Proceedings of the sixteenth Annual International ACM/SIGIR Conference, Pittsburgh, PA. 1993 (pp. 59–68).
Mittendorf E. and P. Shäuble, “Document and Passage Retrieval Based on Hidden Markov Models”, In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, July, 1994 (pp. 318–327).
Rabiner, L.R. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE vol. 77, no. 2, Feb. 1989, 217.
Salton, Gerard, J. Allan and C. Buckley, “Approaches to Passage Retrieval in Full Text Information Systems”, Proceedings of the sixteenth Annual International ACM/SIGIR Conference, Pittsburgh, PA. 1993 (pp. 49–58).
Salton,Gerard, Amit Singhal, Chris Buckley and Mandar Mitra. “Automatic Text Decomposition Using Text Segments and Text Themes”, Proceedings of the Seventh ACM Conference on Hypertext, Washington D.C., 1996.
Salton,Gerard and Amit Singhal. “Automatic Text Theme Generation and the Analysis of Text Structure”, Cornell Computer Science Technical Report 94–1438, July 1994.
Xu, Jinxi and W. Bruce Croft, “Query Expansion Using Local and Global Document Analysis”, In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, August, 1996 (pp. 4–11).
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ponte, J.M., Croft, W.B. (1997). Text segmentation by topic. In: Peters, C., Thanos, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 1997. Lecture Notes in Computer Science, vol 1324. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0026725
Download citation
DOI: https://doi.org/10.1007/BFb0026725
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63554-3
Online ISBN: 978-3-540-69597-4
eBook Packages: Springer Book Archive