Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

TIARA: Interactive, Topic-Based Visual Text Summarization and Analysis

Published: 01 February 2012 Publication History

Abstract

We are building an interactive visual text analysis tool that aids users in analyzing large collections of text. Unlike existing work in visual text analytics, which focuses either on developing sophisticated text analytic techniques or inventing novel text visualization metaphors, ours tightly integrates state-of-the-art text analytics with interactive visualization to maximize the value of both. In this article, we present our work from two aspects. We first introduce an enhanced, LDA-based topic analysis technique that automatically derives a set of topics to summarize a collection of documents and their content evolution over time. To help users understand the complex summarization results produced by our topic analysis technique, we then present the design and development of a time-based visualization of the results. Furthermore, we provide users with a set of rich interaction tools that help them further interpret the visualized results in context and examine the text collection from multiple perspectives. As a result, our work offers three unique contributions. First, we present an enhanced topic modeling technique to provide users with a time-sensitive and more meaningful text summary. Second, we develop an effective visual metaphor to transform abstract and often complex text summarization results into a comprehensible visual representation. Third, we offer users flexible visual interaction tools as alternatives to compensate for the deficiencies of current text summarization techniques. We have applied our work to a number of text corpora and our evaluation shows promise, especially in support of complex text analyses.

References

[1]
Bier, E. A., Stone, M. C., Pier, K. A., Buxton, W., and DeRose, T. 1993. Toolglass and magic lenses: the see-through interface. In Proceedings of the ACM SIGGRAPH Conference. 73--80.
[2]
Blei, D., Ng, A., and Jordan, M. 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 3, 5, 993--1022.
[3]
Byron, L. and Wattenberg, M. 2008. Stacked graphs - Geometry & aesthetics. IEEE Trans. Vis. Comput. Graph. 14, 6, 1245--1252.
[4]
Carenini, G., Ng, R., and Zhou, X. 2007. Summarizing email conversations with clue words. In Proceedings of the International World Wide Web Conference (WWW). 91--100.
[5]
Chen, Y., Wang, L., Dong, M., and Hua, J. 2009. Exemplar-Based visualization of large document corpus. IEEE Trans. Vis. Comput. Graph. 15, 6, 1161--1168.
[6]
Clarke, C. L. A., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan, A., Büttcher, S., and MacKinnon, I. 2008. Novelty and diversity in information retrieval evaluation. In Proceedings of the ACM SIGIR Conference. 659--666.
[7]
Don, A., Zheleva, E., Gregory, M., Tarkan, S., Auvil, L., Clement, T., Shneiderman, B., and Plaisant, C. 2007. Discovering interesting usage patterns in text collections: Integrating text mining with visualization. In Proceedings of the Conference on Infornmation and Knowledge Management (CIKM’07). 213--222.
[8]
Dredze, M., Wallach, H., Puller, D., and Pereira, F. 2008. Generating summary keywords for emails using topics. In Proceedings of the IUI Conference. 199--206.
[9]
Havre, S., Hetzler, E., Whitney, P., and Nowell, L. 2002. Themeriver: visualizing thematic changes in large document collections. IEEE Trans. Vis. Comput. Graph. 8, 1, 9--20.
[10]
He, X., Cai, D., and Niyogi, P. 2005. Laplacian score for feature selection. In Proceedings of the NIPS Conference.
[11]
Hearst, M. 1995. Tilebars: Visualization of term distribution information in full text information access. In Proceedings of the Conference on Human Factors in Computing Systems (CHI’95). 59--66.
[12]
Iwata, T., Yamada, T., and Ueda, N. 2008. Probabilistic latent semantic visualization: Topic model for visualizing documents. In Proceedings of the KDD Conference. 363--371.
[13]
Kerr, B. 2003. Thread arcs: An email thread visualization. In Proceedings of the InfoVis’03 Conference. 211--218.
[14]
Lan, M., Tan, C. L., Low, H.-B., and Sung, S. Y. 2005. A comprehensive comparative study on term weighting schemes for text categorization with support vector machines. In Proceedings of the WWW Conference (Special Interest Tracks and Posters). 1032--1033.
[15]
Leskovec, J., Backstrom, L., and Kleinberg, J. 2009. Meme-Tracking and the dynamics of the news cycle. In Proceedings of the KDD Conference. 497--506.
[16]
Liu, S., Zhou, M. X., Pan, S., Qian, W., Cai, W., and Lian, X. 2009. Interactive, topic-based visual text summarization and analysis. In Proceedings of the Conference on Information and Knowledge Management (CIKM). 543--552.
[17]
Luboschik, M., Schumann, H., and Cords, H. 2008. Particle-Based labeling: Fast point-feature labeling without obscuring other visual features. IEEE Trans. Vis. Comput. Graph. 14, 6, 1237--1244.
[18]
McCallum, A., Wang, X., and Corrada-Emmanuel, A. 2007. Topic and role discovery in social networks with experiments on enron and academic email. J. Artif. Intell. Res. 30, 249--272.
[19]
Mitra, P., Murthy, C. A., and Pal, S. K. 2002. Unsupervised feature selection using feature similarity. IEEE Trans. Patt. Anal. Mach. Intell. 24, 3, 301--312.
[20]
Nardi, B., Whittaker, S., Isaacs, E., Creech, M., Johnson, J., and Hainsworth, J. 2002. Integrating communication and information through contactmap. Comm. ACM 45, 4, 89--95.
[21]
Perer, A. and Smith, M. 2006. Contrasting portraits of email practices: Visual approaches to reflection and analysis. In Proceedings of the AVI Conference. 389--395.
[22]
Rennison, E. 1994. Galaxy of news: An approach to visualizing and understanding expansive news landscapees. In Proceedings of the UIST’94 Conference. 3--12.
[23]
Sahami, M. 1998. Using Machine Learning to Improve Information Access. Ph.D. thesis, Department of Computer Science, Stanford University.
[24]
Salton, G. and Buckley, C. 1988. Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24, 5, 513--523.
[25]
Sarkar, M. and Brown, M. H. 1994. Graphical fisheye views. Commun. ACM 37, 12, 73--84.
[26]
Stasko, J., Gorg, C., and Liu, Z. 2008. Jigsaw: Supporting investigative analysis through interactive visualization. Inf. Vis. 7, 2, 118--132.
[27]
Strobelt, H., Oelke, D., Rohrdantz, C., Stoffel, A., Keim, D. A., and Deussen, O. 2009. Document cards: A top trumps visualization for documents. IEEE Trans. Vis. Comput. Graph. 15, 6, 1145--1152.
[28]
van Ham, F., Wattenberg, M., and Viégas, F. B. 2009. Mapping text with phrase nets. IEEE Trans. Vis. Comput. Graph. 15, 6, 1169--1176.
[29]
Venolia, G. and Neustaedter, C. 2003. Understanding sequence and reply relationships within email conversations: a mixed-model visualization. In Proceedings of the Conference on Human Factors in Computing Systems (CHI’03). 361--368.
[30]
Viegas, F., Golder, S., and Donath, J. 2006. Visualizing email content: Portraying relationships from conversational histories. In Proceedings of the Conference on Human Factors in Computing Systems (CHI). 979--988.
[31]
Wan, S. and McKeown, K. 2004. Generating overview summaries of ongoing email thread discussions. In Proceedings of the COLING Conference. 549--555.
[32]
Wang, D., Li, T., Zhu, S., and Ding, C. 2008. Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In Proceedings of the SIGIR’08 Conference. 307--314.
[33]
Wattenberg, M. and Viegas, F. 2008. The word tree, an interactive visual concordance. In Proceedings of the InfoVis’08 Conference. 1221--1228.
[34]
Wise, J., Thomas, J., Pennock, K., Lantrip, D., Pottier, M., Schur, A., and Crow, V. 1995. Visualizing the non-visual: Spatial analysis and interaction with information from text documents. In Proceedings of the InfoVis’95 Conference. 51--58.

Cited By

View all
  • (2024)Literature review of complementary and alternative therapies: using text mining and analysis of trends in nursing researchBMC Nursing10.1186/s12912-024-02172-923:1Online publication date: 1-Aug-2024
  • (2024)Visualizing Temporal Topic Embeddings with a CompassIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345614331:1(272-282)Online publication date: 10-Sep-2024
  • (2024)CASRank: A ranking algorithm for legal statute retrievalMultimedia Tools and Applications10.1007/s11042-023-15464-083:2(5369-5386)Online publication date: 1-Jan-2024
  • Show More Cited By

Index Terms

  1. TIARA: Interactive, Topic-Based Visual Text Summarization and Analysis

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 3, Issue 2
    February 2012
    455 pages
    ISSN:2157-6904
    EISSN:2157-6912
    DOI:10.1145/2089094
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 February 2012
    Accepted: 01 March 2011
    Revised: 01 March 2011
    Received: 01 July 2010
    Published in TIST Volume 3, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Text analytics
    2. interactive text visualization
    3. stacked graph
    4. text summarization
    5. text trend chart
    6. topic model

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)99
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 02 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Literature review of complementary and alternative therapies: using text mining and analysis of trends in nursing researchBMC Nursing10.1186/s12912-024-02172-923:1Online publication date: 1-Aug-2024
    • (2024)Visualizing Temporal Topic Embeddings with a CompassIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345614331:1(272-282)Online publication date: 10-Sep-2024
    • (2024)CASRank: A ranking algorithm for legal statute retrievalMultimedia Tools and Applications10.1007/s11042-023-15464-083:2(5369-5386)Online publication date: 1-Jan-2024
    • (2024)The 7th International Workshop on Narrative Extraction from Texts: Text2Story 2024Advances in Information Retrieval10.1007/978-3-031-56069-9_52(391-397)Online publication date: 23-Mar-2024
    • (2023)HAiVA: Hybrid AI-assisted Visual Analysis Framework to Study the Effects of Cloud Properties on Climate Patterns2023 IEEE Visualization and Visual Analytics (VIS)10.1109/VIS54172.2023.00054(226-230)Online publication date: 21-Oct-2023
    • (2023)SDRQuerier: A Visual Querying Framework for Cross-National Survey Data RecyclingIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.326194429:6(2862-2874)Online publication date: 1-Jun-2023
    • (2023) ChartWalk : Navigating large collections of text notes in electronic health records for clinical chart review IEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.320944429:1(1244-1254)Online publication date: Jan-2023
    • (2023)Bringing safety analytics to the online shopper: evaluating designs for augmenting point-of-sale interfaces with safety informationInternet Research10.1108/INTR-06-2022-039534:4(1313-1345)Online publication date: 10-Aug-2023
    • (2023)PubExplorer: An interactive analytical system for visualizing publication dataVisual Informatics10.1016/j.visinf.2023.07.0017:3(65-74)Online publication date: Sep-2023
    • (2023)Personalized Multi-document Text Summarization using Deep Learning TechniquesProcedia Computer Science10.1016/j.procs.2023.01.100218:C(1220-1228)Online publication date: 1-Jan-2023
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media