Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

PhC: Multiresolution Visualization and Exploration of Text Corpora with Parallel Hierarchical Coordinates

Published: 01 February 2012 Publication History

Abstract

The high-dimensional nature of the textual data complicates the design of visualization tools to support exploration of large document corpora. In this article, we first argue that the Parallel Coordinates (PC) technique, which can map multidimensional vectors onto a 2D space in such a way that elements with similar values are represented as similar poly-lines or curves in the visualization space, can be used to help users discern patterns in document collections. The inherent reduction in dimensionality during the mapping from multidimensional points to 2D lines, however, may result in visual complications. For instance, the lines that correspond to clusters of objects that are separate in the multidimensional space may overlap each other in the 2D space; the resulting increase in the number of crossings would make it hard to distinguish the individual document clusters. Such crossings of lines and overly dense regions are significant sources of visual clutter, thus avoiding them may help interpret the visualization. In this article, we note that visual clutter can be significantly reduced by adjusting the resolution of the individual term coordinates by clustering the corresponding values. Such reductions in the resolution of the individual term-coordinates, however, will lead to a certain degree of information loss and thus the appropriate resolution for the term-coordinates has to be selected carefully. Thus, in this article we propose a controlled clutter reduction approach, called Parallel hierarchical Coordinates (or PhC), for reducing the visual clutter in PC-based visualizations of text corpora. We define visual clutter and information loss measures and provide extensive evaluations that show that the proposed PhC provides significant visual gains (i.e., multiple orders of reductions in visual clutter) with small information loss during visualization and exploration of document collections.

References

[1]
Aggarwal, G., Tomas Feder, K. K., Motwani, R., Panigrahy, R., Thomas, D., and Zhu, A. 2005. Approximation algorithms for k-anonymity. J. Privacy Technol.
[2]
Ankerst, M., Keim, D. A., and Kriegel, H.-P. 1996. Circle segments: A technique for visually exploring large multidimensional data sets. In Proceedings of the Visualization Conference.
[3]
Ankerst, M., Berchtold, S., and Keim, D. A. 1998. Similarity clustering of dimensions for an enhanced visualization of multidimensional data. In Proceedings of the IEEE Symposium in Information Visualization (INFOVIS).
[4]
Artero, A. O., de Oliveira, M. C. F., and Levkowitz, H. 2004. Uncovering clusters in crowded parallel coordinates visualizations. In Proceedings of the IEEE Symposium on Information Visualization (INFOVIS’04). IEEE Computer Society.
[5]
Bartels, R., Beatty, J., and Barsky, B. 1995. An Introduction to Splines for Use in Computer Graphics and Geometric Modeling. Morgan Kaufmann.
[6]
Bates, M. J. 1989. The design of browsing and berrypicking techniques for the online search interface. Online Rev. 13, 5, 407--424.
[7]
Borzsonyi, S., Kossmann, D., Stocker, K., and Passau, U. 2001. The skyline operator. In Proceedings of the International Conference on Data Engineering (ICDE). 421--430.
[8]
Chintalapani, G., Plaisant, C., and Shneiderman, B. 2004. Extending the utility of treemaps with flexible hierarchy. In Proceedings of the International Conference on Information Visualisation. 335--344.
[9]
Ciriani, V., De Capitani di Vimercati, S., Foresti, S., and Samarati, P. 2007. k-anonymity. In Secure Data Management in Decentralized Systems, T. Yu and S. Jajodia Eds., Springer, 323--353.
[10]
Cox, T. and Cox, M. 2001. Multidimensional Scaling. Chapman Hall.
[11]
Cui, Q., Ward, M., Rundensteiner, E., and Yang, J. 2006. Measuring data abstraction quality in multiresolution visualizations. IEEE Trans. Vis. Comput. Graph. 709--716.
[12]
Deerwester, S. C., Dumais, S. T., Furnas, G. W., Harshman, R. A., and Landauer, T. K., et al. 1989. Computer information retrieval using latent semantic structure. http://www.mendeley.com/research/computer-information-retrieval-using-latent-semantic-structure-1/.
[13]
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci. 41, 6, 391--407.
[14]
Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. 39, 1, 1--38. Series B.
[15]
Di Caro, L., Candan, K. S., and Sapino, M. L. 2008. Using tagflake for condensing navigable tag hierarchies from tag clouds. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). ACM, New York, 1069--1072.
[16]
Don, A., Zheleva, E., Gregory, M., Tarkan, S., Auvil, L., Clement, T., Shneiderman, B., and Plaisant, C. 2007. Discovering interesting usage patterns in text collections: Integrating text mining with visualization. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07). ACM, New York, 213--222.
[17]
Eckart, C. and Young, G. 1936. The approximation of one matrix by another of lower rank. Psychometrika 1, 3, 211--218.
[18]
Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. 1998. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 25, 14863--14868.
[19]
Ellis, G. and Dix, A. 2006. Enabling automatic clutter reduction in parallel coordinate plots. IEEE Trans. Vis. Comput. Graph. 12, 5, 717--724.
[20]
Fortuna, B., Grobelnik, M., and Mladenic’, D. 2005. Visualization of text document corpus. Informatica, 497--502.
[21]
Fua, Y.-H., Ward, M. O., and Rundensteiner, E. A. 1999a. Hierarchical parallel coordinates for exploration of large datasets. In Proceedings of the 10th IEEE Visualization Conference (VIS’99). IEEE Computer Society.
[22]
Fua, Y. H., Ward, M. O., and Rundensteiner, E. A. 1999b. Navigating hierarchies with structure-based brushes. In Proceedings of the IEEE Symposium on Information Visualization (InfoVis’99). G. Wills and D. Keim Eds.
[23]
Graham, M. and Kennedy, J. 2003. Using curves to enhance parallel coordinate visualisations. In Proceedings of the IEEE Symposium on Information Visualization (InfoVis’03).
[24]
Hassan-Montero, Y. and Herrero-Solana, V. 2006a. Improving tag-clouds as visual information retrieval interfaces. In Proceedings of the International Conference on Multidisciplinary Information Sciences and Technologies (InScit’06).
[25]
Hassan-Montero, Y. and Herrero-Solana, V. 2006b. Interfaz visual para recuperacin de informacin basada en anlisis de metadatos, escalamiento multidimensional y efecto ojo de pez. El Profesional de la Información 15, 4.
[26]
Hoffman, P., Grinstein, G., Marx, K., Grosse, I., and Stanley, E. 1997. Dna visual and analytic data mining. In Proceedings of the 8th Conference on Visualization (VIS’97). IEEE Computer Society Press.
[27]
Holten, D. 2006. Hierarchical edge bundles: Visualization of adjacency relations in hierarchical data. IEEE Trans. Vis. Comput. Graph., 741--748.
[28]
Inselberg, A. and Dimsdale, B. 1990. Parallel coordinates: A tool for visualizing multi-dimensional geometry. In Proceedings of the Visualization Conference (VIS). 361--378.
[29]
Keim, D. 2002. Designing pixel-oriented visualization techniques: Theory and applications. IEEE Trans. Vis. Comput. Graph. 6, 1, 59--78.
[30]
Koffka, K. 1999. Principles of Gestalt Psychology. Psychology Press.
[31]
LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. 2005. Incognito: Efficient full-domain k-anonymity. In Proceedings of the ACM SIGMOD Conference on Management of Data. 49--60.
[32]
Li, N. and Li, T. 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of IEEE International Conference on Data Engineering.
[33]
Lin, Y.-R., Sun, J., Cao, N., and Liu, S. 2010. Contextour: Contextual contour analysis on dynamic multi-relational clustering. In Proceedings of the SDM Conference. 418--429.
[34]
Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam, M. 2007. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1, 1, 3.
[35]
Meyerson, A. and Williams, R. 2004. On the complexity of optimal k-anonymity. In Proceedings of the PODS Symposium on Principles of Database Systems. 223--228.
[36]
Novotn, M. 2004. Visually effective information visualization of large data. In Proceedings of the Central European Seminar on Computer Graphics (CESCG).
[37]
Peng, W., Ward, M. O., and Rundensteiner, E. A. 2004. Clutter reduction in multi-dimensional data visualization using dimension reordering. In Proceedings of the IEEE Symposium on Information Visualization. IEEE Computer Society, 89--96.
[38]
Samarati, P. 2001. Protecting respondents’ identities in microdata release. Trans. Knowl. Discov. Engin. 13, 6, 1010--1027.
[39]
Samarati, P. and Sweeney, L. 1998. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In Proceedings of the IEEE Symposium on Research in Security and Privacy.
[40]
Seo, J. and Shneiderman, B. 2004. A rank-by-feature framework for unsupervised multidimensional data exploration using low dimensional projections. In Proceedings of the IEEE Symposium on Information Visualization (INFOVIS’04). IEEE Computer Society, 65--72.
[41]
Shneiderman, B. 1992. Tree visualization with tree-maps: 2-d space-filling approach. ACM Trans. Graph. 11, 1, 92--99.
[42]
Siirtola, H. 2000. Direct manipulation of parallel coordinates. In Extended Abstracts on Human Factors in Computing Systems (CHI’00). ACM, New York, 119--120.
[43]
Song, Y., Pan, S., Liu, S., Zhou, M. X., and Qian, W. 2009. Topic and keyword re-ranking for lda-based topic modeling. In Proceedings of the ACM CIKM Conference on Information and Knowledge Management (CIKM). 1757--1760.
[44]
Tufte, E. R. 2001. The Visual Display of Quantitative Information 2nd Ed. Graphics Press.
[45]
Ward, J. 1963. Hierarchical grouping to optimize an objective function. J. Amer. Statist. Assoc. 58, 236--244.
[46]
Wong, P. C. and Bergeron, R. D. 1996. Multiresolution multidimensional wavelet brushing. In Proceedings of the 7th Conference on Visualization (VIS’96). IEEE Computer Society Press, 141--ff.
[47]
Wong, P. C. and Bergeron, R. D. 1997. Multivariate visualization using metric scaling. In Proceedings of the 8th Conference on Visualization (VIS’97). IEEE Computer Society Press, 111--ff.
[48]
Yang, J., Peng, W., Ward, M. O., and Rundensteiner, E. A. 2003. Interactive hierarchical dimension ordering, spacing and filtering for exploration of high dimensional datasets. In Proceedings of the IEEE Symposium on Information Visualization.
[49]
Yuan, X., Guo, P., Xiao, H., Zhou, H., and Qu, H. 2009. Scattering points in parallel coordinates. IEEE Trans. Vis. Comput. Graph. 15, 6, 1001--1008.
[50]
Zhou, H., Yuan, X., Qu, H., Cui, W., and Chen, B. 2008. Visual clustering in parallel coordinates. Comput. Graph. Forum 27, 3, 1047--1054.
[51]
Zhou, H., Cui, W., Qu, H., Wu, Y., Yuan, X., and Zhuo, W. 2009. Splatting the lines in parallel coordinates. Comput. Graph. Forum 28, 3, 759--766.

Cited By

View all
  • (2022)SIDEWAYS-2022 @ HT-2022: 7th International Workshop on Social Media World SensorsProceedings of the 7th International Workshop on Social Media World Sensors10.1145/3544795.3544844(1-4)Online publication date: 28-Jun-2022
  • (2022)SIDEWAYS-2022 @ HT-2022: 7th International Workshop on Social Media World SensorsProceedings of the 33rd ACM Conference on Hypertext and Social Media10.1145/3511095.3532573(265-268)Online publication date: 28-Jun-2022
  • (2021)DimLift: Interactive Hierarchical Data Exploration Through Dimensional BundlingIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.305751927:6(2908-2922)Online publication date: 1-Jun-2021
  • Show More Cited By

Index Terms

  1. PhC: Multiresolution Visualization and Exploration of Text Corpora with Parallel Hierarchical Coordinates

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Intelligent Systems and Technology
        ACM Transactions on Intelligent Systems and Technology  Volume 3, Issue 2
        February 2012
        455 pages
        ISSN:2157-6904
        EISSN:2157-6912
        DOI:10.1145/2089094
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 01 February 2012
        Accepted: 01 April 2011
        Revised: 01 February 2011
        Received: 01 July 2010
        Published in TIST Volume 3, Issue 2

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Document set visualization
        2. clutter reduction
        3. parallel coordinates

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)4
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 11 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2022)SIDEWAYS-2022 @ HT-2022: 7th International Workshop on Social Media World SensorsProceedings of the 7th International Workshop on Social Media World Sensors10.1145/3544795.3544844(1-4)Online publication date: 28-Jun-2022
        • (2022)SIDEWAYS-2022 @ HT-2022: 7th International Workshop on Social Media World SensorsProceedings of the 33rd ACM Conference on Hypertext and Social Media10.1145/3511095.3532573(265-268)Online publication date: 28-Jun-2022
        • (2021)DimLift: Interactive Hierarchical Data Exploration Through Dimensional BundlingIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.305751927:6(2908-2922)Online publication date: 1-Jun-2021
        • (2019)HiePaCoBig Data Research10.1016/j.bdr.2019.07.00117:C(1-17)Online publication date: 1-Sep-2019
        • (2018)Parallel hierarchies: A visualization for cross-tabulating hierarchical categoriesComputers & Graphics10.1016/j.cag.2018.07.00976(1-17)Online publication date: Nov-2018
        • (2018)Comparison and Fusion of Methods and Future ResearchVisual Knowledge Discovery and Machine Learning10.1007/978-3-319-73040-0_13(307-317)Online publication date: 19-Jan-2018
        • (2016)Measuring Similarity SimilarlyACM Transactions on Intelligent Systems and Technology10.1145/28905108:1(1-28)Online publication date: 26-Sep-2016

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media