Abstract
Similarity-based exploration of multi-dimensional data sets is a difficult task, in which most techniques do not perform well with large data sets, particularly in handling clutter that invariably happens as data sets grow larger. In this paper, we introduce the Visual SuperTree (VST), a method to build a multi-scale similarity tree that can deal with large data sets at interactive rates, maintaining most of the accuracy and the data organization capabilities of other available methods. The VST is built on top of a clustered multi-level configuration of the data that allows the user to quickly explore data sets by similarity. The method is shown to be useful for both unlabeled and labeled data, and it is capable of revealing external and internal cluster structures. We demonstrate its application on artificial and real data sets, showing additional advantages of the approach when exploring data that can be summarized meaningfully.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bachmaier, C., Brandes, U., Schlieper, B.: Drawing phylogenetic trees. In: Proceedings of International Symposium of Algorithms and Computation, vol. 3827, pp. 1110–1121 (2005)
Balzer, M., Deussen, O., Lewerentz, C.: Voronoi treemaps for the visualization of software metrics. In: Proceedings of ACM Symposium on Software Visualization, pp. 165–172. New York, NY, USA (2005)
Bederson, B.B.: PhotoMesa: a zoomable image browser using quantum treemaps and bubblemaps. In: Proceedings of Annual ACM Symposium on User Interface Software and Technology, pp. 71–80. ACM, Orlando, FL, USA (2001)
Bederson, B.B., Shneiderman, B., Wattenberg, M.: Ordered and quantum treemaps: making effective use of 2D space to display hierarchies. ACM Trans. Graph. 21(4), 833–854 (2002)
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Chalmers, M.: A linear iteration time layout algorithm for visualising high-dimensional data. In: Proceedings of Visualization 1996, pp. 127–131. San Francisco, CA, USA (1996)
Cockburn, A., Karlson, A.K., Bederson, B.B.: A review of overview + detail, zooming, and focus + context interfaces. ACM Comput. Surv. 41(1), 1–31 (2008)
Cuadros, A.M., Paulovich, F.V., Minghim, R., Telles, G.P.: Point placement by phylogenetic trees and its application for visual analysis of document collections. In: IEEE Symposium on Visual Analytics Science and Technology, pp. 99–106. Sacramento, CA, USA (2007)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. Miami, FL, USA (2009)
Eler, D.M., Nakazaki, M.Y., Paulovich, F.V., Santos, D.P., Andery, G.F., Oliveira, M.C.F., Batista-Neto, J., Minghim, R.: Visual analysis of image collections. Vis. Comput. 25(10), 923–937 (2009)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 178–178. Los Alamitos, CA, USA (2004)
Gascuel, O., Steel, M.: Neighbor-joining revealed. Mol. Biol. Evol. 23(11), 1997–2000 (2006)
Gomi, A., Miyazaki, R., Itoh, T., Li, J.: CAT: A hierarchical image browser using a rectangle packing technique. In: Proceedings of International Conference Information Visualisation, pp. 82–87. Columbus, OH, USA (2008)
Ingram, S., Munzner, T., Olano, M.: Glimmer: multilevel MDS on the GPU. IEEE Trans. Vis. Comput. Graph. 15(2), 249–261 (2009)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Joia, P., Coimbra, D., Cuminato, J.A., Paulovich, F.V., Nonato, L.G.: Local affine multidimensional projection. IEEE Trans. Vis. Comput. Graph. 17, 2563–2571 (2011)
Li, J., Wang, J.Z.: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans. Pattern Anal. Mach. Intell. 25, 1075–1088 (2003)
van der Maaten, L.: Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15(1), 3221–3245 (2014)
van der Maaten, L., Hinton, G.E.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Neves, T.T., Fadel, S.G., Hilasaca, G.M., Fatore, F.M., Paulovich, F.V.: Updis: a user-assisted projection technique for distance information. Inf. Vis. 17(4), 269–281 (2018)
Nguyen, Q.V., Huang, M.L.: Space-optimized tree: a connection + enclosure approach for the visualization of large hierarchies. Inf. Vis. 2(1), 3–15 (2003)
Nocaj, A., Brandes, U.: Computing Voronoi treemaps: faster, simpler, and resolution-independent. Comput. Graph. Forum 31(3pt1), 855–864 (2012)
Paiva, J.G., Florian, L., Pedrini, H., Telles, G., Minghim, R.: Improved similarity trees and their application to visual data classification. IEEE Trans. Vis. Comput. Graph. 17(12), 2459–2468 (2011)
Pal, N.R., Bezdek, J.C.: On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 3(3), 370–379 (1995)
Pass, G., Zabih, R., Miller, J.: Comparing images using color coherence vectors. In: Proceedings of ACM International Conference on Multimedia, pp. 65–73. Boston, Massachusetts, USA (1996)
Paulovich, F.V., Nonato, L.G., Minghim, R., Levkowitz, H.: Least square projection: a fast high precision multidimensional projection technique and its application to document mapping. IEEE Trans. Vis. Comput. Graph. 14(3), 564–575 (2008)
Paulovich, F.V., Telles, G.P., Toledo, F.M.B., Minghim, R., Nonato, L.G.: Semantic wordification of document collections. Comput. Graph. Forum 31(3pt3), 1145–1153 (2012)
Pavlopoulos, G.A., Soldatos, T.G., Barbosa-Silva, A., Schneider, R.: A reference guide for tree analysis and visualization. BioData Min. 3(1), 1–24 (2010)
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4(4), 406–425 (1987)
Schulz, H.J.: Treevis.net: a tree visualization reference. IEEE Comput. Graph. Appl. 31(6), 11–15 (2011)
Schulz, H.J., Hadlak, S., Schumann, H.: Point-based tree representation: a new approach for large hierarchies. In: Proceeding of IEEE Pacific Visualization Symposium, pp. 81–88. Beijing, China (2009)
Stehling, R.O., Nascimento, M.A., Falcão, A.X.: A compact and efficient image retrieval approach based on border/interior pixel classification. In: Proceedings of International Conference on Information and Knowledge Management, pp. 102–109. McLean, Virginia, USA (2002)
Tan, L., Song, Y., Liu, S., Xie, L.: ImageHive: interactive content-aware image summarization. IEEE Comput. Graph. Appl. 32(1), 46–55 (2012)
Telles, G.P., Araújo, G.S., Walter, M.E.M.T., Brigido, M.M., Almeida, N.F.: Live neighbor-joining. BMC Bioinf. 19(172), 1–13 (2018)
Ying, A.T.T.: Mining Challenge 2015: Comparing and combining different information sources on the Stack Overflow data set. In: Proceedings of Working Conference on Mining Software Repositories (2015)
Acknowledgements
We would like to thank the reviewers for their helpful suggestions.
Funding
This work was funded by the São Paulo Research Foundation (FAPESP), Grant 2011/18838-5, and the National Council for Scientific and Technological Development (CNPq), Grants 307411/2016-8 and 310299/2018-7.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 1 (mp4 9879 KB)
Rights and permissions
About this article
Cite this article
da Silva, R.R.O., Paiva, J.G.S., Telles, G.P. et al. The Visual SuperTree: similarity-based multi-scale visualization. Vis Comput 35, 1067–1080 (2019). https://doi.org/10.1007/s00371-019-01696-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-019-01696-5