research-article

<italic>embComp</italic>: Visual Interactive Comparison of Vector Embeddings

Authors:

Florian Heimerl,

Christoph Kralj,

Torsten Möller,

Michael GleicherAuthors Info & Claims

IEEE Transactions on Visualization and Computer Graphics, Volume 28, Issue 8

Pages 2953 - 2969

https://doi.org/10.1109/TVCG.2020.3045918

Published: 01 August 2022 Publication History

Abstract

This article introduces <italic>embComp</italic>, a novel approach for comparing two embeddings that capture the similarity between objects, such as word and document embeddings. We survey scenarios where comparing these embedding spaces is useful. From those scenarios, we derive common tasks, introduce visual analysis methods that support these tasks, and combine them into a comprehensive system. One of <italic>embComp</italic>’s central features are overview visualizations that are based on metrics for measuring differences in the local structure around objects. Summarizing these local metrics over the embeddings provides global overviews of similarities and differences. Detail views allow comparison of the local structure around selected objects and relating this local information to the global views. Integrating and connecting all of these components, <italic>embComp</italic> supports a range of analysis workflows that help understand similarities and differences between embedding spaces. We assess our approach by applying it in several use cases, including understanding corpora differences via word vector embeddings, and understanding algorithmic differences in generating embeddings.

References

[1]

J. Alaux, E. Grave, M. Cuturi, and A. Joulin, “Unsupervised hyper-alignment for multilingual word embeddings,” in Proc. 7th Int. Conf. Learn. Representations, 2019, pp. 1–11.

[2]

E. Alexander and M. Gleicher, “Task-driven comparison of topic models,” IEEE Trans. Vis. Comput. Graphics, vol. 22, no. 1, pp. 320–329, Jan. 2016.

Digital Library

[3]

D. L. Arendt, N. Nur, Z. Huang, G. Fair, and W. Dou, “Parallel embeddings: A visualization technique for contrasting learned representations,” in Proc. 25th Int. Conf. Intell. User Interfaces, 2020, pp. 259–274.

[4]

J. P. Bagrow, E. M. Bollt, J. D. Skufca, and D. Ben-Avraham, “Portraits of complex networks,” Europhys. Lett., vol. 81, no. 6, 2008, Art. no.

[5]

M. Berger, K. McDonough, and L. M. Seversky, “cite2vec: Citation-driven document exploration via word embeddings,” IEEE Trans. Vis. Comput. Graphics, vol. 23, no. 1, pp. 691–700, Jan. 2017.

Digital Library

[6]

D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” J. Mach. Learn. Res., vol. 3, no. Jan., pp. 993–1022, 2003.

[7]

E. T. Brown, J. Liu, C. E. Brodley, and R. Chang, “Dis-function: Learning distance functions interactively,” in Proc. IEEE Conf. Vis. Analytics Sci. Technol., 2012, pp. 83–92.

[8]

J. Chen, Y. Tao, and H. Lin, “Visual exploration and comparison of word embeddings,” J. Vis. Lang. Comput., vol. 48, pp. 178–186, 2018.

[9]

E. Choiet al., “Multi-layer representation learning for medical concepts,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2016, pp. 1495–1504.

[10]

J. Chuang, D. Ramage, C. Manning, and J. Heer, “Interpretation and trust: Designing model-driven visualizations for text analysis,” in Proc. SIGCHI Conf. Hum. Factors Comput. Syst., 2012, pp. 443–452.

[11]

P. J. Crossno, A. T. Wilson, T. M. Shead, and D. M. Dunlavy, “TopicView: Visually comparing topic models of text collections,” in Proc. IEEE 23rd Int. Conf. Tools Artif. Intell., 2011, pp. 936–943.

[12]

R. Cutura, S. Holzer, M. Aupetit, and M. Sedlmair, “Viscoder: A tool for visually comparing dimensionality reduction algorithms,” in Proc. 26th Eur. Symp. Artif. Neural Netw., 2018, pp. 1–11.

[13]

K. Etemad and R. Chellappa, “Discriminant analysis for recognition of human face images,” J. Opt. Soc. Amer. A, vol. 14, no. 8, pp. 1724–1733, 1997.

[14]

M. Gleicher, “Considerations for visualizing comparison,” IEEE Trans. Vis. Comput. Graphics, vol. 24, no. 1, pp. 413–423, Jan. 2018.

[15]

M. Gleicher, A. Barve, X. Yu, and F. Heimerl, “Boxer: Interactive comparison of classifier results,” Comput. Graph. Forums, vol. 39, no. 3, pp. 181–193, 2020.

[16]

E. Grave, A. Joulin, and Q. Berthet, “Unsupervised alignment of embeddings with wasserstein procrustes,” in Proc. 22nd Int. Conf. Artif. Intell. Statist., 2019, pp. 1880–1890.

[17]

A. Grover and J. Leskovec, “Node2Vec: Scalable feature learning for networks,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2016, pp. 855–864.

[18]

W. L. Hamilton, J. Leskovec, and D. Jurafsky, “Diachronic word embeddings reveal statistical laws of semantic change,” in Proc. 54th Annu. Meeting Assoc. Comput. Linguistics, 2016, pp. 1489–1501.

[19]

F. Heimerl, C.-C. Chang, A. Sarikaya, and M. Gleicher, “Visual designs for binned aggregation of multi-class scatterplots,” 2018,.

[20]

F. Heimerl and M. Gleicher, “Interactive analysis of word vector embeddings,” Comput. Graph. Forum, vol. 37, no. 3, pp. 253–265, 2018.

[21]

F. Heimerl, M. John, Q. Han, S. Koch, and T. Ertl, “DocuCompass: Effective exploration of document landscapes,” in Proc. IEEE Conf. Vis. Analytics Sci. Technol., 2016, pp. 11–20.

[22]

G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006.

[23]

Y. Hristov, A. Lascarides, and S. Ramamoorthy, “Interpretable latent spaces for learning from demonstration,” in Proc. 2nd Conf. Robot Learn., 2018, pp. 957–968.

[24]

S. Ingram, T. Munzner, V. Irvine, M. Tory, S. Bergner, and T. Möller, “DimStiller: Workflows for dimensional analysis and reduction,” in Proc. IEEE Symp. Vis. Analytics Sci. Technol., 2010, pp. 3–10.

[25]

P. Isenberget al., “Vispubdata.org: A metadata collection about IEEE visualization (VIS) publications,” IEEE Trans. Vis. Comput. Graphics, vol. 23, no. 9, pp. 2199–2206, Sep. 2017.

Digital Library

[26]

A. Jatowt and K. Duh, “A framework for analyzing semantic change of words across time,” in Proc. IEEE/ACM Joint Conf. Digit. Libraries, 2014, pp. 229–238.

[27]

S. Kairam, D. MacLean, M. Savva, and J. Heer, “GraphPrism: Compact visualization of network structure,” in Proc. Int. Work. Conf. Adv. Vis. Interfaces, 2012, pp. 498–505.

[28]

M. Köper, M. Zaiß, Q. Han, S. Koch, and S. S. im Walde, “Visualisation and exploration of high-dimensional distributional features in lexical semantic classification,” in Proc. 10th Int. Conf. Lang. Resour. Eval., 2016, pp. 1202–1206.

[29]

J. B. Kruskal, “Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis,” Psychometrika, vol. 29, no. 1, pp. 1–27, 1964.

[30]

B. C. Kwonet al., “Clustervision: Visual supervision of unsupervised clustering,” IEEE Trans. Vis. Comput. Graphics, vol. 24, no. 1, pp. 142–151, Jan. 2018.

[31]

H. Lee, J. Kihm, J. Choo, J. Stasko, and H. Park, “iVisClustering: An interactive visual document clustering via topic modeling,” Comput. Graph. Forum, vol. 31, no. 3pt3, pp. 1155–1164, 2012.

Digital Library

[32]

J. A. Lee and M. Verleysen, “Quality assessment of dimensionality reduction: Rank-based criteria,” Neurocomputing, vol. 72, no. 7–9, pp. 1431–1443, 2009.

Digital Library

[33]

Q. Li, K. S. Njotoprawiro, H. Haleem, Q. Chen, C. Yi, and X. Ma, “EmbeddingVis: A visual analytics approach to comparative network embedding inspection,” in Proc. IEEE Conf. Vis. Analytics Sci. Technol., 2018, pp. 48–59.

[34]

S. Liuet al., “Visual exploration of semantic relationships in neural word embeddings,” IEEE Trans. Vis. Comput. Graphics, vol. 24, no. 1, pp. 553–562, Jan. 2018.

[35]

S. Liu, D. Maljovec, B. Wang, P.-T. Bremer, and V. Pascucci, “Visualizing high-dimensional data: Advances in the past decade,” IEEE Trans. Vis. Comput. Graphics, vol. 23, no. 3, pp. 1249–1268, Mar. 2017.

Digital Library

[36]

Y. Liu, E. Jun, Q. Li, and J. Heer, “Latent space cartography: Visual analysis of vector space embeddings,” Comput. Graph. Forum, vol. 38, no. 3, pp. 67–78, 2019.

[37]

L. V. D. Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, no. Nov., pp. 2579–2605, 2008.

[38]

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 2013,.

[39]

T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin, “Advances in pre-training distributed word representations,” in Proc. 11th Int. Conf. Lang. Resources Eval., 2018, pp. 52–55.

[40]

B. Mokbel, W. Lueks, A. Gisbrecht, and B. Hammer, “Visualizing the quality of dimensionality reduction,” Neurocomputing, vol. 112, pp. 109–123, 2013.

Digital Library

[41]

T. Mühlbacher, L. Linhardt, T. Möller, and H. Piringer, “TreePOD: Sensitivity-aware selection of Pareto-optimal decision trees,” IEEE Trans. Vis. Comput. Graphics, vol. 24, no. 1, pp. 174–183, Jan. 2018.

[42]

T. Munzner, Visualization Analysis and Design. Boca Raton, FL, USA: CRC Press, 2014.

[43]

D. Park, S. Kim, J. Lee, J. Choo, N. Diakopoulos, and N. Elmqvist, “ConceptVector: Text visual analytics via interactive lexicon building using word embedding,” IEEE Trans. Vis. Comput. Graphics, vol. 24, no. 1, pp. 361–370, Jan. 2018.

[44]

J. Pennington, R. Socher, and C. Manning, “GloVe: Global vectors for word representation,” in Proc. Conf. Empir. Methods Natural Lang. Process., 2014, pp. 1532–1543.

[45]

B. Perozzi, R. Al-Rfou, and S. Skiena, “DeepWalk: Online learning of social representations,” in Proc. 20th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2014, pp. 701–710.

[46]

M. Peterset al., “Deep contextualized word representations,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol., 2018, pp. 2227–2237.

[47]

P. Molino, Y. Wang, and J. Zhang, “Parallax: Visualizing and understanding the semantics of embedding spaces via algebraic formulae,” in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics: Syst. Demonstrations, 2019, pp. 1–11.

[48]

H. Piringer, S. Pajer, W. Berger, and H. Teichmann, “Comparative visual analysis of 2D function ensembles,” Comput. Graph. Forum, vol. 31, pp. 1195–1204, 2012.

Digital Library

[49]

J. C. Roberts, “State of the art: Coordinated & multiple views in exploratory visualization,” in Proc. 5th Int. Conf. Coordinated Multiple Views Exploratory Vis., 2007, pp. 61–71.

[50]

S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000.

[51]

A. Sarikaya, M. Gleicher, and D. Szafir, “Design factors for summary visualization in visual analytics,” Comput. Graph. Forum, vol. 37, pp. 145–156, 2018.

[52]

D. Smilkov, N. Thorat, C. Nicholson, E. Reif, F. B. Viégas, and M. Wattenberg, “Embedding projector: Interactive visualization and interpretation of embeddings,” in Proc. Workshop Interpretable Mach. Learn. Complex Syst., 2016, pp. 1–11.

[53]

A. Sopan, M. Freire, M. Taieb-Maimon, J. Golbeck, and B. Shneiderman, “Exploring distributions: Design and evaluation,” Hum.-Comput. Interact. Lab, Univ. Maryland, College Park, MD, USA, 2010.

[54]

S. Sra and I. S. Dhillon, “Generalized nonnegative matrix approximations with Bregman divergences,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2006, pp. 283–290.

[55]

J. Stahnke, M. Dörk, B. Müller, and A. Thom, “Probing projections: Interaction techniques for interpreting arrangements and errors of dimensionality reductions,” IEEE Trans. Vis. Comput. Graphics, vol. 22, no. 1, pp. 629–638, Jan. 2016.

Digital Library

[56]

D. A. Szafir, S. Haroz, M. Gleicher, and S. Franconeri, “Four types of ensemble coding in data visualizations,” J. Vis., vol. 16, no. 5, Mar. 2016, Art. no.

[57]

Text Creation Partnership, “EEBO-TCP: Early english books online,” 2015. [Online]. Available: https://www.textcreationpartnership.org/tcp-eebo/

[58]

B. Thompson, S. Roberts, and G. Lupyan, “Quantifying semantic similarity across languages,” in Proc. 40th Annu. Conf. Cogn. Sci. Soc., 2018, pp. 1–11.

[59]

F. van Ham and A. Perer, “, “Search, show context, expand on demand”: Supporting large graph exploration with degree-of-interest,” IEEE Trans. Vis. Comput. Graphics, vol. 15, no. 6, pp. 953–960, Nov./Dec. 2009.

Digital Library

[60]

J. Venna and S. Kaski, “Neighborhood preservation in nonlinear projection methods: An experimental study,” in Proc. Int. Conf. Artif. Neural Netw., 2001, pp. 485–491.

[61]

S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,” Chemometrics Intell. Lab. Syst., vol. 2, no. 1–3, pp. 37–52, 1987.

[62]

Y. Yuan, G. Xun, Q. Suo, K. Jia, and A. Zhang, “Wave2Vec: Learning deep representations for biosignals,” in Proc. IEEE Int. Conf. Data Mining, 2017, pp. 1159–1164.

[63]

G. Zhong, Y. Chherawala, and M. Cheriet, “An empirical evaluation of supervised dimensionality reduction for recognition,” in Proc. 12th Int. Conf. Document Anal. Recognit., 2013, pp. 1315–1319.

Cited By

Manz TLekschas FGreene EFinak GGehlenborg N(2025)A General Framework for Comparing Embedding Visualizations Across Class-Label HierarchiesIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345637031:1(283-293)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TVCG.2024.3456370
Ye YXiao SZeng XZeng W(2024)ModalChorus: Visual Probing and Alignment of Multi-Modal Embeddings via Modal Fusion MapIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345638731:1(294-304)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1109/TVCG.2024.3456387
Kahng MTenney IPushkarna MLiu MWexler JReif EKallarackal KChang MTerry MDixon L(2024)LLM Comparator: Interactive Analysis of Side-by-Side Evaluation of Large Language ModelsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345635431:1(503-513)Online publication date: 10-Sep-2024
https://dl.acm.org/doi/10.1109/TVCG.2024.3456354
Show More Cited By

Index Terms

embComp: Visual Interactive Comparison of Vector Embeddings

Index terms have been assigned to the content through auto-classification.

Recommendations

Reversible watermarking of 2D-vector data
MM&Sec '04: Proceedings of the 2004 workshop on Multimedia and security

This paper presents a reversible watermarking scheme for the 2D-vector data (point coordinates) which are popularly used in geographical information related applications. This reversible watermarking scheme exploits the high correlation among points in ...
Sentence modeling via multiple word embeddings and multi-level comparison for semantic textual similarity
Highlights
- Encoding sentence via multiple pre-trained word embeddings.
- Evaluating sentence ...
Abstract
Recently, using a pretrained word embedding to represent words achieves success in many natural language processing tasks. According to objective functions, different word embedding models capture different aspects of linguistic ...
Multipurpose image watermarking algorithm based on multistage vector quantization

The rapid growth of digital multimedia and Internet technologies has made copyright protection, copy protection, and integrity verification three important issues in the digital world. To solve these problems, the digital watermarking technique has been ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Visualization and Computer Graphics

IEEE Transactions on Visualization and Computer Graphics Volume 28, Issue 8

Aug. 2022

261 pages

ISSN:1077-2626

Issue’s Table of Contents

1077-2626 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 August 2022

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Manz TLekschas FGreene EFinak GGehlenborg N(2025)A General Framework for Comparing Embedding Visualizations Across Class-Label HierarchiesIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345637031:1(283-293)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TVCG.2024.3456370
Ye YXiao SZeng XZeng W(2024)ModalChorus: Visual Probing and Alignment of Multi-Modal Embeddings via Modal Fusion MapIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345638731:1(294-304)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1109/TVCG.2024.3456387
Kahng MTenney IPushkarna MLiu MWexler JReif EKallarackal KChang MTerry MDixon L(2024)LLM Comparator: Interactive Analysis of Side-by-Side Evaluation of Large Language ModelsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345635431:1(503-513)Online publication date: 10-Sep-2024
https://dl.acm.org/doi/10.1109/TVCG.2024.3456354
Wang JLiu SZhang W(2024)Visual Analytics for Machine Learning: A Data Perspective SurveyIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.335706530:12(7637-7656)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1109/TVCG.2024.3357065
Sivaraman VWu YPerer A(2022)Emblaze: Illuminating Machine Learning Representations through Interactive Comparison of Embedding SpacesProceedings of the 27th International Conference on Intelligent User Interfaces10.1145/3490099.3511137(418-432)Online publication date: 22-Mar-2022
https://dl.acm.org/doi/10.1145/3490099.3511137
Boggust ACarter BSatyanarayan A(2022)Embedding Comparator: Visualizing Differences in Global Structure and Local Neighborhoods via Small MultiplesProceedings of the 27th International Conference on Intelligent User Interfaces10.1145/3490099.3511122(746-766)Online publication date: 22-Mar-2022
https://dl.acm.org/doi/10.1145/3490099.3511122
Rathore ADev SPhillips JSrikumar VZheng YYeh CWang JZhang WWang B(undefined)VERB: Visualizing and Interpreting Bias Mitigation Techniques Geometrically for Word RepresentationsACM Transactions on Interactive Intelligent Systems10.1145/3604433
https://dl.acm.org/doi/10.1145/3604433

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents