Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

<italic>embComp</italic>: Visual Interactive Comparison of Vector Embeddings

Published: 01 August 2022 Publication History

Abstract

This article introduces <italic>embComp</italic>, a novel approach for comparing two embeddings that capture the similarity between objects, such as word and document embeddings. We survey scenarios where comparing these embedding spaces is useful. From those scenarios, we derive common tasks, introduce visual analysis methods that support these tasks, and combine them into a comprehensive system. One of <italic>embComp</italic>&#x2019;s central features are overview visualizations that are based on metrics for measuring differences in the local structure around objects. Summarizing these local metrics over the embeddings provides global overviews of similarities and differences. Detail views allow comparison of the local structure around selected objects and relating this local information to the global views. Integrating and connecting all of these components, <italic>embComp</italic> supports a range of analysis workflows that help understand similarities and differences between embedding spaces. We assess our approach by applying it in several use cases, including understanding corpora differences via word vector embeddings, and understanding algorithmic differences in generating embeddings.

References

[1]
J. Alaux, E. Grave, M. Cuturi, and A. Joulin, “Unsupervised hyper-alignment for multilingual word embeddings,” in Proc. 7th Int. Conf. Learn. Representations, 2019, pp. 1–11.
[2]
E. Alexander and M. Gleicher, “Task-driven comparison of topic models,” IEEE Trans. Vis. Comput. Graphics, vol. 22, no. 1, pp. 320–329, Jan. 2016.
[3]
D. L. Arendt, N. Nur, Z. Huang, G. Fair, and W. Dou, “Parallel embeddings: A visualization technique for contrasting learned representations,” in Proc. 25th Int. Conf. Intell. User Interfaces, 2020, pp. 259–274.
[4]
J. P. Bagrow, E. M. Bollt, J. D. Skufca, and D. Ben-Avraham, “Portraits of complex networks,” Europhys. Lett., vol. 81, no. 6, 2008, Art. no.
[5]
M. Berger, K. McDonough, and L. M. Seversky, “cite2vec: Citation-driven document exploration via word embeddings,” IEEE Trans. Vis. Comput. Graphics, vol. 23, no. 1, pp. 691–700, Jan. 2017.
[6]
D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” J. Mach. Learn. Res., vol. 3, no. Jan., pp. 993–1022, 2003.
[7]
E. T. Brown, J. Liu, C. E. Brodley, and R. Chang, “Dis-function: Learning distance functions interactively,” in Proc. IEEE Conf. Vis. Analytics Sci. Technol., 2012, pp. 83–92.
[8]
J. Chen, Y. Tao, and H. Lin, “Visual exploration and comparison of word embeddings,” J. Vis. Lang. Comput., vol. 48, pp. 178–186, 2018.
[9]
E. Choiet al., “Multi-layer representation learning for medical concepts,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2016, pp. 1495–1504.
[10]
J. Chuang, D. Ramage, C. Manning, and J. Heer, “Interpretation and trust: Designing model-driven visualizations for text analysis,” in Proc. SIGCHI Conf. Hum. Factors Comput. Syst., 2012, pp. 443–452.
[11]
P. J. Crossno, A. T. Wilson, T. M. Shead, and D. M. Dunlavy, “TopicView: Visually comparing topic models of text collections,” in Proc. IEEE 23rd Int. Conf. Tools Artif. Intell., 2011, pp. 936–943.
[12]
R. Cutura, S. Holzer, M. Aupetit, and M. Sedlmair, “Viscoder: A tool for visually comparing dimensionality reduction algorithms,” in Proc. 26th Eur. Symp. Artif. Neural Netw., 2018, pp. 1–11.
[13]
K. Etemad and R. Chellappa, “Discriminant analysis for recognition of human face images,” J. Opt. Soc. Amer. A, vol. 14, no. 8, pp. 1724–1733, 1997.
[14]
M. Gleicher, “Considerations for visualizing comparison,” IEEE Trans. Vis. Comput. Graphics, vol. 24, no. 1, pp. 413–423, Jan. 2018.
[15]
M. Gleicher, A. Barve, X. Yu, and F. Heimerl, “Boxer: Interactive comparison of classifier results,” Comput. Graph. Forums, vol. 39, no. 3, pp. 181–193, 2020.
[16]
E. Grave, A. Joulin, and Q. Berthet, “Unsupervised alignment of embeddings with wasserstein procrustes,” in Proc. 22nd Int. Conf. Artif. Intell. Statist., 2019, pp. 1880–1890.
[17]
A. Grover and J. Leskovec, “Node2Vec: Scalable feature learning for networks,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2016, pp. 855–864.
[18]
W. L. Hamilton, J. Leskovec, and D. Jurafsky, “Diachronic word embeddings reveal statistical laws of semantic change,” in Proc. 54th Annu. Meeting Assoc. Comput. Linguistics, 2016, pp. 1489–1501.
[19]
F. Heimerl, C.-C. Chang, A. Sarikaya, and M. Gleicher, “Visual designs for binned aggregation of multi-class scatterplots,” 2018,.
[20]
F. Heimerl and M. Gleicher, “Interactive analysis of word vector embeddings,” Comput. Graph. Forum, vol. 37, no. 3, pp. 253–265, 2018.
[21]
F. Heimerl, M. John, Q. Han, S. Koch, and T. Ertl, “DocuCompass: Effective exploration of document landscapes,” in Proc. IEEE Conf. Vis. Analytics Sci. Technol., 2016, pp. 11–20.
[22]
G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006.
[23]
Y. Hristov, A. Lascarides, and S. Ramamoorthy, “Interpretable latent spaces for learning from demonstration,” in Proc. 2nd Conf. Robot Learn., 2018, pp. 957–968.
[24]
S. Ingram, T. Munzner, V. Irvine, M. Tory, S. Bergner, and T. Möller, “DimStiller: Workflows for dimensional analysis and reduction,” in Proc. IEEE Symp. Vis. Analytics Sci. Technol., 2010, pp. 3–10.
[25]
P. Isenberget al., “Vispubdata.org: A metadata collection about IEEE visualization (VIS) publications,” IEEE Trans. Vis. Comput. Graphics, vol. 23, no. 9, pp. 2199–2206, Sep. 2017.
[26]
A. Jatowt and K. Duh, “A framework for analyzing semantic change of words across time,” in Proc. IEEE/ACM Joint Conf. Digit. Libraries, 2014, pp. 229–238.
[27]
S. Kairam, D. MacLean, M. Savva, and J. Heer, “GraphPrism: Compact visualization of network structure,” in Proc. Int. Work. Conf. Adv. Vis. Interfaces, 2012, pp. 498–505.
[28]
M. Köper, M. Zaiß, Q. Han, S. Koch, and S. S. im Walde, “Visualisation and exploration of high-dimensional distributional features in lexical semantic classification,” in Proc. 10th Int. Conf. Lang. Resour. Eval., 2016, pp. 1202–1206.
[29]
J. B. Kruskal, “Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis,” Psychometrika, vol. 29, no. 1, pp. 1–27, 1964.
[30]
B. C. Kwonet al., “Clustervision: Visual supervision of unsupervised clustering,” IEEE Trans. Vis. Comput. Graphics, vol. 24, no. 1, pp. 142–151, Jan. 2018.
[31]
H. Lee, J. Kihm, J. Choo, J. Stasko, and H. Park, “iVisClustering: An interactive visual document clustering via topic modeling,” Comput. Graph. Forum, vol. 31, no. 3pt3, pp. 1155–1164, 2012.
[32]
J. A. Lee and M. Verleysen, “Quality assessment of dimensionality reduction: Rank-based criteria,” Neurocomputing, vol. 72, no. 7–9, pp. 1431–1443, 2009.
[33]
Q. Li, K. S. Njotoprawiro, H. Haleem, Q. Chen, C. Yi, and X. Ma, “EmbeddingVis: A visual analytics approach to comparative network embedding inspection,” in Proc. IEEE Conf. Vis. Analytics Sci. Technol., 2018, pp. 48–59.
[34]
S. Liuet al., “Visual exploration of semantic relationships in neural word embeddings,” IEEE Trans. Vis. Comput. Graphics, vol. 24, no. 1, pp. 553–562, Jan. 2018.
[35]
S. Liu, D. Maljovec, B. Wang, P.-T. Bremer, and V. Pascucci, “Visualizing high-dimensional data: Advances in the past decade,” IEEE Trans. Vis. Comput. Graphics, vol. 23, no. 3, pp. 1249–1268, Mar. 2017.
[36]
Y. Liu, E. Jun, Q. Li, and J. Heer, “Latent space cartography: Visual analysis of vector space embeddings,” Comput. Graph. Forum, vol. 38, no. 3, pp. 67–78, 2019.
[37]
L. V. D. Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, no. Nov., pp. 2579–2605, 2008.
[38]
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 2013,.
[39]
T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin, “Advances in pre-training distributed word representations,” in Proc. 11th Int. Conf. Lang. Resources Eval., 2018, pp. 52–55.
[40]
B. Mokbel, W. Lueks, A. Gisbrecht, and B. Hammer, “Visualizing the quality of dimensionality reduction,” Neurocomputing, vol. 112, pp. 109–123, 2013.
[41]
T. Mühlbacher, L. Linhardt, T. Möller, and H. Piringer, “TreePOD: Sensitivity-aware selection of Pareto-optimal decision trees,” IEEE Trans. Vis. Comput. Graphics, vol. 24, no. 1, pp. 174–183, Jan. 2018.
[42]
T. Munzner, Visualization Analysis and Design. Boca Raton, FL, USA: CRC Press, 2014.
[43]
D. Park, S. Kim, J. Lee, J. Choo, N. Diakopoulos, and N. Elmqvist, “ConceptVector: Text visual analytics via interactive lexicon building using word embedding,” IEEE Trans. Vis. Comput. Graphics, vol. 24, no. 1, pp. 361–370, Jan. 2018.
[44]
J. Pennington, R. Socher, and C. Manning, “GloVe: Global vectors for word representation,” in Proc. Conf. Empir. Methods Natural Lang. Process., 2014, pp. 1532–1543.
[45]
B. Perozzi, R. Al-Rfou, and S. Skiena, “DeepWalk: Online learning of social representations,” in Proc. 20th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2014, pp. 701–710.
[46]
M. Peterset al., “Deep contextualized word representations,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol., 2018, pp. 2227–2237.
[47]
P. Molino, Y. Wang, and J. Zhang, “Parallax: Visualizing and understanding the semantics of embedding spaces via algebraic formulae,” in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics: Syst. Demonstrations, 2019, pp. 1–11.
[48]
H. Piringer, S. Pajer, W. Berger, and H. Teichmann, “Comparative visual analysis of 2D function ensembles,” Comput. Graph. Forum, vol. 31, pp. 1195–1204, 2012.
[49]
J. C. Roberts, “State of the art: Coordinated & multiple views in exploratory visualization,” in Proc. 5th Int. Conf. Coordinated Multiple Views Exploratory Vis., 2007, pp. 61–71.
[50]
S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000.
[51]
A. Sarikaya, M. Gleicher, and D. Szafir, “Design factors for summary visualization in visual analytics,” Comput. Graph. Forum, vol. 37, pp. 145–156, 2018.
[52]
D. Smilkov, N. Thorat, C. Nicholson, E. Reif, F. B. Viégas, and M. Wattenberg, “Embedding projector: Interactive visualization and interpretation of embeddings,” in Proc. Workshop Interpretable Mach. Learn. Complex Syst., 2016, pp. 1–11.
[53]
A. Sopan, M. Freire, M. Taieb-Maimon, J. Golbeck, and B. Shneiderman, “Exploring distributions: Design and evaluation,” Hum.-Comput. Interact. Lab, Univ. Maryland, College Park, MD, USA, 2010.
[54]
S. Sra and I. S. Dhillon, “Generalized nonnegative matrix approximations with Bregman divergences,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2006, pp. 283–290.
[55]
J. Stahnke, M. Dörk, B. Müller, and A. Thom, “Probing projections: Interaction techniques for interpreting arrangements and errors of dimensionality reductions,” IEEE Trans. Vis. Comput. Graphics, vol. 22, no. 1, pp. 629–638, Jan. 2016.
[56]
D. A. Szafir, S. Haroz, M. Gleicher, and S. Franconeri, “Four types of ensemble coding in data visualizations,” J. Vis., vol. 16, no. 5, Mar. 2016, Art. no.
[57]
Text Creation Partnership, “EEBO-TCP: Early english books online,” 2015. [Online]. Available: https://www.textcreationpartnership.org/tcp-eebo/
[58]
B. Thompson, S. Roberts, and G. Lupyan, “Quantifying semantic similarity across languages,” in Proc. 40th Annu. Conf. Cogn. Sci. Soc., 2018, pp. 1–11.
[59]
F. van Ham and A. Perer, “, “Search, show context, expand on demand”: Supporting large graph exploration with degree-of-interest,” IEEE Trans. Vis. Comput. Graphics, vol. 15, no. 6, pp. 953–960, Nov./Dec. 2009.
[60]
J. Venna and S. Kaski, “Neighborhood preservation in nonlinear projection methods: An experimental study,” in Proc. Int. Conf. Artif. Neural Netw., 2001, pp. 485–491.
[61]
S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,” Chemometrics Intell. Lab. Syst., vol. 2, no. 1–3, pp. 37–52, 1987.
[62]
Y. Yuan, G. Xun, Q. Suo, K. Jia, and A. Zhang, “Wave2Vec: Learning deep representations for biosignals,” in Proc. IEEE Int. Conf. Data Mining, 2017, pp. 1159–1164.
[63]
G. Zhong, Y. Chherawala, and M. Cheriet, “An empirical evaluation of supervised dimensionality reduction for recognition,” in Proc. 12th Int. Conf. Document Anal. Recognit., 2013, pp. 1315–1319.

Cited By

View all
  • (2025)A General Framework for Comparing Embedding Visualizations Across Class-Label HierarchiesIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345637031:1(283-293)Online publication date: 1-Jan-2025
  • (2024)ModalChorus: Visual Probing and Alignment of Multi-Modal Embeddings via Modal Fusion MapIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345638731:1(294-304)Online publication date: 9-Sep-2024
  • (2024)LLM Comparator: Interactive Analysis of Side-by-Side Evaluation of Large Language ModelsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345635431:1(503-513)Online publication date: 10-Sep-2024
  • Show More Cited By

Index Terms

  1. embComp: Visual Interactive Comparison of Vector Embeddings
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image IEEE Transactions on Visualization and Computer Graphics
          IEEE Transactions on Visualization and Computer Graphics  Volume 28, Issue 8
          Aug. 2022
          261 pages

          Publisher

          IEEE Educational Activities Department

          United States

          Publication History

          Published: 01 August 2022

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 06 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2025)A General Framework for Comparing Embedding Visualizations Across Class-Label HierarchiesIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345637031:1(283-293)Online publication date: 1-Jan-2025
          • (2024)ModalChorus: Visual Probing and Alignment of Multi-Modal Embeddings via Modal Fusion MapIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345638731:1(294-304)Online publication date: 9-Sep-2024
          • (2024)LLM Comparator: Interactive Analysis of Side-by-Side Evaluation of Large Language ModelsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345635431:1(503-513)Online publication date: 10-Sep-2024
          • (2024)Visual Analytics for Machine Learning: A Data Perspective SurveyIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.335706530:12(7637-7656)Online publication date: 1-Dec-2024
          • (2022)Emblaze: Illuminating Machine Learning Representations through Interactive Comparison of Embedding SpacesProceedings of the 27th International Conference on Intelligent User Interfaces10.1145/3490099.3511137(418-432)Online publication date: 22-Mar-2022
          • (2022)Embedding Comparator: Visualizing Differences in Global Structure and Local Neighborhoods via Small MultiplesProceedings of the 27th International Conference on Intelligent User Interfaces10.1145/3490099.3511122(746-766)Online publication date: 22-Mar-2022
          • (undefined)VERB: Visualizing and Interpreting Bias Mitigation Techniques Geometrically for Word RepresentationsACM Transactions on Interactive Intelligent Systems10.1145/3604433

          View Options

          View options

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media