research-article

Open access

Emblaze: Illuminating Machine Learning Representations through Interactive Comparison of Embedding Spaces

Authors:

Venkatesh Sivaraman,

Adam PererAuthors Info & Claims

IUI '22: Proceedings of the 27th International Conference on Intelligent User Interfaces

Pages 418 - 432

https://doi.org/10.1145/3490099.3511137

Published: 22 March 2022 Publication History

All formats PDF

Abstract

Modern machine learning techniques commonly rely on complex, high-dimensional embedding representations to capture underlying structure in the data and improve performance. In order to characterize model flaws and choose a desirable representation, model builders often need to compare across multiple embedding spaces, a challenging analytical task supported by few existing tools. We first interviewed nine embedding experts in a variety of fields to characterize the diverse challenges they face and techniques they use when analyzing embedding spaces. Informed by these perspectives, we developed a novel system called Emblaze that integrates embedding space comparison within a computational notebook environment. Emblaze uses an animated, interactive scatter plot with a novel Star Trail augmentation to enable visual comparison. It also employs novel neighborhood analysis and clustering procedures to dynamically suggest groups of points with interesting changes between spaces. Through a series of case studies with ML experts, we demonstrate how interactive comparison with Emblaze can help gain new insights into embedding space structure.

References

[1]

Dustin L. Arendt, Nasheen Nur, Zhuanyi Huang, Gabriel Fair, and Wenwen Dou. 2020. Parallel embeddings: A visualization technique for contrasting learned representations. International Conference on Intelligent User Interfaces, Proceedings IUI (2020), 259–274. https://doi.org/10.1145/3377325.3377514

Digital Library

[2]

Michaël Aupetit. 2007. Visualizing distortions and recovering topology in continuous projection techniques. Neurocomputing 70, 7-9 (2007), 1304–1330. https://doi.org/10.1016/j.neucom.2006.11.018

Digital Library

[3]

Etienne Becht, Leland McInnes, John Healy, Charles Antoine Dutertre, Immanuel W.H. Kwok, Lai Guan Ng, Florent Ginhoux, and Evan W. Newell. 2019. Dimensionality reduction for visualizing single-cell data using UMAP. Nature Biotechnology 37, 1 (2019), 38–47. https://doi.org/10.1038/nbt.4314

[4]

Angie Boggust, Brandon Carter, and Arvind Satyanarayan. 2019. Embedding comparator: Visualizing differences in global structure and local neighborhoods via small multiples. arXiv (2019). arxiv:1912.04853

[5]

Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. 30th Conference on Neural Information Processing SystemsNIPS 2016 (2016), 1–9. arxiv:1607.06520https://code.google.com/archive/p/word2vec/

[6]

Tara Chari, Joeyta Banerjee, and Lior Pachter. 2021. The Specious Art of Single-Cell Genomics. BioRxiv (2021), 1–23. https://doi.org/10.1101/2021.08.25.457696

[7]

Juntian Chen, Yubo Tao, and Hai Lin. 2018. Visual exploration and comparison of word embeddings. Journal of Visual Languages and Computing 48, July (2018), 178–186. https://doi.org/10.1016/j.jvlc.2018.08.008

[8]

Sunghyo Chung, Cheonbok Park, Sangho Suh, Kyeongpil Kang, Jaegul Choo, and Bum Chul Kwon. 2016. ReVACNN: Steering Convolutional Neural Network via Real-Time Visual Analytics. KDD Workshop on Interactive Data Exploration and AnalyticsNips (2016), 1–8. http://www.filmnips.com/wp-content/uploads/2016/11/FILM-NIPS2016_paper_32.pdf

[9]

Paulo Cortez, António Cerdeira, Fernando Almeida, Telmo Matos, and José Reis. 2009. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems 47, 4 (2009), 547–553. https://doi.org/10.1016/j.dss.2009.05.016

Digital Library

[10]

Rene Cutura, Michaël Aupetit, Jean Daniel Fekete, and Michael Sedlmair. 2020. Comparing and Exploring High-Dimensional Data with Dimensionality Reduction Algorithms and Matrix Visualizations. ACM International Conference Proceeding Series (2020). https://doi.org/10.1145/3399715.3399875

Digital Library

[11]

Jiarui Ding, Anne Condon, and Sohrab P Shah. 2018. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models. Nature Communications 9, 1 (2018), 2002. https://doi.org/10.1038/s41467-018-04368-5

[12]

Niklas Elmqvist, Pierre Dragicevic, and Jean Daniel Fekete. 2008. Rolling the dice: Multidimensional visual exploration using scatterplot matrix navigation. IEEE Transactions on Visualization and Computer Graphics 14, 6(2008), 1141–1148. https://doi.org/10.1109/TVCG.2008.153

[13]

Takanori Fujiwara, Xinhai Wei, Jian Zhao, and Kwan-Liu Ma. 2021. Interactive Dimensionality Reduction for Comparative Analysis. (2021). arxiv:2106.15481http://arxiv.org/abs/2106.15481

[14]

Michael Gleicher, Danielle Albers, Rick Walker, Ilir Jusufi, Charles D. Hansen, and Jonathan C. Roberts. 2011. Visual comparison for information visualization. Information Visualization 10, 4 (2011), 289–309. https://doi.org/10.1177/1473871611416549

Digital Library

[15]

William L. Hamilton, Jure Leskovec, and Dan Jurafsky. 2016. Diachronic word embeddings reveal statistical laws of semantic change. 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers 3(2016), 1489–1501. https://doi.org/10.18653/v1/p16-1141 arxiv:1605.09096

[16]

Jeffrey Heer and George G. Robertson. 2007. Animated transitions in statistical data graphics. IEEE Transactions on Visualization and Computer Graphics 13, 6(2007), 1240–1247. https://doi.org/10.1109/TVCG.2007.70539

Digital Library

[17]

F. Heimerl and M. Gleicher. 2018. Interactive Analysis of Word Vector Embeddings. Computer Graphics Forum 37, 3 (2018), 253–265. https://doi.org/10.1111/cgf.13417

[18]

Florian Heimerl, Christoph Kralj, Torsten Moller, and Michael Gleicher. 2020. embComp: Visual Interactive Comparison of Vector Embeddings. IEEE Transactions on Visualization and Computer Graphics 2626, c(2020), 1–16. https://doi.org/10.1109/TVCG.2020.3045918 arxiv:1911.01542

Digital Library

[19]

Fred Hohman, Minsuk Kahng, Robert Pienta, and Duen Horng Chau. 2019. Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers. IEEE Transactions on Visualization and Computer Graphics 25, 8(2019), 2674–2693. https://doi.org/10.1109/TVCG.2018.2843369 arxiv:1801.06889

Digital Library

[20]

Vladimir Yu Kiselev, Tallulah S Andrews, and Martin Hemberg. 2019. Challenges in unsupervised clustering of single-cell RNA-seq data. Nature Reviews Genetics 20, 5 (2019), 273–282. https://doi.org/10.1038/s41576-018-0088-9

[21]

Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, and Steven Skiena. 2015. Statistically significant detection of linguistic change. WWW 2015 - Proceedings of the 24th International Conference on World Wide Web (2015), 625–635. https://doi.org/10.1145/2736277.2741627 arxiv:1411.3315

Digital Library

[22]

Quan Li, Kristanto Sean Njotoprawiro, Hammad Haleem, Qiaoan Chen, Chris Yi, and Xiaojuan Ma. 2018. EmbeddingVis: A Visual Analytics Approach to Comparative Network Embedding Inspection. 2018 IEEE Conference on Visual Analytics Science and Technology, VAST 2018 - ProceedingsMl(2018), 48–59. https://doi.org/10.1109/VAST.2018.8802454 arxiv:1808.09074

[23]

Wentian Li, Jane E Cerise, Yaning Yang, and Henry Han. 2017. Application of t-SNE to human genetic data. Journal of Bioinformatics and Computational Biology 15, 04 (2017), 1750017. https://doi.org/10.1142/S0219720017500172

[24]

Shusen Liu, Peer Timo Bremer, Jayaraman J. Thiagarajan, Vivek Srikumar, Bei Wang, Yarden Livnat, and Valerio Pascucci. 2018. Visual Exploration of Semantic Relationships in Neural Word Embeddings. IEEE Transactions on Visualization and Computer Graphics 24, 1(2018), 553–562. https://doi.org/10.1109/TVCG.2017.2745141

[25]

Yang Liu, Eunice Jun, Qisheng Li, and Jeffrey Heer. 2019. Latent space cartography: Visual analysis of vector space embeddings. Computer Graphics Forum 38, 3 (2019), 67–78. https://doi.org/10.1111/cgf.13672

[26]

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).

Digital Library

[27]

Sehi L’Yi, Bongkyung Ko, Dong Hwa Shin, Young Joon Cho, Jaeyong Lee, Bohyoung Kim, and Jinwook Seo. 2015. XCluSim: A visual analytics tool for interactively comparing multiple clustering results of bioinformatics data. BMC Bioinformatics 16, 11 (2015), S5. https://doi.org/10.1186/1471-2105-16-S11-S5

[28]

Leland McInnes. 2021. How to use AlignedUMAP. https://umap-learn.readthedocs.io/en/latest/aligned_umap_basic_usage.html

[29]

Ashwin Narayan, Bonnie Berger, and Hyunghoon Cho. 2021. Assessing single-cell transcriptomic variability through density-preserving data visualization. Nature Biotechnology 39, 6 (2021), 765–774. https://doi.org/10.1038/s41587-020-00801-7

[30]

Denis Newman-Griffis, Venkatesh Sivaraman, Adam Perer, Eric Fosler-Lussier, and Harry Hochheiser. 2021. TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. Association for Computational Linguistics, Online, 106–115. https://doi.org/10.18653/v1/2021.naacl-demos.13

[31]

Lan Huong Nguyen and Susan Holmes. 2019. Ten quick tips for effective dimensionality reduction. PLoS Computational Biology 15, 6 (2019), 1–19. https://doi.org/10.1371/journal.pcbi.1006907

[32]

Luis Gustavo Nonato and Michael Aupetit. 2019. Multidimensional Projection for Visual Analytics: Linking Techniques with Distortions, Tasks, and Layout Enrichment. IEEE Transactions on Visualization and Computer Graphics 25, 8(2019), 2650–2673. https://doi.org/10.1109/TVCG.2018.2846735

[33]

Svetlana Ovchinnikova and Simon Anders. 2019. Exploring dimension-reduced embeddings with Sleepwalk. bioRxiv (2019), 749–756. https://doi.org/10.1101/603589

[34]

Paulo Pagliosa, Fernando V. Paulovich, Rosane Minghim, Haim Levkowitz, and Luis Gustavo Nonato. 2015. Projection inspector: Assessment and synthesis of multidimensional projections. Neurocomputing 150, PB (2015), 599–610. https://doi.org/10.1016/j.neucom.2014.07.072

[35]

Nicola Pezzotti, Thomas Höllt, Jan Van Gemert, Boudewijn P.F. Lelieveldt, Elmar Eisemann, and Anna Vilanova. 2018. DeepEyes: Progressive Visual Analytics for Designing Deep Neural Networks. IEEE Transactions on Visualization and Computer Graphics 24, 1(2018), 98–108. https://doi.org/10.1109/TVCG.2017.2744358

[36]

George Robertson, Roland Fernandez, Danyel Fisher, Bongshin Lee, and John Stasko. 2008. Effectiveness of animation in trend visualization. IEEE Transactions on Visualization and Computer Graphics 14, 6(2008), 1325–1332. https://doi.org/10.1109/TVCG.2008.125

Digital Library

[37]

Xin Rong, Joshua Luckson, and Eytan Adar. 2018. LAMVI-2: A Visual Tool for Comparing and Tuning Word Embedding Models. (2018), 1–20. arxiv:1810.11367http://arxiv.org/abs/1810.11367

[38]

Dominik Sacha, Leishi Zhang, Michael Sedlmair, John A. Lee, Jaakko Peltonen, Daniel Weiskopf, Stephen C. North, and Daniel A. Keim. 2017. Visual Interaction with Dimensionality Reduction: A Structured Literature Analysis. IEEE Transactions on Visualization and Computer Graphics 23, 1(2017), 241–250. https://doi.org/10.1109/TVCG.2016.2598495

Digital Library

[39]

Christin Seifert, Vedran Sabol, and Wolfgang Kienreich. 2010. Stress Maps: Analysing Local Phenomena in Dimensionality Reduction Based Visualizations. European Symposium Visual Analytics Science and Technology (EuroVAST) (2010), 13–18. https://doi.org/10.2312/PE/EuroVAST/EuroVAST10/013-018

[40]

Daniel Smilkov, Nikhil Thorat, Charles Nicholson, Emily Reif, Fernanda B. Viégas, and Martin Wattenberg. 2016. Embedding Projector: Interactive Visualization and Interpretation of Embeddings. Nips (2016). arxiv:1611.05469http://arxiv.org/abs/1611.05469

[41]

Julian Stahnke, Marian Dörk, Boris Müller, and Andreas Thom. 2016. Probing Projections: Interaction Techniques for Interpreting Arrangements and Errors of Dimensionality Reductions. IEEE Transactions on Visualization and Computer Graphics 22, 1(2016), 629–638. https://doi.org/10.1109/TVCG.2015.2467717

Digital Library

[42]

Ryan Steed and Aylin Caliskan. 2021. Image representations learned with unsupervised pre-training contain human-like biases. Vol. 1. Association for Computing Machinery. 701–713 pages. https://doi.org/10.1145/3442188.3445932 arxiv:2010.15052

Digital Library

[43]

Martina Toshevska, Frosina Stojanovska, and Jovan Kalajdjieski. 2020. Comparative Analysis of Word Embeddings for Capturing Word Similarities. International Conference on Natural Language Processing (NATP 2020) (2020).

[44]

L J P Van Der Maaten, E O Postma, and H J Van Den Herik. 2009. Dimensionality Reduction: A Comparative Review. Journal of Machine Learning Research 10 (2009), 1–41. https://doi.org/10.1080/13506280444000102

[45]

Frank Van Ham and Adam Perer. 2009. ”Search, show context, expand on demand”: Supporting large graph exploration with degree-of-interest. IEEE Transactions on Visualization and Computer Graphics 15, 6(2009), 953–960. https://doi.org/10.1109/TVCG.2009.108

Digital Library

[46]

Rafael Veras and Christopher Collins. 2019. Saliency Deficit and Motion Outlier Detection in Animated Scatterplots. Conference on Human Factors in Computing Systems - Proceedings (2019), 1–12.

Cited By

Lin DCaba Heilbron FLee JWang OMartelaro N(2024)VideoMap: Supporting Video Exploration, Brainstorming, and Prototyping in the Latent SpaceProceedings of the 16th Conference on Creativity & Cognition10.1145/3635636.3656192(311-327)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3635636.3656192
Wang ZMunechika DLee SChau D(2024)SuperNOVA: Design Strategies and Opportunities for Interactive Visualization in Computational NotebooksExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650848(1-17)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3650848
Scully-Allison CLumsden IWilliams KBartels JTaufer MBrink SBhatele APearce OIsaacs K(2024)Design Concerns for Integrated Scripting and Interactive Visualization in Notebook EnvironmentsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.335456130:9(6572-6585)Online publication date: Sep-2024
https://doi.org/10.1109/TVCG.2024.3354561
Show More Cited By

Index Terms

Emblaze: Illuminating Machine Learning Representations through Interactive Comparison of Embedding Spaces
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
2. Human-centered computing
  1. Human computer interaction (HCI)

Index terms have been assigned to the content through auto-classification.

Recommendations

Locally Minimizing Embedding and Globally Maximizing Variance: Unsupervised Linear Difference Projection for Dimensionality Reduction

Recently, many dimensionality reduction algorithms, including local methods and global methods, have been presented. The representative local linear methods are locally linear embedding (LLE) and linear preserving projections (LPP), which seek to find ...
Nonlinear Dimensionality Reduction by Topologically Constrained Isometric Embedding

Many manifold learning procedures try to embed a given feature data into a flat space of low dimensionality while preserving as much as possible the metric in the natural feature space. The embedding process usually relies on distances between ...
Constrained discriminant neighborhood embedding for high dimensional data feature extraction

When handling pattern classification problem such as face recognition and digital handwriting identification, image data is always represented to high dimensional vectors, from which discriminant features are extracted using dimensionality reduction ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IUI '22: Proceedings of the 27th International Conference on Intelligent User Interfaces

March 2022

888 pages

ISBN:9781450391443

DOI:10.1145/3490099

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 March 2022

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

IUI '22

Sponsor:

IUI '22: 27th International Conference on Intelligent User Interfaces

March 22 - 25, 2022

Helsinki, Finland

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Sponsor:
sigai
sigai

30th International Conference on Intelligent User Interfaces

March 24 - 27, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
819
Total Downloads

Downloads (Last 12 months)303
Downloads (Last 6 weeks)28

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lin DCaba Heilbron FLee JWang OMartelaro N(2024)VideoMap: Supporting Video Exploration, Brainstorming, and Prototyping in the Latent SpaceProceedings of the 16th Conference on Creativity & Cognition10.1145/3635636.3656192(311-327)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3635636.3656192
Wang ZMunechika DLee SChau D(2024)SuperNOVA: Design Strategies and Opportunities for Interactive Visualization in Computational NotebooksExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650848(1-17)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3650848
Scully-Allison CLumsden IWilliams KBartels JTaufer MBrink SBhatele APearce OIsaacs K(2024)Design Concerns for Integrated Scripting and Interactive Visualization in Notebook EnvironmentsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.335456130:9(6572-6585)Online publication date: Sep-2024
https://doi.org/10.1109/TVCG.2024.3354561
Huang ZWitschard DKucher KKerren A(2023)VA + Embeddings STAR: A State‐of‐the‐Art Report on the Use of Embeddings in Visual AnalyticsComputer Graphics Forum10.1111/cgf.1485942:3(539-571)Online publication date: 27-Jun-2023
https://doi.org/10.1111/cgf.14859
Sevastjanova RVogelbacher SSpitz AKeim DEl-Assady M(2023)Visual Comparison of Text Sequences Generated by Large Language Models2023 IEEE Visualization in Data Science (VDS)10.1109/VDS60365.2023.00007(11-20)Online publication date: 15-Oct-2023
https://doi.org/10.1109/VDS60365.2023.00007
Yeh CChen YWu AChen CViégas FWattenberg M(2023)AttentionViz: A Global View of Transformer AttentionIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332716330:1(262-272)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1109/TVCG.2023.3327163
Cheng FKeller MQu HGehlenborg NWang Q(2023)Polyphony: an Interactive Transfer Learning Framework for Single-Cell Data AnalysisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.320940829:1(591-601)Online publication date: Jan-2023
https://doi.org/10.1109/TVCG.2022.3209408
Rissaki AScarone BLiu DPandey AKlein BEliassi-Rad TBorkin M(2022)BiaScope: Visual Unfairness Diagnosis for Graph Embeddings2022 IEEE Visualization in Data Science (VDS)10.1109/VDS57266.2022.00008(27-36)Online publication date: Oct-2022
https://doi.org/10.1109/VDS57266.2022.00008
Sevastjanova RCakmak ERavfogel SCotterell REl-Assady M(2022)Visual Comparison of Language Model AdaptationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.3209458(1-11)Online publication date: 2022
https://doi.org/10.1109/TVCG.2022.3209458
Huang JMishra AKwon BBryan C(2022)ConceptExplainer: Interactive Explanation for Deep Neural Networks from a Concept PerspectiveIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.3209384(1-11)Online publication date: 2022
https://doi.org/10.1109/TVCG.2022.3209384

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents