Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3490099.3511137acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article
Open access

Emblaze: Illuminating Machine Learning Representations through Interactive Comparison of Embedding Spaces

Published: 22 March 2022 Publication History

Abstract

Modern machine learning techniques commonly rely on complex, high-dimensional embedding representations to capture underlying structure in the data and improve performance. In order to characterize model flaws and choose a desirable representation, model builders often need to compare across multiple embedding spaces, a challenging analytical task supported by few existing tools. We first interviewed nine embedding experts in a variety of fields to characterize the diverse challenges they face and techniques they use when analyzing embedding spaces. Informed by these perspectives, we developed a novel system called Emblaze that integrates embedding space comparison within a computational notebook environment. Emblaze uses an animated, interactive scatter plot with a novel Star Trail augmentation to enable visual comparison. It also employs novel neighborhood analysis and clustering procedures to dynamically suggest groups of points with interesting changes between spaces. Through a series of case studies with ML experts, we demonstrate how interactive comparison with Emblaze can help gain new insights into embedding space structure.

References

[1]
Dustin L. Arendt, Nasheen Nur, Zhuanyi Huang, Gabriel Fair, and Wenwen Dou. 2020. Parallel embeddings: A visualization technique for contrasting learned representations. International Conference on Intelligent User Interfaces, Proceedings IUI (2020), 259–274. https://doi.org/10.1145/3377325.3377514
[2]
Michaël Aupetit. 2007. Visualizing distortions and recovering topology in continuous projection techniques. Neurocomputing 70, 7-9 (2007), 1304–1330. https://doi.org/10.1016/j.neucom.2006.11.018
[3]
Etienne Becht, Leland McInnes, John Healy, Charles Antoine Dutertre, Immanuel W.H. Kwok, Lai Guan Ng, Florent Ginhoux, and Evan W. Newell. 2019. Dimensionality reduction for visualizing single-cell data using UMAP. Nature Biotechnology 37, 1 (2019), 38–47. https://doi.org/10.1038/nbt.4314
[4]
Angie Boggust, Brandon Carter, and Arvind Satyanarayan. 2019. Embedding comparator: Visualizing differences in global structure and local neighborhoods via small multiples. arXiv (2019). arxiv:1912.04853
[5]
Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. 30th Conference on Neural Information Processing SystemsNIPS 2016 (2016), 1–9. arxiv:1607.06520https://code.google.com/archive/p/word2vec/
[6]
Tara Chari, Joeyta Banerjee, and Lior Pachter. 2021. The Specious Art of Single-Cell Genomics. BioRxiv (2021), 1–23. https://doi.org/10.1101/2021.08.25.457696
[7]
Juntian Chen, Yubo Tao, and Hai Lin. 2018. Visual exploration and comparison of word embeddings. Journal of Visual Languages and Computing 48, July (2018), 178–186. https://doi.org/10.1016/j.jvlc.2018.08.008
[8]
Sunghyo Chung, Cheonbok Park, Sangho Suh, Kyeongpil Kang, Jaegul Choo, and Bum Chul Kwon. 2016. ReVACNN: Steering Convolutional Neural Network via Real-Time Visual Analytics. KDD Workshop on Interactive Data Exploration and AnalyticsNips (2016), 1–8. http://www.filmnips.com/wp-content/uploads/2016/11/FILM-NIPS2016_paper_32.pdf
[9]
Paulo Cortez, António Cerdeira, Fernando Almeida, Telmo Matos, and José Reis. 2009. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems 47, 4 (2009), 547–553. https://doi.org/10.1016/j.dss.2009.05.016
[10]
Rene Cutura, Michaël Aupetit, Jean Daniel Fekete, and Michael Sedlmair. 2020. Comparing and Exploring High-Dimensional Data with Dimensionality Reduction Algorithms and Matrix Visualizations. ACM International Conference Proceeding Series (2020). https://doi.org/10.1145/3399715.3399875
[11]
Jiarui Ding, Anne Condon, and Sohrab P Shah. 2018. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models. Nature Communications 9, 1 (2018), 2002. https://doi.org/10.1038/s41467-018-04368-5
[12]
Niklas Elmqvist, Pierre Dragicevic, and Jean Daniel Fekete. 2008. Rolling the dice: Multidimensional visual exploration using scatterplot matrix navigation. IEEE Transactions on Visualization and Computer Graphics 14, 6(2008), 1141–1148. https://doi.org/10.1109/TVCG.2008.153
[13]
Takanori Fujiwara, Xinhai Wei, Jian Zhao, and Kwan-Liu Ma. 2021. Interactive Dimensionality Reduction for Comparative Analysis. (2021). arxiv:2106.15481http://arxiv.org/abs/2106.15481
[14]
Michael Gleicher, Danielle Albers, Rick Walker, Ilir Jusufi, Charles D. Hansen, and Jonathan C. Roberts. 2011. Visual comparison for information visualization. Information Visualization 10, 4 (2011), 289–309. https://doi.org/10.1177/1473871611416549
[15]
William L. Hamilton, Jure Leskovec, and Dan Jurafsky. 2016. Diachronic word embeddings reveal statistical laws of semantic change. 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers 3(2016), 1489–1501. https://doi.org/10.18653/v1/p16-1141 arxiv:1605.09096
[16]
Jeffrey Heer and George G. Robertson. 2007. Animated transitions in statistical data graphics. IEEE Transactions on Visualization and Computer Graphics 13, 6(2007), 1240–1247. https://doi.org/10.1109/TVCG.2007.70539
[17]
F. Heimerl and M. Gleicher. 2018. Interactive Analysis of Word Vector Embeddings. Computer Graphics Forum 37, 3 (2018), 253–265. https://doi.org/10.1111/cgf.13417
[18]
Florian Heimerl, Christoph Kralj, Torsten Moller, and Michael Gleicher. 2020. embComp: Visual Interactive Comparison of Vector Embeddings. IEEE Transactions on Visualization and Computer Graphics 2626, c(2020), 1–16. https://doi.org/10.1109/TVCG.2020.3045918 arxiv:1911.01542
[19]
Fred Hohman, Minsuk Kahng, Robert Pienta, and Duen Horng Chau. 2019. Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers. IEEE Transactions on Visualization and Computer Graphics 25, 8(2019), 2674–2693. https://doi.org/10.1109/TVCG.2018.2843369 arxiv:1801.06889
[20]
Vladimir Yu Kiselev, Tallulah S Andrews, and Martin Hemberg. 2019. Challenges in unsupervised clustering of single-cell RNA-seq data. Nature Reviews Genetics 20, 5 (2019), 273–282. https://doi.org/10.1038/s41576-018-0088-9
[21]
Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, and Steven Skiena. 2015. Statistically significant detection of linguistic change. WWW 2015 - Proceedings of the 24th International Conference on World Wide Web (2015), 625–635. https://doi.org/10.1145/2736277.2741627 arxiv:1411.3315
[22]
Quan Li, Kristanto Sean Njotoprawiro, Hammad Haleem, Qiaoan Chen, Chris Yi, and Xiaojuan Ma. 2018. EmbeddingVis: A Visual Analytics Approach to Comparative Network Embedding Inspection. 2018 IEEE Conference on Visual Analytics Science and Technology, VAST 2018 - ProceedingsMl(2018), 48–59. https://doi.org/10.1109/VAST.2018.8802454 arxiv:1808.09074
[23]
Wentian Li, Jane E Cerise, Yaning Yang, and Henry Han. 2017. Application of t-SNE to human genetic data. Journal of Bioinformatics and Computational Biology 15, 04 (2017), 1750017. https://doi.org/10.1142/S0219720017500172
[24]
Shusen Liu, Peer Timo Bremer, Jayaraman J. Thiagarajan, Vivek Srikumar, Bei Wang, Yarden Livnat, and Valerio Pascucci. 2018. Visual Exploration of Semantic Relationships in Neural Word Embeddings. IEEE Transactions on Visualization and Computer Graphics 24, 1(2018), 553–562. https://doi.org/10.1109/TVCG.2017.2745141
[25]
Yang Liu, Eunice Jun, Qisheng Li, and Jeffrey Heer. 2019. Latent space cartography: Visual analysis of vector space embeddings. Computer Graphics Forum 38, 3 (2019), 67–78. https://doi.org/10.1111/cgf.13672
[26]
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).
[27]
Sehi L’Yi, Bongkyung Ko, Dong Hwa Shin, Young Joon Cho, Jaeyong Lee, Bohyoung Kim, and Jinwook Seo. 2015. XCluSim: A visual analytics tool for interactively comparing multiple clustering results of bioinformatics data. BMC Bioinformatics 16, 11 (2015), S5. https://doi.org/10.1186/1471-2105-16-S11-S5
[28]
Leland McInnes. 2021. How to use AlignedUMAP. https://umap-learn.readthedocs.io/en/latest/aligned_umap_basic_usage.html
[29]
Ashwin Narayan, Bonnie Berger, and Hyunghoon Cho. 2021. Assessing single-cell transcriptomic variability through density-preserving data visualization. Nature Biotechnology 39, 6 (2021), 765–774. https://doi.org/10.1038/s41587-020-00801-7
[30]
Denis Newman-Griffis, Venkatesh Sivaraman, Adam Perer, Eric Fosler-Lussier, and Harry Hochheiser. 2021. TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. Association for Computational Linguistics, Online, 106–115. https://doi.org/10.18653/v1/2021.naacl-demos.13
[31]
Lan Huong Nguyen and Susan Holmes. 2019. Ten quick tips for effective dimensionality reduction. PLoS Computational Biology 15, 6 (2019), 1–19. https://doi.org/10.1371/journal.pcbi.1006907
[32]
Luis Gustavo Nonato and Michael Aupetit. 2019. Multidimensional Projection for Visual Analytics: Linking Techniques with Distortions, Tasks, and Layout Enrichment. IEEE Transactions on Visualization and Computer Graphics 25, 8(2019), 2650–2673. https://doi.org/10.1109/TVCG.2018.2846735
[33]
Svetlana Ovchinnikova and Simon Anders. 2019. Exploring dimension-reduced embeddings with Sleepwalk. bioRxiv (2019), 749–756. https://doi.org/10.1101/603589
[34]
Paulo Pagliosa, Fernando V. Paulovich, Rosane Minghim, Haim Levkowitz, and Luis Gustavo Nonato. 2015. Projection inspector: Assessment and synthesis of multidimensional projections. Neurocomputing 150, PB (2015), 599–610. https://doi.org/10.1016/j.neucom.2014.07.072
[35]
Nicola Pezzotti, Thomas Höllt, Jan Van Gemert, Boudewijn P.F. Lelieveldt, Elmar Eisemann, and Anna Vilanova. 2018. DeepEyes: Progressive Visual Analytics for Designing Deep Neural Networks. IEEE Transactions on Visualization and Computer Graphics 24, 1(2018), 98–108. https://doi.org/10.1109/TVCG.2017.2744358
[36]
George Robertson, Roland Fernandez, Danyel Fisher, Bongshin Lee, and John Stasko. 2008. Effectiveness of animation in trend visualization. IEEE Transactions on Visualization and Computer Graphics 14, 6(2008), 1325–1332. https://doi.org/10.1109/TVCG.2008.125
[37]
Xin Rong, Joshua Luckson, and Eytan Adar. 2018. LAMVI-2: A Visual Tool for Comparing and Tuning Word Embedding Models. (2018), 1–20. arxiv:1810.11367http://arxiv.org/abs/1810.11367
[38]
Dominik Sacha, Leishi Zhang, Michael Sedlmair, John A. Lee, Jaakko Peltonen, Daniel Weiskopf, Stephen C. North, and Daniel A. Keim. 2017. Visual Interaction with Dimensionality Reduction: A Structured Literature Analysis. IEEE Transactions on Visualization and Computer Graphics 23, 1(2017), 241–250. https://doi.org/10.1109/TVCG.2016.2598495
[39]
Christin Seifert, Vedran Sabol, and Wolfgang Kienreich. 2010. Stress Maps: Analysing Local Phenomena in Dimensionality Reduction Based Visualizations. European Symposium Visual Analytics Science and Technology (EuroVAST) (2010), 13–18. https://doi.org/10.2312/PE/EuroVAST/EuroVAST10/013-018
[40]
Daniel Smilkov, Nikhil Thorat, Charles Nicholson, Emily Reif, Fernanda B. Viégas, and Martin Wattenberg. 2016. Embedding Projector: Interactive Visualization and Interpretation of Embeddings. Nips (2016). arxiv:1611.05469http://arxiv.org/abs/1611.05469
[41]
Julian Stahnke, Marian Dörk, Boris Müller, and Andreas Thom. 2016. Probing Projections: Interaction Techniques for Interpreting Arrangements and Errors of Dimensionality Reductions. IEEE Transactions on Visualization and Computer Graphics 22, 1(2016), 629–638. https://doi.org/10.1109/TVCG.2015.2467717
[42]
Ryan Steed and Aylin Caliskan. 2021. Image representations learned with unsupervised pre-training contain human-like biases. Vol. 1. Association for Computing Machinery. 701–713 pages. https://doi.org/10.1145/3442188.3445932 arxiv:2010.15052
[43]
Martina Toshevska, Frosina Stojanovska, and Jovan Kalajdjieski. 2020. Comparative Analysis of Word Embeddings for Capturing Word Similarities. International Conference on Natural Language Processing (NATP 2020) (2020).
[44]
L J P Van Der Maaten, E O Postma, and H J Van Den Herik. 2009. Dimensionality Reduction: A Comparative Review. Journal of Machine Learning Research 10 (2009), 1–41. https://doi.org/10.1080/13506280444000102
[45]
Frank Van Ham and Adam Perer. 2009. ”Search, show context, expand on demand”: Supporting large graph exploration with degree-of-interest. IEEE Transactions on Visualization and Computer Graphics 15, 6(2009), 953–960. https://doi.org/10.1109/TVCG.2009.108
[46]
Rafael Veras and Christopher Collins. 2019. Saliency Deficit and Motion Outlier Detection in Animated Scatterplots. Conference on Human Factors in Computing Systems - Proceedings (2019), 1–12.

Cited By

View all
  • (2024)VideoMap: Supporting Video Exploration, Brainstorming, and Prototyping in the Latent SpaceProceedings of the 16th Conference on Creativity & Cognition10.1145/3635636.3656192(311-327)Online publication date: 23-Jun-2024
  • (2024)SuperNOVA: Design Strategies and Opportunities for Interactive Visualization in Computational NotebooksExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650848(1-17)Online publication date: 11-May-2024
  • (2024)Design Concerns for Integrated Scripting and Interactive Visualization in Notebook EnvironmentsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.335456130:9(6572-6585)Online publication date: Sep-2024
  • Show More Cited By

Index Terms

  1. Emblaze: Illuminating Machine Learning Representations through Interactive Comparison of Embedding Spaces
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      IUI '22: Proceedings of the 27th International Conference on Intelligent User Interfaces
      March 2022
      888 pages
      ISBN:9781450391443
      DOI:10.1145/3490099
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 March 2022

      Check for updates

      Author Tags

      1. animation
      2. dimensionality reduction
      3. embedding space comparison
      4. machine learning
      5. visualization

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      IUI '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 746 of 2,811 submissions, 27%

      Upcoming Conference

      IUI '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)303
      • Downloads (Last 6 weeks)28
      Reflects downloads up to 01 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)VideoMap: Supporting Video Exploration, Brainstorming, and Prototyping in the Latent SpaceProceedings of the 16th Conference on Creativity & Cognition10.1145/3635636.3656192(311-327)Online publication date: 23-Jun-2024
      • (2024)SuperNOVA: Design Strategies and Opportunities for Interactive Visualization in Computational NotebooksExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650848(1-17)Online publication date: 11-May-2024
      • (2024)Design Concerns for Integrated Scripting and Interactive Visualization in Notebook EnvironmentsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.335456130:9(6572-6585)Online publication date: Sep-2024
      • (2023)VA + Embeddings STAR: A State‐of‐the‐Art Report on the Use of Embeddings in Visual AnalyticsComputer Graphics Forum10.1111/cgf.1485942:3(539-571)Online publication date: 27-Jun-2023
      • (2023)Visual Comparison of Text Sequences Generated by Large Language Models2023 IEEE Visualization in Data Science (VDS)10.1109/VDS60365.2023.00007(11-20)Online publication date: 15-Oct-2023
      • (2023)AttentionViz: A Global View of Transformer AttentionIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332716330:1(262-272)Online publication date: 26-Oct-2023
      • (2023)Polyphony: an Interactive Transfer Learning Framework for Single-Cell Data AnalysisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.320940829:1(591-601)Online publication date: Jan-2023
      • (2022)BiaScope: Visual Unfairness Diagnosis for Graph Embeddings2022 IEEE Visualization in Data Science (VDS)10.1109/VDS57266.2022.00008(27-36)Online publication date: Oct-2022
      • (2022)Visual Comparison of Language Model AdaptationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.3209458(1-11)Online publication date: 2022
      • (2022)ConceptExplainer: Interactive Explanation for Deep Neural Networks from a Concept PerspectiveIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.3209384(1-11)Online publication date: 2022

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media