Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3491102.3501823acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Open access

Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels

Published: 29 April 2022 Publication History

Abstract

The confusion matrix, a ubiquitous visualization for helping people evaluate machine learning models, is a tabular layout that compares predicted class labels against actual class labels over all data instances. We conduct formative research with machine learning practitioners at Apple and find that conventional confusion matrices do not support more complex data-structures found in modern-day applications, such as hierarchical and multi-output labels. To express such variations of confusion matrices, we design an algebra that models confusion matrices as probability distributions. Based on this algebra, we develop Neo, a visual analytics system that enables practitioners to flexibly author and interact with hierarchical and multi-output confusion matrices, visualize derived metrics, renormalize confusions, and share matrix specifications. Finally, we demonstrate Neo’s utility with three model evaluation scenarios that help people better understand model performance and reveal hidden confusions.

Supplemental Material

MP4 File
Talk Video
Transcript for: Talk Video

References

[1]
Bilal Alsallakh, Allan Hanbury, Helwig Hauser, Silvia Miksch, and Andreas Rauber. 2014. Visual methods for analyzing probabilistic classification data. IEEE Transactions on Visualization and Computer Graphics (2014). https://doi.org/10.1109/tvcg.2014.2346660
[2]
Bilal Alsallakh, Amin Jourabloo, Mao Ye, Xiaoming Liu, and Liu Ren. 2017. Do convolutional neural networks learn class hierarchy?IEEE Transactions on Visualization and Computer Graphics (2017). https://doi.org/10.1109/tvcg.2017.2744683
[3]
Bilal Alsallakh, Zhixin Yan, Shabnam Ghaffarzadegan, Zeng Dai, and Liu Ren. 2020. Visualizing classification structure of large-scale classifiers. In ICML Workshop on Human Interpretability in Machine Learning.
[4]
Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE. https://doi.org/10.1109/icse-seip.2019.00042
[5]
Saleema Amershi, Max Chickering, Steven M Drucker, Bongshin Lee, Patrice Simard, and Jina Suh. 2015. Modeltracker: Redesigning performance analysis tools for machine learning. In Proceedings of the CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/2702123.2702509
[6]
Jacques Bertin. 1983. Semiology of graphics. University of Wisconsin Press.
[7]
Daniel Bruckner. 2014. ML-o-Scope: A diagnostic visualization system for deep machine learning pipelines. Technical Report. https://doi.org/10.21236/ada605112
[8]
Olivier Caelen. 2017. A Bayesian interpretation of the confusion matrix. Annals of Mathematics and Artificial Intelligence (2017). https://doi.org/10.1007/s10472-017-9564-8
[9]
Stuart K Card, Jock D Mackinlay, and Ben Shneiderman. 1999. Readings in information visualization: Using vision to think. Morgan Kaufmann Publishers Inc.
[10]
Edgar Frank Codd. 1970. A relational model of data for large shared data banks. Commun. ACM (1970).
[11]
Graham R Gibbs. 2007. Thematic coding and categorizing. Analyzing Qualitative Data(2007). https://doi.org/10.4135/9781849208574.n4
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity mappings in deep residual networks. In European Conference on Computer Vision. Springer. https://doi.org/10.1007/978-3-319-46493-0_38
[13]
A. Hinterreiter, P. Ruch, H. Stitz, M. Ennemoser, J. Bernard, H. Strobelt, and M. Streit. 2020. ConfusionFlow: A model-agnostic visualization for temporal analysis of classifier confusion. IEEE Transactions on Visualization and Computer Graphics (2020). https://doi.org/10.1109/tvcg.2020.3012063
[14]
Robert Hogg and Elliot Tanis. 2020. Probability and statistical inference. Pearson.
[15]
Fred Hohman, Kanit Wongsuphasawat, Mary Beth Kery, and Kayur Patel. 2020. Understanding and visualizing data iteration in machine learning. In Proceedings of the CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3313831.3376177
[16]
Mohammad Hossin and M. N. Sulaiman. 2015. A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process (2015). https://doi.org/10.5121/ijdkp.2015.5201
[17]
Jigsaw. 2017. Toxic comment classification challenge. Kaggle (2017).
[18]
Minsuk Kahng, Pierre Y Andrews, Aditya Kalro, and Duen Horng Chau. 2017. ActiVis: Visual exploration of industry-scale deep neural network models. IEEE Transactions on Visualization and Computer Graphics (2017). https://doi.org/10.1109/tvcg.2017.2744718
[19]
Sean Kandel, Ravi Parikh, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. 2012. Profiler: Integrated statistical analysis and visualization for data quality assessment. In Proceedings of the International Working Conference on Advanced Visual Interfaces. https://doi.org/10.1145/2254556.2254659
[20]
Ashish Kapoor, Bongshin Lee, Desney Tan, and Eric Horvitz. 2010. Interactive optimization for steering machine classification. In Proceedings of the CHI Conference on Human Factors in Computing Systems(CHI ’10). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1753326.1753529
[21]
Mary Beth Kery, Donghao Ren, Fred Hohman, Dominik Moritz, Kanit Wongsuphasawat, and Kayur Patel. 2020. mage: Fluid moves between code and graphical work in computational notebooks. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. https://doi.org/10.1145/3379337.3415842
[22]
Damir Krstinić, Maja Braović, Ljiljana Šerić, and Dunja Božić-Štulić. 2020. Multi-label classifier performance evaluation with confusion matrix. In Computer Science & Information Technology. AIRCC Publishing Corporation. https://doi.org/10.5121/csit.2020.100801
[23]
George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM (1995). https://doi.org/10.1145/219717.219748
[24]
Kayur Patel, Naomi Bancroft, Steven M Drucker, James Fogarty, Andrew J Ko, and James Landay. 2010. Gestalt: Integrated support for implementation and analysis in machine learning. In Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology. https://doi.org/10.1145/1866029.1866038
[25]
Donghao Ren, Saleema Amershi, Bongshin Lee, Jina Suh, and Jason D Williams. 2016. Squares: Supporting interactive performance analysis for multiclass classifiers. IEEE Transactions on Visualization and Computer Graphics (2016). https://doi.org/10.1109/tvcg.2016.2598828
[26]
Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer. 2016. Vega-lite: A grammar of interactive graphics. IEEE Transactions on Visualization and Computer Graphics (2016). https://doi.org/10.31219/osf.io/mqzyx
[27]
Christin Seifert and Elisabeth Lex. 2009. A novel visualization approach for data-mining-related classification. In 2009 13th International Conference Information Visualisation. IEEE. https://doi.org/10.1109/iv.2009.45
[28]
Hong Shen, Haojian Jin, Ángel Alexander Cabrera, Adam Perer, Haiyi Zhu, and Jason I Hong. 2020. Designing alternative representations of confusion matrices to support non-expert public understanding of algorithm performance. Proceedings of the ACM on Human-Computer Interaction (2020). https://doi.org/10.1145/3415224
[29]
Chris Stolte, Diane Tang, and Pat Hanrahan. 2002. Polaris: A system for query, analysis, and visualization of multidimensional relational databases. IEEE Transactions on Visualization and Computer Graphics (2002). https://doi.org/10.1109/2945.981851
[30]
Chris Stolte, Diane Tang, and Pat Hanrahan. 2002. Query, analysis, and visualization of hierarchically structured data using polaris. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. https://doi.org/10.1145/775047.775064
[31]
Aixin Sun and Ee-Peng Lim. 2001. Hierarchical text classification and evaluation. In Proceedings 2001 IEEE International Conference on Data Mining. IEEE. https://doi.org/10.1109/ICDM.2001.989560
[32]
Robert Susmaga. 2004. Confusion matrix visualization. In Intelligent Information Processing and Web Mining. Springer. https://doi.org/10.1007/978-3-540-39985-8_12
[33]
Justin Talbot, Bongshin Lee, Ashish Kapoor, and Desney S. Tan. 2009. EnsembleMatrix: Interactive visualization to support machine learning with multiple classifiers. In Proceedings of the CHI Conference on Human Factors in Computing Systems(CHI ’09). ACM, 10 pages. https://doi.org/10.1145/1518701.1518895
[34]
Niklas Tötsch and Daniel Hoffmann. 2021. Classifier uncertainty: Evidence, potential impact, and probabilistic treatment. PeerJ Computer Science(2021). https://doi.org/10.7717/peerj-cs.398
[35]
S. van den Elzen and J. J. van Wijk. 2011. BaobabView: Interactive construction and analysis of decision trees. In 2011 IEEE Conference on Visual Analytics Science and Technology. https://doi.org/10.1109/vast.2011.6102453
[36]
James Wexler. 2017. Facets: An open source visualization tool for machine learning training data. http://ai.googleblog.com/2017/07/facets-open-source-visualization-tool.html.
[37]
Leland Wilkinson and Michael Friendly. 2009. The history of the cluster heat map. The American Statistician(2009). https://doi.org/10.1198/tas.2009.0033
[38]
Kanit Wongsuphasawat, Zening Qu, Dominik Moritz, Riley Chang, Felix Ouk, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2017. Voyager 2: Augmenting visual analysis with partial view specifications. In Proceedings of the CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3025453.3025768
[39]
Qian Yang, Jina Suh, Nan-Chen Chen, and Gonzalo Ramos. 2018. Grounding interactive machine learning tool design in how non-experts actually build models. In Proceedings of the 2018 Designing Interactive Systems Conference. https://doi.org/10.1145/3196709.3196729
[40]
J. Zhang, Y. Wang, P. Molino, L. Li, and D. S. Ebert. 2019. Manifold: A model-agnostic framework for interpretation and diagnosis of machine learning models. IEEE Transactions on Visualization and Computer Graphics (2019). https://doi.org/10.1109/tvcg.2018.2864499
[41]
Xiaoyi Zhang, Lilian de Greef, Amanda Swearngin, Samuel White, Kyle Murray, Lisa Yu, Qi Shan, Jeffrey Nichols, Jason Wu, Chris Fleizach, Aaron Everitt, and Jeffrey P Bigham. 2021. Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels. Proceedings of the CHI Conference on Human Factors in Computing Systems (2021). https://doi.org/10.1145/3411764.3445186

Cited By

View all
  • (2025)ConfusionLens: Focus+Context Visualization Interface for Performance Analysis of Multiclass Image ClassifiersACM Transactions on Interactive Intelligent Systems10.1145/370013915:1(1-21)Online publication date: 10-Jan-2025
  • (2024)Deep-Learning-Based Automatic Sinkhole Recognition: Application to the Eastern Dead SeaRemote Sensing10.3390/rs1613226416:13(2264)Online publication date: 21-Jun-2024
  • (2024)Optical–SAR Data Fusion Based on Simple Layer Stacking and the XGBoost Algorithm to Extract Urban Impervious Surfaces in Global Alpha CitiesRemote Sensing10.3390/rs1605087316:5(873)Online publication date: 1-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems
April 2022
10459 pages
ISBN:9781450391573
DOI:10.1145/3491102
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 April 2022

Check for updates

Badges

  • Best Paper

Author Tags

  1. Confusion matrices
  2. interactive systems
  3. machine learning
  4. model evaluation
  5. visual analytics

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CHI '22
Sponsor:
CHI '22: CHI Conference on Human Factors in Computing Systems
April 29 - May 5, 2022
LA, New Orleans, USA

Acceptance Rates

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025
ACM CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,390
  • Downloads (Last 6 weeks)136
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)ConfusionLens: Focus+Context Visualization Interface for Performance Analysis of Multiclass Image ClassifiersACM Transactions on Interactive Intelligent Systems10.1145/370013915:1(1-21)Online publication date: 10-Jan-2025
  • (2024)Deep-Learning-Based Automatic Sinkhole Recognition: Application to the Eastern Dead SeaRemote Sensing10.3390/rs1613226416:13(2264)Online publication date: 21-Jun-2024
  • (2024)Optical–SAR Data Fusion Based on Simple Layer Stacking and the XGBoost Algorithm to Extract Urban Impervious Surfaces in Global Alpha CitiesRemote Sensing10.3390/rs1605087316:5(873)Online publication date: 1-Mar-2024
  • (2024)Comparison of Random Forest and XGBoost Classifiers Using Integrated Optical and SAR Features for Mapping Urban Impervious SurfaceRemote Sensing10.3390/rs1604066516:4(665)Online publication date: 13-Feb-2024
  • (2024)An Investigation of How Software Developers Read Machine Learning CodeProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686678(165-176)Online publication date: 24-Oct-2024
  • (2024)Enhancing Bangla Fake News Detection Using Bidirectional Gated Recurrent Units and Deep Learning TechniquesProceedings of the 7th International Conference on Networking, Intelligent Systems and Security10.1145/3659677.3659703(1-10)Online publication date: 18-Apr-2024
  • (2024)Talaria: Interactively Optimizing Machine Learning Models for Efficient InferenceProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642628(1-19)Online publication date: 11-May-2024
  • (2024)BI‐LAVA: Biocuration With Hierarchical Image Labelling Through Active Learning and Visual AnalyticsComputer Graphics Forum10.1111/cgf.15261Online publication date: Oct-2024
  • (2024)CAN: Concept‐Aligned Neurons for Visual Comparison of Deep Neural Network ModelsComputer Graphics Forum10.1111/cgf.1508543:3Online publication date: 10-Jun-2024
  • (2024)Confusion Matrix Explainability to Improve Model Performance: Application to Network Intrusion Detection2024 10th International Conference on Control, Decision and Information Technologies (CoDIT)10.1109/CoDIT62066.2024.10708595(1-5)Online publication date: 1-Jul-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media