Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3491102.3501823acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Open access

Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels

Published: 29 April 2022 Publication History
  • Get Citation Alerts
  • Abstract

    The confusion matrix, a ubiquitous visualization for helping people evaluate machine learning models, is a tabular layout that compares predicted class labels against actual class labels over all data instances. We conduct formative research with machine learning practitioners at Apple and find that conventional confusion matrices do not support more complex data-structures found in modern-day applications, such as hierarchical and multi-output labels. To express such variations of confusion matrices, we design an algebra that models confusion matrices as probability distributions. Based on this algebra, we develop Neo, a visual analytics system that enables practitioners to flexibly author and interact with hierarchical and multi-output confusion matrices, visualize derived metrics, renormalize confusions, and share matrix specifications. Finally, we demonstrate Neo’s utility with three model evaluation scenarios that help people better understand model performance and reveal hidden confusions.

    Supplementary Material

    MP4 File (3491102.3501823-video-preview.mp4)
    Video Preview
    MP4 File (3491102.3501823-talk-video.mp4)
    Talk Video

    References

    [1]
    Bilal Alsallakh, Allan Hanbury, Helwig Hauser, Silvia Miksch, and Andreas Rauber. 2014. Visual methods for analyzing probabilistic classification data. IEEE Transactions on Visualization and Computer Graphics (2014). https://doi.org/10.1109/tvcg.2014.2346660
    [2]
    Bilal Alsallakh, Amin Jourabloo, Mao Ye, Xiaoming Liu, and Liu Ren. 2017. Do convolutional neural networks learn class hierarchy?IEEE Transactions on Visualization and Computer Graphics (2017). https://doi.org/10.1109/tvcg.2017.2744683
    [3]
    Bilal Alsallakh, Zhixin Yan, Shabnam Ghaffarzadegan, Zeng Dai, and Liu Ren. 2020. Visualizing classification structure of large-scale classifiers. In ICML Workshop on Human Interpretability in Machine Learning.
    [4]
    Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE. https://doi.org/10.1109/icse-seip.2019.00042
    [5]
    Saleema Amershi, Max Chickering, Steven M Drucker, Bongshin Lee, Patrice Simard, and Jina Suh. 2015. Modeltracker: Redesigning performance analysis tools for machine learning. In Proceedings of the CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/2702123.2702509
    [6]
    Jacques Bertin. 1983. Semiology of graphics. University of Wisconsin Press.
    [7]
    Daniel Bruckner. 2014. ML-o-Scope: A diagnostic visualization system for deep machine learning pipelines. Technical Report. https://doi.org/10.21236/ada605112
    [8]
    Olivier Caelen. 2017. A Bayesian interpretation of the confusion matrix. Annals of Mathematics and Artificial Intelligence (2017). https://doi.org/10.1007/s10472-017-9564-8
    [9]
    Stuart K Card, Jock D Mackinlay, and Ben Shneiderman. 1999. Readings in information visualization: Using vision to think. Morgan Kaufmann Publishers Inc.
    [10]
    Edgar Frank Codd. 1970. A relational model of data for large shared data banks. Commun. ACM (1970).
    [11]
    Graham R Gibbs. 2007. Thematic coding and categorizing. Analyzing Qualitative Data(2007). https://doi.org/10.4135/9781849208574.n4
    [12]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity mappings in deep residual networks. In European Conference on Computer Vision. Springer. https://doi.org/10.1007/978-3-319-46493-0_38
    [13]
    A. Hinterreiter, P. Ruch, H. Stitz, M. Ennemoser, J. Bernard, H. Strobelt, and M. Streit. 2020. ConfusionFlow: A model-agnostic visualization for temporal analysis of classifier confusion. IEEE Transactions on Visualization and Computer Graphics (2020). https://doi.org/10.1109/tvcg.2020.3012063
    [14]
    Robert Hogg and Elliot Tanis. 2020. Probability and statistical inference. Pearson.
    [15]
    Fred Hohman, Kanit Wongsuphasawat, Mary Beth Kery, and Kayur Patel. 2020. Understanding and visualizing data iteration in machine learning. In Proceedings of the CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3313831.3376177
    [16]
    Mohammad Hossin and M. N. Sulaiman. 2015. A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process (2015). https://doi.org/10.5121/ijdkp.2015.5201
    [17]
    Jigsaw. 2017. Toxic comment classification challenge. Kaggle (2017).
    [18]
    Minsuk Kahng, Pierre Y Andrews, Aditya Kalro, and Duen Horng Chau. 2017. ActiVis: Visual exploration of industry-scale deep neural network models. IEEE Transactions on Visualization and Computer Graphics (2017). https://doi.org/10.1109/tvcg.2017.2744718
    [19]
    Sean Kandel, Ravi Parikh, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. 2012. Profiler: Integrated statistical analysis and visualization for data quality assessment. In Proceedings of the International Working Conference on Advanced Visual Interfaces. https://doi.org/10.1145/2254556.2254659
    [20]
    Ashish Kapoor, Bongshin Lee, Desney Tan, and Eric Horvitz. 2010. Interactive optimization for steering machine classification. In Proceedings of the CHI Conference on Human Factors in Computing Systems(CHI ’10). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1753326.1753529
    [21]
    Mary Beth Kery, Donghao Ren, Fred Hohman, Dominik Moritz, Kanit Wongsuphasawat, and Kayur Patel. 2020. mage: Fluid moves between code and graphical work in computational notebooks. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. https://doi.org/10.1145/3379337.3415842
    [22]
    Damir Krstinić, Maja Braović, Ljiljana Šerić, and Dunja Božić-Štulić. 2020. Multi-label classifier performance evaluation with confusion matrix. In Computer Science & Information Technology. AIRCC Publishing Corporation. https://doi.org/10.5121/csit.2020.100801
    [23]
    George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM (1995). https://doi.org/10.1145/219717.219748
    [24]
    Kayur Patel, Naomi Bancroft, Steven M Drucker, James Fogarty, Andrew J Ko, and James Landay. 2010. Gestalt: Integrated support for implementation and analysis in machine learning. In Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology. https://doi.org/10.1145/1866029.1866038
    [25]
    Donghao Ren, Saleema Amershi, Bongshin Lee, Jina Suh, and Jason D Williams. 2016. Squares: Supporting interactive performance analysis for multiclass classifiers. IEEE Transactions on Visualization and Computer Graphics (2016). https://doi.org/10.1109/tvcg.2016.2598828
    [26]
    Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer. 2016. Vega-lite: A grammar of interactive graphics. IEEE Transactions on Visualization and Computer Graphics (2016). https://doi.org/10.31219/osf.io/mqzyx
    [27]
    Christin Seifert and Elisabeth Lex. 2009. A novel visualization approach for data-mining-related classification. In 2009 13th International Conference Information Visualisation. IEEE. https://doi.org/10.1109/iv.2009.45
    [28]
    Hong Shen, Haojian Jin, Ángel Alexander Cabrera, Adam Perer, Haiyi Zhu, and Jason I Hong. 2020. Designing alternative representations of confusion matrices to support non-expert public understanding of algorithm performance. Proceedings of the ACM on Human-Computer Interaction (2020). https://doi.org/10.1145/3415224
    [29]
    Chris Stolte, Diane Tang, and Pat Hanrahan. 2002. Polaris: A system for query, analysis, and visualization of multidimensional relational databases. IEEE Transactions on Visualization and Computer Graphics (2002). https://doi.org/10.1109/2945.981851
    [30]
    Chris Stolte, Diane Tang, and Pat Hanrahan. 2002. Query, analysis, and visualization of hierarchically structured data using polaris. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. https://doi.org/10.1145/775047.775064
    [31]
    Aixin Sun and Ee-Peng Lim. 2001. Hierarchical text classification and evaluation. In Proceedings 2001 IEEE International Conference on Data Mining. IEEE. https://doi.org/10.1109/ICDM.2001.989560
    [32]
    Robert Susmaga. 2004. Confusion matrix visualization. In Intelligent Information Processing and Web Mining. Springer. https://doi.org/10.1007/978-3-540-39985-8_12
    [33]
    Justin Talbot, Bongshin Lee, Ashish Kapoor, and Desney S. Tan. 2009. EnsembleMatrix: Interactive visualization to support machine learning with multiple classifiers. In Proceedings of the CHI Conference on Human Factors in Computing Systems(CHI ’09). ACM, 10 pages. https://doi.org/10.1145/1518701.1518895
    [34]
    Niklas Tötsch and Daniel Hoffmann. 2021. Classifier uncertainty: Evidence, potential impact, and probabilistic treatment. PeerJ Computer Science(2021). https://doi.org/10.7717/peerj-cs.398
    [35]
    S. van den Elzen and J. J. van Wijk. 2011. BaobabView: Interactive construction and analysis of decision trees. In 2011 IEEE Conference on Visual Analytics Science and Technology. https://doi.org/10.1109/vast.2011.6102453
    [36]
    James Wexler. 2017. Facets: An open source visualization tool for machine learning training data. http://ai.googleblog.com/2017/07/facets-open-source-visualization-tool.html.
    [37]
    Leland Wilkinson and Michael Friendly. 2009. The history of the cluster heat map. The American Statistician(2009). https://doi.org/10.1198/tas.2009.0033
    [38]
    Kanit Wongsuphasawat, Zening Qu, Dominik Moritz, Riley Chang, Felix Ouk, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2017. Voyager 2: Augmenting visual analysis with partial view specifications. In Proceedings of the CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3025453.3025768
    [39]
    Qian Yang, Jina Suh, Nan-Chen Chen, and Gonzalo Ramos. 2018. Grounding interactive machine learning tool design in how non-experts actually build models. In Proceedings of the 2018 Designing Interactive Systems Conference. https://doi.org/10.1145/3196709.3196729
    [40]
    J. Zhang, Y. Wang, P. Molino, L. Li, and D. S. Ebert. 2019. Manifold: A model-agnostic framework for interpretation and diagnosis of machine learning models. IEEE Transactions on Visualization and Computer Graphics (2019). https://doi.org/10.1109/tvcg.2018.2864499
    [41]
    Xiaoyi Zhang, Lilian de Greef, Amanda Swearngin, Samuel White, Kyle Murray, Lisa Yu, Qi Shan, Jeffrey Nichols, Jason Wu, Chris Fleizach, Aaron Everitt, and Jeffrey P Bigham. 2021. Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels. Proceedings of the CHI Conference on Human Factors in Computing Systems (2021). https://doi.org/10.1145/3411764.3445186

    Cited By

    View all
    • (2024)Deep-Learning-Based Automatic Sinkhole Recognition: Application to the Eastern Dead SeaRemote Sensing10.3390/rs1613226416:13(2264)Online publication date: 21-Jun-2024
    • (2024)Optical–SAR Data Fusion Based on Simple Layer Stacking and the XGBoost Algorithm to Extract Urban Impervious Surfaces in Global Alpha CitiesRemote Sensing10.3390/rs1605087316:5(873)Online publication date: 1-Mar-2024
    • (2024)Comparison of Random Forest and XGBoost Classifiers Using Integrated Optical and SAR Features for Mapping Urban Impervious SurfaceRemote Sensing10.3390/rs1604066516:4(665)Online publication date: 13-Feb-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems
    April 2022
    10459 pages
    ISBN:9781450391573
    DOI:10.1145/3491102
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 April 2022

    Check for updates

    Badges

    • Best Paper

    Author Tags

    1. Confusion matrices
    2. interactive systems
    3. machine learning
    4. model evaluation
    5. visual analytics

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    CHI '22
    Sponsor:
    CHI '22: CHI Conference on Human Factors in Computing Systems
    April 29 - May 5, 2022
    LA, New Orleans, USA

    Acceptance Rates

    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,375
    • Downloads (Last 6 weeks)135
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Deep-Learning-Based Automatic Sinkhole Recognition: Application to the Eastern Dead SeaRemote Sensing10.3390/rs1613226416:13(2264)Online publication date: 21-Jun-2024
    • (2024)Optical–SAR Data Fusion Based on Simple Layer Stacking and the XGBoost Algorithm to Extract Urban Impervious Surfaces in Global Alpha CitiesRemote Sensing10.3390/rs1605087316:5(873)Online publication date: 1-Mar-2024
    • (2024)Comparison of Random Forest and XGBoost Classifiers Using Integrated Optical and SAR Features for Mapping Urban Impervious SurfaceRemote Sensing10.3390/rs1604066516:4(665)Online publication date: 13-Feb-2024
    • (2024)CAN: Concept‐Aligned Neurons for Visual Comparison of Deep Neural Network ModelsComputer Graphics Forum10.1111/cgf.1508543:3Online publication date: 10-Jun-2024
    • (2024)Gait Analysis through Machine Learning Algorithms2024 International Conference on Advancements in Smart, Secure and Intelligent Computing (ASSIC)10.1109/ASSIC60049.2024.10507902(1-6)Online publication date: 27-Jan-2024
    • (2024)Foundation models meet visualizations: Challenges and opportunitiesComputational Visual Media10.1007/s41095-023-0393-x10:3(399-424)Online publication date: 2-May-2024
    • (2024)Set-based visualization and enhancement of embedding results for heterogeneous multi-label networksJournal of Visualization10.1007/s12650-024-00996-wOnline publication date: 20-May-2024
    • (2024)Relative Confusion Matrix: An Efficient Visualization for the Comparison of Classification ModelsArtificial Intelligence and Visualization: Advancing Visual Knowledge Discovery10.1007/978-3-031-46549-9_7(223-243)Online publication date: 25-Apr-2024
    • (2023)Never mind the metricsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619352(22702-22757)Online publication date: 23-Jul-2023
    • (2023)ClassificationPrinciples and Theories of Data Mining With RapidMiner10.4018/978-1-6684-4730-7.ch005(83-106)Online publication date: 2-Jun-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media