Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Interpretable Embedding and Visualization of Compressed Data

Published: 20 February 2023 Publication History

Abstract

Traditional embedding methodologies, also known as dimensionality reduction techniques, assume the availability of exact pairwise distances between the high-dimensional objects that will be embedded in a lower dimensionality. In this article, we propose an embedding that overcomes this limitation and can operate on pairwise distances that are represented as a range of lower and upper bounds. Such bounds are typically estimated when objects are compressed in a lossy manner, so our approach is highly applicable in the case of big compressed datasets. Our methodology can preserve multiple aspects of the original data relationships: distances, correlations, and object scores/ranks, whereas existing techniques typically preserve only distances. Comparative experiments with prevalent embedding methodologies (ISOMAP, t-SNE, MDS, UMAP) illustrate that our approach can provide fidelitous preservation of multiple object relationships, even in the presence of inexact distance information. Our visualization method is also easily interpretable.

References

[1]
Elnaz Barshan, Ali Ghodsi, Zohreh Azimifar, and Mansoor Zolghadri Jahromi. 2011. Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds. Pattern Recognition 44, 7 (2011), 1357–1371.
[2]
Heinz H. Bauschke and Patrick L. Combettes. 2011. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Vol. 408. Springer.
[3]
Mikhail Belkin and Partha Niyogi. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computing 15, 6 (2003), 1373–1396.
[4]
Dimitri P. Bertsekas. 1999. Nonlinear Programming. Athena Scientific.
[5]
Ella Bingham and Heikki Mannila. 2001. Random projection in dimensionality reduction: Applications to image and text data. In Proceedings of the 7th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 245–250.
[6]
Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press.
[7]
José Camacho. 2014. Visualizing big data with compressed score plots: Approach and research challenges. Chemometrics and Intelligent Laboratory Systems 135 (2014), 110–125.
[8]
Emmanuel Candes, Justin Romberg, and Terence Tao. 2006. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 59, 8 (2006), 1207–1223.
[9]
Yale Chang, Junxiang Chen, Michael H. Cho, Peter J. Castaldi, Edwin K. Silverman, and Jennifer G. Dy. 2017. Clustering with domain-specific usefulness scores. In Proceedings of the SIAM International Conference on Data Mining. 207–215.
[10]
Trevor F. Cox and Michael A. A. Cox. 2000. Multidimensional Scaling. Chapman and Hall/CRC.
[11]
John P. Cunningham and Zoubin Ghahramani. 2015. Linear dimensionality reduction: survey, insights, and generalizations. Journal of Machine Learning Research 16, 89 (2015), 2859–2900.
[12]
Jiarui Ding, Anne Condon, and Sohrab P. Shah. 2018. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models. Nature Communications 9, 1 (21 May2018), 2002.
[13]
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml.
[14]
Nikolaos M. Freris, Michalis Vlachos, and Ahmad Ajalloeian. 2020. An interpretable data embedding under uncertain distance information. In Proceedings of the 20th IEEE International Conference on Data Mining. 1022–1027.
[15]
Kelum Gajamannage, Randy Paffenroth, and Erik M. Bollt. 2019. A nonlinear dimensionality reduction framework using smooth geodesics. Pattern Recognition 87 (2019), 226–236.
[16]
Shlomo Hoory, Nathan Linial, and Avi Wigderson. 2006. Expander graphs and their applications. Bulletin of the American Mathematical Society 43, 4 (2006), 439–561.
[17]
Mahdokht Masaeli, Glenn Fung, and Jennifer G. Dy. 2010. From transformation-based dimensionality reduction to feature selection. In Proceedings of the International Conference on Machine Learning. 751–758.
[18]
Leland McInnes, John Healy, Nathaniel Saul, and Lukas Großberger. 2018. UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software 3, 29 (2018), 861.
[19]
Carl D. Meyer. 2000. Matrix Analysis and Applied Linear Algebra. SIAM.
[20]
Yurii Nesterov. 2013. Introductory Lectures on Convex Optimization: A Basic Course. Springer Science & Business Media.
[21]
Johan Paratte, Nathanaël Perraudin, and Pierre Vandergheynst. 2017. Compressive embedding and visualization using graphs. arXiv:1702.05815. Retrieved from https://arxiv.org/abs/1702.05815.
[22]
Nikolaos Passalis and Anastasios Tefas. 2018. Dimensionality reduction using similarity-induced embeddings. IEEE Transactions on Neural Networks Learning Systems 29, 8 (2018), 3429–3441.
[23]
Karl Pearson. 1901. LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2, 11 (1901), 559–572.
[24]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, and J. Vanderplas. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[25]
R. Tyrrell Rockafellar and Roger J.-B. Wets. 2009. Variational Analysis. Springer Science & Business Media.
[26]
Sam T. Roweis and Lawrence K. Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 5500 (2000), 2323–2326.
[27]
Bernhard Schölkopf, Alexander J. Smola, and Klaus-Robert Müller. 1998. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computing 10, 5 (1998), 1299–1319.
[28]
Jian Tang, Jingzhou Liu, Ming Zhang, and Qiaozhu Mei. 2016. Visualizing large-scale and high-dimensional data. In Proceedings of the 25th International Conference on World Wide Web. 287–297.
[29]
Joshua B. Tenenbaum, Vin De Silva, and John C. Langford. 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290, 5500 (2000), 2319–2323.
[30]
Laurens van der Maaten. 2009. Learning a parametric embedding by preserving local structure. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Vol. 5. 384–391.
[31]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9 (2008), 2579–2605.
[32]
Michail Vlachos, Nikolaos Freris, and Anastasios Kyrillidis. 2015. Compressive mining: Fast and optimal data mining in the compressed domain. The International Journal on Very Large Data Bases 24, 1 (2015), 1–24.
[33]
Arnold D. Well and Jerome L. Myers. 2003. Research Design & Statistical Analysis. Psychology Press.
[34]
Hujun Yin. 2007. Nonlinear dimensionality reduction and data visualization: A review. International Journal of Automation and Computing 4, 3 (2007), 294–303.
[35]
X. Zheng and K. Ng. 2014. Metric subregularity of piecewise linear multifunctions and applications to piecewise linear multiobjective optimization. SIAM Journal on Optimization 24, 1 (2014), 154–174.
[36]
Shlomo Zilberstein. 1996. Using anytime algorithms in intelligent systems. AI Magazine 17, 3 (1996), 73–73.

Cited By

View all
  • (2023)Laplacian-based Cluster-Contractive t-SNE for High-Dimensional Data VisualizationACM Transactions on Knowledge Discovery from Data10.1145/361293218:1(1-22)Online publication date: 6-Sep-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 17, Issue 2
February 2023
355 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3572847
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2023
Online AM: 18 May 2022
Accepted: 08 May 2022
Received: 30 January 2022
Published in TKDD Volume 17, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Dimensionality reduction
  2. data embedding
  3. compressed data

Qualifiers

  • Research-article

Funding Sources

  • Ministry of Science and Technology of China
  • Anhui Dept. of Science and Technology
  • Toward Interpretable Machine Learning

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)113
  • Downloads (Last 6 weeks)2
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Laplacian-based Cluster-Contractive t-SNE for High-Dimensional Data VisualizationACM Transactions on Knowledge Discovery from Data10.1145/361293218:1(1-22)Online publication date: 6-Sep-2023

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media