Joint geometric and topological analysis of hierarchical datasets

L Aloni, O Bobrowski, R Talmon - … 2021, Bilbao, Spain, September 13–17 …, 2021 - Springer
Machine Learning and Knowledge Discovery in Databases. Research Track …, 2021Springer
In a world abundant with diverse data arising from complex acquisition techniques, there is a
growing need for new data analysis methods. In this paper we focus on high-dimensional
data that are organized into several hierarchical datasets. We assume that each dataset
consists of complex samples, and every sample has a distinct irregular structure modeled by
a graph. The main novelty in this work lies in the combination of two complementing
powerful data-analytic approaches: topological data analysis (TDA) and geometric manifold …
Abstract
In a world abundant with diverse data arising from complex acquisition techniques, there is a growing need for new data analysis methods. In this paper we focus on high-dimensional data that are organized into several hierarchical datasets. We assume that each dataset consists of complex samples, and every sample has a distinct irregular structure modeled by a graph. The main novelty in this work lies in the combination of two complementing powerful data-analytic approaches: topological data analysis (TDA) and geometric manifold learning. Geometry primarily contains local information, while topology inherently provides global descriptors. Based on this combination, we present a method for building an informative representation of hierarchical datasets. At the finer (sample) level, we devise a new metric between samples based on manifold learning that facilitates quantitative structural analysis. At the coarser (dataset) level, we employ TDA to extract qualitative structural information from the datasets. We showcase the applicability and advantages of our method on simulated data and on a corpus of hyper-spectral images. We show that an ensemble of hyper-spectral images exhibits a hierarchical structure that fits well the considered setting. In addition, we show that our new method gives rise to superior classification results compared to state-of-the-art methods.
Springer