-
Pyramidal Reservoir Graph Neural Network
Authors:
Filippo Maria Bianchi,
Claudio Gallicchio,
Alessio Micheli
Abstract:
We propose a deep Graph Neural Network (GNN) model that alternates two types of layers. The first type is inspired by Reservoir Computing (RC) and generates new vertex features by iterating a non-linear map until it converges to a fixed point. The second type of layer implements graph pooling operations, that gradually reduce the support graph and the vertex features, and further improve the compu…
▽ More
We propose a deep Graph Neural Network (GNN) model that alternates two types of layers. The first type is inspired by Reservoir Computing (RC) and generates new vertex features by iterating a non-linear map until it converges to a fixed point. The second type of layer implements graph pooling operations, that gradually reduce the support graph and the vertex features, and further improve the computational efficiency of the RC-based GNN. The architecture is, therefore, pyramidal. In the last layer, the features of the remaining vertices are combined into a single vector, which represents the graph embedding. Through a mathematical derivation introduced in this paper, we show formally how graph pooling can reduce the computational complexity of the model and speed-up the convergence of the dynamical updates of the vertex features. Our proposed approach to the design of RC-based GNNs offers an advantageous and principled trade-off between accuracy and complexity, which we extensively demonstrate in experiments on a large set of graph datasets.
△ Less
Submitted 10 April, 2021;
originally announced April 2021.
-
Deep Image Translation with an Affinity-Based Change Prior for Unsupervised Multimodal Change Detection
Authors:
Luigi Tommaso Luppino,
Michael Kampffmeyer,
Filippo Maria Bianchi,
Gabriele Moser,
Sebastiano Bruno Serpico,
Robert Jenssen,
Stian Normann Anfinsen
Abstract:
Image translation with convolutional neural networks has recently been used as an approach to multimodal change detection. Existing approaches train the networks by exploiting supervised information of the change areas, which, however, is not always available. A main challenge in the unsupervised problem setting is to avoid that change pixels affect the learning of the translation function. We pro…
▽ More
Image translation with convolutional neural networks has recently been used as an approach to multimodal change detection. Existing approaches train the networks by exploiting supervised information of the change areas, which, however, is not always available. A main challenge in the unsupervised problem setting is to avoid that change pixels affect the learning of the translation function. We propose two new network architectures trained with loss functions weighted by priors that reduce the impact of change pixels on the learning objective. The change prior is derived in an unsupervised fashion from relational pixel information captured by domain-specific affinity matrices. Specifically, we use the vertex degrees associated with an absolute affinity difference matrix and demonstrate their utility in combination with cycle consistency and adversarial training. The proposed neural networks are compared with state-of-the-art algorithms. Experiments conducted on three real datasets show the effectiveness of our methodology.
△ Less
Submitted 8 March, 2021; v1 submitted 13 January, 2020;
originally announced January 2020.
-
Hierarchical Representation Learning in Graph Neural Networks with Node Decimation Pooling
Authors:
Filippo Maria Bianchi,
Daniele Grattarola,
Lorenzo Livi,
Cesare Alippi
Abstract:
In graph neural networks (GNNs), pooling operators compute local summaries of input graphs to capture their global properties, and they are fundamental for building deep GNNs that learn hierarchical representations. In this work, we propose the Node Decimation Pooling (NDP), a pooling operator for GNNs that generates coarser graphs while preserving the overall graph topology. During training, the…
▽ More
In graph neural networks (GNNs), pooling operators compute local summaries of input graphs to capture their global properties, and they are fundamental for building deep GNNs that learn hierarchical representations. In this work, we propose the Node Decimation Pooling (NDP), a pooling operator for GNNs that generates coarser graphs while preserving the overall graph topology. During training, the GNN learns new node representations and fits them to a pyramid of coarsened graphs, which is computed offline in a pre-processing stage. NDP consists of three steps. First, a node decimation procedure selects the nodes belonging to one side of the partition identified by a spectral algorithm that approximates the \maxcut{} solution. Afterwards, the selected nodes are connected with Kron reduction to form the coarsened graph. Finally, since the resulting graph is very dense, we apply a sparsification procedure that prunes the adjacency matrix of the coarsened graph to reduce the computational cost in the GNN. Notably, we show that it is possible to remove many edges without significantly altering the graph structure. Experimental results show that NDP is more efficient compared to state-of-the-art graph pooling operators while reaching, at the same time, competitive performance on a significant variety of graph classification tasks.
△ Less
Submitted 20 April, 2024; v1 submitted 24 October, 2019;
originally announced October 2019.
-
Snow avalanche segmentation in SAR images with Fully Convolutional Neural Networks
Authors:
Filippo Maria Bianchi,
Jakob Grahn,
Markus Eckerstorfer,
Eirik Malnes,
Hannah Vickers
Abstract:
Knowledge about frequency and location of snow avalanche activity is essential for forecasting and mapping of snow avalanche hazard. Traditional field monitoring of avalanche activity has limitations, especially when surveying large and remote areas. In recent years, avalanche detection in Sentinel-1 radar satellite imagery has been developed to improve monitoring. However, the current state-of-th…
▽ More
Knowledge about frequency and location of snow avalanche activity is essential for forecasting and mapping of snow avalanche hazard. Traditional field monitoring of avalanche activity has limitations, especially when surveying large and remote areas. In recent years, avalanche detection in Sentinel-1 radar satellite imagery has been developed to improve monitoring. However, the current state-of-the-art detection algorithms, based on radar signal processing techniques, are still much less accurate than human experts. To reduce this gap, we propose a deep learning architecture for detecting avalanches in Sentinel-1 radar images. We trained a neural network on 6,345 manually labelled avalanches from 117 Sentinel-1 images, each one consisting of six channels that include backscatter and topographical information. Then, we tested our trained model on a new SAR image. Comparing to the manual labelling (the gold standard), we achieved an F1 score above 66\%, while the state-of-the-art detection algorithm sits at an F1 score of only 38\%. A visual inspection of the results generated by our deep learning model shows that only small avalanches are undetected, while some avalanches that were originally not labelled by the human expert are discovered.
△ Less
Submitted 6 November, 2020; v1 submitted 11 October, 2019;
originally announced October 2019.
-
Unsupervised Image Regression for Heterogeneous Change Detection
Authors:
Luigi T. Luppino,
Filippo M. Bianchi,
Gabriele Moser,
Stian N. Anfinsen
Abstract:
Change detection in heterogeneous multitemporal satellite images is an emerging and challenging topic in remote sensing. In particular, one of the main challenges is to tackle the problem in an unsupervised manner. In this paper we propose an unsupervised framework for bitemporal heterogeneous change detection based on the comparison of affinity matrices and image regression. First, our method qua…
▽ More
Change detection in heterogeneous multitemporal satellite images is an emerging and challenging topic in remote sensing. In particular, one of the main challenges is to tackle the problem in an unsupervised manner. In this paper we propose an unsupervised framework for bitemporal heterogeneous change detection based on the comparison of affinity matrices and image regression. First, our method quantifies the similarity of affinity matrices computed from co-located image patches in the two images. This is done to automatically identify pixels that are likely to be unchanged. With the identified pixels as pseudo-training data, we learn a transformation to map the first image to the domain of the other image, and vice versa. Four regression methods are selected to carry out the transformation: Gaussian process regression, support vector regression, random forest regression, and a recently proposed kernel regression method called homogeneous pixel transformation. To evaluate the potentials and limitations of our framework, and also the benefits and disadvantages of each regression method, we perform experiments on two real data sets. The results indicate that the comparison of the affinity matrices can already be considered a change detection method by itself. However, image regression is shown to improve the results obtained by the previous step alone and produces accurate change detection maps despite of the heterogeneity of the multitemporal input data. Notably, the random forest regression approach excels by achieving similar accuracy as the other methods, but with a significantly lower computational cost and with fast and robust tuning of hyperparameters.
△ Less
Submitted 7 September, 2019;
originally announced September 2019.
-
Time series cluster kernels to exploit informative missingness and incomplete label information
Authors:
Karl Øyvind Mikalsen,
Cristina Soguero-Ruiz,
Filippo Maria Bianchi,
Arthur Revhaug,
Robert Jenssen
Abstract:
The time series cluster kernel (TCK) provides a powerful tool for analysing multivariate time series subject to missing data. TCK is designed using an ensemble learning approach in which Bayesian mixture models form the base models. Because of the Bayesian approach, TCK can naturally deal with missing values without resorting to imputation and the ensemble strategy ensures robustness to hyperparam…
▽ More
The time series cluster kernel (TCK) provides a powerful tool for analysing multivariate time series subject to missing data. TCK is designed using an ensemble learning approach in which Bayesian mixture models form the base models. Because of the Bayesian approach, TCK can naturally deal with missing values without resorting to imputation and the ensemble strategy ensures robustness to hyperparameters, making it particularly well suited for unsupervised learning.
However, TCK assumes missing at random and that the underlying missingness mechanism is ignorable, i.e. uninformative, an assumption that does not hold in many real-world applications, such as e.g. medicine. To overcome this limitation, we present a kernel capable of exploiting the potentially rich information in the missing values and patterns, as well as the information from the observed data. In our approach, we create a representation of the missing pattern, which is incorporated into mixed mode mixture models in such a way that the information provided by the missing patterns is effectively exploited. Moreover, we also propose a semi-supervised kernel, capable of taking advantage of incomplete label information to learn more accurate similarities.
Experiments on benchmark data, as well as a real-world case study of patients described by longitudinal electronic health record data who potentially suffer from hospital-acquired infections, demonstrate the effectiveness of the proposed methods.
△ Less
Submitted 10 July, 2019;
originally announced July 2019.
-
Spectral Clustering with Graph Neural Networks for Graph Pooling
Authors:
Filippo Maria Bianchi,
Daniele Grattarola,
Cesare Alippi
Abstract:
Spectral clustering (SC) is a popular clustering technique to find strongly connected communities on a graph. SC can be used in Graph Neural Networks (GNNs) to implement pooling operations that aggregate nodes belonging to the same cluster. However, the eigendecomposition of the Laplacian is expensive and, since clustering results are graph-specific, pooling methods based on SC must perform a new…
▽ More
Spectral clustering (SC) is a popular clustering technique to find strongly connected communities on a graph. SC can be used in Graph Neural Networks (GNNs) to implement pooling operations that aggregate nodes belonging to the same cluster. However, the eigendecomposition of the Laplacian is expensive and, since clustering results are graph-specific, pooling methods based on SC must perform a new optimization for each new sample. In this paper, we propose a graph clustering approach that addresses these limitations of SC. We formulate a continuous relaxation of the normalized minCUT problem and train a GNN to compute cluster assignments that minimize this objective. Our GNN-based implementation is differentiable, does not require to compute the spectral decomposition, and learns a clustering function that can be quickly evaluated on out-of-sample graphs. From the proposed clustering method, we design a graph pooling operator that overcomes some important limitations of state-of-the-art graph pooling techniques and achieves the best performance in several supervised and unsupervised tasks.
△ Less
Submitted 29 December, 2020; v1 submitted 30 June, 2019;
originally announced July 2019.
-
Noisy multi-label semi-supervised dimensionality reduction
Authors:
Karl Øyvind Mikalsen,
Cristina Soguero-Ruiz,
Filippo Maria Bianchi,
Robert Jenssen
Abstract:
Noisy labeled data represent a rich source of information that often are easily accessible and cheap to obtain, but label noise might also have many negative consequences if not accounted for. How to fully utilize noisy labels has been studied extensively within the framework of standard supervised machine learning over a period of several decades. However, very little research has been conducted…
▽ More
Noisy labeled data represent a rich source of information that often are easily accessible and cheap to obtain, but label noise might also have many negative consequences if not accounted for. How to fully utilize noisy labels has been studied extensively within the framework of standard supervised machine learning over a period of several decades. However, very little research has been conducted on solving the challenge posed by noisy labels in non-standard settings. This includes situations where only a fraction of the samples are labeled (semi-supervised) and each high-dimensional sample is associated with multiple labels. In this work, we present a novel semi-supervised and multi-label dimensionality reduction method that effectively utilizes information from both noisy multi-labels and unlabeled data. With the proposed Noisy multi-label semi-supervised dimensionality reduction (NMLSDR) method, the noisy multi-labels are denoised and unlabeled data are labeled simultaneously via a specially designed label propagation algorithm. NMLSDR then learns a projection matrix for reducing the dimensionality by maximizing the dependence between the enlarged and denoised multi-label space and the features in the projected space. Extensive experiments on synthetic data, benchmark datasets, as well as a real-world case study, demonstrate the effectiveness of the proposed algorithm and show that it outperforms state-of-the-art multi-label feature extraction algorithms.
△ Less
Submitted 20 February, 2019;
originally announced February 2019.
-
Deep Divergence-Based Approach to Clustering
Authors:
Michael Kampffmeyer,
Sigurd Løkse,
Filippo M. Bianchi,
Lorenzo Livi,
Arnt-Børre Salberg,
Robert Jenssen
Abstract:
A promising direction in deep learning research consists in learning representations and simultaneously discovering cluster structure in unlabeled data by optimizing a discriminative loss function. As opposed to supervised deep learning, this line of research is in its infancy, and how to design and optimize suitable loss functions to train deep neural networks for clustering is still an open ques…
▽ More
A promising direction in deep learning research consists in learning representations and simultaneously discovering cluster structure in unlabeled data by optimizing a discriminative loss function. As opposed to supervised deep learning, this line of research is in its infancy, and how to design and optimize suitable loss functions to train deep neural networks for clustering is still an open question. Our contribution to this emerging field is a new deep clustering network that leverages the discriminative power of information-theoretic divergence measures, which have been shown to be effective in traditional clustering. We propose a novel loss function that incorporates geometric regularization constraints, thus avoiding degenerate structures of the resulting clustering partition. Experiments on synthetic benchmarks and real datasets show that the proposed network achieves competitive performance with respect to other state-of-the-art methods, scales well to large datasets, and does not require pre-training steps.
△ Less
Submitted 13 February, 2019;
originally announced February 2019.
-
Graph Neural Networks with convolutional ARMA filters
Authors:
Filippo Maria Bianchi,
Daniele Grattarola,
Lorenzo Livi,
Cesare Alippi
Abstract:
Popular graph neural networks implement convolution operations on graphs based on polynomial spectral filters. In this paper, we propose a novel graph convolutional layer inspired by the auto-regressive moving average (ARMA) filter that, compared to polynomial ones, provides a more flexible frequency response, is more robust to noise, and better captures the global graph structure. We propose a gr…
▽ More
Popular graph neural networks implement convolution operations on graphs based on polynomial spectral filters. In this paper, we propose a novel graph convolutional layer inspired by the auto-regressive moving average (ARMA) filter that, compared to polynomial ones, provides a more flexible frequency response, is more robust to noise, and better captures the global graph structure. We propose a graph neural network implementation of the ARMA filter with a recursive and distributed formulation, obtaining a convolutional layer that is efficient to train, localized in the node space, and can be transferred to new graphs at test time. We perform a spectral analysis to study the filtering effect of the proposed ARMA layer and report experiments on four downstream tasks: semi-supervised node classification, graph signal classification, graph classification, and graph regression. Results show that the proposed ARMA layer brings significant improvements over graph neural networks based on polynomial filters.
△ Less
Submitted 24 January, 2021; v1 submitted 4 January, 2019;
originally announced January 2019.
-
The Deep Kernelized Autoencoder
Authors:
Michael Kampffmeyer,
Sigurd Løkse,
Filippo M. Bianchi,
Robert Jenssen,
Lorenzo Livi
Abstract:
Autoencoders learn data representations (codes) in such a way that the input is reproduced at the output of the network. However, it is not always clear what kind of properties of the input data need to be captured by the codes. Kernel machines have experienced great success by operating via inner-products in a theoretically well-defined reproducing kernel Hilbert space, hence capturing topologica…
▽ More
Autoencoders learn data representations (codes) in such a way that the input is reproduced at the output of the network. However, it is not always clear what kind of properties of the input data need to be captured by the codes. Kernel machines have experienced great success by operating via inner-products in a theoretically well-defined reproducing kernel Hilbert space, hence capturing topological properties of input data. In this paper, we enhance the autoencoder's ability to learn effective data representations by aligning inner products between codes with respect to a kernel matrix. By doing so, the proposed kernelized autoencoder allows learning similarity-preserving embeddings of input data, where the notion of similarity is explicitly controlled by the user and encoded in a positive semi-definite kernel matrix. Experiments are performed for evaluating both reconstruction and kernel alignment performance in classification tasks and visualization of high-dimensional data. Additionally, we show that our method is capable to emulate kernel principal component analysis on a denoising task, obtaining competitive results at a much lower computational cost.
△ Less
Submitted 23 July, 2018; v1 submitted 19 July, 2018;
originally announced July 2018.
-
An Unsupervised Multivariate Time Series Kernel Approach for Identifying Patients with Surgical Site Infection from Blood Samples
Authors:
Karl Øyvind Mikalsen,
Cristina Soguero-Ruiz,
Filippo Maria Bianchi,
Arthur Revhaug,
Robert Jenssen
Abstract:
A large fraction of the electronic health records consists of clinical measurements collected over time, such as blood tests, which provide important information about the health status of a patient. These sequences of clinical measurements are naturally represented as time series, characterized by multiple variables and the presence of missing data, which complicate analysis. In this work, we pro…
▽ More
A large fraction of the electronic health records consists of clinical measurements collected over time, such as blood tests, which provide important information about the health status of a patient. These sequences of clinical measurements are naturally represented as time series, characterized by multiple variables and the presence of missing data, which complicate analysis. In this work, we propose a surgical site infection detection framework for patients undergoing colorectal cancer surgery that is completely unsupervised, hence alleviating the problem of getting access to labelled training data. The framework is based on powerful kernels for multivariate time series that account for missing data when computing similarities. Our approach show superior performance compared to baselines that have to resort to imputation techniques and performs comparable to a supervised classification baseline.
△ Less
Submitted 21 March, 2018;
originally announced March 2018.
-
Time series kernel similarities for predicting Paroxysmal Atrial Fibrillation from ECGs
Authors:
Filippo Maria Bianchi,
Lorenzo Livi,
Alberto Ferrante,
Jelena Milosevic,
Miroslaw Malek
Abstract:
We tackle the problem of classifying Electrocardiography (ECG) signals with the aim of predicting the onset of Paroxysmal Atrial Fibrillation (PAF). Atrial fibrillation is the most common type of arrhythmia, but in many cases PAF episodes are asymptomatic. Therefore, in order to help diagnosing PAF, it is important to design procedures for detecting and, more importantly, predicting PAF episodes.…
▽ More
We tackle the problem of classifying Electrocardiography (ECG) signals with the aim of predicting the onset of Paroxysmal Atrial Fibrillation (PAF). Atrial fibrillation is the most common type of arrhythmia, but in many cases PAF episodes are asymptomatic. Therefore, in order to help diagnosing PAF, it is important to design procedures for detecting and, more importantly, predicting PAF episodes. We propose a method for predicting PAF events whose first step consists of a feature extraction procedure that represents each ECG as a multi-variate time series. Successively, we design a classification framework based on kernel similarities for multi-variate time series, capable of handling missing data. We consider different approaches to perform classification in the original space of the multi-variate time series and in an embedding space, defined by the kernel similarity measure. We achieve a classification accuracy comparable with state of the art methods, with the additional advantage of detecting the PAF onset up to 15 minutes in advance.
△ Less
Submitted 4 April, 2018; v1 submitted 21 January, 2018;
originally announced January 2018.
-
Learning compressed representations of blood samples time series with missing data
Authors:
Filippo Maria Bianchi,
Karl Øyvind Mikalsen,
Robert Jenssen
Abstract:
Clinical measurements collected over time are naturally represented as multivariate time series (MTS), which often contain missing data. An autoencoder can learn low dimensional vectorial representations of MTS that preserve important data characteristics, but cannot deal explicitly with missing data. In this work, we propose a new framework that combines an autoencoder with the Time series Cluste…
▽ More
Clinical measurements collected over time are naturally represented as multivariate time series (MTS), which often contain missing data. An autoencoder can learn low dimensional vectorial representations of MTS that preserve important data characteristics, but cannot deal explicitly with missing data. In this work, we propose a new framework that combines an autoencoder with the Time series Cluster Kernel (TCK), a kernel that accounts for missingness patterns in MTS. Via kernel alignment, we incorporate TCK in the autoencoder to improve the learned representations in presence of missing data. We consider a classification problem of MTS with missing values, representing blood samples of patients with surgical site infection. With our approach, rather than with a standard autoencoder, we learn representations in low dimensions that can be classified better.
△ Less
Submitted 20 October, 2017;
originally announced October 2017.
-
Time Series Cluster Kernel for Learning Similarities between Multivariate Time Series with Missing Data
Authors:
Karl Øyvind Mikalsen,
Filippo Maria Bianchi,
Cristina Soguero-Ruiz,
Robert Jenssen
Abstract:
Similarity-based approaches represent a promising direction for time series analysis. However, many such methods rely on parameter tuning, and some have shortcomings if the time series are multivariate (MTS), due to dependencies between attributes, or the time series contain missing data. In this paper, we address these challenges within the powerful context of kernel methods by proposing the robu…
▽ More
Similarity-based approaches represent a promising direction for time series analysis. However, many such methods rely on parameter tuning, and some have shortcomings if the time series are multivariate (MTS), due to dependencies between attributes, or the time series contain missing data. In this paper, we address these challenges within the powerful context of kernel methods by proposing the robust \emph{time series cluster kernel} (TCK). The approach taken leverages the missing data handling properties of Gaussian mixture models (GMM) augmented with informative prior distributions. An ensemble learning approach is exploited to ensure robustness to parameters by combining the clustering results of many GMM to form the final kernel.
We evaluate the TCK on synthetic and real data and compare to other state-of-the-art techniques. The experimental results demonstrate that the TCK is robust to parameter choices, provides competitive results for MTS without missing data and outstanding results for missing data.
△ Less
Submitted 29 June, 2017; v1 submitted 3 April, 2017;
originally announced April 2017.
-
Spectral Clustering using PCKID - A Probabilistic Cluster Kernel for Incomplete Data
Authors:
Sigurd Løkse,
Filippo Maria Bianchi,
Arnt-Børre Salberg,
Robert Jenssen
Abstract:
In this paper, we propose PCKID, a novel, robust, kernel function for spectral clustering, specifically designed to handle incomplete data. By combining posterior distributions of Gaussian Mixture Models for incomplete data on different scales, we are able to learn a kernel for incomplete data that does not depend on any critical hyperparameters, unlike the commonly used RBF kernel. To evaluate ou…
▽ More
In this paper, we propose PCKID, a novel, robust, kernel function for spectral clustering, specifically designed to handle incomplete data. By combining posterior distributions of Gaussian Mixture Models for incomplete data on different scales, we are able to learn a kernel for incomplete data that does not depend on any critical hyperparameters, unlike the commonly used RBF kernel. To evaluate our method, we perform experiments on two real datasets. PCKID outperforms the baseline methods for all fractions of missing values and in some cases outperforms the baseline methods with up to 25 percentage points.
△ Less
Submitted 23 February, 2017;
originally announced February 2017.
-
Deep Kernelized Autoencoders
Authors:
Michael Kampffmeyer,
Sigurd Løkse,
Filippo Maria Bianchi,
Robert Jenssen,
Lorenzo Livi
Abstract:
In this paper we introduce the deep kernelized autoencoder, a neural network model that allows an explicit approximation of (i) the mapping from an input space to an arbitrary, user-specified kernel space and (ii) the back-projection from such a kernel space to input space. The proposed method is based on traditional autoencoders and is trained through a new unsupervised loss function. During trai…
▽ More
In this paper we introduce the deep kernelized autoencoder, a neural network model that allows an explicit approximation of (i) the mapping from an input space to an arbitrary, user-specified kernel space and (ii) the back-projection from such a kernel space to input space. The proposed method is based on traditional autoencoders and is trained through a new unsupervised loss function. During training, we optimize both the reconstruction accuracy of input samples and the alignment between a kernel matrix given as prior and the inner products of the hidden representations computed by the autoencoder. Kernel alignment provides control over the hidden representation learned by the autoencoder. Experiments have been performed to evaluate both reconstruction and kernel alignment performance. Additionally, we applied our method to emulate kPCA on a denoising task obtaining promising results.
△ Less
Submitted 8 February, 2017;
originally announced February 2017.