-
The ROAD to discovery: machine learning-driven anomaly detection in radio astronomy spectrograms
Authors:
Michael Mesarcik,
Albert-Jan Boonstra,
Marco Iacobelli,
Elena Ranguelova,
Cees de Laat,
Rob van Nieuwpoort
Abstract:
As radio telescopes increase in sensitivity and flexibility, so do their complexity and data-rates. For this reason automated system health management approaches are becoming increasingly critical to ensure nominal telescope operations. We propose a new machine learning anomaly detection framework for classifying both commonly occurring anomalies in radio telescopes as well as detecting unknown ra…
▽ More
As radio telescopes increase in sensitivity and flexibility, so do their complexity and data-rates. For this reason automated system health management approaches are becoming increasingly critical to ensure nominal telescope operations. We propose a new machine learning anomaly detection framework for classifying both commonly occurring anomalies in radio telescopes as well as detecting unknown rare anomalies that the system has potentially not yet seen. To evaluate our method, we present a dataset consisting of 7050 autocorrelation-based spectrograms from the Low Frequency Array (LOFAR) telescope and assign 10 different labels relating to the system-wide anomalies from the perspective of telescope operators. This includes electronic failures, miscalibration, solar storms, network and compute hardware errors among many more. We demonstrate how a novel Self Supervised Learning (SSL) paradigm, that utilises both context prediction and reconstruction losses, is effective in learning normal behaviour of the LOFAR telescope. We present the Radio Observatory Anomaly Detector (ROAD), a framework that combines both SSL-based anomaly detection and a supervised classification, thereby enabling both classification of both commonly occurring anomalies and detection of unseen anomalies. We demonstrate that our system is real-time in the context of the LOFAR data processing pipeline, requiring <1ms to process a single spectrogram. Furthermore, ROAD obtains an anomaly detection F-2 score of 0.92 while maintaining a false positive rate of ~2\%, as well as a mean per-class classification F-2 score 0.89, outperforming other related works.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Learning to detect RFI in radio astronomy without seeing it
Authors:
Michael Mesarcik,
Albert-Jan Boonstra,
Elena Ranguelova,
Rob V. van Nieuwpoort
Abstract:
Radio Frequency Interference (RFI) corrupts astronomical measurements, thus affecting the performance of radio telescopes. To address this problem, supervised segmentation models have been proposed as candidate solutions to RFI detection. However, the unavailability of large labelled datasets, due to the prohibitive cost of annotating, makes these solutions unusable. To solve these shortcomings, w…
▽ More
Radio Frequency Interference (RFI) corrupts astronomical measurements, thus affecting the performance of radio telescopes. To address this problem, supervised segmentation models have been proposed as candidate solutions to RFI detection. However, the unavailability of large labelled datasets, due to the prohibitive cost of annotating, makes these solutions unusable. To solve these shortcomings, we focus on the inverse problem; training models on only uncontaminated emissions thereby learning to discriminate RFI from all known astronomical signals and system noise. We use Nearest-Latent-Neighbours (NLN) - an algorithm that utilises both the reconstructions and latent distances to the nearest-neighbours in the latent space of generative autoencoding models for novelty detection. The uncontaminated regions are selected using weak-labels in the form of RFI flags (generated by classical RFI flagging methods) available from most radio astronomical data archives at no additional cost. We evaluate performance on two independent datasets, one simulated from the HERA telescope and another consisting of real observations from LOFAR telescope. Additionally, we provide a small expert-labelled LOFAR dataset (i.e., strong labels) for evaluation of our and other methods. Performance is measured using AUROC, AUPRC and the maximum F1-score for a fixed threshold. For the simulated data we outperform the current state-of-the-art by approximately 1% in AUROC and 3% in AUPRC for the HERA dataset. Furthermore, our algorithm offers both a 4% increase in AUROC and AUPRC at a cost of a degradation in F1-score performance for the LOFAR dataset, without any manual labelling.
△ Less
Submitted 11 October, 2022; v1 submitted 1 July, 2022;
originally announced July 2022.
-
Improving Novelty Detection using the Reconstructions of Nearest Neighbours
Authors:
Michael Mesarcik,
Elena Ranguelova,
Albert-Jan Boonstra,
Rob V. van Nieuwpoort
Abstract:
We show that using nearest neighbours in the latent space of autoencoders (AE) significantly improves performance of semi-supervised novelty detection in both single and multi-class contexts. Autoencoding methods detect novelty by learning to differentiate between the non-novel training class(es) and all other unseen classes. Our method harnesses a combination of the reconstructions of the nearest…
▽ More
We show that using nearest neighbours in the latent space of autoencoders (AE) significantly improves performance of semi-supervised novelty detection in both single and multi-class contexts. Autoencoding methods detect novelty by learning to differentiate between the non-novel training class(es) and all other unseen classes. Our method harnesses a combination of the reconstructions of the nearest neighbours and the latent-neighbour distances of a given input's latent representation. We demonstrate that our nearest-latent-neighbours (NLN) algorithm is memory and time efficient, does not require significant data augmentation, nor is reliant on pre-trained networks. Furthermore, we show that the NLN-algorithm is easily applicable to multiple datasets without modification. Additionally, the proposed algorithm is agnostic to autoencoder architecture and reconstruction error method. We validate our method across several standard datasets for a variety of different autoencoding architectures such as vanilla, adversarial and variational autoencoders using either reconstruction, residual or feature consistent losses. The results show that the NLN algorithm grants up to a 17% increase in Area Under the Receiver Operating Characteristics (AUROC) curve performance for the multi-class case and 8% for single-class novelty detection.
△ Less
Submitted 28 January, 2022; v1 submitted 11 November, 2021;
originally announced November 2021.
-
Deep Learning Assisted Data Inspection for Radio Astronomy
Authors:
Michael Mesarcik,
Albert-Jan Boonstra,
Christiaan Meijer,
Walter Jansen,
Elena Ranguelova,
Rob V. van Nieuwpoort
Abstract:
Modern radio telescopes combine thousands of receivers, long-distance networks, large-scale compute hardware, and intricate software. Due to this complexity, failures occur relatively frequently. In this work we propose novel use of unsupervised deep learning to diagnose system health for modern radio telescopes. The model is a convolutional Variational Autoencoder (VAE) that enables the projectio…
▽ More
Modern radio telescopes combine thousands of receivers, long-distance networks, large-scale compute hardware, and intricate software. Due to this complexity, failures occur relatively frequently. In this work we propose novel use of unsupervised deep learning to diagnose system health for modern radio telescopes. The model is a convolutional Variational Autoencoder (VAE) that enables the projection of the high dimensional time-frequency data to a low-dimensional prescriptive space. Using this projection, telescope operators are able to visually inspect failures thereby maintaining system health. We have trained and evaluated the performance of the VAE quantitatively in controlled experiments on simulated data from HERA. Moreover, we present a qualitative assessment of the the model trained and tested on real LOFAR data. Through the use of a naive SVM classifier on the projected synthesised data, we show that there is a trade-off between the dimensionality of the projection and the number of compounded features in a given spectrogram. The VAE and SVM combination scores between 65% and 90% accuracy depending on the number of features in a given input. Finally, we show the prototype system-health-diagnostic web framework that integrates the evaluated model. The system is currently undergoing testing at the ASTRON observatory.
△ Less
Submitted 27 May, 2020;
originally announced May 2020.