-
Unsupervised representation learning with Hebbian synaptic and structural plasticity in brain-like feedforward neural networks
Authors:
Naresh Ravichandran,
Anders Lansner,
Pawel Herman
Abstract:
Neural networks that can capture key principles underlying brain computation offer exciting new opportunities for developing artificial intelligence and brain-like computing algorithms. Such networks remain biologically plausible while leveraging localized forms of synaptic learning rules and modular network architecture found in the neocortex. Compared to backprop-driven deep learning approches,…
▽ More
Neural networks that can capture key principles underlying brain computation offer exciting new opportunities for developing artificial intelligence and brain-like computing algorithms. Such networks remain biologically plausible while leveraging localized forms of synaptic learning rules and modular network architecture found in the neocortex. Compared to backprop-driven deep learning approches, they provide more suitable models for deploying on neuromorphic hardware and have greater potential for scalability on large-scale computing clusters. The development of such brain-like neural networks depends on having a learning procedure that can build effective internal representations from data. In this work, we introduce and evaluate a brain-like neural network model capable of unsupervised representation learning. It builds on the Bayesian Confidence Propagation Neural Network (BCPNN), which has earlier been implemented as abstract as well as biophyscially detailed recurrent attractor neural networks explaining various cortical associative memory phenomena. Here we developed a feedforward BCPNN model to perform representation learning by incorporating a range of brain-like attributes derived from neocortical circuits such as cortical columns, divisive normalization, Hebbian synaptic plasticity, structural plasticity, sparse activity, and sparse patchy connectivity. The model was tested on a diverse set of popular machine learning benchmarks: grayscale images (MNIST, Fashion-MNIST), RGB natural images (SVHN, CIFAR-10), QSAR (MUV, HIV), and malware detection (EMBER). The performance of the model when using a linear classifier to predict the class labels fared competitively with conventional multi-layer perceptrons and other state-of-the-art brain-like neural networks.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Spiking representation learning for associative memories
Authors:
Naresh Ravichandran,
Anders Lansner,
Pawel Herman
Abstract:
Networks of interconnected neurons communicating through spiking signals offer the bedrock of neural computations. Our brains spiking neural networks have the computational capacity to achieve complex pattern recognition and cognitive functions effortlessly. However, solving real-world problems with artificial spiking neural networks (SNNs) has proved to be difficult for a variety of reasons. Cruc…
▽ More
Networks of interconnected neurons communicating through spiking signals offer the bedrock of neural computations. Our brains spiking neural networks have the computational capacity to achieve complex pattern recognition and cognitive functions effortlessly. However, solving real-world problems with artificial spiking neural networks (SNNs) has proved to be difficult for a variety of reasons. Crucially, scaling SNNs to large networks and processing large-scale real-world datasets have been challenging, especially when compared to their non-spiking deep learning counterparts. The critical operation that is needed of SNNs is the ability to learn distributed representations from data and use these representations for perceptual, cognitive and memory operations. In this work, we introduce a novel SNN that performs unsupervised representation learning and associative memory operations leveraging Hebbian synaptic and activity-dependent structural plasticity coupled with neuron-units modelled as Poisson spike generators with sparse firing (~1 Hz mean and ~100 Hz maximum firing rate). Crucially, the architecture of our model derives from the neocortical columnar organization and combines feedforward projections for learning hidden representations and recurrent projections for forming associative memories. We evaluated the model on properties relevant for attractor-based associative memories such as pattern completion, perceptual rivalry, distortion resistance, and prototype extraction.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask
Authors:
Zineb Senane,
Lele Cao,
Valentin Leonhard Buchner,
Yusuke Tashiro,
Lei You,
Pawel Herman,
Mats Nordahl,
Ruibo Tu,
Vilhelm von Ehrenheim
Abstract:
Time Series Representation Learning (TSRL) focuses on generating informative representations for various Time Series (TS) modeling tasks. Traditional Self-Supervised Learning (SSL) methods in TSRL fall into four main categories: reconstructive, adversarial, contrastive, and predictive, each with a common challenge of sensitivity to noise and intricate data nuances. Recently, diffusion-based method…
▽ More
Time Series Representation Learning (TSRL) focuses on generating informative representations for various Time Series (TS) modeling tasks. Traditional Self-Supervised Learning (SSL) methods in TSRL fall into four main categories: reconstructive, adversarial, contrastive, and predictive, each with a common challenge of sensitivity to noise and intricate data nuances. Recently, diffusion-based methods have shown advanced generative capabilities. However, they primarily target specific application scenarios like imputation and forecasting, leaving a gap in leveraging diffusion models for generic TSRL. Our work, Time Series Diffusion Embedding (TSDE), bridges this gap as the first diffusion-based SSL TSRL approach. TSDE segments TS data into observed and masked parts using an Imputation-Interpolation-Forecasting (IIF) mask. It applies a trainable embedding function, featuring dual-orthogonal Transformer encoders with a crossover mechanism, to the observed part. We train a reverse diffusion process conditioned on the embeddings, designed to predict noise added to the masked part. Extensive experiments demonstrate TSDE's superiority in imputation, interpolation, forecasting, anomaly detection, classification, and clustering. We also conduct an ablation study, present embedding visualizations, and compare inference speed, further substantiating TSDE's efficiency and validity in learning representations of TS data.
△ Less
Submitted 17 June, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
Benchmarking Hebbian learning rules for associative memory
Authors:
Anders Lansner,
Naresh B Ravichandran,
Pawel Herman
Abstract:
Associative memory or content addressable memory is an important component function in computer science and information processing and is a key concept in cognitive and computational brain science. Many different neural network architectures and learning rules have been proposed to model associative memory of the brain while investigating key functions like pattern completion and rivalry, noise re…
▽ More
Associative memory or content addressable memory is an important component function in computer science and information processing and is a key concept in cognitive and computational brain science. Many different neural network architectures and learning rules have been proposed to model associative memory of the brain while investigating key functions like pattern completion and rivalry, noise reduction, and storage capacity. A less investigated but important function is prototype extraction where the training set comprises pattern instances generated by distorting prototype patterns and the task of the trained network is to recall the correct prototype pattern given a new instance. In this paper we characterize these different aspects of associative memory performance and benchmark six different learning rules on storage capacity and prototype extraction. We consider only models with Hebbian plasticity that operate on sparse distributed representations with unit activities in the interval [0,1]. We evaluate both non-modular and modular network architectures and compare performance when trained and tested on different kinds of sparse random binary pattern sets, including correlated ones. We show that covariance learning has a robust but low storage capacity under these conditions and that the Bayesian Confidence Propagation learning rule (BCPNN) is superior with a good margin in all cases except one, reaching a three times higher composite score than the second best learning rule tested.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
Beyond Gut Feel: Using Time Series Transformers to Find Investment Gems
Authors:
Lele Cao,
Gustaf Halvardsson,
Andrew McCornack,
Vilhelm von Ehrenheim,
Pawel Herman
Abstract:
This paper addresses the growing application of data-driven approaches within the Private Equity (PE) industry, particularly in sourcing investment targets (i.e., companies) for Venture Capital (VC) and Growth Capital (GC). We present a comprehensive review of the relevant approaches and propose a novel approach leveraging a Transformer-based Multivariate Time Series Classifier (TMTSC) for predict…
▽ More
This paper addresses the growing application of data-driven approaches within the Private Equity (PE) industry, particularly in sourcing investment targets (i.e., companies) for Venture Capital (VC) and Growth Capital (GC). We present a comprehensive review of the relevant approaches and propose a novel approach leveraging a Transformer-based Multivariate Time Series Classifier (TMTSC) for predicting the success likelihood of any candidate company. The objective of our research is to optimize sourcing performance for VC and GC investments by formally defining the sourcing problem as a multivariate time series classification task. We consecutively introduce the key components of our implementation which collectively contribute to the successful application of TMTSC in VC/GC sourcing: input features, model architecture, optimization target, and investor-centric data processing. Our extensive experiments on two real-world investment tasks, benchmarked towards three popular baselines, demonstrate the effectiveness of our approach in improving decision making within the VC and GC industry.
△ Less
Submitted 14 June, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.
-
Spiking neural networks with Hebbian plasticity for unsupervised representation learning
Authors:
Naresh Ravichandran,
Anders Lansner,
Pawel Herman
Abstract:
We introduce a novel spiking neural network model for learning distributed internal representations from data in an unsupervised procedure. We achieved this by transforming the non-spiking feedforward Bayesian Confidence Propagation Neural Network (BCPNN) model, employing an online correlation-based Hebbian-Bayesian learning and rewiring mechanism, shown previously to perform representation learni…
▽ More
We introduce a novel spiking neural network model for learning distributed internal representations from data in an unsupervised procedure. We achieved this by transforming the non-spiking feedforward Bayesian Confidence Propagation Neural Network (BCPNN) model, employing an online correlation-based Hebbian-Bayesian learning and rewiring mechanism, shown previously to perform representation learning, into a spiking neural network with Poisson statistics and low firing rate comparable to in vivo cortical pyramidal neurons. We evaluated the representations learned by our spiking model using a linear classifier and show performance close to the non-spiking BCPNN, and competitive with other Hebbian-based spiking networks when trained on MNIST and F-MNIST machine learning benchmarks.
△ Less
Submitted 10 May, 2023; v1 submitted 5 May, 2023;
originally announced May 2023.
-
Hebbian fast plasticity and working memory
Authors:
Anders Lansner,
Florian Fiebig,
Pawel Herman
Abstract:
Theories and models of working memory (WM) were at least since the mid-1990s dominated by the persistent activity hypothesis. The past decade has seen rising concerns about the shortcomings of sustained activity as the mechanism for short-term maintenance of WM information in the light of accumulating experimental evidence for so-called activity-silent WM and the fundamental difficulty in explaini…
▽ More
Theories and models of working memory (WM) were at least since the mid-1990s dominated by the persistent activity hypothesis. The past decade has seen rising concerns about the shortcomings of sustained activity as the mechanism for short-term maintenance of WM information in the light of accumulating experimental evidence for so-called activity-silent WM and the fundamental difficulty in explaining robust multi-item WM. In consequence, alternative theories are now explored mostly in the direction of fast synaptic plasticity as the underlying mechanism.The question of non-Hebbian vs Hebbian synaptic plasticity emerges naturally in this context. In this review we focus on fast Hebbian plasticity and trace the origins of WM theories and models building on this form of associative learning.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
Metaheuristic conditional neural network for harvesting skyrmionic metastable states
Authors:
Qichen Xu,
I. P. Miranda,
Manuel Pereiro,
Filipp N. Rybakov,
Danny Thonig,
Erik Sjöqvist,
Pavel Bessarab,
Anders Bergman,
Olle Eriksson,
Pawel Herman,
Anna Delin
Abstract:
We present a metaheuristic conditional neural-network-based method aimed at identifying physically interesting metastable states in a potential energy surface of high rugosity. To demonstrate how this method works, we identify and analyze spin textures with topological charge $Q$ ranging from 1 to $-13$ (where antiskyrmions have $Q<0$) in the Pd/Fe/Ir(111) system, which we model using a classical…
▽ More
We present a metaheuristic conditional neural-network-based method aimed at identifying physically interesting metastable states in a potential energy surface of high rugosity. To demonstrate how this method works, we identify and analyze spin textures with topological charge $Q$ ranging from 1 to $-13$ (where antiskyrmions have $Q<0$) in the Pd/Fe/Ir(111) system, which we model using a classical atomistic spin Hamiltonian based on parameters computed from density functional theory. To facilitate the harvest of relevant spin textures, we make use of the newly developed Segment Anything Model (SAM). Spin textures with $Q$ ranging from $-3$ to $-6$ are further analyzed using finite-temperature spin-dynamics simulations. We observe that for temperatures up to around 20\,K, lifetimes longer than 200\,ps are predicted, and that when these textures decay, new topological spin textures are formed. We also find that the relative stability of the spin textures depend linearly on the topological charge, but only when comparing the most stable antiskyrmions for each topological charge. In general, the number of holes (i.e., non-self-intersecting curves that define closed domain walls in the structure) in the spin texture is an important predictor of stability -- the more holes, the less stable is the texture. Methods for systematic identification and characterization of complex metastable skyrmionic textures -- such as the one demonstrated here -- are highly relevant for advancements in the field of topological spintronics.
△ Less
Submitted 29 May, 2023; v1 submitted 5 March, 2023;
originally announced March 2023.
-
Genetic-tunneling driven energy optimizer for spin systems
Authors:
Qichen Xu,
Zhuanglin Shen,
Manuel Pereiro,
Pawel Herman,
Olle Eriksson,
Anna Delin
Abstract:
A long-standing and difficult problem in, e.g., condensed matter physics is how to find the ground state of a complex many-body system where the potential energy surface has a large number of local minima. Spin systems containing complex and/or topological textures, for example spin spirals or magnetic skyrmions, are prime examples of such systems. We propose here a genetic-tunneling-driven varian…
▽ More
A long-standing and difficult problem in, e.g., condensed matter physics is how to find the ground state of a complex many-body system where the potential energy surface has a large number of local minima. Spin systems containing complex and/or topological textures, for example spin spirals or magnetic skyrmions, are prime examples of such systems. We propose here a genetic-tunneling-driven variance-controlled optimization approach, and apply it to two-dimensional magnetic skyrmionic systems. The approach combines a local energy-minimizer backend and a metaheuristic global search frontend. The algorithm is naturally concurrent, resulting in short user execution time. We find that the method performs significantly better than simulated annealing (SA). Specifically, we demonstrate that for the Pd/Fe/Ir(111) system, our method correctly and efficiently identifies the experimentally observed spin spiral, skyrmion lattice and ferromagnetic ground states as a function of external magnetic field. To our knowledge, no other optimization method has until now succeeded in doing this. We envision that our findings will pave the way for evolutionary computing in mapping out phase diagrams for spin systems in general.
△ Less
Submitted 27 February, 2023; v1 submitted 31 December, 2022;
originally announced January 2023.
-
Brain-like combination of feedforward and recurrent network components achieves prototype extraction and robust pattern recognition
Authors:
Naresh Balaji Ravichandran,
Anders Lansner,
Pawel Herman
Abstract:
Associative memory has been a prominent candidate for the computation performed by the massively recurrent neocortical networks. Attractor networks implementing associative memory have offered mechanistic explanation for many cognitive phenomena. However, attractor memory models are typically trained using orthogonal or random patterns to avoid interference between memories, which makes them unfea…
▽ More
Associative memory has been a prominent candidate for the computation performed by the massively recurrent neocortical networks. Attractor networks implementing associative memory have offered mechanistic explanation for many cognitive phenomena. However, attractor memory models are typically trained using orthogonal or random patterns to avoid interference between memories, which makes them unfeasible for naturally occurring complex correlated stimuli like images. We approach this problem by combining a recurrent attractor network with a feedforward network that learns distributed representations using an unsupervised Hebbian-Bayesian learning rule. The resulting network model incorporates many known biological properties: unsupervised learning, Hebbian plasticity, sparse distributed activations, sparse connectivity, columnar and laminar cortical architecture, etc. We evaluate the synergistic effects of the feedforward and recurrent network components in complex pattern recognition tasks on the MNIST handwritten digits dataset. We demonstrate that the recurrent attractor component implements associative memory when trained on the feedforward-driven internal (hidden) representations. The associative memory is also shown to perform prototype extraction from the training data and make the representations robust to severely distorted input. We argue that several aspects of the proposed integration of feedforward and recurrent computations are particularly attractive from a machine learning perspective.
△ Less
Submitted 3 September, 2022; v1 submitted 30 June, 2022;
originally announced June 2022.
-
A Long Short-term Memory Based Recurrent Neural Network for Interventional MRI Reconstruction
Authors:
Ruiyang Zhao,
Zhao He,
Tao Wang,
Suhao Qiu,
Pawel Herman,
Yanle Hu,
Chencheng Zhang,
Dinggang Shen,
Bomin Sun,
Guang-Zhong Yang,
Yuan Feng
Abstract:
Interventional magnetic resonance imaging (i-MRI) for surgical guidance could help visualize the interventional process such as deep brain stimulation (DBS), improving the surgery performance and patient outcome. Different from retrospective reconstruction in conventional dynamic imaging, i-MRI for DBS has to acquire and reconstruct the interventional images sequentially online. Here we proposed a…
▽ More
Interventional magnetic resonance imaging (i-MRI) for surgical guidance could help visualize the interventional process such as deep brain stimulation (DBS), improving the surgery performance and patient outcome. Different from retrospective reconstruction in conventional dynamic imaging, i-MRI for DBS has to acquire and reconstruct the interventional images sequentially online. Here we proposed a convolutional long short-term memory (Conv-LSTM) based recurrent neural network (RNN), or ConvLR, to reconstruct interventional images with golden-angle radial sampling. By using an initializer and Conv-LSTM blocks, the priors from the pre-operative reference image and intra-operative frames were exploited for reconstructing the current frame. Data consistency for radial sampling was implemented by a soft-projection method. To improve the reconstruction accuracy, an adversarial learning strategy was adopted. A set of interventional images based on the pre-operative and post-operative MR images were simulated for algorithm validation. Results showed with only 10 radial spokes, ConvLR provided the best performance compared with state-of-the-art methods, giving an acceleration up to 40 folds. The proposed algorithm has the potential to achieve real-time i-MRI for DBS and can be used for general purpose MR-guided intervention.
△ Less
Submitted 12 April, 2022; v1 submitted 28 March, 2022;
originally announced March 2022.
-
Semi-supervised learning with Bayesian Confidence Propagation Neural Network
Authors:
Naresh Balaji Ravichandran,
Anders Lansner,
Pawel Herman
Abstract:
Learning internal representations from data using no or few labels is useful for machine learning research, as it allows using massive amounts of unlabeled data. In this work, we use the Bayesian Confidence Propagation Neural Network (BCPNN) model developed as a biologically plausible model of the cortex. Recent work has demonstrated that these networks can learn useful internal representations fr…
▽ More
Learning internal representations from data using no or few labels is useful for machine learning research, as it allows using massive amounts of unlabeled data. In this work, we use the Bayesian Confidence Propagation Neural Network (BCPNN) model developed as a biologically plausible model of the cortex. Recent work has demonstrated that these networks can learn useful internal representations from data using local Bayesian-Hebbian learning rules. In this work, we show how such representations can be leveraged in a semi-supervised setting by introducing and comparing different classifiers. We also evaluate and compare such networks with other popular semi-supervised classifiers.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs
Authors:
Artur Podobas,
Martin Svedin,
Steven W. D. Chien,
Ivy B. Peng,
Naresh Balaji Ravichandran,
Pawel Herman,
Anders Lansner,
Stefano Markidis
Abstract:
The modern deep learning method based on backpropagation has surged in popularity and has been used in multiple domains and application areas. At the same time, there are other -- less-known -- machine learning algorithms with a mature and solid theoretical foundation whose performance remains unexplored. One such example is the brain-like Bayesian Confidence Propagation Neural Network (BCPNN). In…
▽ More
The modern deep learning method based on backpropagation has surged in popularity and has been used in multiple domains and application areas. At the same time, there are other -- less-known -- machine learning algorithms with a mature and solid theoretical foundation whose performance remains unexplored. One such example is the brain-like Bayesian Confidence Propagation Neural Network (BCPNN). In this paper, we introduce StreamBrain -- a framework that allows neural networks based on BCPNN to be practically deployed in High-Performance Computing systems. StreamBrain is a domain-specific language (DSL), similar in concept to existing machine learning (ML) frameworks, and supports backends for CPUs, GPUs, and even FPGAs. We empirically demonstrate that StreamBrain can train the well-known ML benchmark dataset MNIST within seconds, and we are the first to demonstrate BCPNN on STL-10 size networks. We also show how StreamBrain can be used to train with custom floating-point formats and illustrate the impact of using different bfloat variations on BCPNN using FPGAs.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
Automatic Particle Trajectory Classification in Plasma Simulations
Authors:
Stefano Markidis,
Ivy Peng,
Artur Podobas,
Itthinat Jongsuebchoke,
Gabriel Bengtsson,
Pawel Herman
Abstract:
Numerical simulations of plasma flows are crucial for advancing our understanding of microscopic processes that drive the global plasma dynamics in fusion devices, space, and astrophysical systems. Identifying and classifying particle trajectories allows us to determine specific on-going acceleration mechanisms, shedding light on essential plasma processes.
Our overall goal is to provide a gener…
▽ More
Numerical simulations of plasma flows are crucial for advancing our understanding of microscopic processes that drive the global plasma dynamics in fusion devices, space, and astrophysical systems. Identifying and classifying particle trajectories allows us to determine specific on-going acceleration mechanisms, shedding light on essential plasma processes.
Our overall goal is to provide a general workflow for exploring particle trajectory space and automatically classifying particle trajectories from plasma simulations in an unsupervised manner. We combine pre-processing techniques, such as Fast Fourier Transform (FFT), with Machine Learning methods, such as Principal Component Analysis (PCA), k-means clustering algorithms, and silhouette analysis. We demonstrate our workflow by classifying electron trajectories during magnetic reconnection problem. Our method successfully recovers existing results from previous literature without a priori knowledge of the underlying system.
Our workflow can be applied to analyzing particle trajectories in different phenomena, from magnetic reconnection, shocks to magnetospheric flows. The workflow has no dependence on any physics model and can identify particle trajectories and acceleration mechanisms that were not detected before.
△ Less
Submitted 11 October, 2020;
originally announced October 2020.
-
Brain-like approaches to unsupervised learning of hidden representations -- a comparative study
Authors:
Naresh Balaji Ravichandran,
Anders Lansner,
Pawel Herman
Abstract:
Unsupervised learning of hidden representations has been one of the most vibrant research directions in machine learning in recent years. In this work we study the brain-like Bayesian Confidence Propagating Neural Network (BCPNN) model, recently extended to extract sparse distributed high-dimensional representations. The usefulness and class-dependent separability of the hidden representations whe…
▽ More
Unsupervised learning of hidden representations has been one of the most vibrant research directions in machine learning in recent years. In this work we study the brain-like Bayesian Confidence Propagating Neural Network (BCPNN) model, recently extended to extract sparse distributed high-dimensional representations. The usefulness and class-dependent separability of the hidden representations when trained on MNIST and Fashion-MNIST datasets is studied using an external linear classifier and compared with other unsupervised learning methods that include restricted Boltzmann machines and autoencoders.
△ Less
Submitted 16 April, 2021; v1 submitted 6 May, 2020;
originally announced May 2020.
-
Learning representations in Bayesian Confidence Propagation neural networks
Authors:
Naresh Balaji Ravichandran,
Anders Lansner,
Pawel Herman
Abstract:
Unsupervised learning of hierarchical representations has been one of the most vibrant research directions in deep learning during recent years. In this work we study biologically inspired unsupervised strategies in neural networks based on local Hebbian learning. We propose new mechanisms to extend the Bayesian Confidence Propagating Neural Network (BCPNN) architecture, and demonstrate their capa…
▽ More
Unsupervised learning of hierarchical representations has been one of the most vibrant research directions in deep learning during recent years. In this work we study biologically inspired unsupervised strategies in neural networks based on local Hebbian learning. We propose new mechanisms to extend the Bayesian Confidence Propagating Neural Network (BCPNN) architecture, and demonstrate their capability for unsupervised learning of salient hidden representations when tested on the MNIST dataset.
△ Less
Submitted 27 March, 2020;
originally announced March 2020.
-
Automated classification of plasma regions using 3D particle energy distributions
Authors:
Vyacheslav Olshevsky,
Yuri V. Khotyaintsev,
Ahmad Lalti,
Andrey Divin,
Gian Luca Delzanno,
Sven Anderzen,
Pawel Herman,
Steven W. D. Chien,
Levon Avanov,
Andrew P. Dimmock,
Stefano Markidis
Abstract:
We investigate the properties of the ion sky maps produced by the Dual Ion Spectrometers (DIS) from the Fast Plasma Investigation (FPI). We have trained a convolutional neural network classifier to predict four regions crossed by the MMS on the dayside magnetosphere: solar wind, ion foreshock, magnetosheath, and magnetopause using solely DIS spectrograms. The accuracy of the classifier is >98%. We…
▽ More
We investigate the properties of the ion sky maps produced by the Dual Ion Spectrometers (DIS) from the Fast Plasma Investigation (FPI). We have trained a convolutional neural network classifier to predict four regions crossed by the MMS on the dayside magnetosphere: solar wind, ion foreshock, magnetosheath, and magnetopause using solely DIS spectrograms. The accuracy of the classifier is >98%. We use the classifier to detect mixed plasma regions, in particular to find the bow shock regions. A similar approach can be used to identify the magnetopause crossings and reveal regions prone to magnetic reconnection. Data processing through the trained classifier is fast and efficient and thus can be used for classification for the whole MMS database.
△ Less
Submitted 21 September, 2021; v1 submitted 15 August, 2019;
originally announced August 2019.
-
Characterizing Deep-Learning I/O Workloads in TensorFlow
Authors:
Steven W. D. Chien,
Stefano Markidis,
Chaitanya Prasad Sishtla,
Luis Santos,
Pawel Herman,
Sai Narasimhamurthy,
Erwin Laure
Abstract:
The performance of Deep-Learning (DL) computing frameworks rely on the performance of data ingestion and checkpointing. In fact, during the training, a considerable high number of relatively small files are first loaded and pre-processed on CPUs and then moved to accelerator for computation. In addition, checkpointing and restart operations are carried out to allow DL computing frameworks to resta…
▽ More
The performance of Deep-Learning (DL) computing frameworks rely on the performance of data ingestion and checkpointing. In fact, during the training, a considerable high number of relatively small files are first loaded and pre-processed on CPUs and then moved to accelerator for computation. In addition, checkpointing and restart operations are carried out to allow DL computing frameworks to restart quickly from a checkpoint. Because of this, I/O affects the performance of DL applications. In this work, we characterize the I/O performance and scaling of TensorFlow, an open-source programming framework developed by Google and specifically designed for solving DL problems. To measure TensorFlow I/O performance, we first design a micro-benchmark to measure TensorFlow reads, and then use a TensorFlow mini-application based on AlexNet to measure the performance cost of I/O and checkpointing in TensorFlow. To improve the checkpointing performance, we design and implement a burst buffer. We find that increasing the number of threads increases TensorFlow bandwidth by a maximum of 2.3x and 7.8x on our benchmark environments. The use of the tensorFlow prefetcher results in a complete overlap of computation on accelerator and input pipeline on CPU eliminating the effective cost of I/O on the overall performance. The use of a burst buffer to checkpoint to a fast small capacity storage and copy asynchronously the checkpoints to a slower large capacity storage resulted in a performance improvement of 2.6x with respect to checkpointing directly to slower storage on our benchmark environment.
△ Less
Submitted 6 October, 2018;
originally announced October 2018.