Search | arXiv e-print repository

arXiv:2407.18917 [pdf, other]

Spatio-temporal Structure of Excitation and Inhibition Emerges in Spiking Neural Networks with and without Biologically Plausible Constraints

Authors: Balázs Mészáros, James Knight, Thomas Nowotny

Abstract: We present a Spiking Neural Network (SNN) model that incorporates learnable synaptic delays using Dilated Convolution with Learnable Spacings (DCLS). We train this model on the Raw Heidelberg Digits keyword spotting benchmark using Backpropagation Through Time with surrogate gradients. Analysing the spatio-temporal structure of synaptic interactions in the network we observe that after training ex… ▽ More We present a Spiking Neural Network (SNN) model that incorporates learnable synaptic delays using Dilated Convolution with Learnable Spacings (DCLS). We train this model on the Raw Heidelberg Digits keyword spotting benchmark using Backpropagation Through Time with surrogate gradients. Analysing the spatio-temporal structure of synaptic interactions in the network we observe that after training excitation and inhibition are grouped together both in space and time. To further enhance the efficiency and biological realism of our model, we implemented a dynamic pruning strategy that combines DEEP R for connection removal and RigL for connection reintroduction, ensuring that the network maintains optimal connectivity throughout training. Additionally, we incorporated Dale's Principle, enforcing each neuron to be exclusively excitatory or inhibitory -- aligning our model closer to biological neural networks. We observed that, after training, the spatio-temporal patterns of excitation and inhibition appeared in the more biologically plausible model as well. Our research demonstrates the potential of integrating learnable delays, dynamic pruning, and biological constraints to develop efficient SNN models for temporal data processing. Furthermore, our results enhance the understanding of spatio-temporal dynamics in SNNs -- suggesting that the spatio-temporal features which emerge from training are robust to both pruning and rewiring processes -- providing a solid foundation for future work in neuromorphic computing applications. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: 10 pages, 4 figures, submitted to Frontiers in Computational Neuroscience

arXiv:2402.14300 [pdf, other]

A Simple Framework Uniting Visual In-context Learning with Masked Image Modeling to Improve Ultrasound Segmentation

Authors: Yuyue Zhou, Banafshe Felfeliyan, Shrimanti Ghosh, Jessica Knight, Fatima Alves-Pereira, Christopher Keen, Jessica Küpper, Abhilash Rakkunedeth Hareendranathan, Jacob L. Jaremko

Abstract: Conventional deep learning models deal with images one-by-one, requiring costly and time-consuming expert labeling in the field of medical imaging, and domain-specific restriction limits model generalizability. Visual in-context learning (ICL) is a new and exciting area of research in computer vision. Unlike conventional deep learning, ICL emphasizes the model's ability to adapt to new tasks based… ▽ More Conventional deep learning models deal with images one-by-one, requiring costly and time-consuming expert labeling in the field of medical imaging, and domain-specific restriction limits model generalizability. Visual in-context learning (ICL) is a new and exciting area of research in computer vision. Unlike conventional deep learning, ICL emphasizes the model's ability to adapt to new tasks based on given examples quickly. Inspired by MAE-VQGAN, we proposed a new simple visual ICL method called SimICL, combining visual ICL pairing images with masked image modeling (MIM) designed for self-supervised learning. We validated our method on bony structures segmentation in a wrist ultrasound (US) dataset with limited annotations, where the clinical objective was to segment bony structures to help with further fracture detection. We used a test set containing 3822 images from 18 patients for bony region segmentation. SimICL achieved an remarkably high Dice coeffient (DC) of 0.96 and Jaccard Index (IoU) of 0.92, surpassing state-of-the-art segmentation and visual ICL models (a maximum DC 0.86 and IoU 0.76), with SimICL DC and IoU increasing up to 0.10 and 0.16. This remarkably high agreement with limited manual annotations indicates SimICL could be used for training AI models even on small US datasets. This could dramatically decrease the human expert time required for image labeling compared to conventional approaches, and enhance the real-world use of AI assistance in US image analysis. △ Less

Submitted 8 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

arXiv:2401.08991 [pdf, other]

Knight Watch: Ubiquitous Computing Enhancements To Sleep Quality With Acoustic Analysis

Authors: Andrew Ajemian, John Knight, Tommy Nguyen, John O'Connor

Abstract: This project introduces a wearable, non-intrusive device for snoring detection and remediation, designed to be placed under or alongside a pillow. The device uses sensors and machine learning algorithms to detect snoring and employs gentle vibrations to prompt positional changes, thereby reducing snoring episodes. The device is capable of connecting via an API to a cloud-based platform for the ana… ▽ More This project introduces a wearable, non-intrusive device for snoring detection and remediation, designed to be placed under or alongside a pillow. The device uses sensors and machine learning algorithms to detect snoring and employs gentle vibrations to prompt positional changes, thereby reducing snoring episodes. The device is capable of connecting via an API to a cloud-based platform for the analysis of snoring sleep patterns and environmental context. The paper details the development from concept to prototype, emphasizing the technical challenges, solutions, and alignment with ubiquitous computing in sleep quality improvement. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: 6 Pages, 3 Figures

arXiv:2309.09490 [pdf, other]

Self-supervised TransUNet for Ultrasound regional segmentation of the distal radius in children

Authors: Yuyue Zhou, Jessica Knight, Banafshe Felfeliyan, Christopher Keen, Abhilash Rakkunedeth Hareendranathan, Jacob L. Jaremko

Abstract: Supervised deep learning offers great promise to automate analysis of medical images from segmentation to diagnosis. However, their performance highly relies on the quality and quantity of the data annotation. Meanwhile, curating large annotated datasets for medical images requires a high level of expertise, which is time-consuming and expensive. Recently, to quench the thirst for large data sets… ▽ More Supervised deep learning offers great promise to automate analysis of medical images from segmentation to diagnosis. However, their performance highly relies on the quality and quantity of the data annotation. Meanwhile, curating large annotated datasets for medical images requires a high level of expertise, which is time-consuming and expensive. Recently, to quench the thirst for large data sets with high-quality annotation, self-supervised learning (SSL) methods using unlabeled domain-specific data, have attracted attention. Therefore, designing an SSL method that relies on minimal quantities of labeled data has far-reaching significance in medical images. This paper investigates the feasibility of deploying the Masked Autoencoder for SSL (SSL-MAE) of TransUNet, for segmenting bony regions from children's wrist ultrasound scans. We found that changing the embedding and loss function in SSL-MAE can produce better downstream results compared to the original SSL-MAE. In addition, we determined that only pretraining TransUNet embedding and encoder with SSL-MAE does not work as well as TransUNet without SSL-MAE pretraining on downstream segmentation tasks. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2308.08576 [pdf]

Artistic control over the glitch in AI-generated motion capture

Authors: Jamal Knight, Andrew Johnston, Adam Berry

Abstract: Artificial intelligence (AI) models are prevalent today and provide a valuable tool for artists. However, a lesser-known artifact that comes with AI models that is not always discussed is the glitch. Glitches occur for various reasons; sometimes, they are known, and sometimes they are a mystery. Artists who use AI models to generate art might not understand the reason for the glitch but often want… ▽ More Artificial intelligence (AI) models are prevalent today and provide a valuable tool for artists. However, a lesser-known artifact that comes with AI models that is not always discussed is the glitch. Glitches occur for various reasons; sometimes, they are known, and sometimes they are a mystery. Artists who use AI models to generate art might not understand the reason for the glitch but often want to experiment and explore novel ways of augmenting the output of the glitch. This paper discusses some of the questions artists have when leveraging the glitch in AI art production. It explores the unexpected positive outcomes produced by glitches in the specific context of motion capture and performance art. △ Less

Submitted 15 August, 2023; originally announced August 2023.

arXiv:2304.04640 [pdf, other]

NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems

Authors: Jason Yik, Korneel Van den Berghe, Douwe den Blanken, Younes Bouhadjar, Maxime Fabre, Paul Hueber, Denis Kleyko, Noah Pacik-Nelson, Pao-Sheng Vincent Sun, Guangzhi Tang, Shenqi Wang, Biyan Zhou, Soikat Hasan Ahmed, George Vathakkattil Joseph, Benedetto Leto, Aurora Micheli, Anurag Kumar Mishra, Gregor Lenz, Tao Sun, Zergham Ahmed, Mahmoud Akl, Brian Anderson, Andreas G. Andreou, Chiara Bartolozzi, Arindam Basu , et al. (73 additional authors not shown)

Abstract: Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neu… ▽ More Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neuromorphic computing benchmark efforts have not seen widespread adoption due to a lack of inclusive, actionable, and iterative benchmark design and guidelines. To address these shortcomings, we present NeuroBench: a benchmark framework for neuromorphic computing algorithms and systems. NeuroBench is a collaboratively-designed effort from an open community of nearly 100 co-authors across over 50 institutions in industry and academia, aiming to provide a representative structure for standardizing the evaluation of neuromorphic approaches. The NeuroBench framework introduces a common set of tools and systematic methodology for inclusive benchmark measurement, delivering an objective reference framework for quantifying neuromorphic approaches in both hardware-independent (algorithm track) and hardware-dependent (system track) settings. In this article, we present initial performance baselines across various model architectures on the algorithm track and outline the system track benchmark tasks and guidelines. NeuroBench is intended to continually expand its benchmarks and features to foster and track the progress made by the research community. △ Less

Submitted 17 January, 2024; v1 submitted 10 April, 2023; originally announced April 2023.

Comments: Updated from whitepaper to full perspective article preprint

arXiv:2212.01232 [pdf, other]

Loss shaping enhances exact gradient learning with EventProp in Spiking Neural Networks

Authors: Thomas Nowotny, James P. Turner, James C. Knight

Abstract: Event-based machine learning promises more energy-efficient AI on future neuromorphic hardware. Here, we investigate how the recently discovered Eventprop algorithm for gradient descent on exact gradients in spiking neural networks can be scaled up to challenging keyword recognition benchmarks. We implemented Eventprop in the GPU-enhanced Neural Networks framework and used it for training recurren… ▽ More Event-based machine learning promises more energy-efficient AI on future neuromorphic hardware. Here, we investigate how the recently discovered Eventprop algorithm for gradient descent on exact gradients in spiking neural networks can be scaled up to challenging keyword recognition benchmarks. We implemented Eventprop in the GPU-enhanced Neural Networks framework and used it for training recurrent spiking neural networks on the Spiking Heidelberg Digits and Spiking Speech Commands datasets. We found that learning depended strongly on the loss function and extended Eventprop to a wider class of loss functions to enable effective training. When combined with the right additional mechanisms from the machine learning toolbox, Eventprop networks achieved state-of-the-art performance on Spiking Heidelberg Digits and good accuracy on Spiking Speech Commands. This work is a significant step towards a low-power neuromorphic alternative to current machine learning paradigms. △ Less

Submitted 2 June, 2024; v1 submitted 2 December, 2022; originally announced December 2022.

Comments: 23 pages, 7 figures, 5 tables

MSC Class: 68T05 (Primary) 68T10; 68T07 (Secondary) ACM Class: I.2; I.2.6; I.5.1

arXiv:2205.07554 [pdf, other]

doi 10.1051/0004-6361/202243311

Towards on-sky adaptive optics control using reinforcement learning

Authors: J. Nousiainen, C. Rajani, M. Kasper, T. Helin, S. Y. Haffert, C. Vérinaud, J. R. Males, K. Van Gorkom, L. M. Close, J. D. Long, A. D. Hedglen, O. Guyon, L. Schatz, M. Kautz, J. Lumbres, A. Rodack, J. M. Knight, K. Miller

Abstract: The direct imaging of potentially habitable Exoplanets is one prime science case for the next generation of high contrast imaging instruments on ground-based extremely large telescopes. To reach this demanding science goal, the instruments are equipped with eXtreme Adaptive Optics (XAO) systems which will control thousands of actuators at a framerate of kilohertz to several kilohertz. Most of the… ▽ More The direct imaging of potentially habitable Exoplanets is one prime science case for the next generation of high contrast imaging instruments on ground-based extremely large telescopes. To reach this demanding science goal, the instruments are equipped with eXtreme Adaptive Optics (XAO) systems which will control thousands of actuators at a framerate of kilohertz to several kilohertz. Most of the habitable exoplanets are located at small angular separations from their host stars, where the current XAO systems' control laws leave strong residuals.Current AO control strategies like static matrix-based wavefront reconstruction and integrator control suffer from temporal delay error and are sensitive to mis-registration, i.e., to dynamic variations of the control system geometry. We aim to produce control methods that cope with these limitations, provide a significantly improved AO correction and, therefore, reduce the residual flux in the coronagraphic point spread function. We extend previous work in Reinforcement Learning for AO. The improved method, called PO4AO, learns a dynamics model and optimizes a control neural network, called a policy. We introduce the method and study it through numerical simulations of XAO with Pyramid wavefront sensing for the 8-m and 40-m telescope aperture cases. We further implemented PO4AO and carried out experiments in a laboratory environment using MagAO-X at the Steward laboratory. PO4AO provides the desired performance by improving the coronagraphic contrast in numerical simulations by factors 3-5 within the control region of DM and Pyramid WFS, in simulation and in the laboratory. The presented method is also quick to train, i.e., on timescales of typically 5-10 seconds, and the inference time is sufficiently small (< ms) to be used in real-time control for XAO with currently available hardware even for extremely large telescopes. △ Less

Submitted 16 May, 2022; originally announced May 2022.

Journal ref: A&A 664, A71 (2022)

arXiv:2110.15866 [pdf, other]

Towards Comparative Physical Interpretation of Spatial Variability Aware Neural Networks: A Summary of Results

Authors: Jayant Gupta, Carl Molnar, Gaoxiang Luo, Joe Knight, Shashi Shekhar

Abstract: Given Spatial Variability Aware Neural Networks (SVANNs), the goal is to investigate mathematical (or computational) models for comparative physical interpretation towards their transparency (e.g., simulatibility, decomposability and algorithmic transparency). This problem is important due to important use-cases such as reusability, debugging, and explainability to a jury in a court of law. Challe… ▽ More Given Spatial Variability Aware Neural Networks (SVANNs), the goal is to investigate mathematical (or computational) models for comparative physical interpretation towards their transparency (e.g., simulatibility, decomposability and algorithmic transparency). This problem is important due to important use-cases such as reusability, debugging, and explainability to a jury in a court of law. Challenges include a large number of model parameters, vacuous bounds on generalization performance of neural networks, risk of overfitting, sensitivity to noise, etc., which all detract from the ability to interpret the models. Related work on either model-specific or model-agnostic post-hoc interpretation is limited due to a lack of consideration of physical constraints (e.g., mass balance) and properties (e.g., second law of geography). This work investigates physical interpretation of SVANNs using novel comparative approaches based on geographically heterogeneous features. The proposed approach on feature-based physical interpretation is evaluated using a case-study on wetland mapping. The proposed physical interpretation improves the transparency of SVANN models and the analytical results highlight the trade-off between model transparency and model performance (e.g., F1-score). We also describe an interpretation based on geographically heterogeneous processes modeled as partial differential equations (PDEs). △ Less

Submitted 29 October, 2021; originally announced October 2021.

Comments: Submission to SIGSPATIAL 2021 peer review process. The document contains 12 pages (including 2 pages of appendix)

arXiv:2109.06346 [pdf, other]

Physics Driven Domain Specific Transporter Framework with Attention Mechanism for Ultrasound Imaging

Authors: Arpan Tripathi, Abhilash Rakkunedeth, Mahesh Raveendranatha Panicker, Jack Zhang, Naveenjyote Boora, Jessica Knight, Jacob Jaremko, Yale Tung Chen, Kiran Vishnu Narayan, Kesavadas C

Abstract: Most applications of deep learning techniques in medical imaging are supervised and require a large number of labeled data which is expensive and requires many hours of careful annotation by experts. In this paper, we propose an unsupervised, physics driven domain specific transporter framework with an attention mechanism to identify relevant key points with applications in ultrasound imaging. The… ▽ More Most applications of deep learning techniques in medical imaging are supervised and require a large number of labeled data which is expensive and requires many hours of careful annotation by experts. In this paper, we propose an unsupervised, physics driven domain specific transporter framework with an attention mechanism to identify relevant key points with applications in ultrasound imaging. The proposed framework identifies key points that provide a concise geometric representation highlighting regions with high structural variation in ultrasound videos. We incorporate physics driven domain specific information as a feature probability map and use the radon transform to highlight features in specific orientations. The proposed framework has been trained on130 Lung ultrasound (LUS) videos and 113 Wrist ultrasound (WUS) videos and validated on 100 Lung ultrasound (LUS) videos and 58 Wrist ultrasound (WUS) videos acquired from multiple centers across the globe. Images from both datasets were independently assessed by experts to identify clinically relevant features such as A-lines, B-lines and pleura from LUS and radial metaphysis, radial epiphysis and carpal bones from WUS videos. The key points detected from both datasets showed high sensitivity (LUS = 99\% , WUS = 74\%) in detecting the image landmarks identified by experts. Also, on employing for classification of the given lung image into normal and abnormal classes, the proposed approach, even with no prior training, achieved an average accuracy of 97\% and an average F1-score of 95\% respectively on the task of co-classification with 3 fold cross-validation. With the purely unsupervised nature of the proposed approach, we expect the key point detection approach to increase the applicability of ultrasound in various examination performed in emergency and point of care. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Comments: 11 pages,18 figures(including supplementary material)

arXiv:1910.08526 [pdf, other]

A flexible integer linear programming formulation for scheduling clinician on-call service in hospitals

Authors: David Landsman, Huiting Ma, Jesse Knight, Kevin Gough, Sharmistha Mishra

Abstract: Scheduling of personnel in a hospital environment is vital to improving the service provided to patients and balancing the workload assigned to clinicians. Many approaches have been tried and successfully applied to generate efficient schedules in such settings. However, due to the computational complexity of the scheduling problem in general, most approaches resort to heuristics to find a non-opt… ▽ More Scheduling of personnel in a hospital environment is vital to improving the service provided to patients and balancing the workload assigned to clinicians. Many approaches have been tried and successfully applied to generate efficient schedules in such settings. However, due to the computational complexity of the scheduling problem in general, most approaches resort to heuristics to find a non-optimal solution in a reasonable amount of time. We designed an integer linear programming formulation to find an optimal schedule in a clinical division of a hospital. Our formulation mitigates issues related to computational complexity by minimizing the set of constraints, yet retains sufficient flexibility so that it can be adapted to a variety of clinical divisions. We then conducted a case study for our approach using data from the Infectious Diseases division at St. Michael's Hospital in Toronto, Canada. We analyzed and compared the results of our approach to manually-created schedules at the hospital, and found improved adherence to departmental constraints and clinician preferences. We used simulated data to examine the sensitivity of the runtime of our linear program for various parameters and observed reassuring results, signifying the practicality and generalizability of our approach in different real-world scenarios. △ Less

Submitted 18 October, 2019; originally announced October 2019.

arXiv:1904.03257 [pdf, ps, other]

MLSys: The New Frontier of Machine Learning Systems

Authors: Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood , et al. (44 additional authors not shown)

Abstract: Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a ne… ▽ More Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a new systems machine learning research community at the intersection of the traditional systems and ML communities, focused on topics such as hardware systems for ML, software systems for ML, and ML optimized for metrics beyond predictive accuracy. To do this, we describe a new conference, MLSys, that explicitly targets research at the intersection of systems and machine learning with a program committee split evenly between experts in systems and ML, and an explicit focus on topics at the intersection of the two. △ Less

Submitted 1 December, 2019; v1 submitted 29 March, 2019; originally announced April 2019.

arXiv:1904.00682 [pdf, other]

doi 10.1109/TMI.2019.2905770

Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge

Authors: Hugo J. Kuijf, J. Matthijs Biesbroek, Jeroen de Bresser, Rutger Heinen, Simon Andermatt, Mariana Bento, Matt Berseth, Mikhail Belyaev, M. Jorge Cardoso, Adrià Casamitjana, D. Louis Collins, Mahsa Dadar, Achilleas Georgiou, Mohsen Ghafoorian, Dakai Jin, April Khademi, Jesse Knight, Hongwei Li, Xavier Lladó, Miguel Luna, Qaiser Mahmood, Richard McKinley, Alireza Mehrtash, Sébastien Ourselin, Bo-yong Park , et al. (19 additional authors not shown)

Abstract: Quantification of cerebral white matter hyperintensities (WMH) of presumed vascular origin is of key importance in many neurological research studies. Currently, measurements are often still obtained from manual segmentations on brain MR images, which is a laborious procedure. Automatic WMH segmentation methods exist, but a standardized comparison of the performance of such methods is lacking. We… ▽ More Quantification of cerebral white matter hyperintensities (WMH) of presumed vascular origin is of key importance in many neurological research studies. Currently, measurements are often still obtained from manual segmentations on brain MR images, which is a laborious procedure. Automatic WMH segmentation methods exist, but a standardized comparison of the performance of such methods is lacking. We organized a scientific challenge, in which developers could evaluate their method on a standardized multi-center/-scanner image dataset, giving an objective comparison: the WMH Segmentation Challenge (https://wmh.isi.uu.nl/). Sixty T1+FLAIR images from three MR scanners were released with manual WMH segmentations for training. A test set of 110 images from five MR scanners was used for evaluation. Segmentation methods had to be containerized and submitted to the challenge organizers. Five evaluation metrics were used to rank the methods: (1) Dice similarity coefficient, (2) modified Hausdorff distance (95th percentile), (3) absolute log-transformed volume difference, (4) sensitivity for detecting individual lesions, and (5) F1-score for individual lesions. Additionally, methods were ranked on their inter-scanner robustness. Twenty participants submitted their method for evaluation. This paper provides a detailed analysis of the results. In brief, there is a cluster of four methods that rank significantly better than the other methods, with one clear winner. The inter-scanner robustness ranking shows that not all methods generalize to unseen scanners. The challenge remains open for future submissions and provides a public platform for method evaluation. △ Less

Submitted 1 April, 2019; originally announced April 2019.

Comments: Accepted for publication in IEEE Transactions on Medical Imaging

arXiv:1810.09958 [pdf, other]

ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler

Authors: Matthew Sotoudeh, Anand Venkat, Michael Anderson, Evangelos Georganas, Alexander Heinecke, Jason Knight

Abstract: Domain specific accelerators present new challenges and opportunities for code generation onto novel instruction sets, communication fabrics, and memory architectures. In this paper we introduce an intermediate representation (IR) which enables both deep learning computational kernels and hardware capabilities to be described in the same IR. We then formulate and apply instruction mapping to det… ▽ More Domain specific accelerators present new challenges and opportunities for code generation onto novel instruction sets, communication fabrics, and memory architectures. In this paper we introduce an intermediate representation (IR) which enables both deep learning computational kernels and hardware capabilities to be described in the same IR. We then formulate and apply instruction mapping to determine the possible ways a computation can be performed on a hardware system. Next, our scheduler chooses a specific mapping and determines the data movement and computation order. In order to manage the large search space of mappings and schedules, we developed a flexible framework that allows heuristics, cost models, and potentially machine learning to facilitate this search problem. With this system, we demonstrate the automated extraction of matrix multiplication kernels out of recent deep learning kernels such as depthwise-separable convolution. In addition, we demonstrate two to five times better performance on DeepBench sized GEMMs and GRU RNN execution when compared to state-of-the-art (SOTA) implementations on new hardware and up to 85% of the performance for SOTA implementations on existing hardware. △ Less

Submitted 12 October, 2018; originally announced October 2018.

arXiv:1801.08058 [pdf, other]

Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning

Authors: Scott Cyphers, Arjun K. Bansal, Anahita Bhiwandiwalla, Jayaram Bobba, Matthew Brookhart, Avijit Chakraborty, Will Constable, Christian Convey, Leona Cook, Omar Kanawi, Robert Kimball, Jason Knight, Nikolay Korovaiko, Varun Kumar, Yixing Lao, Christopher R. Lishka, Jaikrishnan Menon, Jennifer Myers, Sandeep Aswath Narayana, Adam Procter, Tristan J. Webb

Abstract: The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new topology remains challenging, as each requires some level of manual effort. This issue is compounded by the proliferation of frameworks and hardware platforms. The current approach, which we call "direct optimization", requires deep changes within each framework to improve the tr… ▽ More The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new topology remains challenging, as each requires some level of manual effort. This issue is compounded by the proliferation of frameworks and hardware platforms. The current approach, which we call "direct optimization", requires deep changes within each framework to improve the training performance for each hardware backend (CPUs, GPUs, FPGAs, ASICs) and requires $\mathcal{O}(fp)$ effort; where $f$ is the number of frameworks and $p$ is the number of platforms. While optimized kernels for deep-learning primitives are provided via libraries like Intel Math Kernel Library for Deep Neural Networks (MKL-DNN), there are several compiler-inspired ways in which performance can be further optimized. Building on our experience creating neon (a fast deep learning library on GPUs), we developed Intel nGraph, a soon to be open-sourced C++ library to simplify the realization of optimized deep learning performance across frameworks and hardware platforms. Initially-supported frameworks include TensorFlow, MXNet, and Intel neon framework. Initial backends are Intel Architecture CPUs (CPU), the Intel(R) Nervana Neural Network Processor(R) (NNP), and NVIDIA GPUs. Currently supported compiler optimizations include efficient memory management and data layout abstraction. In this paper, we describe our overall architecture and its core components. In the future, we envision extending nGraph API support to a wider range of frameworks, hardware (including FPGAs and ASICs), and compiler optimizations (training versus inference optimizations, multi-node and multi-device scaling via efficient sub-graph partitioning, and HW-specific compounding of operations). △ Less

Submitted 29 January, 2018; v1 submitted 24 January, 2018; originally announced January 2018.

Showing 1–15 of 15 results for author: Knight, J