Search | arXiv e-print repository

Part-based Quantitative Analysis for Heatmaps

Authors: Osman Tursun, Sinan Kalkan, Simon Denman, Sridha Sridharan, Clinton Fookes

Abstract: Heatmaps have been instrumental in helping understand deep network decisions, and are a common approach for Explainable AI (XAI). While significant progress has been made in enhancing the informativeness and accessibility of heatmaps, heatmap analysis is typically very subjective and limited to domain experts. As such, developing automatic, scalable, and numerical analysis methods to make heatmap-… ▽ More Heatmaps have been instrumental in helping understand deep network decisions, and are a common approach for Explainable AI (XAI). While significant progress has been made in enhancing the informativeness and accessibility of heatmaps, heatmap analysis is typically very subjective and limited to domain experts. As such, developing automatic, scalable, and numerical analysis methods to make heatmap-based XAI more objective, end-user friendly, and cost-effective is vital. In addition, there is a need for comprehensive evaluation metrics to assess heatmap quality at a granular level. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2403.15717 [pdf, other]

Ev-Edge: Efficient Execution of Event-based Vision Algorithms on Commodity Edge Platforms

Authors: Shrihari Sridharan, Surya Selvam, Kaushik Roy, Anand Raghunathan

Abstract: Event cameras have emerged as a promising sensing modality for autonomous navigation systems, owing to their high temporal resolution, high dynamic range and negligible motion blur. To process the asynchronous temporal event streams from such sensors, recent research has shown that a mix of Artificial Neural Networks (ANNs), Spiking Neural Networks (SNNs) as well as hybrid SNN-ANN algorithms are n… ▽ More Event cameras have emerged as a promising sensing modality for autonomous navigation systems, owing to their high temporal resolution, high dynamic range and negligible motion blur. To process the asynchronous temporal event streams from such sensors, recent research has shown that a mix of Artificial Neural Networks (ANNs), Spiking Neural Networks (SNNs) as well as hybrid SNN-ANN algorithms are necessary to achieve high accuracies across a range of perception tasks. However, we observe that executing such workloads on commodity edge platforms which feature heterogeneous processing elements such as CPUs, GPUs and neural accelerators results in inferior performance. This is due to the mismatch between the irregular nature of event streams and diverse characteristics of algorithms on the one hand and the underlying hardware platform on the other. We propose Ev-Edge, a framework that contains three key optimizations to boost the performance of event-based vision systems on edge platforms: (1) An Event2Sparse Frame converter directly transforms raw event streams into sparse frames, enabling the use of sparse libraries with minimal encoding overheads (2) A Dynamic Sparse Frame Aggregator merges sparse frames at runtime by trading off the temporal granularity of events and computational demand thereby improving hardware utilization (3) A Network Mapper maps concurrently executing tasks to different processing elements while also selecting layer precision by considering both compute and communication overheads. On several state-of-art networks for a range of autonomous navigation tasks, Ev-Edge achieves 1.28x-2.05x improvements in latency and 1.23x-2.15x in energy over an all-GPU implementation on the NVIDIA Jetson Xavier AGX platform for single-task execution scenarios. Ev-Edge also achieves 1.43x-1.81x latency improvements over round-robin scheduling methods in multi-task execution scenarios. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2401.02634 [pdf, other]

AG-ReID.v2: Bridging Aerial and Ground Views for Person Re-identification

Authors: Huy Nguyen, Kien Nguyen, Sridha Sridharan, Clinton Fookes

Abstract: Aerial-ground person re-identification (Re-ID) presents unique challenges in computer vision, stemming from the distinct differences in viewpoints, poses, and resolutions between high-altitude aerial and ground-based cameras. Existing research predominantly focuses on ground-to-ground matching, with aerial matching less explored due to a dearth of comprehensive datasets. To address this, we introd… ▽ More Aerial-ground person re-identification (Re-ID) presents unique challenges in computer vision, stemming from the distinct differences in viewpoints, poses, and resolutions between high-altitude aerial and ground-based cameras. Existing research predominantly focuses on ground-to-ground matching, with aerial matching less explored due to a dearth of comprehensive datasets. To address this, we introduce AG-ReID.v2, a dataset specifically designed for person Re-ID in mixed aerial and ground scenarios. This dataset comprises 100,502 images of 1,615 unique individuals, each annotated with matching IDs and 15 soft attribute labels. Data were collected from diverse perspectives using a UAV, stationary CCTV, and smart glasses-integrated camera, providing a rich variety of intra-identity variations. Additionally, we have developed an explainable attention network tailored for this dataset. This network features a three-stream architecture that efficiently processes pairwise image distances, emphasizes key top-down features, and adapts to variations in appearance due to altitude differences. Comparative evaluations demonstrate the superiority of our approach over existing baselines. We plan to release the dataset and algorithm source code publicly, aiming to advance research in this specialized field of computer vision. For access, please visit https://github.com/huynguyen792/AG-ReID.v2. △ Less

Submitted 7 April, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

Comments: 13 pages, Accepted by TIFS 2023

arXiv:2312.15364 [pdf, other]

WildScenes: A Benchmark for 2D and 3D Semantic Segmentation in Large-scale Natural Environments

Authors: Kavisha Vidanapathirana, Joshua Knights, Stephen Hausler, Mark Cox, Milad Ramezani, Jason Jooste, Ethan Griffiths, Shaheer Mohamed, Sridha Sridharan, Clinton Fookes, Peyman Moghadam

Abstract: Recent progress in semantic scene understanding has primarily been enabled by the availability of semantically annotated bi-modal (camera and lidar) datasets in urban environments. However, such annotated datasets are also needed for natural, unstructured environments to enable semantic perception for applications, including conservation, search and rescue, environment monitoring, and agricultural… ▽ More Recent progress in semantic scene understanding has primarily been enabled by the availability of semantically annotated bi-modal (camera and lidar) datasets in urban environments. However, such annotated datasets are also needed for natural, unstructured environments to enable semantic perception for applications, including conservation, search and rescue, environment monitoring, and agricultural automation. Therefore, we introduce WildScenes, a bi-modal benchmark dataset consisting of multiple large-scale traversals in natural environments, including semantic annotations in high-resolution 2D images and dense 3D lidar point clouds, and accurate 6-DoF pose information. The data is (1) trajectory-centric with accurate localization and globally aligned point clouds, (2) calibrated and synchronized to support bi-modal inference, and (3) containing different natural environments over 6 months to support research on domain adaptation. Our 3D semantic labels are obtained via an efficient automated process that transfers the human-annotated 2D labels from multiple views into 3D point clouds, thus circumventing the need for expensive and time-consuming human annotation in 3D. We introduce benchmarks on 2D and 3D semantic segmentation and evaluate a variety of recent deep-learning techniques to demonstrate the challenges in semantic segmentation in natural environments. We propose train-val-test splits for standard benchmarks as well as domain adaptation benchmarks and utilize an automated split generation technique to ensure the balance of class label distributions. The data, evaluation scripts and pretrained models will be released upon acceptance at https://csiro-robotics.github.io/WildScenes. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: Under review. The first 3 authors contributed equally

arXiv:2309.12631 [pdf, other]

Learning the eigenstructure of quantum dynamics using classical shadows

Authors: Atithi Acharya, Siddhartha Saha, Shagesh Sridharan, Yanis Bahroun, Anirvan M. Sengupta

Abstract: Learning dynamics from repeated observation of the time evolution of an open quantum system, namely, the problem of quantum process tomography is an important task. This task is difficult in general, but, with some additional constraints could be tractable. This motivates us to look at the problem of Lindblad operator discovery from observations. We point out that for moderate size Hilbert spaces,… ▽ More Learning dynamics from repeated observation of the time evolution of an open quantum system, namely, the problem of quantum process tomography is an important task. This task is difficult in general, but, with some additional constraints could be tractable. This motivates us to look at the problem of Lindblad operator discovery from observations. We point out that for moderate size Hilbert spaces, low Kraus rank of the channel, and short time steps, the eigenvalues of the Choi matrix corresponding to the channel have a special structure. We use the least-square method for the estimation of a channel where, for fixed inputs, we estimate the outputs by classical shadows. The resultant noisy estimate of the channel can then be denoised by diagonalizing the nominal Choi matrix, truncating some eigenvalues, and altering it to a genuine Choi matrix. This processed Choi matrix is then compared to the original one. We see that as the number of samples increases, our reconstruction becomes more accurate. We also use tools from random matrix theory to understand the effect of estimation noise in the eigenspectrum of the estimated Choi matrix. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.09431 [pdf, other]

FactoFormer: Factorized Hyperspectral Transformers with Self-Supervised Pretraining

Authors: Shaheer Mohamed, Maryam Haghighat, Tharindu Fernando, Sridha Sridharan, Clinton Fookes, Peyman Moghadam

Abstract: Hyperspectral images (HSIs) contain rich spectral and spatial information. Motivated by the success of transformers in the field of natural language processing and computer vision where they have shown the ability to learn long range dependencies within input data, recent research has focused on using transformers for HSIs. However, current state-of-the-art hyperspectral transformers only tokenize… ▽ More Hyperspectral images (HSIs) contain rich spectral and spatial information. Motivated by the success of transformers in the field of natural language processing and computer vision where they have shown the ability to learn long range dependencies within input data, recent research has focused on using transformers for HSIs. However, current state-of-the-art hyperspectral transformers only tokenize the input HSI sample along the spectral dimension, resulting in the under-utilization of spatial information. Moreover, transformers are known to be data-hungry and their performance relies heavily on large-scale pretraining, which is challenging due to limited annotated hyperspectral data. Therefore, the full potential of HSI transformers has not been fully realized. To overcome these limitations, we propose a novel factorized spectral-spatial transformer that incorporates factorized self-supervised pretraining procedures, leading to significant improvements in performance. The factorization of the inputs allows the spectral and spatial transformers to better capture the interactions within the hyperspectral data cubes. Inspired by masked image modeling pretraining, we also devise efficient masking strategies for pretraining each of the spectral and spatial transformers. We conduct experiments on six publicly available datasets for HSI classification task and demonstrate that our model achieves state-of-the-art performance in all the datasets. The code for our model will be made available at https://github.com/csiro-robotics/factoformer. △ Less

Submitted 3 January, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

Comments: Accepted to IEEE Transactions on Geoscience and Remote Sensing in December 2023

arXiv:2308.08731 [pdf, other]

Learning Through Guidance: Knowledge Distillation for Endoscopic Image Classification

Authors: Harshala Gammulle, Yubo Chen, Sridha Sridharan, Travis Klein, Clinton Fookes

Abstract: Endoscopy plays a major role in identifying any underlying abnormalities within the gastrointestinal (GI) tract. There are multiple GI tract diseases that are life-threatening, such as precancerous lesions and other intestinal cancers. In the usual process, a diagnosis is made by a medical expert which can be prone to human errors and the accuracy of the test is also entirely dependent on the expe… ▽ More Endoscopy plays a major role in identifying any underlying abnormalities within the gastrointestinal (GI) tract. There are multiple GI tract diseases that are life-threatening, such as precancerous lesions and other intestinal cancers. In the usual process, a diagnosis is made by a medical expert which can be prone to human errors and the accuracy of the test is also entirely dependent on the expert's level of experience. Deep learning, specifically Convolution Neural Networks (CNNs) which are designed to perform automatic feature learning without any prior feature engineering, has recently reported great benefits for GI endoscopy image analysis. Previous research has developed models that focus only on improving performance, as such, the majority of introduced models contain complex deep network architectures with a large number of parameters that require longer training times. However, there is a lack of focus on developing lightweight models which can run in low-resource environments, which are typically encountered in medical clinics. We investigate three KD-based learning frameworks, response-based, feature-based, and relation-based mechanisms, and introduce a novel multi-head attention-based feature fusion mechanism to support relation-based learning. Compared to the existing relation-based methods that follow simplistic aggregation techniques of multi-teacher response/feature-based knowledge, we adopt the multi-head attention technique to provide flexibility towards localising and transferring important details from each teacher to better guide the student. We perform extensive evaluations on two widely used public datasets, KVASIR-V2 and Hyper-KVASIR, and our experimental results signify the merits of our proposed relation-based framework in achieving an improved lightweight model (only 51.8k trainable parameters) that can run in a resource-limited environment. △ Less

Submitted 16 August, 2023; originally announced August 2023.

arXiv:2308.04638 [pdf, other]

GeoAdapt: Self-Supervised Test-Time Adaptation in LiDAR Place Recognition Using Geometric Priors

Authors: Joshua Knights, Stephen Hausler, Sridha Sridharan, Clinton Fookes, Peyman Moghadam

Abstract: LiDAR place recognition approaches based on deep learning suffer from significant performance degradation when there is a shift between the distribution of training and test datasets, often requiring re-training the networks to achieve peak performance. However, obtaining accurate ground truth data for new training data can be prohibitively expensive, especially in complex or GPS-deprived environm… ▽ More LiDAR place recognition approaches based on deep learning suffer from significant performance degradation when there is a shift between the distribution of training and test datasets, often requiring re-training the networks to achieve peak performance. However, obtaining accurate ground truth data for new training data can be prohibitively expensive, especially in complex or GPS-deprived environments. To address this issue we propose GeoAdapt, which introduces a novel auxiliary classification head to generate pseudo-labels for re-training on unseen environments in a self-supervised manner. GeoAdapt uses geometric consistency as a prior to improve the robustness of our generated pseudo-labels against domain shift, improving the performance and reliability of our Test-Time Adaptation approach. Comprehensive experiments show that GeoAdapt significantly boosts place recognition performance across moderate to severe domain shifts, and is competitive with fully supervised test-time adaptation approaches. Our code is available at https://github.com/csiro-robotics/GeoAdapt. △ Less

Submitted 28 November, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

Comments: Accepted to IEEE Robotics and Automation Letters (RA-L) November 2023

arXiv:2308.02427 [pdf, other]

Unlocking the Potential of Similarity Matching: Scalability, Supervision and Pre-training

Authors: Yanis Bahroun, Shagesh Sridharan, Atithi Acharya, Dmitri B. Chklovskii, Anirvan M. Sengupta

Abstract: While effective, the backpropagation (BP) algorithm exhibits limitations in terms of biological plausibility, computational cost, and suitability for online learning. As a result, there has been a growing interest in developing alternative biologically plausible learning approaches that rely on local learning rules. This study focuses on the primarily unsupervised similarity matching (SM) framewor… ▽ More While effective, the backpropagation (BP) algorithm exhibits limitations in terms of biological plausibility, computational cost, and suitability for online learning. As a result, there has been a growing interest in developing alternative biologically plausible learning approaches that rely on local learning rules. This study focuses on the primarily unsupervised similarity matching (SM) framework, which aligns with observed mechanisms in biological systems and offers online, localized, and biologically plausible algorithms. i) To scale SM to large datasets, we propose an implementation of Convolutional Nonnegative SM using PyTorch. ii) We introduce a localized supervised SM objective reminiscent of canonical correlation analysis, facilitating stacking SM layers. iii) We leverage the PyTorch implementation for pre-training architectures such as LeNet and compare the evaluation of features against BP-trained models. This work combines biologically plausible algorithms with computational efficiency opening multiple avenues for further explorations. △ Less

Submitted 2 August, 2023; originally announced August 2023.

arXiv:2307.03388 [pdf, other]

General-Purpose Multimodal Transformer meets Remote Sensing Semantic Segmentation

Authors: Nhi Kieu, Kien Nguyen, Sridha Sridharan, Clinton Fookes

Abstract: The advent of high-resolution multispectral/hyperspectral sensors, LiDAR DSM (Digital Surface Model) information and many others has provided us with an unprecedented wealth of data for Earth Observation. Multimodal AI seeks to exploit those complementary data sources, particularly for complex tasks like semantic segmentation. While specialized architectures have been developed, they are highly co… ▽ More The advent of high-resolution multispectral/hyperspectral sensors, LiDAR DSM (Digital Surface Model) information and many others has provided us with an unprecedented wealth of data for Earth Observation. Multimodal AI seeks to exploit those complementary data sources, particularly for complex tasks like semantic segmentation. While specialized architectures have been developed, they are highly complicated via significant effort in model design, and require considerable re-engineering whenever a new modality emerges. Recent trends in general-purpose multimodal networks have shown great potential to achieve state-of-the-art performance across multiple multimodal tasks with one unified architecture. In this work, we investigate the performance of PerceiverIO, one in the general-purpose multimodal family, in the remote sensing semantic segmentation domain. Our experiments reveal that this ostensibly universal network struggles with object scale variation in remote sensing images and fails to detect the presence of cars from a top-down view. To address these issues, even with extreme class imbalance issues, we propose a spatial and volumetric learning component. Specifically, we design a UNet-inspired module that employs 3D convolution to encode vital local information and learn cross-modal features simultaneously, while reducing network computational burden via the cross-attention mechanism of PerceiverIO. The effectiveness of the proposed component is validated through extensive experiments comparing it with other methods such as 2D convolution, and dual local module (\ie the combination of Conv2D 1x1 and Conv2D 3x3 inspired by UNetFormer). The proposed method achieves competitive results with specialized architectures like UNetFormer and SwinUNet, showing its potential to minimize network architecture engineering with a minimal compromise on the performance. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: Accepted to CVPR Workshop on Multimodal Learning for Earth and Environment 2023

arXiv:2305.14516 [pdf, other]

Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

Authors: Srinivas Sridharan, Taekyung Heo, Louis Feng, Zhaodong Wang, Matt Bergeron, Wenyin Fu, Shengbao Zheng, Brian Coutinho, Saeed Rashidi, Changhai Man, Tushar Krishna

Abstract: Benchmarking and co-design are essential for driving optimizations and innovation around ML models, ML software, and next-generation hardware. Full workload benchmarks, e.g. MLPerf, play an essential role in enabling fair comparison across different software and hardware stacks especially once systems are fully designed and deployed. However, the pace of AI innovation demands a more agile methodol… ▽ More Benchmarking and co-design are essential for driving optimizations and innovation around ML models, ML software, and next-generation hardware. Full workload benchmarks, e.g. MLPerf, play an essential role in enabling fair comparison across different software and hardware stacks especially once systems are fully designed and deployed. However, the pace of AI innovation demands a more agile methodology to benchmark creation and usage by simulators and emulators for future system co-design. We propose Chakra, an open graph schema for standardizing workload specification capturing key operations and dependencies, also known as Execution Trace (ET). In addition, we propose a complementary set of tools/capabilities to enable collection, generation, and adoption of Chakra ETs by a wide range of simulators, emulators, and benchmarks. For instance, we use generative AI models to learn latent statistical properties across thousands of Chakra ETs and use these models to synthesize Chakra ETs. These synthetic ETs can obfuscate key proprietary information and also target future what-if scenarios. As an example, we demonstrate an end-to-end proof-of-concept that converts PyTorch ETs to Chakra ETs and uses this to drive an open-source training system simulator (ASTRA-sim). Our end-goal is to build a vibrant industry-wide ecosystem of agile benchmarks and tools to drive future AI system co-design. △ Less

Submitted 26 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.11394 [pdf, other]

Remembering What Is Important: A Factorised Multi-Head Retrieval and Auxiliary Memory Stabilisation Scheme for Human Motion Prediction

Authors: Tharindu Fernando, Harshala Gammulle, Sridha Sridharan, Simon Denman, Clinton Fookes

Abstract: Humans exhibit complex motions that vary depending on the task that they are performing, the interactions they engage in, as well as subject-specific preferences. Therefore, forecasting future poses based on the history of the previous motions is a challenging task. This paper presents an innovative auxiliary-memory-powered deep neural network framework for the improved modelling of historical kno… ▽ More Humans exhibit complex motions that vary depending on the task that they are performing, the interactions they engage in, as well as subject-specific preferences. Therefore, forecasting future poses based on the history of the previous motions is a challenging task. This paper presents an innovative auxiliary-memory-powered deep neural network framework for the improved modelling of historical knowledge. Specifically, we disentangle subject-specific, task-specific, and other auxiliary information from the observed pose sequences and utilise these factorised features to query the memory. A novel Multi-Head knowledge retrieval scheme leverages these factorised feature embeddings to perform multiple querying operations over the historical observations captured within the auxiliary memory. Moreover, our proposed dynamic masking strategy makes this feature disentanglement process dynamic. Two novel loss functions are introduced to encourage diversity within the auxiliary memory while ensuring the stability of the memory contents, such that it can locate and store salient information that can aid the long-term prediction of future motion, irrespective of data imbalances or the diversity of the input data distribution. With extensive experiments conducted on two public benchmarks, Human3.6M and CMU-Mocap, we demonstrate that these design choices collectively allow the proposed approach to outperform the current state-of-the-art methods by significant margins: $>$ 17\% on the Human3.6M dataset and $>$ 9\% on the CMU-Mocap dataset. △ Less

Submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.01074 [pdf, other]

doi 10.1109/TNNLS.2023.3321432

Physical Adversarial Attacks for Surveillance: A Survey

Authors: Kien Nguyen, Tharindu Fernando, Clinton Fookes, Sridha Sridharan

Abstract: Modern automated surveillance techniques are heavily reliant on deep learning methods. Despite the superior performance, these learning systems are inherently vulnerable to adversarial attacks - maliciously crafted inputs that are designed to mislead, or trick, models into making incorrect predictions. An adversary can physically change their appearance by wearing adversarial t-shirts, glasses, or… ▽ More Modern automated surveillance techniques are heavily reliant on deep learning methods. Despite the superior performance, these learning systems are inherently vulnerable to adversarial attacks - maliciously crafted inputs that are designed to mislead, or trick, models into making incorrect predictions. An adversary can physically change their appearance by wearing adversarial t-shirts, glasses, or hats or by specific behavior, to potentially avoid various forms of detection, tracking and recognition of surveillance systems; and obtain unauthorized access to secure properties and assets. This poses a severe threat to the security and safety of modern surveillance systems. This paper reviews recent attempts and findings in learning and designing physical adversarial attacks for surveillance applications. In particular, we propose a framework to analyze physical adversarial attacks and provide a comprehensive survey of physical adversarial attacks on four key surveillance tasks: detection, identification, tracking, and action recognition under this framework. Furthermore, we review and analyze strategies to defend against the physical adversarial attacks and the methods for evaluating the strengths of the defense. The insights in this paper present an important step in building resilience within surveillance systems to physical adversarial attacks. △ Less

Submitted 14 October, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

Comments: This paper has been accepted for publication in T-NNLS

arXiv:2304.02202 [pdf, other]

Towards Self-Explainability of Deep Neural Networks with Heatmap Captioning and Large-Language Models

Authors: Osman Tursun, Simon Denman, Sridha Sridharan, Clinton Fookes

Abstract: Heatmaps are widely used to interpret deep neural networks, particularly for computer vision tasks, and the heatmap-based explainable AI (XAI) techniques are a well-researched topic. However, most studies concentrate on enhancing the quality of the generated heatmap or discovering alternate heatmap generation techniques, and little effort has been devoted to making heatmap-based XAI automatic, int… ▽ More Heatmaps are widely used to interpret deep neural networks, particularly for computer vision tasks, and the heatmap-based explainable AI (XAI) techniques are a well-researched topic. However, most studies concentrate on enhancing the quality of the generated heatmap or discovering alternate heatmap generation techniques, and little effort has been devoted to making heatmap-based XAI automatic, interactive, scalable, and accessible. To address this gap, we propose a framework that includes two modules: (1) context modelling and (2) reasoning. We proposed a template-based image captioning approach for context modelling to create text-based contextual information from the heatmap and input data. The reasoning module leverages a large language model to provide explanations in combination with specialised knowledge. Our qualitative experiments demonstrate the effectiveness of our framework and heatmap captioning approach. The code for the proposed template-based heatmap captioning approach will be publicly available. △ Less

Submitted 4 April, 2023; originally announced April 2023.

arXiv:2303.14006 [pdf, other]

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

Authors: William Won, Taekyung Heo, Saeed Rashidi, Srinivas Sridharan, Sudarshan Srinivasan, Tushar Krishna

Abstract: As deep learning models and input data are scaling at an unprecedented rate, it is inevitable to move towards distributed training platforms to fit the model and increase training throughput. State-of-the-art approaches and techniques, such as wafer-scale nodes, multi-dimensional network topologies, disaggregated memory systems, and parallelization strategies, have been actively adopted by emergin… ▽ More As deep learning models and input data are scaling at an unprecedented rate, it is inevitable to move towards distributed training platforms to fit the model and increase training throughput. State-of-the-art approaches and techniques, such as wafer-scale nodes, multi-dimensional network topologies, disaggregated memory systems, and parallelization strategies, have been actively adopted by emerging distributed training systems. This results in a complex SW/HW co-design stack of distributed training, necessitating a modeling/simulation infrastructure for design-space exploration. In this paper, we extend the open-source ASTRA-sim infrastructure and endow it with the capabilities to model state-of-the-art and emerging distributed training models and platforms. More specifically, (i) we enable ASTRA-sim to support arbitrary model parallelization strategies via a graph-based training-loop implementation, (ii) we implement a parameterizable multi-dimensional heterogeneous topology generation infrastructure with analytical performance estimates enabling simulating target systems at scale, and (iii) we enhance the memory system modeling to support accurate modeling of in-network collective communication and disaggregated memory systems. With such capabilities, we run comprehensive case studies targeting emerging distributed models and platforms. This infrastructure lets system designers swiftly traverse the complex co-design stack and give meaningful insights when designing and deploying distributed training platforms at scale. △ Less

Submitted 24 March, 2023; originally announced March 2023.

arXiv:2303.13894 [pdf, ps, other]

Polynomial correspondences expressible as maps of $d$-tuples

Authors: Shrihari Sridharan, Subith G., Atma Ram Tiwari

Abstract: In this paper, we consider polynomial correspondences $f (x, y)$ in $\mathbb{C}[x, y]$ of degree $d \ge 2$ in both the variables and obtain necessary and sufficient conditions in order that the equation $f (x, y) = 0$ can be expressed as $φ(x) = ψ(y)$, where $φ$ and $ψ$ are fractional degree $d$ rational maps in the Riemann sphere. In the absence of involutions that played a vital role towards cha… ▽ More In this paper, we consider polynomial correspondences $f (x, y)$ in $\mathbb{C}[x, y]$ of degree $d \ge 2$ in both the variables and obtain necessary and sufficient conditions in order that the equation $f (x, y) = 0$ can be expressed as $φ(x) = ψ(y)$, where $φ$ and $ψ$ are fractional degree $d$ rational maps in the Riemann sphere. In the absence of involutions that played a vital role towards characterising quadratic correspondences ($d = 2$), we employ certain elementary ideas from theory of equations and matrices to achieve our results. We further explore certain symmetry conditions on the matrix of coefficients of correspondences that satisfy the above factorisation. We conclude this short note with a few examples. △ Less

Submitted 24 March, 2023; originally announced March 2023.

arXiv:2303.08597 [pdf, other]

Aerial-Ground Person Re-ID

Authors: Huy Nguyen, Kien Nguyen, Sridha Sridharan, Clinton Fookes

Abstract: Person re-ID matches persons across multiple non-overlapping cameras. Despite the increasing deployment of airborne platforms in surveillance, current existing person re-ID benchmarks' focus is on ground-ground matching and very limited efforts on aerial-aerial matching. We propose a new benchmark dataset - AG-ReID, which performs person re-ID matching in a new setting: across aerial and ground ca… ▽ More Person re-ID matches persons across multiple non-overlapping cameras. Despite the increasing deployment of airborne platforms in surveillance, current existing person re-ID benchmarks' focus is on ground-ground matching and very limited efforts on aerial-aerial matching. We propose a new benchmark dataset - AG-ReID, which performs person re-ID matching in a new setting: across aerial and ground cameras. Our dataset contains 21,983 images of 388 identities and 15 soft attributes for each identity. The data was collected by a UAV flying at altitudes between 15 to 45 meters and a ground-based CCTV camera on a university campus. Our dataset presents a novel elevated-viewpoint challenge for person re-ID due to the significant difference in person appearance across these cameras. We propose an explainable algorithm to guide the person re-ID model's training with soft attributes to address this challenge. Experiments demonstrate the efficacy of our method on the aerial-ground person re-ID task. The dataset will be published and the baseline codes will be open-sourced at https://github.com/huynguyen792/AG-ReID to facilitate research in this area. △ Less

Submitted 14 August, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: Published on IEEE International Conference on Multimedia and Expo 2023 (ICME2023)

arXiv:2303.07470 [pdf, other]

X-Former: In-Memory Acceleration of Transformers

Authors: Shrihari Sridharan, Jacob R. Stevens, Kaushik Roy, Anand Raghunathan

Abstract: Transformers have achieved great success in a wide variety of natural language processing (NLP) tasks due to the attention mechanism, which assigns an importance score for every word relative to other words in a sequence. However, these models are very large, often reaching hundreds of billions of parameters, and therefore require a large number of DRAM accesses. Hence, traditional deep neural net… ▽ More Transformers have achieved great success in a wide variety of natural language processing (NLP) tasks due to the attention mechanism, which assigns an importance score for every word relative to other words in a sequence. However, these models are very large, often reaching hundreds of billions of parameters, and therefore require a large number of DRAM accesses. Hence, traditional deep neural network (DNN) accelerators such as GPUs and TPUs face limitations in processing Transformers efficiently. In-memory accelerators based on non-volatile memory promise to be an effective solution to this challenge, since they provide high storage density while performing massively parallel matrix vector multiplications within memory arrays. However, attention score computations, which are frequently used in Transformers (unlike CNNs and RNNs), require matrix vector multiplications (MVM) where both operands change dynamically for each input. As a result, conventional NVM-based accelerators incur high write latency and write energy when used for Transformers, and further suffer from the low endurance of most NVM technologies. To address these challenges, we present X-Former, a hybrid in-memory hardware accelerator that consists of both NVM and CMOS processing elements to execute transformer workloads efficiently. To improve the hardware utilization of X-Former, we also propose a sequence blocking dataflow, which overlaps the computations of the two processing elements and reduces execution time. Across several benchmarks, we show that X-Former achieves upto 85x and 7.5x improvements in latency and energy over a NVIDIA GeForce GTX 1060 GPU and upto 10.7x and 4.6x improvements in latency and energy over a state-of-the-art in-memory NVM accelerator. △ Less

Submitted 13 March, 2023; originally announced March 2023.

arXiv:2301.04122 [pdf, other]

Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks

Authors: Mingyu Liang, Wenyin Fu, Louis Feng, Zhongyi Lin, Pavani Panakanti, Shengbao Zheng, Srinivas Sridharan, Christina Delimitrou

Abstract: Building large AI fleets to support the rapidly growing DL workloads is an active research topic for modern cloud providers. Generating accurate benchmarks plays an essential role in designing the fast-paced software and hardware solutions in this space. Two fundamental challenges to make this scalable are (i) workload representativeness and (ii) the ability to quickly incorporate changes to the f… ▽ More Building large AI fleets to support the rapidly growing DL workloads is an active research topic for modern cloud providers. Generating accurate benchmarks plays an essential role in designing the fast-paced software and hardware solutions in this space. Two fundamental challenges to make this scalable are (i) workload representativeness and (ii) the ability to quickly incorporate changes to the fleet into the benchmarks. To overcome these issues, we propose Mystique, an accurate and scalable framework for production AI benchmark generation. It leverages the PyTorch execution trace (ET), a new feature that captures the runtime information of AI models at the granularity of operators, in a graph format, together with their metadata. By sourcing fleet ETs, we can build AI benchmarks that are portable and representative. Mystique is scalable, due to its lightweight data collection, in terms of runtime overhead and instrumentation effort. It is also adaptive because ET composability allows flexible control on benchmark creation. We evaluate our methodology on several production AI models, and show that benchmarks generated with Mystique closely resemble original AI models, both in execution time and system-level metrics. We also showcase the portability of the generated benchmarks across platforms, and demonstrate several use cases enabled by the fine-grained composability of the execution trace. △ Less

Submitted 11 April, 2023; v1 submitted 16 December, 2022; originally announced January 2023.

Comments: Accepted to ISCA 2023

arXiv:2211.12732 [pdf, other]

Wild-Places: A Large-Scale Dataset for Lidar Place Recognition in Unstructured Natural Environments

Authors: Joshua Knights, Kavisha Vidanapathirana, Milad Ramezani, Sridha Sridharan, Clinton Fookes, Peyman Moghadam

Abstract: Many existing datasets for lidar place recognition are solely representative of structured urban environments, and have recently been saturated in performance by deep learning based approaches. Natural and unstructured environments present many additional challenges for the tasks of long-term localisation but these environments are not represented in currently available datasets. To address this w… ▽ More Many existing datasets for lidar place recognition are solely representative of structured urban environments, and have recently been saturated in performance by deep learning based approaches. Natural and unstructured environments present many additional challenges for the tasks of long-term localisation but these environments are not represented in currently available datasets. To address this we introduce Wild-Places, a challenging large-scale dataset for lidar place recognition in unstructured, natural environments. Wild-Places contains eight lidar sequences collected with a handheld sensor payload over the course of fourteen months, containing a total of 63K undistorted lidar submaps along with accurate 6DoF ground truth. Our dataset contains multiple revisits both within and between sequences, allowing for both intra-sequence (i.e. loop closure detection) and inter-sequence (i.e. re-localisation) place recognition. We also benchmark several state-of-the-art approaches to demonstrate the challenges that this dataset introduces, particularly the case of long-term place recognition due to natural environments changing over time. Our dataset and code will be available at https://csiro-robotics.github.io/Wild-Places. △ Less

Submitted 2 March, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

Comments: Equal Contribution from first two authors Accepted to ICRA2023 Website link: https://csiro-robotics.github.io/Wild-Places/

arXiv:2211.08565 [pdf, other]

Using Auxiliary Information for Person Re-Identification -- A Tutorial Overview

Authors: Tharindu Fernando, Clinton Fookes, Sridha Sridharan, Dana Michalski

Abstract: Person re-identification (re-id) is a pivotal task within an intelligent surveillance pipeline and there exist numerous re-id frameworks that achieve satisfactory performance in challenging benchmarks. However, these systems struggle to generate acceptable results when there are significant differences between the camera views, illumination conditions, or occlusions. This result can be attributed… ▽ More Person re-identification (re-id) is a pivotal task within an intelligent surveillance pipeline and there exist numerous re-id frameworks that achieve satisfactory performance in challenging benchmarks. However, these systems struggle to generate acceptable results when there are significant differences between the camera views, illumination conditions, or occlusions. This result can be attributed to the deficiency that exists within many recently proposed re-id pipelines where they are predominately driven by appearance-based features and little attention is paid to other auxiliary information that could aid the re-id. In this paper, we systematically review the current State-Of-The-Art (SOTA) methods in both uni-modal and multimodal person re-id. Extending beyond a conceptual framework, we illustrate how the existing SOTA methods can be extended to support these additional auxiliary information and quantitatively evaluate the utility of such auxiliary feature information, ranging from logos printed on the objects carried by the subject or printed on the clothes worn by the subject, through to his or her behavioural trajectories. To the best of our knowledge, this is the first work that explores the fusion of multiple information to generate a more discriminant person descriptor and the principal aim of this paper is to provide a thorough theoretical analysis regarding the implementation of such a framework. In addition, using model interpretation techniques, we validate the contributions from different combinations of the auxiliary information versus the original features that the SOTA person re-id models extract. We outline the limitations of the proposed approaches and propose future research directions that could be pursued to advance the area of multi-modal person re-id. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: Preprint Submitted to Pattern Recognition

arXiv:2210.04432 [pdf, other]

doi 10.1109/LRA.2023.3255560

Spectral Geometric Verification: Re-Ranking Point Cloud Retrieval for Metric Localization

Authors: Kavisha Vidanapathirana, Peyman Moghadam, Sridha Sridharan, Clinton Fookes

Abstract: In large-scale metric localization, an incorrect result during retrieval will lead to an incorrect pose estimate or loop closure. Re-ranking methods propose to take into account all the top retrieval candidates and re-order them to increase the likelihood of the top candidate being correct. However, state-of-the-art re-ranking methods are inefficient when re-ranking many potential candidates due t… ▽ More In large-scale metric localization, an incorrect result during retrieval will lead to an incorrect pose estimate or loop closure. Re-ranking methods propose to take into account all the top retrieval candidates and re-order them to increase the likelihood of the top candidate being correct. However, state-of-the-art re-ranking methods are inefficient when re-ranking many potential candidates due to their need for resource intensive point cloud registration between the query and each candidate. In this work, we propose an efficient spectral method for geometric verification (named SpectralGV) that does not require registration. We demonstrate how the optimal inter-cluster score of the correspondence compatibility graph of two point clouds represents a robust fitness score measuring their spatial consistency. This score takes into account the subtle geometric differences between structurally similar point clouds and therefore can be used to identify the correct candidate among potential matches retrieved by global similarity search. SpectralGV is deterministic, robust to outlier correspondences, and can be computed in parallel for all potential candidates. We conduct extensive experiments on 5 large-scale datasets to demonstrate that SpectralGV outperforms other state-of-the-art re-ranking methods and show that it consistently improves the recall and pose estimation of 3 state-of-the-art metric localization architectures while having a negligible effect on their runtime. The open-source implementation and trained models are available at: https://github.com/csiro-robotics/SpectralGV. △ Less

Submitted 6 March, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

Comments: Accepted for publication in IEEE RA-L (2023)

arXiv:2207.10898 [pdf, other]

Impact of RoCE Congestion Control Policies on Distributed Training of DNNs

Authors: Tarannum Khan, Saeed Rashidi, Srinivas Sridharan, Pallavi Shurpali, Aditya Akella, Tushar Krishna

Abstract: RDMA over Converged Ethernet (RoCE) has gained significant attraction for datacenter networks due to its compatibility with conventional Ethernet-based fabric. However, the RDMA protocol is efficient only on (nearly) lossless networks, emphasizing the vital role of congestion control on RoCE networks. Unfortunately, the native RoCE congestion control scheme, based on Priority Flow Control (PFC), s… ▽ More RDMA over Converged Ethernet (RoCE) has gained significant attraction for datacenter networks due to its compatibility with conventional Ethernet-based fabric. However, the RDMA protocol is efficient only on (nearly) lossless networks, emphasizing the vital role of congestion control on RoCE networks. Unfortunately, the native RoCE congestion control scheme, based on Priority Flow Control (PFC), suffers from many drawbacks such as unfairness, head-of-line-blocking, and deadlock. Therefore, in recent years many schemes have been proposed to provide additional congestion control for RoCE networks to minimize PFC drawbacks. However, these schemes are proposed for general datacenter environments. In contrast to the general datacenters that are built using commodity hardware and run general-purpose workloads, high-performance distributed training platforms deploy high-end accelerators and network components and exclusively run training workloads using collectives (All-Reduce, All-To-All) communication libraries for communication. Furthermore, these platforms usually have a private network, separating their communication traffic from the rest of the datacenter traffic. Scalable topology-aware collective algorithms are inherently designed to avoid incast patterns and balance traffic optimally. These distinct features necessitate revisiting previously proposed congestion control schemes for general-purpose datacenter environments. In this paper, we thoroughly analyze some of the SOTA RoCE congestion control schemes vs. PFC when running on distributed training platforms. Our results indicate that previously proposed RoCE congestion control schemes have little impact on the end-to-end performance of training workloads, motivating the necessity of designing an optimized, yet low-overhead, congestion control scheme based on the characteristics of distributed training platforms and workloads. △ Less

Submitted 22 July, 2022; originally announced July 2022.

arXiv:2207.01769 [pdf, other]

SESS: Saliency Enhancing with Scaling and Sliding

Authors: Osman Tursun, Simon Denman, Sridha Sridharan, Clinton Fookes

Abstract: High-quality saliency maps are essential in several machine learning application areas including explainable AI and weakly supervised object detection and segmentation. Many techniques have been developed to generate better saliency using neural networks. However, they are often limited to specific saliency visualisation methods or saliency issues. We propose a novel saliency enhancing approach ca… ▽ More High-quality saliency maps are essential in several machine learning application areas including explainable AI and weakly supervised object detection and segmentation. Many techniques have been developed to generate better saliency using neural networks. However, they are often limited to specific saliency visualisation methods or saliency issues. We propose a novel saliency enhancing approach called SESS (Saliency Enhancing with Scaling and Sliding). It is a method and model agnostic extension to existing saliency map generation methods. With SESS, existing saliency approaches become robust to scale variance, multiple occurrences of target objects, presence of distractors and generate less noisy and more discriminative saliency maps. SESS improves saliency by fusing saliency maps extracted from multiple patches at different scales from different areas, and combines these individual maps using a novel fusion scheme that incorporates channel-wise weights and spatial weighted average. To improve efficiency, we introduce a pre-filtering step that can exclude uninformative saliency maps to improve efficiency while still enhancing overall results. We evaluate SESS on object recognition and detection benchmarks where it achieves significant improvement. The code is released publicly to enable researchers to verify performance and further development. Code is available at: https://github.com/neouyghur/SESS △ Less

Submitted 4 July, 2022; originally announced July 2022.

Comments: This paper will be presented at ECCV2022

arXiv:2204.01952 [pdf, other]

doi 10.1109/TGRS.2023.3268606

Towards On-Board Panoptic Segmentation of Multispectral Satellite Images

Authors: Tharindu Fernando, Clinton Fookes, Harshala Gammulle, Simon Denman, Sridha Sridharan

Abstract: With tremendous advancements in low-power embedded computing devices and remote sensing instruments, the traditional satellite image processing pipeline which includes an expensive data transfer step prior to processing data on the ground is being replaced by on-board processing of captured data. This paradigm shift enables critical and time-sensitive analytic intelligence to be acquired in a time… ▽ More With tremendous advancements in low-power embedded computing devices and remote sensing instruments, the traditional satellite image processing pipeline which includes an expensive data transfer step prior to processing data on the ground is being replaced by on-board processing of captured data. This paradigm shift enables critical and time-sensitive analytic intelligence to be acquired in a timely manner on-board the satellite itself. However, at present, the on-board processing of multi-spectral satellite images is limited to classification and segmentation tasks. Extending this processing to its next logical level, in this paper we propose a lightweight pipeline for on-board panoptic segmentation of multi-spectral satellite images. Panoptic segmentation offers major economic and environmental insights, ranging from yield estimation from agricultural lands to intelligence for complex military applications. Nevertheless, the on-board intelligence extraction raises several challenges due to the loss of temporal observations and the need to generate predictions from a single image sample. To address this challenge, we propose a multimodal teacher network based on a cross-modality attention-based fusion strategy to improve the segmentation accuracy by exploiting data from multiple modes. We also propose an online knowledge distillation framework to transfer the knowledge learned by this multi-modal teacher network to a uni-modal student which receives only a single frame input, and is more appropriate for an on-board environment. We benchmark our approach against existing state-of-the-art panoptic segmentation models using the PASTIS multi-spectral panoptic segmentation dataset considering an on-board processing setting. Our evaluations demonstrate a substantial increase in accuracy metrics compared to the existing state-of-the-art models. △ Less

Submitted 4 April, 2022; originally announced April 2022.

arXiv:2203.00807 [pdf, other]

InCloud: Incremental Learning for Point Cloud Place Recognition

Authors: Joshua Knights, Peyman Moghadam, Milad Ramezani, Sridha Sridharan, Clinton Fookes

Abstract: Place recognition is a fundamental component of robotics, and has seen tremendous improvements through the use of deep learning models in recent years. Networks can experience significant drops in performance when deployed in unseen or highly dynamic environments, and require additional training on the collected data. However naively fine-tuning on new training distributions can cause severe degra… ▽ More Place recognition is a fundamental component of robotics, and has seen tremendous improvements through the use of deep learning models in recent years. Networks can experience significant drops in performance when deployed in unseen or highly dynamic environments, and require additional training on the collected data. However naively fine-tuning on new training distributions can cause severe degradation of performance on previously visited domains, a phenomenon known as catastrophic forgetting. In this paper we address the problem of incremental learning for point cloud place recognition and introduce InCloud, a structure-aware distillation-based approach which preserves the higher-order structure of the network's embedding space. We introduce several challenging new benchmarks on four popular and large-scale LiDAR datasets (Oxford, MulRan, In-house and KITTI) showing broad improvements in point cloud place recognition performance over a variety of network architectures. To the best of our knowledge, this work is the first to effectively apply incremental learning for point cloud place recognition. Data pre-processing, training and evaluation code for this paper can be found at https://github.com/csiro-robotics/InCloud. △ Less

Submitted 29 November, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022)

arXiv:2201.03080 [pdf, other]

The State of Aerial Surveillance: A Survey

Authors: Kien Nguyen, Clinton Fookes, Sridha Sridharan, Yingli Tian, Feng Liu, Xiaoming Liu, Arun Ross

Abstract: The rapid emergence of airborne platforms and imaging sensors are enabling new forms of aerial surveillance due to their unprecedented advantages in scale, mobility, deployment and covert observation capabilities. This paper provides a comprehensive overview of human-centric aerial surveillance tasks from a computer vision and pattern recognition perspective. It aims to provide readers with an in-… ▽ More The rapid emergence of airborne platforms and imaging sensors are enabling new forms of aerial surveillance due to their unprecedented advantages in scale, mobility, deployment and covert observation capabilities. This paper provides a comprehensive overview of human-centric aerial surveillance tasks from a computer vision and pattern recognition perspective. It aims to provide readers with an in-depth systematic review and technical analysis of the current state of aerial surveillance tasks using drones, UAVs and other airborne platforms. The main object of interest is humans, where single or multiple subjects are to be detected, identified, tracked, re-identified and have their behavior analyzed. More specifically, for each of these four tasks, we first discuss unique challenges in performing these tasks in an aerial setting compared to a ground-based setting. We then review and analyze the aerial datasets publicly available for each task, and delve deep into the approaches in the aerial literature and investigate how they presently address the aerial challenges. We conclude the paper with discussion on the missing gaps and open research questions to inform future research avenues. △ Less

Submitted 12 January, 2022; v1 submitted 9 January, 2022; originally announced January 2022.

arXiv:2112.00289 [pdf, other]

Point Cloud Segmentation Using Sparse Temporal Local Attention

Authors: Joshua Knights, Peyman Moghadam, Clinton Fookes, Sridha Sridharan

Abstract: Point clouds are a key modality used for perception in autonomous vehicles, providing the means for a robust geometric understanding of the surrounding environment. However despite the sensor outputs from autonomous vehicles being naturally temporal in nature, there is still limited exploration of exploiting point cloud sequences for 3D seman-tic segmentation. In this paper we propose a novel Spar… ▽ More Point clouds are a key modality used for perception in autonomous vehicles, providing the means for a robust geometric understanding of the surrounding environment. However despite the sensor outputs from autonomous vehicles being naturally temporal in nature, there is still limited exploration of exploiting point cloud sequences for 3D seman-tic segmentation. In this paper we propose a novel Sparse Temporal Local Attention (STELA) module which aggregates intermediate features from a local neighbourhood in previous point cloud frames to provide a rich temporal context to the decoder. Using the sparse local neighbourhood enables our approach to gather features more flexibly than those which directly match point features, and more efficiently than those which perform expensive global attention over the whole point cloud frame. We achieve a competitive mIoU of 64.3% on the SemanticKitti dataset, and demonstrate significant improvement over the single-frame baseline in our ablation studies. △ Less

Submitted 2 December, 2021; v1 submitted 1 December, 2021; originally announced December 2021.

Comments: 8 pages, 3 figures Published at the Australasian Conference on Robotics and Automation (ACRA) 2021

arXiv:2110.04478 [pdf, other]

doi 10.1145/3470496.3527382

Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models

Authors: Saeed Rashidi, William Won, Sudarshan Srinivasan, Srinivas Sridharan, Tushar Krishna

Abstract: Distributed training is a solution to reduce DNN training time by splitting the task across multiple NPUs (e.g., GPU/TPU). However, distributed training adds communication overhead between the NPUs in order to synchronize the gradients and/or activation, depending on the parallelization strategy. In next-generation platforms for training at scale, NPUs will be connected through multi-dimensional n… ▽ More Distributed training is a solution to reduce DNN training time by splitting the task across multiple NPUs (e.g., GPU/TPU). However, distributed training adds communication overhead between the NPUs in order to synchronize the gradients and/or activation, depending on the parallelization strategy. In next-generation platforms for training at scale, NPUs will be connected through multi-dimensional networks with diverse, heterogeneous bandwidths. This work identifies a looming challenge of keeping all network dimensions busy and maximizing the network BW within the hybrid environment if we leverage scheduling techniques for collective communication on systems today. We propose Themis, a novel collective scheduling scheme that dynamically schedules collectives (divided into chunks) to balance the communication loads across all dimensions, further improving the network BW utilization. Our results show that on average, Themis can improve the network BW utilization of the single All-Reduce by 1.72X (2.70X max), and improve the end-to-end training iteration performance of real workloads such as ResNet-152, GNMT, DLRM, and Transformer-1T by 1.49X (2.25X max), 1.30X (1.78X max), 1.30X (1.77X max), and 1.25X (1.53X max), respectively. △ Less

Submitted 7 July, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

arXiv:2109.13495 [pdf, ps, other]

Dynamics of Products of Matrices in Max Algebra

Authors: S. Jayaraman, Y. K. Prajapaty, S. Sridharan

Abstract: The aim of this manuscript is to understand the dynamics of matrix products in a max algebra. A consequence of the Perron-Fröbenius theorem on periodic points of a nonnegative matrix is generalized to a max algebra setting. The same is then studied for a finite product associated to a $p$-lettered word on $N$ letters arising from a finite collection of nonnegative matrices, with each member having… ▽ More The aim of this manuscript is to understand the dynamics of matrix products in a max algebra. A consequence of the Perron-Fröbenius theorem on periodic points of a nonnegative matrix is generalized to a max algebra setting. The same is then studied for a finite product associated to a $p$-lettered word on $N$ letters arising from a finite collection of nonnegative matrices, with each member having its maximum circuit geometric mean at most 1. △ Less

Submitted 28 September, 2021; originally announced September 2021.

MSC Class: 15A80; 15B34; 37H12

arXiv:2109.08336 [pdf, other]

doi 10.1109/ICRA46639.2022.9811753

LoGG3D-Net: Locally Guided Global Descriptor Learning for 3D Place Recognition

Authors: Kavisha Vidanapathirana, Milad Ramezani, Peyman Moghadam, Sridha Sridharan, Clinton Fookes

Abstract: Retrieval-based place recognition is an efficient and effective solution for re-localization within a pre-built map, or global data association for Simultaneous Localization and Mapping (SLAM). The accuracy of such an approach is heavily dependent on the quality of the extracted scene-level representation. While end-to-end solutions - which learn a global descriptor from input point clouds - have… ▽ More Retrieval-based place recognition is an efficient and effective solution for re-localization within a pre-built map, or global data association for Simultaneous Localization and Mapping (SLAM). The accuracy of such an approach is heavily dependent on the quality of the extracted scene-level representation. While end-to-end solutions - which learn a global descriptor from input point clouds - have demonstrated promising results, such approaches are limited in their ability to enforce desirable properties at the local feature level. In this paper, we introduce a local consistency loss to guide the network towards learning local features which are consistent across revisits, hence leading to more repeatable global descriptors resulting in an overall improvement in 3D place recognition performance. We formulate our approach in an end-to-end trainable architecture called LoGG3D-Net. Experiments on two large-scale public benchmarks (KITTI and MulRan) show that our method achieves mean $F1_{max}$ scores of $0.939$ and $0.968$ on KITTI and MulRan respectively, achieving state-of-the-art performance while operating in near real-time. The open-source implementation is available at: https://github.com/csiro-robotics/LoGG3D-Net. △ Less

Submitted 16 February, 2022; v1 submitted 16 September, 2021; originally announced September 2021.

Comments: Accepted - ICRA 2022

arXiv:2108.08995 [pdf, other]

Discriminative Domain-Invariant Adversarial Network for Deep Domain Generalization

Authors: Mohammad Mahfujur Rahman, Clinton Fookes, Sridha Sridharan

Abstract: Domain generalization approaches aim to learn a domain invariant prediction model for unknown target domains from multiple training source domains with different distributions. Significant efforts have recently been committed to broad domain generalization, which is a challenging and topical problem in machine learning and computer vision communities. Most previous domain generalization approaches… ▽ More Domain generalization approaches aim to learn a domain invariant prediction model for unknown target domains from multiple training source domains with different distributions. Significant efforts have recently been committed to broad domain generalization, which is a challenging and topical problem in machine learning and computer vision communities. Most previous domain generalization approaches assume that the conditional distribution across the domains remain the same across the source domains and learn a domain invariant model by minimizing the marginal distributions. However, the assumption of a stable conditional distribution of the training source domains does not really hold in practice. The hyperplane learned from the source domains will easily misclassify samples scattered at the boundary of clusters or far from their corresponding class centres. To address the above two drawbacks, we propose a discriminative domain-invariant adversarial network (DDIAN) for domain generalization. The discriminativeness of the features are guaranteed through a discriminative feature module and domain-invariant features are guaranteed through the global domain and local sub-domain alignment modules. Extensive experiments on several benchmarks show that DDIAN achieves better prediction on unseen target data during training compared to state-of-the-art domain generalization approaches. △ Less

Submitted 20 August, 2021; originally announced August 2021.

Comments: This manuscript is submitted to Computer Vision and Image Understanding (CVIU)

arXiv:2108.03786 [pdf, other]

Multi-Slice Net: A novel light weight framework for COVID-19 Diagnosis

Authors: Harshala Gammulle, Tharindu Fernando, Sridha Sridharan, Simon Denman, Clinton Fookes

Abstract: This paper presents a novel lightweight COVID-19 diagnosis framework using CT scans. Our system utilises a novel two-stage approach to generate robust and efficient diagnoses across heterogeneous patient level inputs. We use a powerful backbone network as a feature extractor to capture discriminative slice-level features. These features are aggregated by a lightweight network to obtain a patient l… ▽ More This paper presents a novel lightweight COVID-19 diagnosis framework using CT scans. Our system utilises a novel two-stage approach to generate robust and efficient diagnoses across heterogeneous patient level inputs. We use a powerful backbone network as a feature extractor to capture discriminative slice-level features. These features are aggregated by a lightweight network to obtain a patient level diagnosis. The aggregation network is carefully designed to have a small number of trainable parameters while also possessing sufficient capacity to generalise to diverse variations within different CT volumes and to adapt to noise introduced during the data acquisition. We achieve a significant performance increase over the baselines when benchmarked on the SPGC COVID-19 Radiomics Dataset, despite having only 2.5 million trainable parameters and requiring only 0.623 seconds on average to process a single patient's CT volume using an Nvidia-GeForce RTX 2080 GPU. △ Less

Submitted 8 August, 2021; originally announced August 2021.

Comments: IEEE International Conference on Autonomous Systems 2021

arXiv:2106.15835 [pdf, other]

Robust and Interpretable Temporal Convolution Network for Event Detection in Lung Sound Recordings

Authors: Tharindu Fernando, Sridha Sridharan, Simon Denman, Houman Ghaemmaghami, Clinton Fookes

Abstract: This paper proposes a novel framework for lung sound event detection, segmenting continuous lung sound recordings into discrete events and performing recognition on each event. Exploiting the lightweight nature of Temporal Convolution Networks (TCNs) and their superior results compared to their recurrent counterparts, we propose a lightweight, yet robust, and completely interpretable framework for… ▽ More This paper proposes a novel framework for lung sound event detection, segmenting continuous lung sound recordings into discrete events and performing recognition on each event. Exploiting the lightweight nature of Temporal Convolution Networks (TCNs) and their superior results compared to their recurrent counterparts, we propose a lightweight, yet robust, and completely interpretable framework for lung sound event detection. We propose the use of a multi-branch TCN architecture and exploit a novel fusion strategy to combine the resultant features from these branches. This not only allows the network to retain the most salient information across different temporal granularities and disregards irrelevant information, but also allows our network to process recordings of arbitrary length. Results: The proposed method is evaluated on multiple public and in-house benchmarks of irregular and noisy recordings of the respiratory auscultation process for the identification of numerous auscultation events including inhalation, exhalation, crackles, wheeze, stridor, and rhonchi. We exceed the state-of-the-art results in all evaluations. Furthermore, we empirically analyse the effect of the proposed multi-branch TCN architecture and the feature fusion strategy and provide quantitative and qualitative evaluations to illustrate their efficiency. Moreover, we provide an end-to-end model interpretation pipeline that interprets the operations of all the components of the proposed framework. Our analysis of different feature fusion strategies shows that the proposed feature concatenation method leads to better suppression of non-informative features, which drastically reduces the classifier overhead resulting in a robust lightweight network.The lightweight nature of our model allows it to be deployed in end-user devices such as smartphones, and it has the ability to generate predictions in real-time. △ Less

Submitted 30 June, 2021; originally announced June 2021.

Comments: preprint submitted to JBHI

arXiv:2106.05599 [pdf, other]

doi 10.1016/j.ijleo.2021.168280

Gated InGaAs Detector Characterization with Sub-Picosecond Weak Coherent Pulses

Authors: Gautam Kumar Shaw, Shyam Sridharan, Anil Prabhakar

Abstract: We propose and demonstrate a method to characterize a gated InGaAs single-photon detector (SPD). Ultrashort weak coherent pulses, from a mode-locked sub-picosecond pulsed laser, were used to measure photon counts, at varying arrival times relative to the start of the SPD gate voltage. The uneven detection probabilities within the gate window were used to estimate the afterpulse probability with re… ▽ More We propose and demonstrate a method to characterize a gated InGaAs single-photon detector (SPD). Ultrashort weak coherent pulses, from a mode-locked sub-picosecond pulsed laser, were used to measure photon counts, at varying arrival times relative to the start of the SPD gate voltage. The uneven detection probabilities within the gate window were used to estimate the afterpulse probability with respect to various detector parameters: excess bias, width of gate window and hold-off time. We estimated a lifetime of 2.1 microseconds for the half-life of trapped carriers, using a power-law fit to the decay in afterpulse probability. Finally, we quantify the timing jitter of the SPD using a time to digital converter with a resolution of 55 ps. △ Less

Submitted 11 July, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

Comments: 15 pages, 10 figures

arXiv:2104.13780 [pdf, other]

Semantic Consistency and Identity Mapping Multi-Component Generative Adversarial Network for Person Re-Identification

Authors: Amena Khatun, Simon Denman, Sridha Sridharan, Clinton Fookes

Abstract: In a real world environment, person re-identification (Re-ID) is a challenging task due to variations in lighting conditions, viewing angles, pose and occlusions. Despite recent performance gains, current person Re-ID algorithms still suffer heavily when encountering these variations. To address this problem, we propose a semantic consistency and identity mapping multi-component generative adversa… ▽ More In a real world environment, person re-identification (Re-ID) is a challenging task due to variations in lighting conditions, viewing angles, pose and occlusions. Despite recent performance gains, current person Re-ID algorithms still suffer heavily when encountering these variations. To address this problem, we propose a semantic consistency and identity mapping multi-component generative adversarial network (SC-IMGAN) which provides style adaptation from one to many domains. To ensure that transformed images are as realistic as possible, we propose novel identity mapping and semantic consistency losses to maintain identity across the diverse domains. For the Re-ID task, we propose a joint verification-identification quartet network which is trained with generated and real images, followed by an effective quartet loss for verification. Our proposed method outperforms state-of-the-art techniques on six challenging person Re-ID datasets: CUHK01, CUHK03, VIPeR, PRID2011, iLIDS and Market-1501. △ Less

Submitted 28 April, 2021; originally announced April 2021.

Comments: Accepted in WACV 2020

Journal ref: WACV, 2020

arXiv:2104.13773 [pdf, other]

Pose-driven Attention-guided Image Generation for Person Re-Identification

Authors: Amena Khatun, Simon Denman, Sridha Sridharan, Clinton Fookes

Abstract: Person re-identification (re-ID) concerns the matching of subject images across different camera views in a multi camera surveillance system. One of the major challenges in person re-ID is pose variations across the camera network, which significantly affects the appearance of a person. Existing development data lack adequate pose variations to carry out effective training of person re-ID systems.… ▽ More Person re-identification (re-ID) concerns the matching of subject images across different camera views in a multi camera surveillance system. One of the major challenges in person re-ID is pose variations across the camera network, which significantly affects the appearance of a person. Existing development data lack adequate pose variations to carry out effective training of person re-ID systems. To solve this issue, in this paper we propose an end-to-end pose-driven attention-guided generative adversarial network, to generate multiple poses of a person. We propose to attentively learn and transfer the subject pose through an attention mechanism. A semantic-consistency loss is proposed to preserve the semantic information of the person during pose transfer. To ensure fine image details are realistic after pose translation, an appearance discriminator is used while a pose discriminator is used to ensure the pose of the transferred images will exactly be the same as the target pose. We show that by incorporating the proposed approach in a person re-identification framework, realistic pose transferred images and state-of-the-art re-identification results can be achieved. △ Less

Submitted 28 April, 2021; originally announced April 2021.

Comments: Submitted to Pattern Recognition

arXiv:2104.13725 [pdf, other]

Preserving Semantic Consistency in Unsupervised Domain Adaptation Using Generative Adversarial Networks

Authors: Mohammad Mahfujur Rahman, Clinton Fookes, Sridha Sridharan

Abstract: Unsupervised domain adaptation seeks to mitigate the distribution discrepancy between source and target domains, given labeled samples of the source domain and unlabeled samples of the target domain. Generative adversarial networks (GANs) have demonstrated significant improvement in domain adaptation by producing images which are domain specific for training. However, most of the existing GAN base… ▽ More Unsupervised domain adaptation seeks to mitigate the distribution discrepancy between source and target domains, given labeled samples of the source domain and unlabeled samples of the target domain. Generative adversarial networks (GANs) have demonstrated significant improvement in domain adaptation by producing images which are domain specific for training. However, most of the existing GAN based techniques for unsupervised domain adaptation do not consider semantic information during domain matching, hence these methods degrade the performance when the source and target domain data are semantically different. In this paper, we propose an end-to-end novel semantic consistent generative adversarial network (SCGAN). This network can achieve source to target domain matching by capturing semantic information at the feature level and producing images for unsupervised domain adaptation from both the source and the target domains. We demonstrate the robustness of our proposed method which exceeds the state-of-the-art performance in unsupervised domain adaptation settings by performing experiments on digit and object classification tasks. △ Less

Submitted 28 April, 2021; originally announced April 2021.

Comments: Submitted to Pattern Recognition Letters

arXiv:2104.13581 [pdf, other]

Deep Domain Generalization with Feature-norm Network

Authors: Mohammad Mahfujur Rahman, Clinton Fookes, Sridha Sridharan

Abstract: In this paper, we tackle the problem of training with multiple source domains with the aim to generalize to new domains at test time without an adaptation step. This is known as domain generalization (DG). Previous works on DG assume identical categories or label space across the source domains. In the case of category shift among the source domains, previous methods on DG are vulnerable to negati… ▽ More In this paper, we tackle the problem of training with multiple source domains with the aim to generalize to new domains at test time without an adaptation step. This is known as domain generalization (DG). Previous works on DG assume identical categories or label space across the source domains. In the case of category shift among the source domains, previous methods on DG are vulnerable to negative transfer due to the large mismatch among label spaces, decreasing the target classification accuracy. To tackle the aforementioned problem, we introduce an end-to-end feature-norm network (FNN) which is robust to negative transfer as it does not need to match the feature distribution among the source domains. We also introduce a collaborative feature-norm network (CFNN) to further improve the generalization capability of FNN. The CFNN matches the predictions of the next most likely categories for each training sample which increases each network's posterior entropy. We apply the proposed FNN and CFNN networks to the problem of DG for image classification tasks and demonstrate significant improvement over the state-of-the-art. △ Less

Submitted 28 April, 2021; originally announced April 2021.

Comments: Submitted to Pattern Recognition

arXiv:2104.07240 [pdf, other]

doi 10.1109/ICIP42928.2021.9506223

Learning Regional Attention over Multi-resolution Deep Convolutional Features for Trademark Retrieval

Authors: Osman Tursun, Simon Denman, Sridha Sridharan, Clinton Fookes

Abstract: Large-scale trademark retrieval is an important content-based image retrieval task. A recent study shows that off-the-shelf deep features aggregated with Regional-Maximum Activation of Convolutions (R-MAC) achieve state-of-the-art results. However, R-MAC suffers in the presence of background clutter/trivial regions and scale variance, and discards important spatial information. We introduce three… ▽ More Large-scale trademark retrieval is an important content-based image retrieval task. A recent study shows that off-the-shelf deep features aggregated with Regional-Maximum Activation of Convolutions (R-MAC) achieve state-of-the-art results. However, R-MAC suffers in the presence of background clutter/trivial regions and scale variance, and discards important spatial information. We introduce three simple but effective modifications to R-MAC to overcome these drawbacks. First, we propose the use of both sum and max pooling to minimise the loss of spatial information. We also employ domain-specific unsupervised soft-attention to eliminate background clutter and unimportant regions. Finally, we add multi-resolution inputs to enhance the scale-invariance of R-MAC. We evaluate these three modifications on the million-scale METU dataset. Our results show that all modifications bring non-trivial improvements, and surpass previous state-of-the-art results. △ Less

Submitted 15 April, 2021; originally announced April 2021.

arXiv:2104.05158 [pdf, other]

Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models

Authors: Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Zhihao Jia, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, Jie Amy Yang, Leon Gao, Dmytro Ivchenko, Aarti Basant, Yuxi Hu, Jiyan Yang, Ehsan K. Ardestani, Xiaodong Wang, Rakesh Komuravelli, Ching-Hsiang Chu, Serhat Yilmaz, Huayu Li, Jiyuan Qian, Zhuobo Feng , et al. (28 additional authors not shown)

Abstract: Deep learning recommendation models (DLRMs) are used across many business-critical services at Facebook and are the single largest AI application in terms of infrastructure demand in its data-centers. In this paper we discuss the SW/HW co-designed solution for high-performance distributed training of large-scale DLRMs. We introduce a high-performance scalable software stack based on PyTorch and pa… ▽ More Deep learning recommendation models (DLRMs) are used across many business-critical services at Facebook and are the single largest AI application in terms of infrastructure demand in its data-centers. In this paper we discuss the SW/HW co-designed solution for high-performance distributed training of large-scale DLRMs. We introduce a high-performance scalable software stack based on PyTorch and pair it with the new evolution of Zion platform, namely ZionEX. We demonstrate the capability to train very large DLRMs with up to 12 Trillion parameters and show that we can attain 40X speedup in terms of time to solution over previous systems. We achieve this by (i) designing the ZionEX platform with dedicated scale-out network, provisioned with high bandwidth, optimal topology and efficient transport (ii) implementing an optimized PyTorch-based training stack supporting both model and data parallelism (iii) developing sharding algorithms capable of hierarchical partitioning of the embedding tables along row, column dimensions and load balancing them across multiple workers; (iv) adding high-performance core operators while retaining flexibility to support optimizers with fully deterministic updates (v) leveraging reduced precision communications, multi-level memory hierarchy (HBM+DDR+SSD) and pipelining. Furthermore, we develop and briefly comment on distributed data ingestion and other supporting services that are required for the robust and efficient end-to-end training in production environments. △ Less

Submitted 26 February, 2023; v1 submitted 11 April, 2021; originally announced April 2021.

arXiv:2102.04016 [pdf, other]

An Efficient Framework for Zero-Shot Sketch-Based Image Retrieval

Authors: Osman Tursun, Simon Denman, Sridha Sridharan, Ethan Goan, Clinton Fookes

Abstract: Recently, Zero-shot Sketch-based Image Retrieval (ZS-SBIR) has attracted the attention of the computer vision community due to it's real-world applications, and the more realistic and challenging setting than found in SBIR. ZS-SBIR inherits the main challenges of multiple computer vision problems including content-based Image Retrieval (CBIR), zero-shot learning and domain adaptation. The majority… ▽ More Recently, Zero-shot Sketch-based Image Retrieval (ZS-SBIR) has attracted the attention of the computer vision community due to it's real-world applications, and the more realistic and challenging setting than found in SBIR. ZS-SBIR inherits the main challenges of multiple computer vision problems including content-based Image Retrieval (CBIR), zero-shot learning and domain adaptation. The majority of previous studies using deep neural networks have achieved improved results through either projecting sketch and images into a common low-dimensional space or transferring knowledge from seen to unseen classes. However, those approaches are trained with complex frameworks composed of multiple deep convolutional neural networks (CNNs) and are dependent on category-level word labels. This increases the requirements on training resources and datasets. In comparison, we propose a simple and efficient framework that does not require high computational training resources, and can be trained on datasets without semantic categorical labels. Furthermore, at training and inference stages our method only uses a single CNN. In this work, a pre-trained ImageNet CNN (e.g., ResNet50) is fine-tuned with three proposed learning objects: domain-aware quadruplet loss, semantic classification loss, and semantic knowledge preservation loss. The domain-aware quadruplet and semantic classification losses are introduced to learn discriminative, semantic and domain invariant features through considering ZS-SBIR as object detection and verification problem. ... △ Less

Submitted 8 February, 2021; originally announced February 2021.

arXiv:2101.11239 [pdf, other]

Im2Mesh GAN: Accurate 3D Hand Mesh Recovery from a Single RGB Image

Authors: Akila Pemasiri, Kien Nguyen Thanh, Sridha Sridharan, Clinton Fookes

Abstract: This work addresses hand mesh recovery from a single RGB image. In contrast to most of the existing approaches where the parametric hand models are employed as the prior, we show that the hand mesh can be learned directly from the input image. We propose a new type of GAN called Im2Mesh GAN to learn the mesh through end-to-end adversarial training. By interpreting the mesh as a graph, our model is… ▽ More This work addresses hand mesh recovery from a single RGB image. In contrast to most of the existing approaches where the parametric hand models are employed as the prior, we show that the hand mesh can be learned directly from the input image. We propose a new type of GAN called Im2Mesh GAN to learn the mesh through end-to-end adversarial training. By interpreting the mesh as a graph, our model is able to capture the topological relationship among the mesh vertices. We also introduce a 3D surface descriptor into the GAN architecture to further capture the 3D features associated. We experiment two approaches where one can reap the benefits of coupled groundtruth data availability of images and the corresponding meshes, while the other combats the more challenging problem of mesh estimations without the corresponding groundtruth. Through extensive evaluations we demonstrate that the proposed method outperforms the state-of-the-art. △ Less

Submitted 27 January, 2021; originally announced January 2021.

arXiv:2012.02364 [pdf, other]

Deep Learning for Medical Anomaly Detection -- A Survey

Authors: Tharindu Fernando, Harshala Gammulle, Simon Denman, Sridha Sridharan, Clinton Fookes

Abstract: Machine learning-based medical anomaly detection is an important problem that has been extensively studied. Numerous approaches have been proposed across various medical application domains and we observe several similarities across these distinct applications. Despite this comparability, we observe a lack of structured organisation of these diverse research applications such that their advantages… ▽ More Machine learning-based medical anomaly detection is an important problem that has been extensively studied. Numerous approaches have been proposed across various medical application domains and we observe several similarities across these distinct applications. Despite this comparability, we observe a lack of structured organisation of these diverse research applications such that their advantages and limitations can be studied. The principal aim of this survey is to provide a thorough theoretical analysis of popular deep learning techniques in medical anomaly detection. In particular, we contribute a coherent and systematic review of state-of-the-art techniques, comparing and contrasting their architectural differences as well as training algorithms. Furthermore, we provide a comprehensive overview of deep model interpretation strategies that can be used to interpret model decisions. In addition, we outline the key limitations of existing deep medical anomaly detection techniques and propose key research directions for further investigation. △ Less

Submitted 13 April, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

Comments: Preprint submitted to ACM Computing Surveys

arXiv:2011.14497 [pdf, other]

doi 10.1109/ICRA48506.2021.9560915

Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling

Authors: Kavisha Vidanapathirana, Peyman Moghadam, Ben Harwood, Muming Zhao, Sridha Sridharan, Clinton Fookes

Abstract: Place Recognition enables the estimation of a globally consistent map and trajectory by providing non-local constraints in Simultaneous Localisation and Mapping (SLAM). This paper presents Locus, a novel place recognition method using 3D LiDAR point clouds in large-scale environments. We propose a method for extracting and encoding topological and temporal information related to components in a sc… ▽ More Place Recognition enables the estimation of a globally consistent map and trajectory by providing non-local constraints in Simultaneous Localisation and Mapping (SLAM). This paper presents Locus, a novel place recognition method using 3D LiDAR point clouds in large-scale environments. We propose a method for extracting and encoding topological and temporal information related to components in a scene and demonstrate how the inclusion of this auxiliary information in place description leads to more robust and discriminative scene representations. Second-order pooling along with a non-linear transform is used to aggregate these multi-level features to generate a fixed-length global descriptor, which is invariant to the permutation of input features. The proposed method outperforms state-of-the-art methods on the KITTI dataset. Furthermore, Locus is demonstrated to be robust across several challenging situations such as occlusions and viewpoint changes in 3D LiDAR point clouds. The open-source implementation is available at: https://github.com/csiro-robotics/locus . △ Less

Submitted 7 April, 2021; v1 submitted 29 November, 2020; originally announced November 2020.

Comments: ICRA 2021. Implementation available at: https://github.com/csiro-robotics/locus

arXiv:2011.12288 [pdf]

doi 10.14445/22312803/IJCTT-V68I11P105

Machine Learning (ML) In a 5G Standalone (SA) Self Organizing Network (SON)

Authors: Srinivasan Sridharan

Abstract: Machine learning (ML) is included in Self-organizing Networks (SONs) that are key drivers for enhancing the Operations, Administration, and Maintenance (OAM) activities. It is included in the 5G Standalone (SA) system is one of the 5G communication tracks that transforms 4G networking to next-generation technology that is based on mobile applications. The research's main aim is to an overview of m… ▽ More Machine learning (ML) is included in Self-organizing Networks (SONs) that are key drivers for enhancing the Operations, Administration, and Maintenance (OAM) activities. It is included in the 5G Standalone (SA) system is one of the 5G communication tracks that transforms 4G networking to next-generation technology that is based on mobile applications. The research's main aim is to an overview of machine learning (ML) in 5G standalone core networks. 5G Standalone is considered a key enabler by the service providers as it improves the efficacy of the throughput that edges the network. It also assists in advancing new cellular use cases like ultra-reliable low latency communications (URLLC) that supports combinations of frequencies. △ Less

Submitted 24 November, 2020; originally announced November 2020.

Comments: 5G, Machine learning (ML), Self-organizing Networks (SONs), 5G Standalone, Artificial Intelligence (AI)

arXiv:2011.11198 [pdf, other]

Complex-valued Iris Recognition Network

Authors: Kien Nguyen, Clinton Fookes, Sridha Sridharan, Arun Ross

Abstract: In this work, we design a fully complex-valued neural network for the task of iris recognition. Unlike the problem of general object recognition, where real-valued neural networks can be used to extract pertinent features, iris recognition depends on the extraction of both phase and magnitude information from the input iris texture in order to better represent its biometric content. This necessita… ▽ More In this work, we design a fully complex-valued neural network for the task of iris recognition. Unlike the problem of general object recognition, where real-valued neural networks can be used to extract pertinent features, iris recognition depends on the extraction of both phase and magnitude information from the input iris texture in order to better represent its biometric content. This necessitates the extraction and processing of phase information that cannot be effectively handled by a real-valued neural network. In this regard, we design a fully complex-valued neural network that can better capture the multi-scale, multi-resolution, and multi-orientation phase and amplitude features of the iris texture. We show a strong correspondence of the proposed complex-valued iris recognition network with Gabor wavelets that are used to generate the classical IrisCode; however, the proposed method enables a new capability of automatic complex-valued feature learning that is tailored for iris recognition. We conduct experiments on three benchmark datasets - ND-CrossSensor-2013, CASIA-Iris-Thousand and UBIRIS.v2 - and show the benefit of the proposed network for the task of iris recognition. We exploit visualization schemes to convey how the complex-valued network, when compared to standard real-valued networks, extracts fundamentally different features from the iris texture. △ Less

Submitted 15 February, 2022; v1 submitted 22 November, 2020; originally announced November 2020.

Comments: This paper has been accepted for publication in T-PAMI

arXiv:2011.09581 [pdf, other]

Patient-independent Epileptic Seizure Prediction using Deep Learning Models

Authors: Theekshana Dissanayake, Tharindu Fernando, Simon Denman, Sridha Sridharan, Clinton Fookes

Abstract: Objective: Epilepsy is one of the most prevalent neurological diseases among humans and can lead to severe brain injuries, strokes, and brain tumors. Early detection of seizures can help to mitigate injuries, and can be used to aid the treatment of patients with epilepsy. The purpose of a seizure prediction system is to successfully identify the pre-ictal brain stage, which occurs before a seizure… ▽ More Objective: Epilepsy is one of the most prevalent neurological diseases among humans and can lead to severe brain injuries, strokes, and brain tumors. Early detection of seizures can help to mitigate injuries, and can be used to aid the treatment of patients with epilepsy. The purpose of a seizure prediction system is to successfully identify the pre-ictal brain stage, which occurs before a seizure event. Patient-independent seizure prediction models are designed to offer accurate performance across multiple subjects within a dataset, and have been identified as a real-world solution to the seizure prediction problem. However, little attention has been given for designing such models to adapt to the high inter-subject variability in EEG data. Methods: We propose two patient-independent deep learning architectures with different learning strategies that can learn a global function utilizing data from multiple subjects. Results: Proposed models achieve state-of-the-art performance for seizure prediction on the CHB-MIT-EEG dataset, demonstrating 88.81% and 91.54% accuracy respectively. Conclusions: The Siamese model trained on the proposed learning strategy is able to learn patterns related to patient variations in data while predicting seizures. Significance: Our models show superior performance for patient-independent seizure prediction, and the same architecture can be used as a patient-specific classifier after model adaptation. We are the first study that employs model interpretation to understand classifier behavior for the task for seizure prediction, and we also show that the MFCC feature map utilized by our models contains predictive biomarkers related to interictal and pre-ictal brain states. △ Less

Submitted 18 November, 2020; originally announced November 2020.

arXiv:2011.06207 [pdf, other]

Domain Generalization in Biosignal Classification

Authors: Theekshana Dissanayake, Tharindu Fernando, Simon Denman, Houman Ghaemmaghami, Sridha Sridharan, Clinton Fookes

Abstract: Objective: When training machine learning models, we often assume that the training data and evaluation data are sampled from the same distribution. However, this assumption is violated when the model is evaluated on another unseen but similar database, even if that database contains the same classes. This problem is caused by domain-shift and can be solved using two approaches: domain adaptation… ▽ More Objective: When training machine learning models, we often assume that the training data and evaluation data are sampled from the same distribution. However, this assumption is violated when the model is evaluated on another unseen but similar database, even if that database contains the same classes. This problem is caused by domain-shift and can be solved using two approaches: domain adaptation and domain generalization. Simply, domain adaptation methods can access data from unseen domains during training; whereas in domain generalization, the unseen data is not available during training. Hence, domain generalization concerns models that perform well on inaccessible, domain-shifted data. Method: Our proposed domain generalization method represents an unseen domain using a set of known basis domains, afterwhich we classify the unseen domain using classifier fusion. To demonstrate our system, we employ a collection of heart sound databases that contain normal and abnormal sounds (classes). Results: Our proposed classifier fusion method achieves accuracy gains of up to 16% for four completely unseen domains. Conclusion: Recognizing the complexity induced by the inherent temporal nature of biosignal data, the two-stage method proposed in this study is able to effectively simplify the whole process of domain generalization while demonstrating good results on unseen domains and the adopted basis domains. Significance: To our best knowledge, this is the first study that investigates domain generalization for biosignal data. Our proposed learning strategy can be used to effectively learn domain-relevant features while being aware of the class differences in the data. △ Less

Submitted 12 November, 2020; originally announced November 2020.

arXiv:2011.05438 [pdf, other]

Fast & Slow Learning: Incorporating Synthetic Gradients in Neural Memory Controllers

Authors: Tharindu Fernando, Simon Denman, Sridha Sridharan, Clinton Fookes

Abstract: Neural Memory Networks (NMNs) have received increased attention in recent years compared to deep architectures that use a constrained memory. Despite their new appeal, the success of NMNs hinges on the ability of the gradient-based optimiser to perform incremental training of the NMN controllers, determining how to leverage their high capacity for knowledge retrieval. This means that while excelle… ▽ More Neural Memory Networks (NMNs) have received increased attention in recent years compared to deep architectures that use a constrained memory. Despite their new appeal, the success of NMNs hinges on the ability of the gradient-based optimiser to perform incremental training of the NMN controllers, determining how to leverage their high capacity for knowledge retrieval. This means that while excellent performance can be achieved when the training data is consistent and well distributed, rare data samples are hard to learn from as the controllers fail to incorporate them effectively during model training. Drawing inspiration from the human cognition process, in particular the utilisation of neuromodulators in the human brain, we propose to decouple the learning process of the NMN controllers to allow them to achieve flexible, rapid adaptation in the presence of new information. This trait is highly beneficial for meta-learning tasks where the memory controllers must quickly grasp abstract concepts in the target domain, and adapt stored knowledge. This allows the NMN controllers to quickly determine which memories are to be retained and which are to be erased, and swiftly adapt their strategy to the new task at hand. Through both quantitative and qualitative evaluations on multiple public benchmarks, including classification and regression tasks, we demonstrate the utility of the proposed approach. Our evaluations not only highlight the ability of the proposed NMN architecture to outperform the current state-of-the-art methods, but also provide insights on how the proposed augmentations help achieve such superior results. In addition, we demonstrate the practical implications of the proposed learning strategy, where the feedback path can be shared among multiple neural memory networks as a mechanism for knowledge sharing. △ Less

Submitted 10 November, 2020; originally announced November 2020.

Showing 1–50 of 374 results for author: Sridharan, S