Search | arXiv e-print repository

Collective Decision-Making on Task Allocation Feasibility

Authors: Samratul Fuady, Danesh Tarapore, Shoaib Ehsan, Mohammad D. Soorati

Abstract: Robot swarms offer the potential to bring several advantages to the real-world applications but deploying them presents challenges in ensuring feasibility across diverse environments. Assessing the feasibility of new tasks for swarms is crucial to ensure the effective utilisation of resources, as well as to provide awareness of the suitability of a swarm solution for a particular task. In this pap… ▽ More Robot swarms offer the potential to bring several advantages to the real-world applications but deploying them presents challenges in ensuring feasibility across diverse environments. Assessing the feasibility of new tasks for swarms is crucial to ensure the effective utilisation of resources, as well as to provide awareness of the suitability of a swarm solution for a particular task. In this paper, we introduce the concept of distributed feasibility, where the swarm collectively assesses the feasibility of task allocation based on local observations and interactions. We apply Direct Modulation of Majority-based Decisions as our collective decision-making strategy and show that, in a homogeneous setting, the swarm is able to collectively decide whether a given setup has a high or low feasibility as long as the robot-to-task ratio is not near one. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 3 Pages, 3 Figures, Accepted to ICRA 2024 Workshop "Breaking Swarm Stereotypes"

arXiv:2405.02297 [pdf, other]

Employing Universal Voting Schemes for Improved Visual Place Recognition Performance

Authors: Maria Waheed, Michael Milford, Xiaojun Zhai, Maria Fasli, Klaus McDonald-Maier, Shoaib Ehsan

Abstract: Visual Place Recognition has been the subject of many endeavours utilizing different ensemble approaches to improve VPR performance. Ideas like multi-process fusion, Fly-Inspired Voting Units, SwitchHit or Switch-Fuse involve combining different VPR techniques together, utilizing different strategies. However, a major aspect often common to many of these strategies is voting. Voting is an extremel… ▽ More Visual Place Recognition has been the subject of many endeavours utilizing different ensemble approaches to improve VPR performance. Ideas like multi-process fusion, Fly-Inspired Voting Units, SwitchHit or Switch-Fuse involve combining different VPR techniques together, utilizing different strategies. However, a major aspect often common to many of these strategies is voting. Voting is an extremely relevant topic to explore in terms of its application and significance for any ensemble VPR setup. This paper analyses several voting schemes to maximise the place detection accuracy of a VPR ensemble set up and determine the optimal voting schemes for selection. We take inspiration from a variety of voting schemes that are widely employed in fields such as politics and sociology and it is evident via empirical data that the selection of the voting method influences the results drastically. The paper tests a wide variety of voting schemes to present the improvement in the VPR results for several data sets. We aim to determine whether a single optimal voting scheme exists or, much like in other fields of research, the selection of a voting technique is relative to its application and environment. We propose a ranking of these different voting methods from best to worst which allows for better selection. While presenting our results in terms of voting method's performance bounds, in form of radar charts, PR curves to showcase the difference in performance and a comparison methodology using a McNemar test variant to determine the statistical significance of the differences. This test is performed to further confirm the reliability of outcomes and draw comparisons for better and informed selection a voting technique. △ Less

Submitted 8 March, 2024; originally announced May 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2305.05705

arXiv:2401.08263 [pdf, other]

Multi-Technique Sequential Information Consistency For Dynamic Visual Place Recognition In Changing Environments

Authors: Bruno Arcanjo, Bruno Ferrarini, Michael Milford, Klaus D. McDonald-Maier, Shoaib Ehsan

Abstract: Visual place recognition (VPR) is an essential component of robot navigation and localization systems that allows them to identify a place using only image data. VPR is challenging due to the significant changes in a place's appearance driven by different daily illumination, seasonal weather variations and diverse viewpoints. Currently, no single VPR technique excels in every environmental conditi… ▽ More Visual place recognition (VPR) is an essential component of robot navigation and localization systems that allows them to identify a place using only image data. VPR is challenging due to the significant changes in a place's appearance driven by different daily illumination, seasonal weather variations and diverse viewpoints. Currently, no single VPR technique excels in every environmental condition, each exhibiting unique benefits and shortcomings, and therefore combining multiple techniques can achieve more reliable VPR performance. Present multi-method approaches either rely on online ground-truth information, which is often not available, or on brute-force technique combination, potentially lowering performance with high variance technique sets. Addressing these shortcomings, we propose a VPR system dubbed Multi-Sequential Information Consistency (MuSIC) which leverages sequential information to select the most cohesive technique on an online per-frame basis. For each technique in a set, MuSIC computes their respective sequential consistencies by analysing the frame-to-frame continuity of their top match candidates, which are then directly compared to select the optimal technique for the current query image. The use of sequential information to select between VPR methods results in an overall VPR performance increase across different benchmark datasets, while avoiding the need for extra ground-truth of the runtime environment. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: arXiv admin note: text overlap with arXiv:2303.14247

arXiv:2312.12995 [pdf, other]

Aggregating Multiple Bio-Inspired Image Region Classifiers For Effective And Lightweight Visual Place Recognition

Authors: Bruno Arcanjo, Bruno Ferrarini, Maria Fasli, Michael Milford, Klaus D. McDonald-Maier, Shoaib Ehsan

Abstract: Visual place recognition (VPR) enables autonomous systems to localize themselves within an environment using image information. While VPR techniques built upon a Convolutional Neural Network (CNN) backbone dominate state-of-the-art VPR performance, their high computational requirements make them unsuitable for platforms equipped with low-end hardware. Recently, a lightweight VPR system based on mu… ▽ More Visual place recognition (VPR) enables autonomous systems to localize themselves within an environment using image information. While VPR techniques built upon a Convolutional Neural Network (CNN) backbone dominate state-of-the-art VPR performance, their high computational requirements make them unsuitable for platforms equipped with low-end hardware. Recently, a lightweight VPR system based on multiple bio-inspired classifiers, dubbed DrosoNets, has been proposed, achieving great computational efficiency at the cost of reduced absolute place retrieval performance. In this work, we propose a novel multi-DrosoNet localization system, dubbed RegionDrosoNet, with significantly improved VPR performance, while preserving a low-computational profile. Our approach relies on specializing distinct groups of DrosoNets on differently sliced partitions of the original image, increasing extrinsic model differentiation. Furthermore, we introduce a novel voting module to combine the outputs of all DrosoNets into the final place prediction which considers multiple top refence candidates from each DrosoNet. RegionDrosoNet outperforms other lightweight VPR techniques when dealing with both appearance changes and viewpoint variations. Moreover, it competes with computationally expensive methods on some benchmark datasets at a small fraction of their online inference time. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.09028 [pdf, other]

Design Space Exploration of Low-Bit Quantized Neural Networks for Visual Place Recognition

Authors: Oliver Grainge, Michael Milford, Indu Bodala, Sarvapali D. Ramchurn, Shoaib Ehsan

Abstract: Visual Place Recognition (VPR) is a critical task for performing global re-localization in visual perception systems. It requires the ability to accurately recognize a previously visited location under variations such as illumination, occlusion, appearance and viewpoint. In the case of robotic systems and augmented reality, the target devices for deployment are battery powered edge devices. Theref… ▽ More Visual Place Recognition (VPR) is a critical task for performing global re-localization in visual perception systems. It requires the ability to accurately recognize a previously visited location under variations such as illumination, occlusion, appearance and viewpoint. In the case of robotic systems and augmented reality, the target devices for deployment are battery powered edge devices. Therefore whilst the accuracy of VPR methods is important so too is memory consumption and latency. Recently new works have focused on the recall@1 metric as a performance measure with limited focus on resource utilization. This has resulted in methods that use deep learning models too large to deploy on low powered edge devices. We hypothesize that these large models are highly over-parameterized and can be optimized to satisfy the constraints of a low powered embedded system whilst maintaining high recall performance. Our work studies the impact of compact convolutional network architecture design in combination with full-precision and mixed-precision post-training quantization on VPR performance. Importantly we not only measure performance via the recall@1 score but also measure memory consumption and latency. We characterize the design implications on memory, latency and recall scores and provide a number of design recommendations for VPR systems under these resource limitations. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2305.05776 [pdf, other]

Visual Place Recognition with Low-Resolution Images

Authors: Mihnea-Alexandru Tomita, Bruno Ferrarini, Michael Milford, Klaus McDonald-Maier, Shoaib Ehsan

Abstract: Images incorporate a wealth of information from a robot's surroundings. With the widespread availability of compact cameras, visual information has become increasingly popular for addressing the localisation problem, which is then termed as Visual Place Recognition (VPR). While many applications use high-resolution cameras and high-end systems to achieve optimal place-matching performance, low-end… ▽ More Images incorporate a wealth of information from a robot's surroundings. With the widespread availability of compact cameras, visual information has become increasingly popular for addressing the localisation problem, which is then termed as Visual Place Recognition (VPR). While many applications use high-resolution cameras and high-end systems to achieve optimal place-matching performance, low-end commercial systems face limitations due to resource constraints and relatively low-resolution and low-quality cameras. In this paper, we analyse the effects of image resolution on the accuracy and robustness of well-established handcrafted VPR pipelines. Handcrafted designs have low computational demands and can adapt to flexible image resolutions, making them a suitable approach to scale to any image source and to operate under resource limitations. This paper aims to help academic researchers and companies in the hardware and software industry co-design VPR solutions and expand the use of VPR algorithms in commercial products. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: The paper has been accepted for presentation at the Active Methods in Autonomous Navigation Workshop, part of the 2023 International Conference on Robotics and Automation (ICRA)

arXiv:2305.05705 [pdf, other]

An Evaluation and Ranking of Different Voting Schemes for Improved Visual Place Recognition

Authors: Maria Waheed, Michael Milford, Xiaojun Zhai, Klaus McDonald-Maier, Shoaib Ehsan

Abstract: Visual Place Recognition has recently seen a surge of endeavours utilizing different ensemble approaches to improve VPR performance. Ideas like multi-process fusion or switching involve combining different VPR techniques together, utilizing different strategies. One major aspect often common to many of these strategies is voting. Voting is widely used in many ensemble methods, so it is potentially… ▽ More Visual Place Recognition has recently seen a surge of endeavours utilizing different ensemble approaches to improve VPR performance. Ideas like multi-process fusion or switching involve combining different VPR techniques together, utilizing different strategies. One major aspect often common to many of these strategies is voting. Voting is widely used in many ensemble methods, so it is potentially a relevant subject to explore in terms of its application and significance for improving VPR performance. This paper attempts to looks into detail and analyze a variety of voting schemes to evaluate which voting technique is optimal for an ensemble VPR set up. We take inspiration from a variety of voting schemes that exist and are widely employed in other research fields such as politics and sociology. The idea is inspired by an observation that different voting methods result in different outcomes for the same type of data and each voting scheme is utilized for specific cases in different academic fields. Some of these voting schemes include Condorcet voting, Broda Count and Plurality voting. Voting employed in any aspect requires that a fair system be established, that outputs the best and most favourable results which in our case would involve improving VPR performance. We evaluate some of these voting techniques in a standardized testing of different VPR techniques, using a variety of VPR data sets. We aim to determine whether a single optimal voting scheme exists or, much like in other fields of research, the selection of a voting technique is relative to its application and environment. We also aim to propose a ranking of these different voting methods from best to worst according to our results as this will allow for better selection of voting schemes. △ Less

Submitted 9 May, 2023; originally announced May 2023.

arXiv:2305.05256 [pdf, other]

Patch-DrosoNet: Classifying Image Partitions With Fly-Inspired Models For Lightweight Visual Place Recognition

Authors: Bruno Arcanjo, Bruno Ferrarini, Michael Milford, Klaus D. McDonald-Maier, Shoaib Ehsan

Abstract: Visual place recognition (VPR) enables autonomous systems to localize themselves within an environment using image information. While Convolution Neural Networks (CNNs) currently dominate state-of-the-art VPR performance, their high computational requirements make them unsuitable for platforms with budget or size constraints. This has spurred the development of lightweight algorithms, such as Dros… ▽ More Visual place recognition (VPR) enables autonomous systems to localize themselves within an environment using image information. While Convolution Neural Networks (CNNs) currently dominate state-of-the-art VPR performance, their high computational requirements make them unsuitable for platforms with budget or size constraints. This has spurred the development of lightweight algorithms, such as DrosoNet, which employs a voting system based on multiple bio-inspired units. In this paper, we present a novel training approach for DrosoNet, wherein separate models are trained on distinct regions of a reference image, allowing them to specialize in the visual features of that specific section. Additionally, we introduce a convolutional-like prediction method, in which each DrosoNet unit generates a set of place predictions for each portion of the query image. These predictions are then combined using the previously introduced voting system. Our approach significantly improves upon the VPR performance of previous work while maintaining an extremely compact and lightweight algorithm, making it suitable for resource-constrained platforms. △ Less

Submitted 9 May, 2023; originally announced May 2023.

arXiv:2303.14247 [pdf, other]

A-MuSIC: An Adaptive Ensemble System For Visual Place Recognition In Changing Environments

Authors: Bruno Arcanjo, Bruno Ferrarini, Michael Milford, Klaus D. McDonald-Maier, Shoaib Ehsan

Abstract: Visual place recognition (VPR) is an essential component of robot navigation and localization systems that allows them to identify a place using only image data. VPR is challenging due to the significant changes in a place's appearance under different illumination throughout the day, with seasonal weather and when observed from different viewpoints. Currently, no single VPR technique excels in eve… ▽ More Visual place recognition (VPR) is an essential component of robot navigation and localization systems that allows them to identify a place using only image data. VPR is challenging due to the significant changes in a place's appearance under different illumination throughout the day, with seasonal weather and when observed from different viewpoints. Currently, no single VPR technique excels in every environmental condition, each exhibiting unique benefits and shortcomings. As a result, VPR systems combining multiple techniques achieve more reliable VPR performance in changing environments, at the cost of higher computational loads. Addressing this shortcoming, we propose an adaptive VPR system dubbed Adaptive Multi-Self Identification and Correction (A-MuSIC). We start by developing a method to collect information of the runtime performance of a VPR technique by analysing the frame-to-frame continuity of matched queries. We then demonstrate how to operate the method on a static ensemble of techniques, generating data on which techniques are contributing the most for the current environment. A-MuSIC uses the collected information to both select a minimal subset of techniques and to decide when a re-selection is required during navigation. A-MuSIC matches or beats state-of-the-art VPR performance across all tested benchmark datasets while maintaining its computational load on par with individual techniques. △ Less

Submitted 24 March, 2023; originally announced March 2023.

arXiv:2303.00714 [pdf, other]

A Complementarity-Based Switch-Fuse System for Improved Visual Place Recognition

Authors: Maria Waheed, Sania Waheed, Michael Milford, Klaus McDonald-Maier, Shoaib Ehsan

Abstract: Recently several fusion and switching based approaches have been presented to solve the problem of Visual Place Recognition. In spite of these systems demonstrating significant boost in VPR performance they each have their own set of limitations. The multi-process fusion systems usually involve employing brute force and running all available VPR techniques simultaneously while the switching method… ▽ More Recently several fusion and switching based approaches have been presented to solve the problem of Visual Place Recognition. In spite of these systems demonstrating significant boost in VPR performance they each have their own set of limitations. The multi-process fusion systems usually involve employing brute force and running all available VPR techniques simultaneously while the switching method attempts to negate this practise by only selecting the best suited VPR technique for given query image. But switching does fail at times when no available suitable technique can be identified. An innovative solution would be an amalgamation of the two otherwise discrete approaches to combine their competitive advantages while negating their shortcomings. The proposed, Switch-Fuse system, is an interesting way to combine both the robustness of switching VPR techniques based on complementarity and the force of fusing the carefully selected techniques to significantly improve performance. Our system holds a structure superior to the basic fusion methods as instead of simply fusing all or any random techniques, it is structured to first select the best possible VPR techniques for fusion, according to the query image. The system combines two significant processes, switching and fusing VPR techniques, which together as a hybrid model substantially improve performance on all major VPR data sets illustrated using PR curves. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: arXiv admin note: text overlap with arXiv:2203.00591

arXiv:2302.13314 [pdf, other]

Data-Efficient Sequence-Based Visual Place Recognition with Highly Compressed JPEG Images

Authors: Mihnea-Alexandru Tomita, Bruno Ferrarini, Michael Milford, Klaus McDonald-Maier, Shoaib Ehsan

Abstract: Visual Place Recognition (VPR) is a fundamental task that allows a robotic platform to successfully localise itself in the environment. For decentralised VPR applications where the visual data has to be transmitted between several agents, the communication channel may restrict the localisation process when limited bandwidth is available. JPEG is an image compression standard that can employ high c… ▽ More Visual Place Recognition (VPR) is a fundamental task that allows a robotic platform to successfully localise itself in the environment. For decentralised VPR applications where the visual data has to be transmitted between several agents, the communication channel may restrict the localisation process when limited bandwidth is available. JPEG is an image compression standard that can employ high compression ratios to facilitate lower data transmission for VPR applications. However, when applying high levels of JPEG compression, both the image clarity and size are drastically reduced. In this paper, we incorporate sequence-based filtering in a number of well-established, learnt and non-learnt VPR techniques to overcome the performance loss resulted from introducing high levels of JPEG compression. The sequence length that enables 100% place matching performance is reported and an analysis of the amount of data required for each VPR technique to perform the transfer on the entire spectrum of JPEG compression is provided. Moreover, the time required by each VPR technique to perform place matching is investigated, on both uniformly and non-uniformly JPEG compressed data. The results show that it is beneficial to use a highly compressed JPEG dataset with an increased sequence length, as similar levels of VPR performance are reported at a significantly reduced bandwidth. The results presented in this paper also emphasize that there is a trade-off between the amount of data transferred and the total time required to perform VPR. Our experiments also suggest that is often favourable to compress the query images to the same quality of the map, as more efficient place matching can be performed. The experiments are conducted on several VPR datasets, under mild to extreme JPEG compression. △ Less

Submitted 26 February, 2023; originally announced February 2023.

arXiv:2210.00834 [pdf, other]

Merging Classification Predictions with Sequential Information for Lightweight Visual Place Recognition in Changing Environments

Authors: Bruno Arcanjo, Bruno Ferrarini, Michael Milford, Klaus D. McDonald-Maier, Shoaib Ehsan

Abstract: Low-overhead visual place recognition (VPR) is a highly active research topic. Mobile robotics applications often operate under low-end hardware, and even more hardware capable systems can still benefit from freeing up onboard system resources for other navigation tasks. This work addresses lightweight VPR by proposing a novel system based on the combination of binary-weighted classifier networks… ▽ More Low-overhead visual place recognition (VPR) is a highly active research topic. Mobile robotics applications often operate under low-end hardware, and even more hardware capable systems can still benefit from freeing up onboard system resources for other navigation tasks. This work addresses lightweight VPR by proposing a novel system based on the combination of binary-weighted classifier networks with a one-dimensional convolutional network, dubbed merger. Recent work in fusing multiple VPR techniques has mainly focused on increasing VPR performance, with computational efficiency not being highly prioritized. In contrast, we design our technique prioritizing low inference times, taking inspiration from the machine learning literature where the efficient combination of classifiers is a heavily researched topic. Our experiments show that the merger achieves inference times as low as 1 millisecond, being significantly faster than other well-established lightweight VPR techniques, while achieving comparable or superior VPR performance on several visual changes such as seasonal variations and viewpoint lateral shifts. △ Less

Submitted 3 October, 2022; originally announced October 2022.

arXiv:2209.08343 [pdf, other]

Data Efficient Visual Place Recognition Using Extremely JPEG-Compressed Images

Authors: Mihnea-Alexandru Tomita, Bruno Ferrarini, Michael Milford, Klaus McDonald-Maier, Shoaib Ehsan

Abstract: Visual Place Recognition (VPR) is the ability of a robotic platform to correctly interpret visual stimuli from its on-board cameras in order to determine whether it is currently located in a previously visited place, despite different viewpoint, illumination and appearance changes. JPEG is a widely used image compression standard that is capable of significantly reducing the size of an image at th… ▽ More Visual Place Recognition (VPR) is the ability of a robotic platform to correctly interpret visual stimuli from its on-board cameras in order to determine whether it is currently located in a previously visited place, despite different viewpoint, illumination and appearance changes. JPEG is a widely used image compression standard that is capable of significantly reducing the size of an image at the cost of image clarity. For applications where several robotic platforms are simultaneously deployed, the visual data gathered must be transmitted remotely between each robot. Hence, JPEG compression can be employed to drastically reduce the amount of data transmitted over a communication channel, as working with limited bandwidth for VPR can be proven to be a challenging task. However, the effects of JPEG compression on the performance of current VPR techniques have not been previously studied. For this reason, this paper presents an in-depth study of JPEG compression in VPR related scenarios. We use a selection of well-established VPR techniques on well-established benchmark datasets with various amounts of compression applied. We show that by introducing compression, the VPR performance is drastically reduced, especially in the higher spectrum of compression. Moreover, this paper demonstrates how fine-tuning a CNN can be utilised as an optimisation method for JPEG compressed data to perform more consistently with the image transformations detected in extremely JPEG compressed images. △ Less

Submitted 1 March, 2023; v1 submitted 17 September, 2022; originally announced September 2022.

Comments: The paper is currently under-review. 8 pages, 8 figures

arXiv:2203.00591 [pdf, other]

SwitchHit: A Probabilistic, Complementarity-Based Switching System for Improved Visual Place Recognition in Changing Environments

Authors: Maria Waheed, Michael Milford, Klaus McDonald-Maier, Shoaib Ehsan

Abstract: Visual place recognition (VPR), a fundamental task in computer vision and robotics, is the problem of identifying a place mainly based on visual information. Viewpoint and appearance changes, such as due to weather and seasonal variations, make this task challenging. Currently, there is no universal VPR technique that can work in all types of environments, on a variety of robotic platforms, and un… ▽ More Visual place recognition (VPR), a fundamental task in computer vision and robotics, is the problem of identifying a place mainly based on visual information. Viewpoint and appearance changes, such as due to weather and seasonal variations, make this task challenging. Currently, there is no universal VPR technique that can work in all types of environments, on a variety of robotic platforms, and under a wide range of viewpoint and appearance changes. Recent work has shown the potential of combining different VPR methods intelligently by evaluating complementarity for some specific VPR datasets to achieve better performance. This, however, requires ground truth information (correct matches) which is not available when a robot is deployed in a real-world scenario. Moreover, running multiple VPR techniques in parallel may be prohibitive for resource-constrained embedded platforms. To overcome these limitations, this paper presents a probabilistic complementarity based switching VPR system, SwitchHit. Our proposed system consists of multiple VPR techniques, however, it does not simply run all techniques at once, rather predicts the probability of correct match for an incoming query image and dynamically switches to another complementary technique if the probability of correctly matching the query is below a certain threshold. This innovative use of multiple VPR techniques allow our system to be more efficient and robust than other combined VPR approaches employing brute force and running multiple VPR techniques at once. Thus making it more suitable for resource constrained embedded systems and achieving an overall superior performance from what any individual VPR method in the system could have by achieved running independently. △ Less

Submitted 1 March, 2022; originally announced March 2022.

arXiv:2202.12375 [pdf, other]

Highly-Efficient Binary Neural Networks for Visual Place Recognition

Authors: Bruno Ferrarini, Michael Milford, Klaus D. McDonald-Maier, Shoaib Ehsan

Abstract: VPR is a fundamental task for autonomous navigation as it enables a robot to localize itself in the workspace when a known location is detected. Although accuracy is an essential requirement for a VPR technique, computational and energy efficiency are not less important for real-world applications. CNN-based techniques archive state-of-the-art VPR performance but are computationally intensive and… ▽ More VPR is a fundamental task for autonomous navigation as it enables a robot to localize itself in the workspace when a known location is detected. Although accuracy is an essential requirement for a VPR technique, computational and energy efficiency are not less important for real-world applications. CNN-based techniques archive state-of-the-art VPR performance but are computationally intensive and energy demanding. Binary neural networks (BNN) have been recently proposed to address VPR efficiently. Although a typical BNN is an order of magnitude more efficient than a CNN, its processing time and energy usage can be further improved. In a typical BNN, the first convolution is not completely binarized for the sake of accuracy. Consequently, the first layer is the slowest network stage, requiring a large share of the entire computational effort. This paper presents a class of BNNs for VPR that combines depthwise separable factorization and binarization to replace the first convolutional layer to improve computational and energy efficiency. Our best model achieves state-of-the-art VPR performance while spending considerably less time and energy to process an image than a BNN using a non-binary convolution as a first stage. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: 8 pages, 10 figures, 2 tables

Journal ref: Published and Presented at IROS 2022, Kyoto

arXiv:2201.01588 [pdf, other]

Using Machine Learning for Anomaly Detection on a System-on-Chip under Gamma Radiation

Authors: Eduardo Weber Wachter, Server Kasap, Sefki Kolozali, Xiaojun Zhai, Shoaib Ehsan, Klaus McDonald-Maier

Abstract: The emergence of new nanoscale technologies has imposed significant challenges to designing reliable electronic systems in radiation environments. A few types of radiation like Total Ionizing Dose (TID) effects often cause permanent damages on such nanoscale electronic devices, and current state-of-the-art technologies to tackle TID make use of expensive radiation-hardened devices. This paper focu… ▽ More The emergence of new nanoscale technologies has imposed significant challenges to designing reliable electronic systems in radiation environments. A few types of radiation like Total Ionizing Dose (TID) effects often cause permanent damages on such nanoscale electronic devices, and current state-of-the-art technologies to tackle TID make use of expensive radiation-hardened devices. This paper focuses on a novel and different approach: using machine learning algorithms on consumer electronic level Field Programmable Gate Arrays (FPGAs) to tackle TID effects and monitor them to replace before they stop working. This condition has a research challenge to anticipate when the board results in a total failure due to TID effects. We observed internal measurements of the FPGA boards under gamma radiation and used three different anomaly detection machine learning (ML) algorithms to detect anomalies in the sensor measurements in a gamma-radiated environment. The statistical results show a highly significant relationship between the gamma radiation exposure levels and the board measurements. Moreover, our anomaly detection results have shown that a One-Class Support Vector Machine with Radial Basis Function Kernel has an average Recall score of 0.95. Also, all anomalies can be detected before the boards stop working. △ Less

Submitted 5 January, 2022; originally announced January 2022.

arXiv:2109.11002 [pdf, other]

A Benchmark Comparison of Visual Place Recognition Techniques for Resource-Constrained Embedded Platforms

Authors: Rose Power, Mubariz Zaffar, Bruno Ferrarini, Michael Milford, Klaus McDonald-Maier, Shoaib Ehsan

Abstract: Visual Place Recognition (VPR) has been a subject of significant research over the last 15 to 20 years. VPR is a fundamental task for autonomous navigation as it enables self-localization within an environment. Although robots are often equipped with resource-constrained hardware, the computational requirements of and effects on VPR techniques have received little attention. In this work, we prese… ▽ More Visual Place Recognition (VPR) has been a subject of significant research over the last 15 to 20 years. VPR is a fundamental task for autonomous navigation as it enables self-localization within an environment. Although robots are often equipped with resource-constrained hardware, the computational requirements of and effects on VPR techniques have received little attention. In this work, we present a hardware-focused benchmark evaluation of a number of state-of-the-art VPR techniques on public datasets. We consider popular single board computers, including ODroid, UP and Raspberry Pi 3, in addition to a commodity desktop and laptop for reference. We present our analysis based on several key metrics, including place-matching accuracy, image encoding time, descriptor matching time and memory needs. Key questions addressed include: (1) How does the performance accuracy of a VPR technique change with processor architecture? (2) How does power consumption vary for different VPR techniques and embedded platforms? (3) How much does descriptor size matter in comparison to today's embedded platforms' storage? (4) How does the performance of a high-end platform relate to an on-board low-end embedded platform for VPR? The extensive analysis and results in this work serve not only as a benchmark for the VPR community, but also provide useful insights for real-world adoption of VPR applications. △ Less

Submitted 22 September, 2021; originally announced September 2021.

arXiv:2109.10986 [pdf, other]

An Efficient and Scalable Collection of Fly-inspired Voting Units for Visual Place Recognition in Changing Environments

Authors: Bruno Arcanjo, Bruno Ferrarini, Michael Milford, Klaus D. McDonald-Maier, Shoaib Ehsan

Abstract: State-of-the-art visual place recognition performance is currently being achieved utilizing deep learning based approaches. Despite the recent efforts in designing lightweight convolutional neural network based models, these can still be too expensive for the most hardware restricted robot applications. Low-overhead VPR techniques would not only enable platforms equipped with low-end, cheap hardwa… ▽ More State-of-the-art visual place recognition performance is currently being achieved utilizing deep learning based approaches. Despite the recent efforts in designing lightweight convolutional neural network based models, these can still be too expensive for the most hardware restricted robot applications. Low-overhead VPR techniques would not only enable platforms equipped with low-end, cheap hardware but also reduce computation on more powerful systems, allowing these resources to be allocated for other navigation tasks. In this work, our goal is to provide an algorithm of extreme compactness and efficiency while achieving state-of-the-art robustness to appearance changes and small point-of-view variations. Our first contribution is DrosoNet, an exceptionally compact model inspired by the odor processing abilities of the fruit fly, Drosophyla melanogaster. Our second and main contribution is a voting mechanism that leverages multiple small and efficient classifiers to achieve more robust and consistent VPR compared to a single one. We use DrosoNet as the baseline classifier for the voting mechanism and evaluate our models on five benchmark datasets, assessing moderate to extreme appearance changes and small to moderate viewpoint variations. We then compare the proposed algorithms to state-of-the-art methods, both in terms of precision-recall AUC results and computational efficiency. △ Less

Submitted 22 September, 2021; originally announced September 2021.

arXiv:2103.01994 [pdf, other]

doi 10.1109/ACCESS.2022.3196389

Sequence-Based Filtering for Visual Route-Based Navigation: Analysing the Benefits, Trade-offs and Design Choices

Authors: Mihnea-Alexandru Tomită, Mubariz Zaffar, Michael Milford, Klaus McDonald-Maier, Shoaib Ehsan

Abstract: Visual Place Recognition (VPR) is the ability to correctly recall a previously visited place using visual information under environmental, viewpoint and appearance changes. An emerging trend in VPR is the use of sequence-based filtering methods on top of single-frame-based place matching techniques for route-based navigation. The combination leads to varying levels of potential place matching perf… ▽ More Visual Place Recognition (VPR) is the ability to correctly recall a previously visited place using visual information under environmental, viewpoint and appearance changes. An emerging trend in VPR is the use of sequence-based filtering methods on top of single-frame-based place matching techniques for route-based navigation. The combination leads to varying levels of potential place matching performance boosts at increased computational costs. This raises a number of interesting research questions: How does performance boost (due to sequential filtering) vary along the entire spectrum of single-frame-based matching methods? How does sequence matching length affect the performance curve? Which specific combinations provide a good trade-off between performance and computation? However, there is lack of previous work looking at these important questions and most of the sequence-based filtering work to date has been used without a systematic approach. To bridge this research gap, this paper conducts an in-depth investigation of the relationship between the performance of single-frame-based place matching techniques and the use of sequence-based filtering on top of those methods. It analyzes individual trade-offs, properties and limitations for different combinations of single-frame-based and sequential techniques. A number of state-of-the-art VPR methods and widely used public datasets are utilized to present the findings that contain a number of meaningful insights for the VPR community. △ Less

Submitted 2 March, 2021; originally announced March 2021.

Comments: 7 pages, currently under-review

arXiv:2102.12728 [pdf, other]

Scene Retrieval for Contextual Visual Mapping

Authors: William H. B. Smith, Michael Milford, Klaus D. McDonald-Maier, Shoaib Ehsan

Abstract: Visual navigation localizes a query place image against a reference database of place images, also known as a `visual map'. Localization accuracy requirements for specific areas of the visual map, `scene classes', vary according to the context of the environment and task. State-of-the-art visual mapping is unable to reflect these requirements by explicitly targetting scene classes for inclusion in… ▽ More Visual navigation localizes a query place image against a reference database of place images, also known as a `visual map'. Localization accuracy requirements for specific areas of the visual map, `scene classes', vary according to the context of the environment and task. State-of-the-art visual mapping is unable to reflect these requirements by explicitly targetting scene classes for inclusion in the map. Four different scene classes, including pedestrian crossings and stations, are identified in each of the Nordland and St. Lucia datasets. Instead of re-training separate scene classifiers which struggle with these overlapping scene classes we make our first contribution: defining the problem of `scene retrieval'. Scene retrieval extends image retrieval to classification of scenes defined at test time by associating a single query image to reference images of scene classes. Our second contribution is a triplet-trained convolutional neural network (CNN) to address this problem which increases scene classification accuracy by up to 7% against state-of-the-art networks pre-trained for scene recognition. The second contribution is an algorithm `DMC' that combines our scene classification with distance and memorability for visual mapping. Our analysis shows that DMC includes 64% more images of our chosen scene classes in a visual map than just using distance interval mapping. State-of-the-art visual place descriptors AMOS-Net, Hybrid-Net and NetVLAD are finally used to show that DMC improves scene class localization accuracy by a mean of 3% and localization accuracy of the remaining map images by a mean of 10% across both datasets. △ Less

Submitted 25 February, 2021; originally announced February 2021.

Comments: 8 page paper on visual place recogniton and scene classification

arXiv:2102.08416 [pdf, other]

Improving Visual Place Recognition Performance by Maximising Complementarity

Authors: Maria Waheed, Michael Milford, Klaus D. McDonald-Maier, Shoaib Ehsan

Abstract: Visual place recognition (VPR) is the problem of recognising a previously visited location using visual information. Many attempts to improve the performance of VPR methods have been made in the literature. One approach that has received attention recently is the multi-process fusion where different VPR methods run in parallel and their outputs are combined in an effort to achieve better performan… ▽ More Visual place recognition (VPR) is the problem of recognising a previously visited location using visual information. Many attempts to improve the performance of VPR methods have been made in the literature. One approach that has received attention recently is the multi-process fusion where different VPR methods run in parallel and their outputs are combined in an effort to achieve better performance. The multi-process fusion, however, does not have a well-defined criterion for selecting and combining different VPR methods from a wide range of available options. To the best of our knowledge, this paper investigates the complementarity of state-of-the-art VPR methods systematically for the first time and identifies those combinations which can result in better performance. The paper presents a well-defined framework which acts as a sanity check to find the complementarity between two techniques by utilising a McNemar's test-like approach. The framework allows estimation of upper and lower complementarity bounds for the VPR techniques to be combined, along with an estimate of maximum VPR performance that may be achieved. Based on this framework, results are presented for eight state-of-the-art VPR methods on ten widely-used VPR datasets showing the potential of different combinations of techniques for achieving better performance. △ Less

Submitted 16 February, 2021; originally announced February 2021.

arXiv:2010.00716 [pdf, other]

doi 10.1109/TRO.2022.3148908

Binary Neural Networks for Memory-Efficient and Effective Visual Place Recognition in Changing Environments

Authors: Bruno Ferrarini, Michael Milford, Klaus D. McDonald-Maier, Shoaib Ehsan

Abstract: Visual place recognition (VPR) is a robot's ability to determine whether a place was visited before using visual data. While conventional hand-crafted methods for VPR fail under extreme environmental appearance changes, those based on convolutional neural networks (CNNs) achieve state-of-the-art performance but result in heavy runtime processes and model sizes that demand a large amount of memory.… ▽ More Visual place recognition (VPR) is a robot's ability to determine whether a place was visited before using visual data. While conventional hand-crafted methods for VPR fail under extreme environmental appearance changes, those based on convolutional neural networks (CNNs) achieve state-of-the-art performance but result in heavy runtime processes and model sizes that demand a large amount of memory. Hence, CNN-based approaches are unsuitable for resource-constrained platforms, such as small robots and drones. In this paper, we take a multi-step approach of decreasing the precision of model parameters, combining it with network depth reduction and fewer neurons in the classifier stage to propose a new class of highly compact models that drastically reduces the memory requirements and computational effort while maintaining state-of-the-art VPR performance. To the best of our knowledge, this is the first attempt to propose binary neural networks for solving the visual place recognition problem effectively under changing conditions and with significantly reduced resource requirements. Our best-performing binary neural network, dubbed FloppyNet, achieves comparable VPR performance when considered against its full-precision and deeper counterparts while consuming 99% less memory and increasing the inference speed seven times. △ Less

Submitted 23 January, 2022; v1 submitted 1 October, 2020; originally announced October 2020.

Journal ref: IEEE Transactions on Robotics, 2022

arXiv:2009.13454 [pdf, other]

ConvSequential-SLAM: A Sequence-based, Training-less Visual Place Recognition Technique for Changing Environments

Authors: Mihnea-Alexandru Tomită, Mubariz Zaffar, Michael Milford, Klaus McDonald-Maier, Shoaib Ehsan

Abstract: Visual Place Recognition (VPR) is the ability to correctly recall a previously visited place under changing viewpoints and appearances. A large number of handcrafted and deep-learning-based VPR techniques exist, where the former suffer from appearance changes and the latter have significant computational needs. In this paper, we present a new handcrafted VPR technique that achieves state-of-the-ar… ▽ More Visual Place Recognition (VPR) is the ability to correctly recall a previously visited place under changing viewpoints and appearances. A large number of handcrafted and deep-learning-based VPR techniques exist, where the former suffer from appearance changes and the latter have significant computational needs. In this paper, we present a new handcrafted VPR technique that achieves state-of-the-art place matching performance under challenging conditions. Our technique combines the best of 2 existing trainingless VPR techniques, SeqSLAM and CoHOG, which are each robust to conditional and viewpoint changes, respectively. This blend, namely ConvSequential-SLAM, utilises sequential information and block-normalisation to handle appearance changes, while using regional-convolutional matching to achieve viewpoint-invariance. We analyse content-overlap in-between query frames to find a minimum sequence length, while also re-using the image entropy information for environment-based sequence length tuning. State-of-the-art performance is reported in contrast to 8 contemporary VPR techniques on 4 public datasets. Qualitative insights and an ablation study on sequence length are also provided. △ Less

Submitted 28 September, 2020; originally announced September 2020.

Comments: 10 pages, currently under-review

arXiv:2005.08135 [pdf, other]

doi 10.1007/s11263-021-01469-5

VPR-Bench: An Open-Source Visual Place Recognition Evaluation Framework with Quantifiable Viewpoint and Appearance Change

Authors: Mubariz Zaffar, Sourav Garg, Michael Milford, Julian Kooij, David Flynn, Klaus McDonald-Maier, Shoaib Ehsan

Abstract: Visual Place Recognition (VPR) is the process of recognising a previously visited place using visual information, often under varying appearance conditions and viewpoint changes and with computational constraints. VPR is related to the concepts of localisation, loop closure, image retrieval and is a critical component of many autonomous navigation systems ranging from autonomous vehicles to drones… ▽ More Visual Place Recognition (VPR) is the process of recognising a previously visited place using visual information, often under varying appearance conditions and viewpoint changes and with computational constraints. VPR is related to the concepts of localisation, loop closure, image retrieval and is a critical component of many autonomous navigation systems ranging from autonomous vehicles to drones and computer vision systems. While the concept of place recognition has been around for many years, VPR research has grown rapidly as a field over the past decade due to improving camera hardware and its potential for deep learning-based techniques, and has become a widely studied topic in both the computer vision and robotics communities. This growth however has led to fragmentation and a lack of standardisation in the field, especially concerning performance evaluation. Moreover, the notion of viewpoint and illumination invariance of VPR techniques has largely been assessed qualitatively and hence ambiguously in the past. In this paper, we address these gaps through a new comprehensive open-source framework for assessing the performance of VPR techniques, dubbed "VPR-Bench". VPR-Bench (Open-sourced at: https://github.com/MubarizZaffar/VPR-Bench) introduces two much-needed capabilities for VPR researchers: firstly, it contains a benchmark of 12 fully-integrated datasets and 10 VPR techniques, and secondly, it integrates a comprehensive variation-quantified dataset for quantifying viewpoint and illumination invariance. We apply and analyse popular evaluation metrics for VPR from both the computer vision and robotics communities, and discuss how these different metrics complement and/or replace each other, depending upon the underlying applications and system requirements. △ Less

Submitted 1 October, 2021; v1 submitted 16 May, 2020; originally announced May 2020.

Comments: Accepted version of our IJCV paper

Journal ref: International Journal of Computer Vision. 2021 May 7:1-39

arXiv:1909.08153 [pdf, other]

CAMAL: Context-Aware Multi-layer Attention framework for Lightweight Environment Invariant Visual Place Recognition

Authors: Ahmad Khaliq, Shoaib Ehsan, Michael Milford, Klaus McDonald-Maier

Abstract: In the last few years, Deep Convolutional Neural Networks (D-CNNs) have shown state-of-the-art (SOTA) performance for Visual Place Recognition (VPR), a pivotal component of long-term intelligent robotic vision (vision-aware localization and navigation systems). The prestigious generalization power of D-CNNs gained upon training on large scale places datasets and learned persistent image regions wh… ▽ More In the last few years, Deep Convolutional Neural Networks (D-CNNs) have shown state-of-the-art (SOTA) performance for Visual Place Recognition (VPR), a pivotal component of long-term intelligent robotic vision (vision-aware localization and navigation systems). The prestigious generalization power of D-CNNs gained upon training on large scale places datasets and learned persistent image regions which are found to be robust for specific place recognition under changing conditions and camera viewpoints. However, against the computation and power intensive D-CNNs based VPR algorithms that are employed to determine the approximate location of resource-constrained mobile robots, lightweight VPR techniques are preferred. This paper presents a computation- and energy-efficient CAMAL framework that captures place-specific multi-layer convolutional attentions efficient for environment invariant-VPR. At 4x lesser power consumption, evaluating the proposed VPR framework on challenging benchmark place recognition datasets reveal better and comparable Area under Precision-Recall (AUC-PR) curves with approximately 4x improved image retrieval performance over the contemporary VPR methodologies. △ Less

Submitted 13 August, 2020; v1 submitted 17 September, 2019; originally announced September 2019.

Comments: under-review

arXiv:1908.00258 [pdf, other]

Visual Place Recognition for Aerial Robotics: Exploring Accuracy-Computation Trade-off for Local Image Descriptors

Authors: Bruno Ferrarini, Maria Waheed, Sania Waheed, Shoaib Ehsan, Michael Milford, Klaus D. McDonald-Maier

Abstract: Visual Place Recognition (VPR) is a fundamental yet challenging task for small Unmanned Aerial Vehicle (UAV). The core reasons are the extreme viewpoint changes, and limited computational power onboard a UAV which restricts the applicability of robust but computation intensive state-of-the-art VPR methods. In this context, a viable approach is to use local image descriptors for performing VPR as t… ▽ More Visual Place Recognition (VPR) is a fundamental yet challenging task for small Unmanned Aerial Vehicle (UAV). The core reasons are the extreme viewpoint changes, and limited computational power onboard a UAV which restricts the applicability of robust but computation intensive state-of-the-art VPR methods. In this context, a viable approach is to use local image descriptors for performing VPR as these can be computed relatively efficiently without the need of any special hardware, such as a GPU. However, the choice of a local feature descriptor is not trivial and calls for a detailed investigation as there is a trade-off between VPR accuracy and the required computational effort. To fill this research gap, this paper examines the performance of several state-of-the-art local feature descriptors, both from accuracy and computational perspectives, specifically for VPR application utilizing standard aerial datasets. The presented results confirm that a trade-off between accuracy and computational effort is inevitable while executing VPR on resource-constrained hardware. △ Less

Submitted 1 August, 2019; originally announced August 2019.

Journal ref: NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2019)

arXiv:1905.02025 [pdf, other]

DisplaceNet: Recognising Displaced People from Images by Exploiting Dominance Level

Authors: Grigorios Kalliatakis, Shoaib Ehsan, Maria Fasli, Klaus McDonald-Maier

Abstract: Every year millions of men, women and children are forced to leave their homes and seek refuge from wars, human rights violations, persecution, and natural disasters. The number of forcibly displaced people came at a record rate of 44,400 every day throughout 2017, raising the cumulative total to 68.5 million at the years end, overtaken the total population of the United Kingdom. Up to 85% of the… ▽ More Every year millions of men, women and children are forced to leave their homes and seek refuge from wars, human rights violations, persecution, and natural disasters. The number of forcibly displaced people came at a record rate of 44,400 every day throughout 2017, raising the cumulative total to 68.5 million at the years end, overtaken the total population of the United Kingdom. Up to 85% of the forcibly displaced find refuge in low- and middle-income countries, calling for increased humanitarian assistance worldwide. To reduce the amount of manual labour required for human-rights-related image analysis, we introduce DisplaceNet, a novel model which infers potential displaced people from images by integrating the control level of the situation and conventional convolutional neural network (CNN) classifier into one framework for image classification. Experimental results show that DisplaceNet achieves up to 4% coverage-the proportion of a data set for which a classifier is able to produce a prediction-gain over the sole use of a CNN classifier. Our dataset, codes and trained models will be available online at https://github.com/GKalliatakis/DisplaceNet. △ Less

Submitted 3 May, 2019; originally announced May 2019.

Comments: To be published in CVPR Workshop on Computer Vision for Global Challenges (CV4GC). arXiv admin note: substantial text overlap with arXiv:1902.03817

arXiv:1904.07967 [pdf, other]

Are State-of-the-art Visual Place Recognition Techniques any Good for Aerial Robotics?

Authors: Mubariz Zaffar, Ahmad Khaliq, Shoaib Ehsan, Michael Milford, Kostas Alexis, Klaus McDonald-Maier

Abstract: Visual Place Recognition (VPR) has seen significant advances at the frontiers of matching performance and computational superiority over the past few years. However, these evaluations are performed for ground-based mobile platforms and cannot be generalized to aerial platforms. The degree of viewpoint variation experienced by aerial robots is complex, with their processing power and on-board memor… ▽ More Visual Place Recognition (VPR) has seen significant advances at the frontiers of matching performance and computational superiority over the past few years. However, these evaluations are performed for ground-based mobile platforms and cannot be generalized to aerial platforms. The degree of viewpoint variation experienced by aerial robots is complex, with their processing power and on-board memory limited by payload size and battery ratings. Therefore, in this paper, we collect $8$ state-of-the-art VPR techniques that have been previously evaluated for ground-based platforms and compare them on $2$ recently proposed aerial place recognition datasets with three prime focuses: a) Matching performance b) Processing power consumption c) Projected memory requirements. This gives a birds-eye view of the applicability of contemporary VPR research to aerial robotics and lays down the the nature of challenges for aerial-VPR. △ Less

Submitted 22 May, 2019; v1 submitted 16 April, 2019; originally announced April 2019.

Comments: IEEE ICRA 2019 Workshop on Aerial Robotics 8 pages, 7 figures

arXiv:1904.04555 [pdf, other]

doi 10.1007/978-3-030-20205-7_8

Assessing Capsule Networks With Biased Data

Authors: Bruno Ferrarini, Shoaib Ehsan, Adrien Bartoli, Aleš Leonardis, Klaus D. McDonald-Maier

Abstract: Machine learning based methods achieves impressive results in object classification and detection. Utilizing representative data of the visual world during the training phase is crucial to achieve good performance with such data driven approaches. However, it not always possible to access bias-free datasets thus, robustness to biased data is a desirable property for a learning system. Capsule Netw… ▽ More Machine learning based methods achieves impressive results in object classification and detection. Utilizing representative data of the visual world during the training phase is crucial to achieve good performance with such data driven approaches. However, it not always possible to access bias-free datasets thus, robustness to biased data is a desirable property for a learning system. Capsule Networks have been introduced recently and their tolerance to biased data has received little attention. This paper aims to fill this gap and proposes two experimental scenarios to assess the tolerance to imbalanced training data and to determine the generalization performance of a model with unfamiliar affine transformations of the images. This paper assesses dynamic routing and EM routing based Capsule Networks and proposes a comparison with Convolutional Neural Networks in the two tested scenarios. The presented results provide new insights into the behaviour of capsule networks. △ Less

Submitted 9 April, 2019; originally announced April 2019.

Comments: 15 pages, 4 figures, 2 tables, Capsule Networks, Evaluation, Biased Data

MSC Class: 00B25

Journal ref: Scandinavian Conference on Image Analysis. Springer, Cham, 2019

arXiv:1903.09107 [pdf, other]

Levelling the Playing Field: A Comprehensive Comparison of Visual Place Recognition Approaches under Changing Conditions

Authors: Mubariz Zaffar, Ahmad Khaliq, Shoaib Ehsan, Michael Milford, Klaus McDonald-Maier

Abstract: In recent years there has been significant improvement in the capability of Visual Place Recognition (VPR) methods, building on the success of both hand-crafted and learnt visual features, temporal filtering and usage of semantic scene information. The wide range of approaches and the relatively recent growth in interest in the field has meant that a wide range of datasets and assessment methodolo… ▽ More In recent years there has been significant improvement in the capability of Visual Place Recognition (VPR) methods, building on the success of both hand-crafted and learnt visual features, temporal filtering and usage of semantic scene information. The wide range of approaches and the relatively recent growth in interest in the field has meant that a wide range of datasets and assessment methodologies have been proposed, often with a focus only on precision-recall type metrics, making comparison difficult. In this paper we present a comprehensive approach to evaluating the performance of 10 state-of-the-art recently-developed VPR techniques, which utilizes three standardized metrics: (a) Matching Performance b) Matching Time c) Memory Footprint. Together this analysis provides an up-to-date and widely encompassing snapshot of the various strengths and weaknesses of contemporary approaches to the VPR problem. The aim of this work is to help move this particular research field towards a more mature and unified approach to the problem, enabling better comparison and hence more progress to be made in future research. △ Less

Submitted 29 April, 2019; v1 submitted 21 March, 2019; originally announced March 2019.

Comments: ICRA 2019 Workshop on Database Generation and Benchmarking of SLAM Algorithms for Robotics and VR/AR

arXiv:1902.03817 [pdf, other]

GET-AID: Visual Recognition of Human Rights Abuses via Global Emotional Traits

Authors: Grigorios Kalliatakis, Shoaib Ehsan, Maria Fasli, Klaus D. McDonald-Maier

Abstract: In the era of social media and big data, the use of visual evidence to document conflict and human rights abuse has become an important element for human rights organizations and advocates. In this paper, we address the task of detecting two types of human rights abuses in challenging, everyday photos: (1) child labour, and (2) displaced populations. We propose a novel model that is driven by a hu… ▽ More In the era of social media and big data, the use of visual evidence to document conflict and human rights abuse has become an important element for human rights organizations and advocates. In this paper, we address the task of detecting two types of human rights abuses in challenging, everyday photos: (1) child labour, and (2) displaced populations. We propose a novel model that is driven by a human-centric approach. Our hypothesis is that the emotional state of a person -- how positive or pleasant an emotion is, and the control level of the situation by the person -- are powerful cues for perceiving potential human rights violations. To exploit these cues, our model learns to predict global emotional traits over a given image based on the joint analysis of every detected person and the whole scene. By integrating these predictions with a data-driven convolutional neural network (CNN) classifier, our system efficiently infers potential human rights abuses in a clean, end-to-end system we call GET-AID (from Global Emotional Traits for Abuse IDentification). Extensive experiments are performed to verify our method on the recently introduced subset of Human Rights Archive (HRA) dataset (2 violation categories with the same number of positive and negative samples), where we show quantitatively compelling results. Compared with previous works and the sole use of a CNN classifier, this paper improves the coverage up to 23.73% for child labour and 57.21% for displaced populations. Our dataset, codes and trained models are available online at https://github.com/GKalliatakis/GET-AID. △ Less

Submitted 11 February, 2019; originally announced February 2019.

Comments: 10 pages, 6 figures

arXiv:1811.03529 [pdf, other]

Memorable Maps: A Framework for Re-defining Places in Visual Place Recognition

Authors: Mubariz Zaffar, Shoaib Ehsan, Michael Milford, Klaus Mcdonald Maier

Abstract: This paper presents a cognition-inspired agnostic framework for building a map for Visual Place Recognition. This framework draws inspiration from human-memorability, utilizes the traditional image entropy concept and computes the static content in an image; thereby presenting a tri-folded criterion to assess the 'memorability' of an image for visual place recognition. A dataset namely 'ESSEX3IN1'… ▽ More This paper presents a cognition-inspired agnostic framework for building a map for Visual Place Recognition. This framework draws inspiration from human-memorability, utilizes the traditional image entropy concept and computes the static content in an image; thereby presenting a tri-folded criterion to assess the 'memorability' of an image for visual place recognition. A dataset namely 'ESSEX3IN1' is created, composed of highly confusing images from indoor, outdoor and natural scenes for analysis. When used in conjunction with state-of-the-art visual place recognition methods, the proposed framework provides significant performance boost to these techniques, as evidenced by results on ESSEX3IN1 and other public datasets. △ Less

Submitted 21 March, 2019; v1 submitted 8 November, 2018; originally announced November 2018.

Comments: 13 pages, 25 figures, 1 table

arXiv:1811.03032 [pdf, other]

A Holistic Visual Place Recognition Approach using Lightweight CNNs for Significant ViewPoint and Appearance Changes

Authors: Ahmad Khaliq, Shoaib Ehsan, Zetao Chen, Michael Milford, Klaus McDonald-Maier

Abstract: This paper presents a lightweight visual place recognition approach, capable of achieving high performance with low computational cost, and feasible for mobile robotics under significant viewpoint and appearance changes. Results on several benchmark datasets confirm an average boost of 13% in accuracy, and 12x average speedup relative to state-of-the-art methods. This paper presents a lightweight visual place recognition approach, capable of achieving high performance with low computational cost, and feasible for mobile robotics under significant viewpoint and appearance changes. Results on several benchmark datasets confirm an average boost of 13% in accuracy, and 12x average speedup relative to state-of-the-art methods. △ Less

Submitted 27 October, 2019; v1 submitted 7 November, 2018; originally announced November 2018.

Comments: Conditionally Accepted as short paper at IEEE Transactions on Robotics (T-RO)

arXiv:1808.00588 [pdf, other]

Weather Classification: A new multi-class dataset, data augmentation approach and comprehensive evaluations of Convolutional Neural Networks

Authors: Jose Carlos Villarreal Guerra, Zeba Khanam, Shoaib Ehsan, Rustam Stolkin, Klaus McDonald-Maier

Abstract: Weather conditions often disrupt the proper functioning of transportation systems. Present systems either deploy an array of sensors or use an in-vehicle camera to predict weather conditions. These solutions have resulted in incremental cost and limited scope. To ensure smooth operation of all transportation services in all-weather conditions, a reliable detection system is necessary to classify w… ▽ More Weather conditions often disrupt the proper functioning of transportation systems. Present systems either deploy an array of sensors or use an in-vehicle camera to predict weather conditions. These solutions have resulted in incremental cost and limited scope. To ensure smooth operation of all transportation services in all-weather conditions, a reliable detection system is necessary to classify weather in wild. The challenges involved in solving this problem is that weather conditions are diverse in nature and there is an absence of discriminate features among various weather conditions. The existing works to solve this problem have been scene specific and have targeted classification of two categories of weather. In this paper, we have created a new open source dataset consisting of images depicting three classes of weather i.e rain, snow and fog called RFS Dataset. A novel algorithm has also been proposed which has used super pixel delimiting masks as a form of data augmentation, leading to reasonable results with respect to ten Convolutional Neural Network architectures. △ Less

Submitted 1 August, 2018; originally announced August 2018.

arXiv:1807.02098 [pdf, other]

MAT-CNN-SOPC: Motionless Analysis of Traffic Using Convolutional Neural Networks on System-On-a-Programmable-Chip

Authors: Somdip Dey, Grigorios Kalliatakis, Sangeet Saha, Amit Kumar Singh, Shoaib Ehsan, Klaus McDonald-Maier

Abstract: Intelligent Transportation Systems (ITS) have become an important pillar in modern "smart city" framework which demands intelligent involvement of machines. Traffic load recognition can be categorized as an important and challenging issue for such systems. Recently, Convolutional Neural Network (CNN) models have drawn considerable amount of interest in many areas such as weather classification, hu… ▽ More Intelligent Transportation Systems (ITS) have become an important pillar in modern "smart city" framework which demands intelligent involvement of machines. Traffic load recognition can be categorized as an important and challenging issue for such systems. Recently, Convolutional Neural Network (CNN) models have drawn considerable amount of interest in many areas such as weather classification, human rights violation detection through images, due to its accurate prediction capabilities. This work tackles real-life traffic load recognition problem on System-On-a-Programmable-Chip (SOPC) platform and coin it as MAT-CNN- SOPC, which uses an intelligent re-training mechanism of the CNN with known environments. The proposed methodology is capable of enhancing the efficacy of the approach by 2.44x in comparison to the state-of-art and proven through experimental analysis. We have also introduced a mathematical equation, which is capable of quantifying the suitability of using different CNN models over the other for a particular application based implementation. △ Less

Submitted 14 August, 2018; v1 submitted 5 July, 2018; originally announced July 2018.

Comments: 6 pages, 3 figures, 2 tables

ACM Class: I.4; I.2.1; C.1.4

Journal ref: 2018 NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2018)

arXiv:1807.01605 [pdf]

doi 10.1109/AHS.2018.8541483

Sensors, SLAM and Long-term Autonomy: A Review

Authors: Mubariz Zaffar, Shoaib Ehsan, Rustam Stolkin, Klaus McDonald Maier

Abstract: Simultaneous Localization and Mapping, commonly known as SLAM, has been an active research area in the field of Robotics over the past three decades. For solving the SLAM problem, every robot is equipped with either a single sensor or a combination of similar/different sensors. This paper attempts to review, discuss, evaluate and compare these sensors. Keeping an eye on future, this paper also ass… ▽ More Simultaneous Localization and Mapping, commonly known as SLAM, has been an active research area in the field of Robotics over the past three decades. For solving the SLAM problem, every robot is equipped with either a single sensor or a combination of similar/different sensors. This paper attempts to review, discuss, evaluate and compare these sensors. Keeping an eye on future, this paper also assesses the characteristics of these sensors against factors critical to the long-term autonomy challenge. △ Less

Submitted 4 July, 2018; originally announced July 2018.

Comments: 6 pages, 7 figures

arXiv:1805.04714 [pdf, other]

Exploring object-centric and scene-centric CNN features and their complementarity for human rights violations recognition in images

Authors: Grigorios Kalliatakis, Shoaib Ehsan, Ales Leonardis, Klaus McDonald-Maier

Abstract: Identifying potential abuses of human rights through imagery is a novel and challenging task in the field of computer vision, that will enable to expose human rights violations over large-scale data that may otherwise be impossible. While standard databases for object and scene categorisation contain hundreds of different classes, the largest available dataset of human rights violations contains o… ▽ More Identifying potential abuses of human rights through imagery is a novel and challenging task in the field of computer vision, that will enable to expose human rights violations over large-scale data that may otherwise be impossible. While standard databases for object and scene categorisation contain hundreds of different classes, the largest available dataset of human rights violations contains only 4 classes. Here, we introduce the `Human Rights Archive Database' (HRA), a verified-by-experts repository of 3050 human rights violations photographs, labelled with human rights semantic categories, comprising a list of the types of human rights abuses encountered at present. With the HRA dataset and a two-phase transfer learning scheme, we fine-tuned the state-of-the-art deep convolutional neural networks (CNNs) to provide human rights violations classification CNNs (HRA-CNNs). We also present extensive experiments refined to evaluate how well object-centric and scene-centric CNN features can be combined for the task of recognising human rights abuses. With this, we show that HRA database poses a challenge at a higher level for the well studied representation learning methods, and provide a benchmark in the task of human rights violations recognition in visual context. We expect this dataset can help to open up new horizons on creating systems able of recognising rich information about human rights violations. Our dataset, codes and trained models are available online at https://github.com/GKalliatakis/Human-Rights-Archive-CNNs. △ Less

Submitted 12 May, 2018; originally announced May 2018.

Comments: 19 pages, 13 figures; Submitted to PLOS ONE

arXiv:1712.06508 [pdf, ps, other]

doi 10.1103/PhysRevMaterials.2.023802

Assessment of the GLLB-SC potential for solid-state properties and attempts for improvement

Authors: Fabien Tran, Sohaib Ehsan, Peter Blaha

Abstract: Based on the work of Gritsenko et al. (GLLB) [Phys. Rev. A 51, 1944 (1995)], the method of Kuisma et al. [Phys. Rev. B 82, 115106 (2010)] to calculate the band gap in solids was shown to be much more accurate than the common local density approximation (LDA) and generalized gradient approximation (GGA). The main feature of the GLLB-SC potential (SC stands for solid and correlation) is to lead to a… ▽ More Based on the work of Gritsenko et al. (GLLB) [Phys. Rev. A 51, 1944 (1995)], the method of Kuisma et al. [Phys. Rev. B 82, 115106 (2010)] to calculate the band gap in solids was shown to be much more accurate than the common local density approximation (LDA) and generalized gradient approximation (GGA). The main feature of the GLLB-SC potential (SC stands for solid and correlation) is to lead to a nonzero derivative discontinuity that can be conveniently calculated and then added to the Kohn-Sham band gap for a comparison with the experimental band gap. In this work, a thorough comparison of GLLB-SC with other methods, e.g., the modified Becke-Johnson (mBJ) potential [F. Tran and P. Blaha, Phys. Rev. Lett. 102, 226401 (2009)], for electronic, magnetic, and density-related properties is presented. It is shown that for the band gap, GLLB-SC does not perform as well as mBJ for systems with a small band gap and strongly correlated systems, but is on average of similar accuracy as hybrid functionals. The results on itinerant metals indicate that GLLB-SC overestimates significantly the magnetic moment (much more than mBJ does), but leads to excellent results for the electric field gradient, for which mBJ is in general not recommended. In the aim of improving the results, variants of the GLLB-SC potential are also tested. △ Less

Submitted 18 December, 2017; originally announced December 2017.

Journal ref: Phys. Rev. Materials 2, 023802 (2018)

arXiv:1711.03874 [pdf]

Material Classification in the Wild: Do Synthesized Training Data Generalise Better than Real-World Training Data?

Authors: Grigorios Kalliatakis, Anca Sticlaru, George Stamatiadis, Shoaib Ehsan, Ales Leonardis, Juergen Gall, Klaus D. McDonald-Maier

Abstract: We question the dominant role of real-world training images in the field of material classification by investigating whether synthesized data can generalise more effectively than real-world data. Experimental results on three challenging real-world material databases show that the best performing pre-trained convolutional neural network (CNN) architectures can achieve up to 91.03% mean average pre… ▽ More We question the dominant role of real-world training images in the field of material classification by investigating whether synthesized data can generalise more effectively than real-world data. Experimental results on three challenging real-world material databases show that the best performing pre-trained convolutional neural network (CNN) architectures can achieve up to 91.03% mean average precision when classifying materials in cross-dataset scenarios. We demonstrate that synthesized data achieve an improvement on mean average precision when used as training data and in conjunction with pre-trained CNN architectures, which spans from ~ 5% to ~ 19% across three widely used material databases of real-world images. △ Less

Submitted 9 November, 2017; originally announced November 2017.

Comments: accepted for publication in VISAPP 2018. arXiv admin note: text overlap with arXiv:1703.04101

arXiv:1709.08202 [pdf, other]

Performance Characterization of Image Feature Detectors in Relation to the Scene Content Utilizing a Large Image Database

Authors: Bruno Ferrarini, Shoaib Ehsan, Ales Leonardis, Naveed Ur Rehman, Klaus D. McDonald-Maier

Abstract: Selecting the most suitable local invariant feature detector for a particular application has rendered the task of evaluating feature detectors a critical issue in vision research. Although the literature offers a variety of comparison works focusing on performance evaluation of image feature detectors under several types of image transformations, the influence of the scene content on the performa… ▽ More Selecting the most suitable local invariant feature detector for a particular application has rendered the task of evaluating feature detectors a critical issue in vision research. Although the literature offers a variety of comparison works focusing on performance evaluation of image feature detectors under several types of image transformations, the influence of the scene content on the performance of local feature detectors has received little attention so far. This paper aims to bridge this gap with a new framework for determining the type of scenes which maximize and minimize the performance of detectors in terms of repeatability rate. The results are presented for several state-of-the-art feature detectors that have been obtained using a large image database of 20482 images under JPEG compression, uniform light and blur changes with 539 different scenes captured from real-world scenarios. These results provide new insights into the behavior of feature detectors. △ Less

Submitted 13 October, 2017; v1 submitted 24 September, 2017; originally announced September 2017.

Comments: Extended version of the conference paper available at http://ieeexplore.ieee.org/abstract/document/7314191/?reload=true

arXiv:1703.10501 [pdf, other]

A Paradigm Shift: Detecting Human Rights Violations Through Web Images

Authors: Grigorios Kalliatakis, Shoaib Ehsan, Klaus D. McDonald-Maier

Abstract: The growing presence of devices carrying digital cameras, such as mobile phones and tablets, combined with ever improving internet networks have enabled ordinary citizens, victims of human rights abuse, and participants in armed conflicts, protests, and disaster situations to capture and share via social media networks images and videos of specific events. This paper discusses the potential of ima… ▽ More The growing presence of devices carrying digital cameras, such as mobile phones and tablets, combined with ever improving internet networks have enabled ordinary citizens, victims of human rights abuse, and participants in armed conflicts, protests, and disaster situations to capture and share via social media networks images and videos of specific events. This paper discusses the potential of images in human rights context including the opportunities and challenges they present. This study demonstrates that real-world images have the capacity to contribute complementary data to operational human rights monitoring efforts when combined with novel computer vision approaches. The analysis is concluded by arguing that if images are to be used effectively to detect and identify human rights violations by rights advocates, greater attention to gathering task-specific visual concepts from large-scale web images is required. △ Less

Submitted 30 March, 2017; originally announced March 2017.

Comments: Position paper, 8 pages, 3 figures

arXiv:1703.04103 [pdf, other]

Detection of Human Rights Violations in Images: Can Convolutional Neural Networks help?

Authors: Grigorios Kalliatakis, Shoaib Ehsan, Maria Fasli, Ales Leonardis, Juergen Gall, Klaus D. McDonald-Maier

Abstract: After setting the performance benchmarks for image, video, speech and audio processing, deep convolutional networks have been core to the greatest advances in image recognition tasks in recent times. This raises the question of whether there are any benefit in targeting these remarkable deep architectures with the unattempted task of recognising human rights violations through digital images. Unde… ▽ More After setting the performance benchmarks for image, video, speech and audio processing, deep convolutional networks have been core to the greatest advances in image recognition tasks in recent times. This raises the question of whether there are any benefit in targeting these remarkable deep architectures with the unattempted task of recognising human rights violations through digital images. Under this perspective, we introduce a new, well-sampled human rights-centric dataset called Human Rights Understanding (HRUN). We conduct a rigorous evaluation on a common ground by combining this dataset with different state-of-the-art deep convolutional architectures in order to achieve recognition of human rights violations. Experimental results on the HRUN dataset have shown that the best performing CNN architectures can achieve up to 88.10\% mean average precision. Additionally, our experiments demonstrate that increasing the size of the training samples is crucial for achieving an improvement on mean average precision principally when utilising very deep networks. △ Less

Submitted 16 March, 2017; v1 submitted 12 March, 2017; originally announced March 2017.

Comments: In Proceedings of the 12th International Conference on Computer Vision Theory and Applications (VISAPP 2017), 8 pages

arXiv:1703.04101 [pdf, other]

Evaluating Deep Convolutional Neural Networks for Material Classification

Authors: Grigorios Kalliatakis, Georgios Stamatiadis, Shoaib Ehsan, Ales Leonardis, Juergen Gall, Anca Sticlaru, Klaus D. McDonald-Maier

Abstract: Determining the material category of a surface from an image is a demanding task in perception that is drawing increasing attention. Following the recent remarkable results achieved for image classification and object detection utilising Convolutional Neural Networks (CNNs), we empirically study material classification of everyday objects employing these techniques. More specifically, we conduct a… ▽ More Determining the material category of a surface from an image is a demanding task in perception that is drawing increasing attention. Following the recent remarkable results achieved for image classification and object detection utilising Convolutional Neural Networks (CNNs), we empirically study material classification of everyday objects employing these techniques. More specifically, we conduct a rigorous evaluation of how state-of-the art CNN architectures compare on a common ground over widely used material databases. Experimental results on three challenging material databases show that the best performing CNN architectures can achieve up to 94.99\% mean average precision when classifying materials. △ Less

Submitted 16 March, 2017; v1 submitted 12 March, 2017; originally announced March 2017.

Comments: In Proceedings of the 12th International Conference on Computer Vision Theory and Applications (VISAPP 2017), 7 pages

arXiv:1702.02089 [pdf]

A Statistical Model for Ideal Team Selection for A National Cricket Squad

Authors: Sadia Tasnim Swarna, Shamim Ehsan, Md. Saiful Islam

Abstract: Cricket is a game played between two teams which consists of eleven players each. Nowadays cricket game is becoming more and more popular in Bangladesh and other South Asian Countries. Before a match people are very enthusiastic about team squads and "Which players are playing today?", "How well will MR. X perform today?" are the million dollar questions before a big match. This article will propo… ▽ More Cricket is a game played between two teams which consists of eleven players each. Nowadays cricket game is becoming more and more popular in Bangladesh and other South Asian Countries. Before a match people are very enthusiastic about team squads and "Which players are playing today?", "How well will MR. X perform today?" are the million dollar questions before a big match. This article will propose a method using statistical data analysis for recommending a national team squad. Recent match scorecards for domestic and international matches played by a specific team in recent years are used to recommend the ideal squad. Impact point or rating points of all players in different conditions are calculated and the best ones from different categories are chosen to form optimal line-ups. To evaluate the efficiency of impact point system, it will be tested with real time match data to see how much accuracy it gives. △ Less

Submitted 27 January, 2017; originally announced February 2017.

Comments: 6 pages

arXiv:1701.08156 [pdf]

A Comprehensive Survey on Bengali Phoneme Recognition

Authors: Sadia Tasnim Swarna, Shamim Ehsan, Md. Saiful Islam, Marium E Jannat

Abstract: Hidden Markov model based various phoneme recognition methods for Bengali language is reviewed. Automatic phoneme recognition for Bengali language using multilayer neural network is reviewed. Usefulness of multilayer neural network over single layer neural network is discussed. Bangla phonetic feature table construction and enhancement for Bengali speech recognition is also discussed. Comparison a… ▽ More Hidden Markov model based various phoneme recognition methods for Bengali language is reviewed. Automatic phoneme recognition for Bengali language using multilayer neural network is reviewed. Usefulness of multilayer neural network over single layer neural network is discussed. Bangla phonetic feature table construction and enhancement for Bengali speech recognition is also discussed. Comparison among these methods is discussed. △ Less

Submitted 26 April, 2018; v1 submitted 27 January, 2017; originally announced January 2017.

Comments: 7 pages, reference added in phoneme recognition methods

arXiv:1605.06094 [pdf, ps, other]

Automatic Selection of the Optimal Local Feature Detector

Authors: Bruno Ferrarini, Shoaib Ehsan, Naveed Ur Rehman, Ales Leonardis, Klaus D. McDonald-Maier

Abstract: A large number of different feature detectors has been proposed so far. Any existing approach presents strengths and weaknesses, which make a detector optimal only for a limited range of applications. A tool capable of selecting the optimal feature detector in relation to the operating conditions is presented in this paper. The input images are quickly analyzed to determine what type of image tran… ▽ More A large number of different feature detectors has been proposed so far. Any existing approach presents strengths and weaknesses, which make a detector optimal only for a limited range of applications. A tool capable of selecting the optimal feature detector in relation to the operating conditions is presented in this paper. The input images are quickly analyzed to determine what type of image transformation is applied to them and at which amount. Finally, the detector that is expected to obtain the highest repeatability under such conditions, is chosen to extract features from the input images. The efficiency and the good accuracy in determining the optimal feature detector for any operating condition, make the proposed tool suitable to be utilized in real visual applications. %A large number of different feature detectors has been proposed so far. Any existing approach presents strengths and weaknesses, which make a detector optimal only for a limited range of applications. A large number of different local feature detectors have been proposed in the last few years. However, each feature detector has its own strengths ad weaknesses that limit its use to a specific range of applications. In this paper is presented a tool capable of quickly analysing input images to determine which type and amount of transformation is applied to them and then selecting the optimal feature detector, which is expected to perform the best. The results show that the performance and the fast execution time render the proposed tool suitable for real-world vision applications. △ Less

Submitted 19 May, 2016; originally announced May 2016.

Comments: pre-print version

arXiv:1605.05791 [pdf]

A Generic Framework for Assessing the Performance Bounds of Image Feature Detectors

Authors: Shoaib Ehsan, Adrian F. Clark, Ales Leonardis, Naveed ur Rehman, Klaus D. McDonald-Maier

Abstract: Since local feature detection has been one of the most active research areas in computer vision during the last decade, a large number of detectors have been proposed. The interest in feature-based applications continues to grow and has thus rendered the task of characterizing the performance of various feature detection methods an important issue in vision research. Inspired by the good practices… ▽ More Since local feature detection has been one of the most active research areas in computer vision during the last decade, a large number of detectors have been proposed. The interest in feature-based applications continues to grow and has thus rendered the task of characterizing the performance of various feature detection methods an important issue in vision research. Inspired by the good practices of electronic system design, a generic framework based on the repeatability measure is presented in this paper that allows assessment of the upper and lower bounds of detector performance and finds statistically significant performance differences between detectors as a function of image transformation amount by introducing a new variant of McNemars test in an effort to design more reliable and effective vision systems. The proposed framework is then employed to establish operating and guarantee regions for several state-of-the-art detectors and to identify their statistical performance differences for three specific image transformations: JPEG compression, uniform light changes and blurring. The results are obtained using a newly acquired, large image database (20482) images with 539 different scenes. These results provide new insights into the behaviour of detectors and are also useful from the vision systems design perspective. △ Less

Submitted 18 May, 2016; originally announced May 2016.

Comments: Journal version

arXiv:1510.05157 [pdf]

Performance Characterization of Image Feature Detectors in Relation to the Scene Content Utilizing a Large Image Database

Authors: Bruno Ferrarini, Shoaib Ehsan, Naveed Ur Rehman, Klaus D. McDonald-Maier

Abstract: Selecting the most suitable local invariant feature detector for a particular application has rendered the task of evaluating feature detectors a critical issue in vision research. No state-of-the-art image feature detector works satisfactorily under all types of image transformations. Although the literature offers a variety of comparison works focusing on performance evaluation of image feature… ▽ More Selecting the most suitable local invariant feature detector for a particular application has rendered the task of evaluating feature detectors a critical issue in vision research. No state-of-the-art image feature detector works satisfactorily under all types of image transformations. Although the literature offers a variety of comparison works focusing on performance evaluation of image feature detectors under several types of image transformation, the influence of the scene content on the performance of local feature detectors has received little attention so far. This paper aims to bridge this gap with a new framework for determining the type of scenes, which maximize and minimize the performance of detectors in terms of repeatability rate. Several state-of-the-art feature detectors have been assessed utilizing a large database of 12936 images generated by applying uniform light and blur changes to 539 scenes captured from the real world. The results obtained provide new insights into the behaviour of feature detectors. △ Less

Submitted 17 October, 2015; originally announced October 2015.

Comments: IWSSIP 2015

arXiv:1510.05156 [pdf]

Assessing The Performance Bounds Of Local Feature Detectors: Taking Inspiration From Electronics Design Practices

Authors: Shoaib Ehsan, Adrian F. Clark, Bruno Ferrarini, Naveed Ur Rehman, Klaus D. McDonald-Maier

Abstract: Since local feature detection has been one of the most active research areas in computer vision, a large number of detectors have been proposed. This has rendered the task of characterizing the performance of various feature detection methods an important issue in vision research. Inspired by the good practices of electronic system design, a generic framework based on the improved repeatability me… ▽ More Since local feature detection has been one of the most active research areas in computer vision, a large number of detectors have been proposed. This has rendered the task of characterizing the performance of various feature detection methods an important issue in vision research. Inspired by the good practices of electronic system design, a generic framework based on the improved repeatability measure is presented in this paper that allows assessment of the upper and lower bounds of detector performance in an effort to design more reliable and effective vision systems. This framework is then employed to establish operating and guarantee regions for several state-of-the art detectors for JPEG compression and uniform light changes. The results are obtained using a newly acquired, large image database (15092 images) with 539 different scenes. These results provide new insights into the behavior of detectors and are also useful from the vision systems design perspective. △ Less

Submitted 17 October, 2015; originally announced October 2015.

Comments: IWSSIP 2015

arXiv:1510.05145 [pdf]

doi 10.3390/s130810876

Rapid Online Analysis of Local Feature Detectors and Their Complementarity

Authors: Shoaib Ehsan, Adrian F. Clark, Klaus D. McDonald-Maier

Abstract: A vision system that can assess its own performance and take appropriate actions online to maximize its effectiveness would be a step towards achieving the long-cherished goal of imitating humans. This paper proposes a method for performing an online performance analysis of local feature detectors, the primary stage of many practical vision systems. It advocates the spatial distribution of local i… ▽ More A vision system that can assess its own performance and take appropriate actions online to maximize its effectiveness would be a step towards achieving the long-cherished goal of imitating humans. This paper proposes a method for performing an online performance analysis of local feature detectors, the primary stage of many practical vision systems. It advocates the spatial distribution of local image features as a good performance indicator and presents a metric that can be calculated rapidly, concurs with human visual assessments and is complementary to existing offline measures such as repeatability. The metric is shown to provide a measure of complementarity for combinations of detectors, correctly reflecting the underlying principles of individual detectors. Qualitative results on well-established datasets for several state-of-the-art detectors are presented based on the proposed measure. Using a hypothesis testing approach and a newly-acquired, larger image database, statistically-significant performance differences are identified. Different detector pairs and triplets are examined quantitatively and the results provide a useful guideline for combining detectors in applications that require a reasonable spatial distribution of image features. A principled framework for combining feature detectors in these applications is also presented. Timing results reveal the potential of the metric for online applications. △ Less

Submitted 17 October, 2015; originally announced October 2015.

Journal ref: Sensors 2013, 13, 10876-10907

Showing 1–50 of 56 results for author: Ehsan, S