Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (90)

Search Parameters:
Keywords = visual place recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 5315 KiB  
Article
Adaptive Feature Refinement and Weighted Similarity for Deep Loop Closure Detection in Appearance Variation
by Zhuolin Peng, Rujun Song, Hang Yang, Ying Li, Jiazhen Lin, Zhuoling Xiao and Bo Yan
Appl. Sci. 2024, 14(14), 6276; https://doi.org/10.3390/app14146276 - 18 Jul 2024
Viewed by 117
Abstract
Loop closure detection (LCD), also known as place recognition, is a crucial component of visual simultaneous localization and mapping (vSLAM) systems, aiding in the reduction of cumulative localization errors on a global scale. However, changes in environmental appearance and differing viewpoints pose significant [...] Read more.
Loop closure detection (LCD), also known as place recognition, is a crucial component of visual simultaneous localization and mapping (vSLAM) systems, aiding in the reduction of cumulative localization errors on a global scale. However, changes in environmental appearance and differing viewpoints pose significant challenges to the accuracy of the LCD algorithm. Addressing this issue, this paper presents a novel end-to-end framework (MetricNet) for LCDs to enhance detection performance in complex scenes with distinct appearance variations. Focusing on deep features with high distinguishability, an attention-based Channel Weighting Module(CWM) is designed to adaptively detect salient regions of interest. In addition, a patch-by-patch Similarity Measurement Module (SMM) is incorporated to steer the network for handling challenging situations that tend to cause perceptual aliasing. Experiments on three typical datasets have demonstrated MetricNet’s appealing detection performance and generalization ability compared to many state-of-the-art learning-based methods, where the mean average precision is increased by up to 11.92%, 18.10%, and 5.33% respectively. Moreover, the detection results on additional open datasets with apparent viewpoint variations and the odometry dataset for localization problems have also revealed the dependability of MetricNet under different adaptation scenarios. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

21 pages, 1906 KiB  
Article
BinVPR: Binary Neural Networks towards Real-Valued for Visual Place Recognition
by Junshuai Wang, Junyu Han, Ruifang Dong and Jiangming Kan
Sensors 2024, 24(13), 4130; https://doi.org/10.3390/s24134130 - 25 Jun 2024
Viewed by 610
Abstract
Visual Place Recognition (VPR) aims to determine whether a robot or visual navigation system locates in a previously visited place using visual information. It is an essential technology and challenging problem in computer vision and robotic communities. Recently, numerous works have demonstrated that [...] Read more.
Visual Place Recognition (VPR) aims to determine whether a robot or visual navigation system locates in a previously visited place using visual information. It is an essential technology and challenging problem in computer vision and robotic communities. Recently, numerous works have demonstrated that the performance of Convolutional Neural Network (CNN)-based VPR is superior to that of traditional methods. However, with a huge number of parameters, large memory storage is necessary for these CNN models. It is a great challenge for mobile robot platforms equipped with limited resources. Fortunately, Binary Neural Networks (BNNs) can reduce memory consumption by converting weights and activation values from 32-bit into 1-bit. But current BNNs always suffer from gradients vanishing and a marked drop in accuracy. Therefore, this work proposed a BinVPR model to handle this issue. The solution is twofold. Firstly, a feature restoration strategy was explored to add features into the latter convolutional layers to further solve the gradient-vanishing problem during the training process. Moreover, we identified two principles to address gradient vanishing: restoring basic features and restoring basic features from higher to lower layers. Secondly, considering the marked drop in accuracy results from gradient mismatch during backpropagation, this work optimized the combination of binarized activation and binarized weight functions in the Larq framework, and the best combination was obtained. The performance of BinVPR was validated on public datasets. The experimental results show that it outperforms state-of-the-art BNN-based approaches and full-precision networks of AlexNet and ResNet in terms of both recognition accuracy and model size. It is worth mentioning that BinVPR achieves the same accuracy with only 1% and 4.6% model sizes of AlexNet and ResNet. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

16 pages, 1345 KiB  
Article
A Haptic Braille Keyboard Layout for Smartphone Applications
by Georgios Voutsakelis, Nikolaos Tzimos, Georgios Kokkonis and Sotirios Kontogiannis
Electronics 2024, 13(12), 2408; https://doi.org/10.3390/electronics13122408 - 20 Jun 2024
Viewed by 484
Abstract
Though most people are capable of performing many tasks regardless of cognitive or physical challenges, some individuals, especially those with visual impairments, must rely on others to perform even basic tasks. The chance of them interacting with a computing device is minimal, except [...] Read more.
Though most people are capable of performing many tasks regardless of cognitive or physical challenges, some individuals, especially those with visual impairments, must rely on others to perform even basic tasks. The chance of them interacting with a computing device is minimal, except for speech recognition technology, which is quite complicated. Additionally, it has become apparent that mainstream devices are gaining more acceptance among people with vision problems compared to traditional assistive devices. To address this, we developed the Haptic Braille Keyboard Android application to help vision-impaired users interact more easily with devices such as smartphones and tablets. The academic novelty of the application lies in its customization capabilities, which maximize the Quality of Experience for the user. The application allows users to place the Braille buttons in their desired layout for convenience. Users can move and position the virtual buttons on the screen to create a layout for text entry based on the Braille writing system. For this purpose, we conducted extensive testing and experimentation to determine which of the two commonly used Braille layouts is most user-friendly. This work can help visually impaired users interact with smartphones and tablets more easily and independently, making communication less challenging. Full article
(This article belongs to the Special Issue Haptic Systems and the Tactile Internet: Design and Applications)
Show Figures

Figure 1

17 pages, 4647 KiB  
Article
Fine Segmentation of Chinese Character Strokes Based on Coordinate Awareness and Enhanced BiFPN
by Henghui Mo and Linjing Wei
Sensors 2024, 24(11), 3480; https://doi.org/10.3390/s24113480 - 28 May 2024
Viewed by 485
Abstract
Considering the complex structure of Chinese characters, particularly the connections and intersections between strokes, there are challenges in low accuracy of Chinese character stroke extraction and recognition, as well as unclear segmentation. This study builds upon the YOLOv8n-seg model to propose the YOLOv8n-seg-CAA-BiFPN [...] Read more.
Considering the complex structure of Chinese characters, particularly the connections and intersections between strokes, there are challenges in low accuracy of Chinese character stroke extraction and recognition, as well as unclear segmentation. This study builds upon the YOLOv8n-seg model to propose the YOLOv8n-seg-CAA-BiFPN Chinese character stroke fine segmentation model. The proposed Coordinate-Aware Attention mechanism (CAA) divides the backbone network input feature map into four parts, applying different weights for horizontal, vertical, and channel attention to compute and fuse key information, thus capturing the contextual regularity of closely arranged stroke positions. The network’s neck integrates an enhanced weighted bi-directional feature pyramid network (BiFPN), enhancing the fusion effect for features of strokes of various sizes. The Shape-IoU loss function is adopted in place of the traditional CIoU loss function, focusing on the shape and scale of stroke bounding boxes to optimize the bounding box regression process. Finally, the Grad-CAM++ technique is used to generate heatmaps of segmentation predictions, facilitating the visualization of effective features and a deeper understanding of the model’s focus areas. Trained and tested on the public Chinese character stroke datasets CCSE-Kai and CCSE-HW, the model achieves an average accuracy of 84.71%, an average recall rate of 83.65%, and a mean average precision of 80.11%. Compared to the original YOLOv8n-seg and existing mainstream segmentation models like SegFormer, BiSeNetV2, and Mask R-CNN, the average accuracy improved by 3.50%, 4.35%, 10.56%, and 22.05%, respectively; the average recall rates improved by 4.42%, 9.32%, 15.64%, and 24.92%, respectively; and the mean average precision improved by 3.11%, 4.15%, 8.02%, and 19.33%, respectively. The results demonstrate that the YOLOv8n-seg-CAA-BiFPN network can accurately achieve Chinese character stroke segmentation. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

15 pages, 6287 KiB  
Article
Research on Improved Road Visual Navigation Recognition Method Based on DeepLabV3+ in Pitaya Orchard
by Lixue Zhu, Wenqian Deng, Yingjie Lai, Xiaogeng Guo and Shiang Zhang
Agronomy 2024, 14(6), 1119; https://doi.org/10.3390/agronomy14061119 - 24 May 2024
Cited by 1 | Viewed by 520
Abstract
Traditional DeepLabV3+ image semantic segmentation methods face challenges in pitaya orchard environments characterized by multiple interference factors, complex image backgrounds, high computational complexity, and extensive memory consumption. This paper introduces an improved visual navigation path recognition method for pitaya orchards. Initially, DeepLabV3+ utilizes [...] Read more.
Traditional DeepLabV3+ image semantic segmentation methods face challenges in pitaya orchard environments characterized by multiple interference factors, complex image backgrounds, high computational complexity, and extensive memory consumption. This paper introduces an improved visual navigation path recognition method for pitaya orchards. Initially, DeepLabV3+ utilizes a lightweight MobileNetV2 as its primary feature extraction backbone, which is augmented with a Pyramid Split Attention (PSA) module placed after the Atrous Spatial Pyramid Pooling (ASPP) module. This improvement enhances the spatial feature representation of feature maps, thereby sharpening the segmentation boundaries. Additionally, an Efficient Channel Attention Network (ECANet) mechanism is integrated with the lower-level features of MobileNetV2 to reduce computational complexity and refine the clarity of target boundaries. The paper also designs a navigation path extraction algorithm, which fits the road mask regions segmented by the model to achieve precise navigation path recognition. Experimental findings show that the enhanced DeepLabV3+ model achieved a Mean Intersection over Union (MIoU) and average pixel accuracy of 95.79% and 97.81%, respectively. These figures represent increases of 0.59 and 0.41 percentage points when contrasted with the original model. Furthermore, the model’s memory consumption is reduced by 85.64%, 84.70%, and 85.06% when contrasted with the Pyramid Scene Parsing Network (PSPNet), U-Net, and Fully Convolutional Network (FCN) models, respectively. This reduction makes the proposed model more efficient while maintaining high segmentation accuracy, thus supporting enhanced operational efficiency in practical applications. The test results for navigation path recognition accuracy reveal that the angle error between the navigation centerline extracted using the least squares method and the manually fitted centerline is less than 5°. Additionally, the average deviation between the road centerlines extracted under three different lighting conditions and the actual road centerline is only 2.66 pixels, with an average image recognition time of 0.10 s. This performance suggests that the study can provide an effective reference for visual navigation in smart agriculture. Full article
(This article belongs to the Special Issue The Applications of Deep Learning in Smart Agriculture)
Show Figures

Figure 1

20 pages, 5360 KiB  
Article
An Appearance-Semantic Descriptor with Coarse-to-Fine Matching for Robust VPR
by Jie Chen, Wenbo Li, Pengshuai Hou, Zipeng Yang and Haoyu Zhao
Sensors 2024, 24(7), 2203; https://doi.org/10.3390/s24072203 - 29 Mar 2024
Viewed by 603
Abstract
In recent years, semantic segmentation has made significant progress in visual place recognition (VPR) by using semantic information that is relatively invariant to appearance and viewpoint, demonstrating great potential. However, in some extreme scenarios, there may be semantic occlusion and semantic sparsity, which [...] Read more.
In recent years, semantic segmentation has made significant progress in visual place recognition (VPR) by using semantic information that is relatively invariant to appearance and viewpoint, demonstrating great potential. However, in some extreme scenarios, there may be semantic occlusion and semantic sparsity, which can lead to confusion when relying solely on semantic information for localization. Therefore, this paper proposes a novel VPR framework that employs a coarse-to-fine image matching strategy, combining semantic and appearance information to improve algorithm performance. First, we construct SemLook global descriptors using semantic contours, which can preliminarily screen images to enhance the accuracy and real-time performance of the algorithm. Based on this, we introduce SemLook local descriptors for fine screening, combining robust appearance information extracted by deep learning with semantic information. These local descriptors can address issues such as semantic overlap and sparsity in urban environments, further improving the accuracy of the algorithm. Through this refined screening process, we can effectively handle the challenges of complex image matching in urban environments and obtain more accurate results. The performance of SemLook descriptors is evaluated on three public datasets (Extended-CMU Season, Robot-Car Seasons v2, and SYNTHIA) and compared with six state-of-the-art VPR algorithms (HOG, CoHOG, AlexNet_VPR, Region VLAD, Patch-NetVLAD, Forest). In the experimental comparison, considering both real-time performance and evaluation metrics, the SemLook descriptors are found to outperform the other six algorithms. Evaluation metrics include the area under the curve (AUC) based on the precision–recall curve, Recall@100%Precision, and Precision@100%Recall. On the Extended-CMU Season dataset, SemLook descriptors achieve a 100% AUC value, and on the SYNTHIA dataset, they achieve a 99% AUC value, demonstrating outstanding performance. The experimental results indicate that introducing global descriptors for initial screening and utilizing local descriptors combining both semantic and appearance information for precise matching can effectively address the issue of location recognition in scenarios with semantic ambiguity or sparsity. This algorithm enhances descriptor performance, making it more accurate and robust in scenes with variations in appearance and viewpoint. Full article
Show Figures

Figure 1

20 pages, 4429 KiB  
Article
A Novel LIBS Sensor for Sample Examinations on a Crime Scene
by Violeta Lazic, Fabrizio Andreoli, Salvatore Almaviva, Marco Pistilli, Ivano Menicucci, Christian Ulrich, Frank Schnürer and Roberto Chirico
Sensors 2024, 24(5), 1469; https://doi.org/10.3390/s24051469 - 24 Feb 2024
Viewed by 1336
Abstract
In this work, we present a compact LIBS sensor developed for characterization of samples on a crime scene following requirements of law enforcement agencies involved in the project. The sensor operates both in a tabletop mode, for aside measurements of swabbed materials or [...] Read more.
In this work, we present a compact LIBS sensor developed for characterization of samples on a crime scene following requirements of law enforcement agencies involved in the project. The sensor operates both in a tabletop mode, for aside measurements of swabbed materials or taken fragments, and in handheld mode where the sensor head is pointed directly on targets at the scene. The sensor head is connected via an umbilical to an instrument box that could be battery-powered and contains also a color camera for sample visualization, illumination LEDs, and pointing system for placing the target in focus. Here we describe the sensor’s architecture and functionalities, the optimization of the acquisition parameters, and the results of some LIBS measurements. On nano-plotted traces at silica wafer and in optimized conditions, for most of the elements the detection limits, in term of the absolute element masses, were found to be below 10 picograms. We also show results obtained on some representative materials, like fingerprints, swabbed soil and gunshot residue, varnishes on metal, and coated plastics. The last, solid samples were used to evaluate the depth profiling capabilities of the instrument, where the recognition of all four car paint layers was achieved. Full article
(This article belongs to the Special Issue Recent Trends and Advances in Laser Spectroscopy and Sensing)
Show Figures

Figure 1

18 pages, 41901 KiB  
Article
SVS-VPR: A Semantic Visual and Spatial Information-Based Hierarchical Visual Place Recognition for Autonomous Navigation in Challenging Environmental Conditions
by Saba Arshad and Tae-Hyoung Park
Sensors 2024, 24(3), 906; https://doi.org/10.3390/s24030906 - 30 Jan 2024
Viewed by 868
Abstract
Robust visual place recognition (VPR) enables mobile robots to identify previously visited locations. For this purpose, the extracted visual information and place matching method plays a significant role. In this paper, we critically review the existing VPR methods and group them into three [...] Read more.
Robust visual place recognition (VPR) enables mobile robots to identify previously visited locations. For this purpose, the extracted visual information and place matching method plays a significant role. In this paper, we critically review the existing VPR methods and group them into three major categories based on visual information used, i.e., handcrafted features, deep features, and semantics. Focusing the benefits of convolutional neural networks (CNNs) and semantics, and limitations of existing research, we propose a robust appearance-based place recognition method, termed SVS-VPR, which is implemented as a hierarchical model consisting of two major components: global scene-based and local feature-based matching. The global scene semantics are extracted and compared with pre-visited images to filter the match candidates while reducing the search space and computational cost. The local feature-based matching involves the extraction of robust local features from CNN possessing invariant properties against environmental conditions and a place matching method utilizing semantic, visual, and spatial information. SVS-VPR is evaluated on publicly available benchmark datasets using true positive detection rate, recall at 100% precision, and area under the curve. Experimental findings demonstrate that SVS-VPR surpasses several state-of-the-art deep learning-based methods, boosting robustness against significant changes in viewpoint and appearance while maintaining efficient matching time performance. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

13 pages, 839 KiB  
Article
Contextual Patch-NetVLAD: Context-Aware Patch Feature Descriptor and Patch Matching Mechanism for Visual Place Recognition
by Wenyuan Sun, Wentang Chen, Runxiang Huang and Jing Tian
Sensors 2024, 24(3), 855; https://doi.org/10.3390/s24030855 - 28 Jan 2024
Cited by 1 | Viewed by 926
Abstract
The goal of visual place recognition (VPR) is to determine the location of a query image by identifying its place in a collection of image databases. Visual sensor technologies are crucial for visual place recognition as they allow for precise identification and location [...] Read more.
The goal of visual place recognition (VPR) is to determine the location of a query image by identifying its place in a collection of image databases. Visual sensor technologies are crucial for visual place recognition as they allow for precise identification and location of query images within a database. Global descriptor-based VPR methods face the challenge of accurately capturing the local specific regions within a scene; consequently, it leads to an increasing probability of confusion during localization in such scenarios. To tackle feature extraction and feature matching challenges in VPR, we propose a modified patch-NetVLAD strategy that includes two new modules: a context-aware patch descriptor and a context-aware patch matching mechanism. Firstly, we propose a context-driven patch feature descriptor to overcome the limitations of global and local descriptors in visual place recognition. This descriptor aggregates features from each patch’s surrounding neighborhood. Secondly, we introduce a context-driven feature matching mechanism that utilizes cluster and saliency context-driven weighting rules to assign higher weights to patches that are less similar to densely populated or locally similar regions for improved localization performance. We further incorporate both of these modules into the patch-NetVLAD framework, resulting in a new approach called contextual patch-NetVLAD. Experimental results are provided to show that our proposed approach outperforms other state-of-the-art methods to achieve a Recall@10 score of 99.82 on Pittsburgh30k, 99.82 on FMDataset, and 97.68 on our benchmark dataset. Full article
(This article belongs to the Special Issue Vision Sensors: Image Processing Technologies and Applications)
Show Figures

Figure 1

15 pages, 2566 KiB  
Article
A Low-Cost Inertial Measurement Unit Motion Capture System for Operation Posture Collection and Recognition
by Mingyue Yin, Jianguang Li and Tiancong Wang
Sensors 2024, 24(2), 686; https://doi.org/10.3390/s24020686 - 21 Jan 2024
Viewed by 1444
Abstract
In factories, human posture recognition facilitates human–machine collaboration, human risk management, and workflow improvement. Compared to optical sensors, inertial sensors have the advantages of portability and resistance to obstruction, making them suitable for factories. However, existing product-level inertial sensing solutions are generally expensive. [...] Read more.
In factories, human posture recognition facilitates human–machine collaboration, human risk management, and workflow improvement. Compared to optical sensors, inertial sensors have the advantages of portability and resistance to obstruction, making them suitable for factories. However, existing product-level inertial sensing solutions are generally expensive. This paper proposes a low-cost human motion capture system based on BMI 160, a type of six-axis inertial measurement unit (IMU). Based on WIFI communication, the collected data are processed to obtain the displacement of human joints’ rotation angles around XYZ directions and the displacement in XYZ directions, then the human skeleton hierarchical relationship was combined to calculate the real-time human posture. Furthermore, the digital human model was been established on Unity3D to synchronously visualize and present human movements. We simulated assembly operations in a virtual reality environment for human posture data collection and posture recognition experiments. Six inertial sensors were placed on the chest, waist, knee joints, and ankle joints of both legs. There were 16,067 labeled samples obtained for posture recognition model training, and the accumulated displacement and the rotation angle of six joints in the three directions were used as input features. The bi-directional long short-term memory (BiLSTM) model was used to identify seven common operation postures: standing, slightly bending, deep bending, half-squatting, squatting, sitting, and supine, with an average accuracy of 98.24%. According to the experiment result, the proposed method could be used to develop a low-cost and effective solution to human posture recognition for factory operation. Full article
(This article belongs to the Special Issue Advanced Sensors for Real-Time Monitoring Applications ‖)
Show Figures

Figure 1

38 pages, 7240 KiB  
Article
Challenges of Engineering Applications of Descriptive Geometry
by Zsuzsa Balajti
Symmetry 2024, 16(1), 50; https://doi.org/10.3390/sym16010050 - 29 Dec 2023
Cited by 1 | Viewed by 1889
Abstract
Descriptive geometry has indispensable applications in many engineering activities. A summary of these is provided in the first chapter of this paper, preceded by a brief introduction into the methods of representation and mathematical recognition related to our research area, such as projection [...] Read more.
Descriptive geometry has indispensable applications in many engineering activities. A summary of these is provided in the first chapter of this paper, preceded by a brief introduction into the methods of representation and mathematical recognition related to our research area, such as projection perpendicular to a single plane, projection images created by perpendicular projection onto two mutually perpendicular image planes, but placed on one plane, including the research of curves and movements, visual representation and perception relying on a mathematical approach, and studies on toothed driving pairs and tool geometry in order to place the development presented here among them. As a result of the continuous variability of the technological environment according to various optimization aspects, the engineering activities must also be continuously adapted to the changes, for which an appropriate approach and formulation are required from the practitioners of descriptive geometry, and can even lead to improvement in the field of descriptive geometry. The imaging procedures are always based on the methods and theorems of descriptive geometry. Our aim was to examine the spatial variation in the wear of the tool edge and the machining of the components of toothed drive pairs using two cameras. Resolving contradictions in spatial geometry reconstruction research is a constant challenge, to which a possible answer in many cases is the searching for the right projection direction, and positioning cameras appropriately. A special method of enumerating the possible infinite viewpoints for the reconstruction of tool surface edge curves is presented in the second part of this paper. In the case of the monitoring the shape geometry, taking into account the interchangeability of the projection directions, i.e., the property of symmetry, all images made from two perpendicular directions were taken into account. The procedure for determining the correct directions in a mathematically exact way is also presented through examples. A new criterion was formulated for the tested tooth edge of the hob to take into account the shading of the tooth next to it. The analysis and some of the results of the Monge mapping, suitable for the solution of a mechanical engineering task to be solved in a specific technical environment, namely defining the conditions for camera placements that ensure reconstructibility are also presented. Taking physical shadowing into account, conclusions can be drawn about the degree of distortion of the machined surface from the spatial deformation of the edge curve of the tool reconstructed with correctly positioned cameras. Full article
Show Figures

Graphical abstract

20 pages, 4464 KiB  
Article
Feature-Based Place Recognition Using Forward-Looking Sonar
by Ana Rita Gaspar and Aníbal Matos
J. Mar. Sci. Eng. 2023, 11(11), 2198; https://doi.org/10.3390/jmse11112198 - 19 Nov 2023
Cited by 1 | Viewed by 1119
Abstract
Some structures in the harbour environment need to be inspected regularly. However, these scenarios present a major challenge for the accurate estimation of a vehicle’s position and subsequent recognition of similar images. In these scenarios, visibility can be poor, making place recognition a [...] Read more.
Some structures in the harbour environment need to be inspected regularly. However, these scenarios present a major challenge for the accurate estimation of a vehicle’s position and subsequent recognition of similar images. In these scenarios, visibility can be poor, making place recognition a difficult task as the visual appearance of a local feature can be compromised. Under these operating conditions, imaging sonars are a promising solution. The quality of the captured images is affected by some factors but they do not suffer from haze, which is an advantage. Therefore, a purely acoustic approach for unsupervised recognition of similar images based on forward-looking sonar (FLS) data is proposed to solve the perception problems in harbour facilities. To simplify the variation of environment parameters and sensor configurations, and given the need for online data for these applications, a harbour scenario was recreated using the Stonefish simulator. Therefore, experiments were conducted with preconfigured user trajectories to simulate inspections in the vicinity of structures. The place recognition approach performs better than the results obtained from optical images. The proposed method provides a good compromise in terms of distinctiveness, achieving 87.5% recall considering appropriate constraints and assumptions for this task given its impact on navigation success. That is, it is based on a similarity threshold of 0.3 and 12 consistent features to consider only effective loops. The behaviour of FLS is the same regardless of the environment conditions and thus this work opens new horizons for the use of these sensors as a great aid for underwater perception, namely, to avoid degradation of navigation performance in muddy conditions. Full article
(This article belongs to the Special Issue Underwater Engineering and Image Processing)
Show Figures

Figure 1

11 pages, 2348 KiB  
Article
Visual Place Recognition of Robots via Global Features of Scan-Context Descriptors with Dictionary-Based Coding
by Minying Ye and Kanji Tanaka
Appl. Sci. 2023, 13(15), 9040; https://doi.org/10.3390/app13159040 - 7 Aug 2023
Viewed by 1210
Abstract
Self-localization is a crucial requirement for visual robot place recognition. Particularly, the 3D point cloud obtained from 3D laser rangefinders (LRF) is applied to it. The critical part is the efficiency and accuracy of place recognition of visual robots based on the 3D [...] Read more.
Self-localization is a crucial requirement for visual robot place recognition. Particularly, the 3D point cloud obtained from 3D laser rangefinders (LRF) is applied to it. The critical part is the efficiency and accuracy of place recognition of visual robots based on the 3D point cloud. The current solution is converting the 3D point clouds to 2D images, and then processing these with a convolutional neural network (CNN) classification. Although the popular scan-context descriptor obtained from the 3D data can retain parts of the 3D point cloud characteristics, its accuracy is slightly low. This is because the scan-context image under the adjacent label inclines to be confusing. This study reclassifies the image according to the CNN global features through image feature extraction. In addition, the dictionary-based coding is leveraged to construct the retrieval dataset. The experiment was conducted on the North-Campus-Long-Term (NCLT) dataset under four-seasons conditions. The results show that the proposed method is superior compared to the other methods without real-time Global Positioning System (GPS) information. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

25 pages, 6279 KiB  
Article
Optimizing Appearance-Based Localization with Catadioptric Cameras: Small-Footprint Models for Real-Time Inference on Edge Devices
by Marta Rostkowska and Piotr Skrzypczyński
Sensors 2023, 23(14), 6485; https://doi.org/10.3390/s23146485 - 18 Jul 2023
Cited by 2 | Viewed by 924
Abstract
This paper considers the task of appearance-based localization: visual place recognition from omnidirectional images obtained from catadioptric cameras. The focus is on designing an efficient neural network architecture that accurately and reliably recognizes indoor scenes on distorted images from a catadioptric camera, even [...] Read more.
This paper considers the task of appearance-based localization: visual place recognition from omnidirectional images obtained from catadioptric cameras. The focus is on designing an efficient neural network architecture that accurately and reliably recognizes indoor scenes on distorted images from a catadioptric camera, even in self-similar environments with few discernible features. As the target application is the global localization of a low-cost service mobile robot, the proposed solutions are optimized toward being small-footprint models that provide real-time inference on edge devices, such as Nvidia Jetson. We compare several design choices for the neural network-based architecture of the localization system and then demonstrate that the best results are achieved with embeddings (global descriptors) yielded by exploiting transfer learning and fine tuning on a limited number of catadioptric images. We test our solutions on two small-scale datasets collected using different catadioptric cameras in the same office building. Next, we compare the performance of our system to state-of-the-art visual place recognition systems on the publicly available COLD Freiburg and Saarbrücken datasets that contain images collected under different lighting conditions. Our system compares favourably to the competitors both in terms of the accuracy of place recognition and the inference time, providing a cost- and energy-efficient means of appearance-based localization for an indoor service robot. Full article
(This article belongs to the Special Issue Sensors for Robots II)
Show Figures

Figure 1

15 pages, 6974 KiB  
Article
Video-Based Human Activity Recognition Using Deep Learning Approaches
by Guilherme Augusto Silva Surek, Laio Oriel Seman, Stefano Frizzo Stefenon, Viviana Cocco Mariani and Leandro dos Santos Coelho
Sensors 2023, 23(14), 6384; https://doi.org/10.3390/s23146384 - 13 Jul 2023
Cited by 14 | Viewed by 8457
Abstract
Due to its capacity to gather vast, high-level data about human activity from wearable or stationary sensors, human activity recognition substantially impacts people’s day-to-day lives. Multiple people and things may be seen acting in the video, dispersed throughout the frame in various places. [...] Read more.
Due to its capacity to gather vast, high-level data about human activity from wearable or stationary sensors, human activity recognition substantially impacts people’s day-to-day lives. Multiple people and things may be seen acting in the video, dispersed throughout the frame in various places. Because of this, modeling the interactions between many entities in spatial dimensions is necessary for visual reasoning in the action recognition task. The main aim of this paper is to evaluate and map the current scenario of human actions in red, green, and blue videos, based on deep learning models. A residual network (ResNet) and a vision transformer architecture (ViT) with a semi-supervised learning approach are evaluated. The DINO (self-DIstillation with NO labels) is used to enhance the potential of the ResNet and ViT. The evaluated benchmark is the human motion database (HMDB51), which tries to better capture the richness and complexity of human actions. The obtained results for video classification with the proposed ViT are promising based on performance metrics and results from the recent literature. The results obtained using a bi-dimensional ViT with long short-term memory demonstrated great performance in human action recognition when applied to the HMDB51 dataset. The mentioned architecture presented 96.7 ± 0.35% and 41.0 ± 0.27% in terms of accuracy (mean ± standard deviation values) in the train and test phases of the HMDB51 dataset, respectively. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Back to TopTop