Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,730)

Search Parameters:
Keywords = Vision Transformer

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
10 pages, 619 KiB  
Systematic Review
The Current Role of Single-Site Robotic Approach in Liver Resection: A Systematic Review
by Simone Guadagni, Annalisa Comandatore, Niccolò Furbetta, Gregorio Di Franco, Bianca Bechini, Filippo Vagelli, Niccolò Ramacciotti, Matteo Palmeri, Giulio Di Candio, Elisa Giovannetti and Luca Morelli
Life 2024, 14(7), 894; https://doi.org/10.3390/life14070894 (registering DOI) - 19 Jul 2024
Abstract
Background: Liver resection is a critical surgical procedure for treating various hepatic pathologies. Minimally invasive approaches have gradually gained importance, and, in recent years, the introduction of robotic surgery has transformed the surgical landscape, providing potential advantages such as enhanced precision and stable [...] Read more.
Background: Liver resection is a critical surgical procedure for treating various hepatic pathologies. Minimally invasive approaches have gradually gained importance, and, in recent years, the introduction of robotic surgery has transformed the surgical landscape, providing potential advantages such as enhanced precision and stable ergonomic vision. Among robotic techniques, the single-site approach has garnered increasing attention due to its potential to minimize surgical trauma and improve cosmetic outcomes. However, the full extent of its utility and efficacy in liver resection has yet to be thoroughly explored. Methods: We conducted a comprehensive systematic review to evaluate the current role of the single-site robotic approach in liver resection. A detailed search of PubMed was performed to identify relevant studies published up to January 2024. Eligible studies were critically appraised, and data concerning surgical outcomes, perioperative parameters, and post-operative complications were extracted and analyzed. Results: Our review synthesizes evidence from six studies, encompassing a total of seven cases undergoing robotic single-site hepatic resection (SSHR) using various versions of the da Vinci© system. Specifically, the procedures included five left lateral segmentectomy, one right hepatectomy, and one caudate lobe resection. We provide a summary of the surgical techniques, indications, selection criteria, and outcomes associated with this approach. Conclusion: The single-site robotic approach represents an option among the minimally invasive approaches in liver surgery. However, although the feasibility has been demonstrated, further studies are needed to elucidate its optimal utilization, long-term outcomes, and comparative effectiveness against the other techniques. This systematic review provides valuable insights into the current state of single-site robotic liver resection and underscores the need for continued research in this rapidly evolving field. Full article
(This article belongs to the Special Issue Robot-Assisted Surgery: New Trends and Solutions)
Show Figures

Figure 1

19 pages, 10494 KiB  
Article
RT-DETR-Tomato: Tomato Target Detection Algorithm Based on Improved RT-DETR for Agricultural Safety Production
by Zhimin Zhao, Shuo Chen, Yuheng Ge, Penghao Yang, Yunkun Wang and Yunsheng Song
Appl. Sci. 2024, 14(14), 6287; https://doi.org/10.3390/app14146287 (registering DOI) - 19 Jul 2024
Abstract
The detection of tomatoes is of vital importance for enhancing production efficiency, with image recognition-based tomato detection methods being the primary approach. However, these methods face challenges such as the difficulty in extracting small targets, low detection accuracy, and slow processing speeds. Therefore, [...] Read more.
The detection of tomatoes is of vital importance for enhancing production efficiency, with image recognition-based tomato detection methods being the primary approach. However, these methods face challenges such as the difficulty in extracting small targets, low detection accuracy, and slow processing speeds. Therefore, this paper proposes an improved RT-DETR-Tomato model for efficient tomato detection under complex environmental conditions. The model mainly consists of a Swin Transformer block, a BiFormer module, path merging, multi-scale convolutional layers, and fully connected layers. In this proposed model, Swin Transformer is chosen as the new backbone network to replace ResNet50 because of its superior ability to capture broader global dependency relationships and contextual information. Meanwhile, a lightweight BiFormer block is adopted in Swin Transformer to reduce computational complexity through content-aware flexible computation allocation. Experimental results show that the average accuracy of the final RT-DETR-Tomato model is greatly improved compared to the original model, and the model training time is greatly reduced, demonstrating better environmental adaptability. In the future, the RT-DETR-Tomato model can be integrated with intelligent patrol and picking robots, enabling precise identification of crops and ensuring the safety of crops and the smooth progress of agricultural production. Full article
Show Figures

Figure 1

21 pages, 3747 KiB  
Article
ViT-PSO-SVM: Cervical Cancer Predication Based on Integrating Vision Transformer with Particle Swarm Optimization and Support Vector Machine
by Abdulaziz AlMohimeed, Mohamed Shehata, Nora El-Rashidy, Sherif Mostafa, Amira Samy Talaat and Hager Saleh
Bioengineering 2024, 11(7), 729; https://doi.org/10.3390/bioengineering11070729 - 18 Jul 2024
Viewed by 111
Abstract
Cervical cancer (CCa) is the fourth most prevalent and common cancer affecting women worldwide, with increasing incidence and mortality rates. Hence, early detection of CCa plays a crucial role in improving outcomes. Non-invasive imaging procedures with good diagnostic performance are desirable and have [...] Read more.
Cervical cancer (CCa) is the fourth most prevalent and common cancer affecting women worldwide, with increasing incidence and mortality rates. Hence, early detection of CCa plays a crucial role in improving outcomes. Non-invasive imaging procedures with good diagnostic performance are desirable and have the potential to lessen the degree of intervention associated with the gold standard, biopsy. Recently, artificial intelligence-based diagnostic models such as Vision Transformers (ViT) have shown promising performance in image classification tasks, rivaling or surpassing traditional convolutional neural networks (CNNs). This paper studies the effect of applying a ViT to predict CCa using different image benchmark datasets. A newly developed approach (ViT-PSO-SVM) was presented for boosting the results of the ViT based on integrating the ViT with particle swarm optimization (PSO), and support vector machine (SVM). First, the proposed framework extracts features from the Vision Transformer. Then, PSO is used to reduce the complexity of extracted features and optimize feature representation. Finally, a softmax classification layer is replaced with an SVM classification model to precisely predict CCa. The models are evaluated using two benchmark cervical cell image datasets, namely SipakMed and Herlev, with different classification scenarios: two, three, and five classes. The proposed approach achieved 99.112% accuracy and 99.113% F1-score for SipakMed with two classes and achieved 97.778% accuracy and 97.805% F1-score for Herlev with two classes outperforming other Vision Transformers, CNN models, and pre-trained models. Finally, GradCAM is used as an explainable artificial intelligence (XAI) tool to visualize and understand the regions of a given image that are important for a model’s prediction. The obtained experimental results demonstrate the feasibility and efficacy of the developed ViT-PSO-SVM approach and hold the promise of providing a robust, reliable, accurate, and non-invasive diagnostic tool that will lead to improved healthcare outcomes worldwide. Full article
Show Figures

Graphical abstract

16 pages, 3596 KiB  
Article
Strategies and Actions’ Definition for the New Territorial Government Plan of Voghera, Italy: Towards a Healthier City
by Roberto De Lotto, Caterina Pietra, Matilde Sessi and Elisabetta Venco
Urban Sci. 2024, 8(3), 87; https://doi.org/10.3390/urbansci8030087 - 17 Jul 2024
Viewed by 215
Abstract
Cities require flexible and participatory planning methodologies to address complex and evolving urban challenges. In Italy, the legislative framework is defined at the regional level according to national laws and jurisprudence. The Lombardy region introduced the Planning Document (PD) in the Territorial Government [...] Read more.
Cities require flexible and participatory planning methodologies to address complex and evolving urban challenges. In Italy, the legislative framework is defined at the regional level according to national laws and jurisprudence. The Lombardy region introduced the Planning Document (PD) in the Territorial Government Plan (PGT) as a strategic tool capable of adapting to changes. Based on a strategic vision, this document guides concrete actions for urban transformation, actively involving stakeholders, including citizens. The PD’s role is to translate the political program into an urban planning design, encompassing both technical and political dimensions. The political aspect is usually emphasized in the strategic component of the document. Following the formation process of the whole city plan, the authors define the key strategies in the PD for a healthier urban future for the city of Voghera. It emerges as a balanced urban development that combines economic growth, environmental preservation, and community well-being. In the paper, the authors synthesize Voghera’s PD as an example of strategic planning that interacts with practical planning actions and guides both public and private decisions about the city’s development toward a healthier city. Full article
Show Figures

Figure 1

41 pages, 33915 KiB  
Article
Four Transformer-Based Deep Learning Classifiers Embedded with an Attention U-Net-Based Lung Segmenter and Layer-Wise Relevance Propagation-Based Heatmaps for COVID-19 X-ray Scans
by Siddharth Gupta, Arun K. Dubey, Rajesh Singh, Mannudeep K. Kalra, Ajith Abraham, Vandana Kumari, John R. Laird, Mustafa Al-Maini, Neha Gupta, Inder Singh, Klaudija Viskovic, Luca Saba and Jasjit S. Suri
Diagnostics 2024, 14(14), 1534; https://doi.org/10.3390/diagnostics14141534 - 16 Jul 2024
Viewed by 359
Abstract
Background: Diagnosing lung diseases accurately is crucial for proper treatment. Convolutional neural networks (CNNs) have advanced medical image processing, but challenges remain in their accurate explainability and reliability. This study combines U-Net with attention and Vision Transformers (ViTs) to enhance lung disease [...] Read more.
Background: Diagnosing lung diseases accurately is crucial for proper treatment. Convolutional neural networks (CNNs) have advanced medical image processing, but challenges remain in their accurate explainability and reliability. This study combines U-Net with attention and Vision Transformers (ViTs) to enhance lung disease segmentation and classification. We hypothesize that Attention U-Net will enhance segmentation accuracy and that ViTs will improve classification performance. The explainability methodologies will shed light on model decision-making processes, aiding in clinical acceptance. Methodology: A comparative approach was used to evaluate deep learning models for segmenting and classifying lung illnesses using chest X-rays. The Attention U-Net model is used for segmentation, and architectures consisting of four CNNs and four ViTs were investigated for classification. Methods like Gradient-weighted Class Activation Mapping plus plus (Grad-CAM++) and Layer-wise Relevance Propagation (LRP) provide explainability by identifying crucial areas influencing model decisions. Results: The results support the conclusion that ViTs are outstanding in identifying lung disorders. Attention U-Net obtained a Dice Coefficient of 98.54% and a Jaccard Index of 97.12%. ViTs outperformed CNNs in classification tasks by 9.26%, reaching an accuracy of 98.52% with MobileViT. An 8.3% increase in accuracy was seen while moving from raw data classification to segmented image classification. Techniques like Grad-CAM++ and LRP provided insights into the decision-making processes of the models. Conclusions: This study highlights the benefits of integrating Attention U-Net and ViTs for analyzing lung diseases, demonstrating their importance in clinical settings. Emphasizing explainability clarifies deep learning processes, enhancing confidence in AI solutions and perhaps enhancing clinical acceptance for improved healthcare results. Full article
(This article belongs to the Special Issue Artificial Intelligence in Biomedical Image Analysis—2nd Edition)
Show Figures

Figure 1

16 pages, 1341 KiB  
Article
DSCEH: Dual-Stream Correlation-Enhanced Deep Hashing for Image Retrieval
by Yulin Yang, Huizhen Chen, Rongkai Liu, Shuning Liu, Yu Zhan, Chao Hu and Ronghua Shi
Mathematics 2024, 12(14), 2221; https://doi.org/10.3390/math12142221 - 16 Jul 2024
Viewed by 220
Abstract
Deep Hashing is widely used for large-scale image-retrieval tasks to speed up the retrieval process. Current deep hashing methods are mainly based on the Convolutional Neural Network (CNN) or Vision Transformer (VIT). They only use the local or global features for low-dimensional mapping [...] Read more.
Deep Hashing is widely used for large-scale image-retrieval tasks to speed up the retrieval process. Current deep hashing methods are mainly based on the Convolutional Neural Network (CNN) or Vision Transformer (VIT). They only use the local or global features for low-dimensional mapping and only use the similarity loss function to optimize the correlation between pairwise or triplet images. Therefore, the effectiveness of deep hashing methods is limited. In this paper, we propose a dual-stream correlation-enhanced deep hashing framework (DSCEH), which uses the local and global features of the image for low-dimensional mapping and optimizes the correlation of images from the model architecture. DSCEH consists of two main steps: model training and deep-hash-based retrieval. During the training phase, a dual-network structure comprising CNN and VIT is employed for feature extraction. Subsequently, feature fusion is achieved through a concatenation operation, followed by similarity evaluation based on the class token acquired from VIT to establish edge relationships. The Graph Convolutional Network is then utilized to enhance correlation optimization between images, resulting in the generation of high-quality hash codes. This stage facilitates the development of an optimized hash model for image retrieval. In the retrieval stage, all images within the database and the to-be-retrieved images are initially mapped to hash codes using the aforementioned hash model. The retrieval results are subsequently determined based on the Hamming distance between the hash codes. We conduct experiments on three datasets: CIFAR-10, MSCOCO, and NUSWIDE. Experimental results show the superior performance of DSCEH, which helps with fast and accurate image retrieval. Full article
Show Figures

Figure 1

35 pages, 8466 KiB  
Article
Comprehensive Evaluation of the Development Level of China’s Characteristic Towns under the Perspective of an Urban–Rural Integration Development Strategy
by Xuekelaiti Haiyirete, Qian Xu, Jian Wang, Xinjie Liu and Kui Zeng
Land 2024, 13(7), 1069; https://doi.org/10.3390/land13071069 - 16 Jul 2024
Viewed by 280
Abstract
With the advancement of urbanization and the continuous deepening of reforms in urban–rural systems, China’s urbanization process has entered a new era of integrated urban–rural integration. Currently, as a global “new green revolution” gains momentum, numerous countries are deeply integrating the concept of [...] Read more.
With the advancement of urbanization and the continuous deepening of reforms in urban–rural systems, China’s urbanization process has entered a new era of integrated urban–rural integration. Currently, as a global “new green revolution” gains momentum, numerous countries are deeply integrating the concept of sustainable development into new urban planning. Against this backdrop, urban planners worldwide are committed to building green, livable, and smart cities that can meet the needs of the present generation without compromising the ability of future generations to meet their needs, thus achieving the vision of harmonious coexistence between humanity and nature. Characteristic towns, leveraging their resource advantages, play a significant role in achieving sustainable regional economic development. They serve as valuable references for China’s urban transformation and upgrading, as well as for promoting rural urbanization, and are crucial avenues for advancing China’s urban–rural integration development strategy. The evaluation of the development level of characteristic towns is a necessary step in their progress and a strong guarantee for promoting their construction and development. Therefore, effectively evaluating the social benefits of characteristic towns is paramount. This study constructs an evaluation model based on the grey rough set theory and Technique for Order Preference by Similarity to Ideal Solution of TOPSIS. Firstly, an evaluation index system for the development level of characteristic towns is established. Then, the grey relational analysis method and rough set theory are used to reduce the index attributes, while the conditional information entropy theory is introduced to determine the weights of the reduced indicators. Finally, the TOPSIS model is applied to evaluate the development level of characteristic towns. Through empirical research, eight characteristic towns in Zhejiang Province, China, were assessed and ranked, verifying the effectiveness and feasibility of the proposed model. Full article
Show Figures

Figure 1

22 pages, 15279 KiB  
Article
Reconstruction of OFDM Signals Using a Dual Discriminator CGAN with BiLSTM and Transformer
by Yuhai Li, Youchen Fan, Shunhu Hou, Yufei Niu, You Fu and Hanzhe Li
Sensors 2024, 24(14), 4562; https://doi.org/10.3390/s24144562 - 14 Jul 2024
Viewed by 347
Abstract
Communication signal reconstruction technology represents a critical area of research within communication countermeasures and signal processing. Considering traditional OFDM signal reconstruction methods’ intricacy and suboptimal reconstruction performance, a dual discriminator CGAN model incorporating LSTM and Transformer is proposed. When reconstructing OFDM signals using [...] Read more.
Communication signal reconstruction technology represents a critical area of research within communication countermeasures and signal processing. Considering traditional OFDM signal reconstruction methods’ intricacy and suboptimal reconstruction performance, a dual discriminator CGAN model incorporating LSTM and Transformer is proposed. When reconstructing OFDM signals using the traditional CNN network, it becomes challenging to extract intricate temporal information. Therefore, the BiLSTM network is incorporated into the first discriminator to capture timing details of the IQ (In-phase and Quadrature-phase) sequence and constellation map information of the AP (Amplitude and Phase) sequence. Subsequently, following the addition of fixed position coding, these data are fed into the core network constructed based on the Transformer Encoder for further learning. Simultaneously, to capture the correlation between the two IQ signals, the VIT (Vision in Transformer) concept is incorporated into the second discriminator. The IQ sequence is treated as a single-channel two-dimensional image and segmented into pixel blocks containing IQ sequence through Conv2d. Fixed position coding is added and sent to the Transformer core network for learning. The generator network transforms input noise data into a dimensional space aligned with the IQ signal and embedding vector dimensions. It appends identical position encoding information to the IQ sequence before sending it to the Transformer network. The experimental results demonstrate that, under commonly utilized OFDM modulation formats such as BPSK, QPSK, and 16QAM, the time series waveform, constellation diagram, and spectral diagram exhibit high-quality reconstruction. Our algorithm achieves improved signal quality while managing complexity compared to other reconstruction methods. Full article
(This article belongs to the Special Issue Computer Vision Recognition and Communication Sensing System)
Show Figures

Figure 1

46 pages, 2893 KiB  
Review
A Comprehensive Review of AI Diagnosis Strategies for Age-Related Macular Degeneration (AMD)
by Aya A. Abd El-Khalek, Hossam Magdy Balaha, Ashraf Sewelam, Mohammed Ghazal, Abeer T. Khalil, Mohy Eldin A. Abo-Elsoud and Ayman El-Baz
Bioengineering 2024, 11(7), 711; https://doi.org/10.3390/bioengineering11070711 - 13 Jul 2024
Viewed by 262
Abstract
The rapid advancement of computational infrastructure has led to unprecedented growth in machine learning, deep learning, and computer vision, fundamentally transforming the analysis of retinal images. By utilizing a wide array of visual cues extracted from retinal fundus images, sophisticated artificial intelligence models [...] Read more.
The rapid advancement of computational infrastructure has led to unprecedented growth in machine learning, deep learning, and computer vision, fundamentally transforming the analysis of retinal images. By utilizing a wide array of visual cues extracted from retinal fundus images, sophisticated artificial intelligence models have been developed to diagnose various retinal disorders. This paper concentrates on the detection of Age-Related Macular Degeneration (AMD), a significant retinal condition, by offering an exhaustive examination of recent machine learning and deep learning methodologies. Additionally, it discusses potential obstacles and constraints associated with implementing this technology in the field of ophthalmology. Through a systematic review, this research aims to assess the efficacy of machine learning and deep learning techniques in discerning AMD from different modalities as they have shown promise in the field of AMD and retinal disorders diagnosis. Organized around prevalent datasets and imaging techniques, the paper initially outlines assessment criteria, image preprocessing methodologies, and learning frameworks before conducting a thorough investigation of diverse approaches for AMD detection. Drawing insights from the analysis of more than 30 selected studies, the conclusion underscores current research trajectories, major challenges, and future prospects in AMD diagnosis, providing a valuable resource for both scholars and practitioners in the domain. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Figure 1

22 pages, 3609 KiB  
Article
Feature-Model-Based In-Process Measurement of Machining Precision Using Computer Vision
by Zhimeng Li, Weiwen Liao, Long Zhang, Yuxiang Ren, Guangming Sun and Yicun Sang
Appl. Sci. 2024, 14(14), 6094; https://doi.org/10.3390/app14146094 - 12 Jul 2024
Viewed by 290
Abstract
In-process measurement of machining precision is of great importance to advanced manufacturing, which is an essential technology to realize compensation machining. In terms of cost-effectiveness and repeatability of computer vision, it has become a trend to replace traditional manual measurement with computer vision [...] Read more.
In-process measurement of machining precision is of great importance to advanced manufacturing, which is an essential technology to realize compensation machining. In terms of cost-effectiveness and repeatability of computer vision, it has become a trend to replace traditional manual measurement with computer vision measurement. In this paper, an in-process measurement method is proposed to improve precision and reduce the costs of machining precision. Firstly, a universal features model framework of machining parts is established to analyze the CAD model and give standard information on the machining features. Secondly, a window generator is proposed to adaptively crop the image of the machining part according to the size of features. Then, the automatic detection of the edges of machining features is performed based on regions of interest (ROIs) from the cropped image. Finally, the measurement of machining precision is realized through a Hough transform on the detected edges. To verify the effectiveness of the proposed method, a series of in-process measurement experiments were carried out on machined parts with various features and sheet metal parts, such as dimensional accuracy measurement tests, straightness measurement tests, and roundness measurement tests under the same part conditions. The best measurement accuracy of this method for dimensional accuracy, straightness, and roundness were 99%, 97%, and 96%, respectively. In comparison, precision measurement experiments were conducted under the same conditions using the Canny edge detection algorithm, the sub-pixel edge detection algorithm, and the Otsu–Canny edge detection algorithm. Experimental results show that the feature-model-based in-process measurement of machining precision using computer vision demonstrates superiority and effectiveness among various measurement methods. Full article
Show Figures

Figure 1

3 pages, 130 KiB  
Editorial
Recent Advances in Computer Vision: Technologies and Applications
by Mingliang Gao, Guofeng Zou, Yun Li and Xiangyu Guo
Electronics 2024, 13(14), 2734; https://doi.org/10.3390/electronics13142734 - 12 Jul 2024
Viewed by 359
Abstract
Computer vision plays a pivotal role in modern society, which transforms fields such as healthcare, transportation, entertainment, and manufacturing by enabling machines to interpret and understand visual information, revolutionizing industries, and enhancing daily life [...] Full article
(This article belongs to the Special Issue Recent Advances in Computer Vision: Technologies and Applications)
40 pages, 5912 KiB  
Article
ConVision Benchmark: A Contemporary Framework to Benchmark CNN and ViT Models
by Shreyas Bangalore Vijayakumar, Krishna Teja Chitty-Venkata, Kanishk Arya and Arun K. Somani
AI 2024, 5(3), 1132-1171; https://doi.org/10.3390/ai5030056 - 11 Jul 2024
Viewed by 426
Abstract
Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have shown remarkable performance in computer vision tasks, including object detection and image recognition. These models have evolved significantly in architecture, efficiency, and versatility. Concurrently, deep-learning frameworks have diversified, with versions that often complicate reproducibility [...] Read more.
Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have shown remarkable performance in computer vision tasks, including object detection and image recognition. These models have evolved significantly in architecture, efficiency, and versatility. Concurrently, deep-learning frameworks have diversified, with versions that often complicate reproducibility and unified benchmarking. We propose ConVision Benchmark, a comprehensive framework in PyTorch, to standardize the implementation and evaluation of state-of-the-art CNN and ViT models. This framework addresses common challenges such as version mismatches and inconsistent validation metrics. As a proof of concept, we performed an extensive benchmark analysis on a COVID-19 dataset, encompassing nearly 200 CNN and ViT models in which DenseNet-161 and MaxViT-Tiny achieved exceptional accuracy with a peak performance of around 95%. Although we primarily used the COVID-19 dataset for image classification, the framework is adaptable to a variety of datasets, enhancing its applicability across different domains. Our methodology includes rigorous performance evaluations, highlighting metrics such as accuracy, precision, recall, F1 score, and computational efficiency (FLOPs, MACs, CPU, and GPU latency). The ConVision Benchmark facilitates a comprehensive understanding of model efficacy, aiding researchers in deploying high-performance models for diverse applications. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

12 pages, 3550 KiB  
Article
Measurement Approach for the Pose of Flanges in Cabin Assemblies through Distributed Vision
by Xiaojie Ma, Jieyu Zhang, Tianchao Miao, Fawen Xie and Zhongqiu Geng
Sensors 2024, 24(14), 4484; https://doi.org/10.3390/s24144484 - 11 Jul 2024
Viewed by 238
Abstract
The relative rotation angle between two cabins should be automatically and precisely obtained during automated assembly processes for spacecraft and aircraft. This paper introduces a method to solve this problem based on distributed vision, where two groups of cameras are employed to take [...] Read more.
The relative rotation angle between two cabins should be automatically and precisely obtained during automated assembly processes for spacecraft and aircraft. This paper introduces a method to solve this problem based on distributed vision, where two groups of cameras are employed to take images of mating features, such as dowel pins and holes, in oblique directions. Then, the relative rotation between the mating flanges of two cabins is calculated. The key point is the registration of the distributed cameras; thus, a simple and practical registration process is designed. It is assumed that there are rigid and scaling transformations among the world coordinate systems (WCS) of each camera. Therefore, the rigid-correct and scaling-correct matrices are adopted to register the cameras. An auxiliary registration device with known features is designed and moved in the cameras’ field of view (FOV) to obtain the matrix parameters so that each camera acquires traces of every feature. The parameters can be solved using a genetic algorithm based on the known geometric relationships between the trajectories on the registration devices. This paper designs a prototype to verify the method. The precision reaches 0.02° in the measuring space of 340 mm. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

27 pages, 5461 KiB  
Essay
BAFormer: A Novel Boundary-Aware Compensation UNet-like Transformer for High-Resolution Cropland Extraction
by Zhiyong Li, Youming Wang, Fa Tian, Junbo Zhang, Yijie Chen and Kunhong Li
Remote Sens. 2024, 16(14), 2526; https://doi.org/10.3390/rs16142526 - 10 Jul 2024
Viewed by 492
Abstract
Utilizing deep learning for semantic segmentation of cropland from remote sensing imagery has become a crucial technique in land surveys. Cropland is highly heterogeneous and fragmented, and existing methods often suffer from inaccurate boundary segmentation. This paper introduces a UNet-like boundary-aware compensation model [...] Read more.
Utilizing deep learning for semantic segmentation of cropland from remote sensing imagery has become a crucial technique in land surveys. Cropland is highly heterogeneous and fragmented, and existing methods often suffer from inaccurate boundary segmentation. This paper introduces a UNet-like boundary-aware compensation model (BAFormer). Cropland boundaries typically exhibit rapid transformations in pixel values and texture features, often appearing as high-frequency features in remote sensing images. To enhance the recognition of these high-frequency features as represented by cropland boundaries, the proposed BAFormer integrates a Feature Adaptive Mixer (FAM) and develops a Depthwise Large Kernel Multi-Layer Perceptron model (DWLK-MLP) to enrich the global and local cropland boundaries features separately. Specifically, FAM enhances the boundary-aware method by adaptively acquiring high-frequency features through convolution and self-attention advantages, while DWLK-MLP further supplements boundary position information using a large receptive field. The efficacy of BAFormer has been evaluated on datasets including Vaihingen, Potsdam, LoveDA, and Mapcup. It demonstrates high performance, achieving mIoU scores of 84.5%, 87.3%, 53.5%, and 83.1% on these datasets, respectively. Notably, BAFormer-T (lightweight model) surpasses other lightweight models on the Vaihingen dataset with scores of 91.3% F1 and 84.1% mIoU. Full article
Show Figures

Figure 1

14 pages, 11264 KiB  
Article
Robust BEV 3D Object Detection for Vehicles with Tire Blow-Out
by Dongsheng Yang, Xiaojie Fan, Wei Dong, Chaosheng Huang and Jun Li
Sensors 2024, 24(14), 4446; https://doi.org/10.3390/s24144446 - 9 Jul 2024
Viewed by 342
Abstract
The bird’s-eye view (BEV) method, which is a vision-centric representation-based perception task, is essential and promising for future Autonomous Vehicle perception. It has advantages of fusion-friendly, intuitive, end-to-end optimization and is cheaper than LiDAR. The performance of existing BEV methods, however, would be [...] Read more.
The bird’s-eye view (BEV) method, which is a vision-centric representation-based perception task, is essential and promising for future Autonomous Vehicle perception. It has advantages of fusion-friendly, intuitive, end-to-end optimization and is cheaper than LiDAR. The performance of existing BEV methods, however, would be deteriorated under the situation of a tire blow-out. This is because they quite rely on accurate camera calibration which may be disabled by noisy camera parameters during blow-out. Therefore, it is extremely unsafe to use existing BEV methods in the tire blow-out situation. In this paper, we propose a geometry-guided auto-resizable kernel transformer (GARKT) method, which is designed especially for vehicles with tire blow-out. Specifically, we establish a camera deviation model for vehicles with tire blow-out. Then we use the geometric priors to attain the prior position in perspective view with auto-resizable kernels. The resizable perception areas are encoded and flattened to generate BEV representation. GARKT predicts the nuScenes detection score (NDS) with a value of 0.439 on a newly created blow-out dataset based on nuScenes. NDS can still obtain 0.431 when the tire is completely flat, which is much more robust compared to other transformer-based BEV methods. Moreover, the GARKT method has almost real-time computing speed, with about 20.5 fps on one GPU. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

Back to TopTop