Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (260)

Search Parameters:
Keywords = monocular detection

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 30735 KiB  
Article
A Cloud Detection System for UAV Sense and Avoid: Analysis of a Monocular Approach in Simulation and Flight Tests
by Adrian Dudek and Peter Stütz
Drones 2025, 9(1), 55; https://doi.org/10.3390/drones9010055 - 15 Jan 2025
Viewed by 402
Abstract
In order to contribute to the operation of unmanned aerial vehicles (UAVs) according to visual flight rules (VFR), this article proposes a monocular approach for cloud detection using an electro-optical sensor. Cloud avoidance is motivated by several factors, including improving visibility for collision [...] Read more.
In order to contribute to the operation of unmanned aerial vehicles (UAVs) according to visual flight rules (VFR), this article proposes a monocular approach for cloud detection using an electro-optical sensor. Cloud avoidance is motivated by several factors, including improving visibility for collision prevention and reducing the risks of icing and turbulence. The described workflow is based on parallelized detection, tracking and triangulation of features with prior segmentation of clouds in the image. As output, the system generates a cloud occupancy grid of the aircraft’s vicinity, which can be used for cloud avoidance calculations afterwards. The proposed methodology was tested in simulation and flight experiments. With the aim of developing cloud segmentation methods, datasets were created, one of which was made publicly available and features 5488 labeled, augmented cloud images from a real flight experiment. The trained segmentation models based on the YOLOv8 framework are able to separate clouds from the background even under challenging environmental conditions. For a performance analysis of the subsequent cloud position estimation stage, calculated and actual cloud positions are compared and feature evaluation metrics are applied. The investigations demonstrate the functionality of the approach, even if challenges become apparent under real flight conditions. Full article
(This article belongs to the Special Issue Flight Control and Collision Avoidance of UAVs)
Show Figures

Figure 1

17 pages, 22331 KiB  
Article
Depth Estimation Based on MMwave Radar and Camera Fusion with Attention Mechanisms and Multi-Scale Features for Autonomous Driving Vehicles
by Zhaohuan Zhu, Feng Wu, Wenqing Sun, Quanying Wu, Feng Liang and Wuhan Zhang
Electronics 2025, 14(2), 300; https://doi.org/10.3390/electronics14020300 - 13 Jan 2025
Viewed by 478
Abstract
Autonomous driving vehicles have strong path planning and obstacle avoidance capabilities, which provide great support to avoid traffic accidents. Autonomous driving has become a research hotspot worldwide. Depth estimation is a key technology in autonomous driving as it provides an important basis for [...] Read more.
Autonomous driving vehicles have strong path planning and obstacle avoidance capabilities, which provide great support to avoid traffic accidents. Autonomous driving has become a research hotspot worldwide. Depth estimation is a key technology in autonomous driving as it provides an important basis for accurately detecting traffic objects and avoiding collisions in advance. However, the current difficulties in depth estimation include insufficient estimation accuracy, difficulty in acquiring depth information using monocular vision, and an important challenge of fusing multiple sensors for depth estimation. To enhance depth estimation performance in complex traffic environments, this study proposes a depth estimation method in which point clouds and images obtained from MMwave radar and cameras are fused. Firstly, a residual network is established to extract the multi-scale features of the MMwave radar point clouds and the corresponding image obtained simultaneously from the same location. Correlations between the radar points and the image are established by fusing the extracted multi-scale features. A semi-dense depth estimation is achieved by assigning the depth value of the radar point to the most relevant image region. Secondly, a bidirectional feature fusion structure with additional fusion branches is designed to enhance the richness of the feature information. The information loss during the feature fusion process is reduced, and the robustness of the model is enhanced. Finally, parallel channel and position attention mechanisms are used to enhance the feature representation of the key areas in the fused feature map, the interference of irrelevant areas is suppressed, and the depth estimation accuracy is enhanced. The experimental results on the public dataset nuScenes show that, compared with the baseline model, the proposed method reduces the average absolute error (MAE) by 4.7–6.3% and the root mean square error (RMSE) by 4.2–5.2%. Full article
Show Figures

Figure 1

23 pages, 6144 KiB  
Article
Based on the Geometric Characteristics of Binocular Imaging for Yarn Remaining Detection
by Ke Le and Yanhong Yuan
Sensors 2025, 25(2), 339; https://doi.org/10.3390/s25020339 - 9 Jan 2025
Viewed by 318
Abstract
The automated detection of yarn margins is crucial for ensuring the continuity and quality of production in textile workshops. Traditional methods rely on workers visually inspecting the yarn margin to determine the timing of replacement; these methods fail to provide real-time data and [...] Read more.
The automated detection of yarn margins is crucial for ensuring the continuity and quality of production in textile workshops. Traditional methods rely on workers visually inspecting the yarn margin to determine the timing of replacement; these methods fail to provide real-time data and cannot meet the precise scheduling requirements of modern production. The complex environmental conditions in textile workshops, combined with the cylindrical shape and repetitive textural features of yarn bobbins, limit the application of traditional visual solutions. Therefore, we propose a visual measurement method based on the geometric characteristics of binocular imaging: First, all contours in the image are extracted, and the distance sequence between the contours and the centroid is extracted. This sequence is then matched with a predefined template to identify the contour information of the yarn bobbin. Additionally, four equations for the tangent line from the camera optical center to the edge points of the yarn bobbin contour are established, and the angle bisectors of each pair of tangents are found. By solving the system of equations for these two angle bisectors, their intersection point is determined, giving the radius of the yarn bobbin. This method overcomes the limitations of monocular vision systems, which lack depth information and suffer from size measurement errors due to the insufficient repeat positioning accuracy when patrolling back and forth. Next, to address the self-occlusion issues and matching difficulties during binocular system measurements caused by the yarn bobbin surface’s repetitive texture, an imaging model is established based on the yarn bobbin’s cylindrical characteristics. This avoids pixel-by-pixel matching in binocular vision and enables the accurate measurement of the remaining yarn margin. The experimental data show that the measurement method exhibits high precision within the recommended working distance range, with an average error of only 0.68 mm. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

29 pages, 10852 KiB  
Article
Resource-Exploration-Oriented Lunar Rocks Monocular Detection and 3D Pose Estimation
by Jiayu Suo, Hongfeng Long, Yuebo Ma, Yuhao Zhang, Zhen Liang, Chuan Yan and Rujin Zhao
Aerospace 2025, 12(1), 4; https://doi.org/10.3390/aerospace12010004 - 25 Dec 2024
Viewed by 381
Abstract
Lunar in situ resource utilization is a core goal in lunar exploration, with accurate lunar rock pose estimation being essential. To address the challenges posed by the lack of texture features and extreme lighting conditions, this study proposes the Simulation-YOLO-Hourglass-Transformer (SYHT) method. The [...] Read more.
Lunar in situ resource utilization is a core goal in lunar exploration, with accurate lunar rock pose estimation being essential. To address the challenges posed by the lack of texture features and extreme lighting conditions, this study proposes the Simulation-YOLO-Hourglass-Transformer (SYHT) method. The method enhances accuracy and robustness in complex lunar environments, demonstrating strong adaptability and excellent performance, particularly in conditions of extreme lighting and scarce texture. This approach provides valuable insights for object pose estimation in lunar exploration tasks and lays the foundation for lunar resource development. First, the YOLO-Hourglass-Transformer (YHT) network is used to extract keypoint information from each rock and generate the corresponding 3D pose. Then, a lunar surface imaging physics simulation model is employed to generate simulated lunar rock data for testing the method. The experimental results show that the SYHT method performs exceptionally well on simulated lunar rock data, achieving a mean per-joint position error (MPJPE) of 37.93 mm and a percentage of correct keypoints (PCK) of 99.94%, significantly outperforming existing methods. Finally, transfer learning experiments on real-world datasets validate its strong generalization capability, highlighting its effectiveness for lunar rock pose estimation in both simulated and real lunar environments. Full article
(This article belongs to the Section Astronautics & Space Science)
Show Figures

Figure 1

15 pages, 6614 KiB  
Article
Advancing Forest Plot Surveys: A Comparative Study of Visual vs. LiDAR SLAM Technologies
by Tianshuo Guan, Yuchen Shen, Yuankai Wang, Peidong Zhang, Rui Wang and Fei Yan
Forests 2024, 15(12), 2083; https://doi.org/10.3390/f15122083 - 26 Nov 2024
Viewed by 689
Abstract
Forest plot surveys are vital for monitoring forest resource growth, contributing to their sustainable development. The accuracy and efficiency of these surveys are paramount, making technological advancements such as Simultaneous Localization and Mapping (SLAM) crucial. This study investigates the application of SLAM technology, [...] Read more.
Forest plot surveys are vital for monitoring forest resource growth, contributing to their sustainable development. The accuracy and efficiency of these surveys are paramount, making technological advancements such as Simultaneous Localization and Mapping (SLAM) crucial. This study investigates the application of SLAM technology, utilizing LiDAR (Light Detection and Ranging) and monocular cameras, to enhance forestry plot surveys. Conducted in three 32 × 32 m plots within the Tibet Autonomous Region of China, the research compares the efficacy of LiDAR-based and visual SLAM algorithms in estimating tree parameters such as diameter at breast height (DBH), tree height, and position, alongside their adaptability to forest environments. The findings revealed that both types of algorithms achieved high precision in DBH estimation, with LiDAR SLAM presenting a root mean square error (RMSE) range of 1.4 to 1.96 cm and visual SLAM showing a slightly higher precision, with an RMSE of 0.72 to 0.85 cm. In terms of tree position accuracy, the three methods can achieve tree location measurements. LiDAR SLAM accurately represents the relative positions of trees, while the traditional and visual SLAM systems exhibit slight positional offsets for individual trees. However, discrepancies arose in tree height estimation accuracy, where visual SLAM exhibited a bias range from −0.55 to 0.19 m and an RMSE of 1.36 to 2.34 m, while LiDAR SLAM had a broader bias range and higher RMSE, especially for trees over 25 m, attributed to scanning angle limitations and branch occlusion. Moreover, the study highlights the comprehensive point cloud data generated by LiDAR SLAM, useful for calculating extensive tree parameters such as volume and carbon storage and Tree Information Modeling (TIM) through digital twin technology. In contrast, the sparser data from visual SLAM limits its use to basic parameter estimation. These insights underscore the effectiveness and precision of SLAM-based approaches in forestry plot surveys while also indicating distinct advantages and suitability of each method to different forest environments. The findings advocate for tailored survey strategies, aligning with specific forest conditions and requirements, enhancing the application of SLAM technology in forestry management and conservation efforts. Full article
(This article belongs to the Special Issue Integrated Measurements for Precision Forestry)
Show Figures

Figure 1

27 pages, 28012 KiB  
Article
A Model Development Approach Based on Point Cloud Reconstruction and Mapping Texture Enhancement
by Boyang You and Barmak Honarvar Shakibaei Asli
Big Data Cogn. Comput. 2024, 8(11), 164; https://doi.org/10.3390/bdcc8110164 - 20 Nov 2024
Viewed by 796
Abstract
To address the challenge of rapid geometric model development in the digital twin industry, this paper presents a comprehensive pipeline for constructing 3D models from images using monocular vision imaging principles. Firstly, a structure-from-motion (SFM) algorithm generates a 3D point cloud from photographs. [...] Read more.
To address the challenge of rapid geometric model development in the digital twin industry, this paper presents a comprehensive pipeline for constructing 3D models from images using monocular vision imaging principles. Firstly, a structure-from-motion (SFM) algorithm generates a 3D point cloud from photographs. The feature detection methods scale-invariant feature transform (SIFT), speeded-up robust features (SURF), and KAZE are compared across six datasets, with SIFT proving the most effective (matching rate higher than 0.12). Using K-nearest-neighbor matching and random sample consensus (RANSAC), refined feature point matching and 3D spatial representation are achieved via antipodal geometry. Then, the Poisson surface reconstruction algorithm converts the point cloud into a mesh model. Additionally, texture images are enhanced by leveraging a visual geometry group (VGG) network-based deep learning approach. Content images from a dataset provide geometric contours via higher-level VGG layers, while textures from style images are extracted using the lower-level layers. These are fused to create texture-transferred images, where the image quality assessment (IQA) metrics SSIM and PSNR are used to evaluate texture-enhanced images. Finally, texture mapping integrates the enhanced textures with the mesh model, improving the scene representation with enhanced texture. The method presented in this paper surpassed a LiDAR-based reconstruction approach by 20% in terms of point cloud density and number of model facets, while the hardware cost was only 1% of that associated with LiDAR. Full article
Show Figures

Figure 1

23 pages, 32729 KiB  
Article
PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous Vehicles
by Husnain Mushtaq, Xiaoheng Deng, Fizza Azhar, Mubashir Ali and Hafiz Husnain Raza Sherazi
Information 2024, 15(11), 739; https://doi.org/10.3390/info15110739 - 19 Nov 2024
Viewed by 1172
Abstract
Accurate 3D object detection is essential for autonomous driving, yet traditional LiDAR models often struggle with sparse point clouds. We propose perspective-aware hierarchical vision transformer-based LiDAR-camera fusion (PLC-Fusion) for 3D object detection to address this. This efficient, multi-modal 3D object detection framework integrates [...] Read more.
Accurate 3D object detection is essential for autonomous driving, yet traditional LiDAR models often struggle with sparse point clouds. We propose perspective-aware hierarchical vision transformer-based LiDAR-camera fusion (PLC-Fusion) for 3D object detection to address this. This efficient, multi-modal 3D object detection framework integrates LiDAR and camera data for improved performance. First, our method enhances LiDAR data by projecting them onto a 2D plane, enabling the extraction of object perspective features from a probability map via the Object Perspective Sampling (OPS) module. It incorporates a lightweight perspective detector, consisting of interconnected 2D and monocular 3D sub-networks, to extract image features and generate object perspective proposals by predicting and refining top-scored 3D candidates. Second, it leverages two independent transformers—CamViT for 2D image features and LidViT for 3D point cloud features. These ViT-based representations are fused via the Cross-Fusion module for hierarchical and deep representation learning, improving performance and computational efficiency. These mechanisms enhance the utilization of semantic features in a region of interest (ROI) to obtain more representative point features, leading to a more effective fusion of information from both LiDAR and camera sources. PLC-Fusion outperforms existing methods, achieving a mean average precision (mAP) of 83.52% and 90.37% for 3D and BEV detection, respectively. Moreover, PLC-Fusion maintains a competitive inference time of 0.18 s. Our model addresses computational bottlenecks by eliminating the need for dense BEV searches and global attention mechanisms while improving detection range and precision. Full article
(This article belongs to the Special Issue Emerging Research in Object Tracking and Image Segmentation)
Show Figures

Figure 1

20 pages, 1837 KiB  
Article
A Monocular Ranging Method for Ship Targets Based on Unmanned Surface Vessels in a Shaking Environment
by Zimu Wang, Xiunan Li, Peng Chen, Dan Luo, Gang Zheng and Xin Chen
Remote Sens. 2024, 16(22), 4220; https://doi.org/10.3390/rs16224220 - 12 Nov 2024
Viewed by 893
Abstract
Aiming to address errors in the estimation of the position and attitude of an unmanned vessel, especially during vibration, where the rapid loss of feature point information hinders continuous attitude estimation and global trajectory mapping, this paper improves the monocular ORB-SLAM framework based [...] Read more.
Aiming to address errors in the estimation of the position and attitude of an unmanned vessel, especially during vibration, where the rapid loss of feature point information hinders continuous attitude estimation and global trajectory mapping, this paper improves the monocular ORB-SLAM framework based on the characteristics of the marine environment. In general, we extract the location area of the artificial sea target in the video, build a virtual feature set for it, and filter the background features. When shaking occurs, GNSS information is combined and the target feature set is used to complete the map reconstruction task. Specifically, firstly, the sea target area of interest is detected by YOLOv5, and the feature extraction and matching method is optimized in the front-end tracking stage to adapt to the sea environment. In the key frame selection and local map optimization stage, the characteristics of the feature set are improved to further improve the positioning accuracy, to provide more accurate position and attitude information about the unmanned platform. We use GNSS information to provide the scale and world coordinates for the map. Finally, the target distance is measured by the beam ranging method. In this paper, marine unmanned platform data, GNSS, and AIS position data are autonomously collected, and experiments are carried out using the proposed marine ranging system. Experimental results show that the maximum measurement error of this method is 9.2%, and the average error is 4.7%. Full article
Show Figures

Figure 1

7 pages, 1849 KiB  
Proceeding Paper
Inverse Perspective Mapping Correction for Aiding Camera-Based Autonomous Driving Tasks
by Norbert Markó, Péter Kőrös and Miklós Unger
Eng. Proc. 2024, 79(1), 67; https://doi.org/10.3390/engproc2024079067 - 7 Nov 2024
Viewed by 388
Abstract
Inverse perspective mapping (IPM) is a crucial technique in camera-based autonomous driving, transforming the perspective view captured by the camera into a bird’s-eye view. This can be beneficial for accurate environmental perception, path planning, obstacle detection, and navigation. IPM faces challenges such as [...] Read more.
Inverse perspective mapping (IPM) is a crucial technique in camera-based autonomous driving, transforming the perspective view captured by the camera into a bird’s-eye view. This can be beneficial for accurate environmental perception, path planning, obstacle detection, and navigation. IPM faces challenges such as distortion and inaccuracies due to varying road inclinations and intrinsic camera properties. Herein, we revealed inaccuracies inherent in our current IPM approach so proper correction techniques can be applied later. We aimed to explore correction possibilities to enhance the accuracy of IPM and examine other methods that could be used as a benchmark or even a replacement, such as stereo vision and deep learning-based monocular depth estimation methods. With this work, we aimed to provide an analysis and direction for working with IPM. Full article
(This article belongs to the Proceedings of The Sustainable Mobility and Transportation Symposium 2024)
Show Figures

Figure 1

18 pages, 5155 KiB  
Article
Strabismus Detection in Monocular Eye Images for Telemedicine Applications
by Wattanapong Kurdthongmee, Lunla Udomvej, Arsanchai Sukkuea, Piyadhida Kurdthongmee, Chitchanok Sangeamwong and Chayanid Chanakarn
J. Imaging 2024, 10(11), 284; https://doi.org/10.3390/jimaging10110284 - 7 Nov 2024
Viewed by 963
Abstract
This study presents a novel method for the early detection of strabismus, a common eye misalignment disorder, with an emphasis on its application in telemedicine. The technique leverages synchronized eye movements to estimate the pupil location of one eye based on the other, [...] Read more.
This study presents a novel method for the early detection of strabismus, a common eye misalignment disorder, with an emphasis on its application in telemedicine. The technique leverages synchronized eye movements to estimate the pupil location of one eye based on the other, achieving close alignment in non-strabismic cases. Regression models for each eye are developed using advanced machine learning algorithms, and significant discrepancies between estimated and actual pupil positions indicate the presence of strabismus. This approach provides a non-invasive, efficient solution for early detection and bridges the gap between basic research and clinical care by offering an accessible, machine learning-based tool that facilitates timely intervention and improved outcomes in diverse healthcare settings. The potential for pediatric screening is discussed as a possible direction for future research. Full article
Show Figures

Figure 1

18 pages, 13017 KiB  
Article
DeployFusion: A Deployable Monocular 3D Object Detection with Multi-Sensor Information Fusion in BEV for Edge Devices
by Fei Huang, Shengshu Liu, Guangqian Zhang, Bingsen Hao, Yangkai Xiang and Kun Yuan
Sensors 2024, 24(21), 7007; https://doi.org/10.3390/s24217007 - 31 Oct 2024
Viewed by 824
Abstract
To address the challenges of suboptimal remote detection and significant computational burden in existing multi-sensor information fusion 3D object detection methods, a novel approach based on Bird’s-Eye View (BEV) is proposed. This method utilizes an enhanced lightweight EdgeNeXt feature extraction network, incorporating residual [...] Read more.
To address the challenges of suboptimal remote detection and significant computational burden in existing multi-sensor information fusion 3D object detection methods, a novel approach based on Bird’s-Eye View (BEV) is proposed. This method utilizes an enhanced lightweight EdgeNeXt feature extraction network, incorporating residual branches to address network degradation caused by the excessive depth of STDA encoding blocks. Meantime, deformable convolution is used to expand the receptive field and reduce computational complexity. The feature fusion module constructs a two-stage fusion network to optimize the fusion and alignment of multi-sensor features. This network aligns image features to supplement environmental information with point cloud features, thereby obtaining the final BEV features. Additionally, a Transformer decoder that emphasizes global spatial cues is employed to process the BEV feature sequence, enabling precise detection of distant small objects. Experimental results demonstrate that this method surpasses the baseline network, with improvements of 4.5% in the NuScenes detection score and 5.5% in average precision for detection objects. Finally, the model is converted and accelerated using TensorRT tools for deployment on mobile devices, achieving an inference time of 138 ms per frame on the Jetson Orin NX embedded platform, thus enabling real-time 3D object detection. Full article
(This article belongs to the Special Issue AI-Driving for Autonomous Vehicles)
Show Figures

Figure 1

25 pages, 11107 KiB  
Article
Joint Optimization of the 3D Model and 6D Pose for Monocular Pose Estimation
by Liangchao Guo, Lin Chen, Qiufu Wang, Zhuo Zhang and Xiaoliang Sun
Drones 2024, 8(11), 626; https://doi.org/10.3390/drones8110626 - 30 Oct 2024
Viewed by 612
Abstract
The autonomous landing of unmanned aerial vehicles (UAVs) relies on a precise relative 6D pose between platforms. Existing model-based monocular pose estimation methods need an accurate 3D model of the target. They cannot handle the absence of an accurate 3D model. This paper [...] Read more.
The autonomous landing of unmanned aerial vehicles (UAVs) relies on a precise relative 6D pose between platforms. Existing model-based monocular pose estimation methods need an accurate 3D model of the target. They cannot handle the absence of an accurate 3D model. This paper adopts the multi-view geometry constraints within the monocular image sequence to solve the problem. And a novel approach to monocular pose estimation is introduced, which jointly optimizes the target’s 3D model and the relative 6D pose. We propose to represent the target’s 3D model using a set of sparse 3D landmarks. The 2D landmarks are detected in the input image by a trained neural network. Based on the 2D–3D correspondences, the initial pose estimation is obtained by solving the PnP problem. To achieve joint optimization, this paper builds the objective function based on the minimization of the reprojection error. And the correction values of the 3D landmarks and the 6D pose are parameters to be solved in the optimization problem. By solving the optimization problem, the joint optimization of the target’s 3D model and the 6D pose is realized. In addition, a sliding window combined with a keyframe extraction strategy is adopted to speed up the algorithm processing. Experimental results on synthetic and real image sequences show that the proposed method achieves real-time and online high-precision monocular pose estimation with the absence of an accurate 3D model via the joint optimization of the target’s 3D model and pose. Full article
Show Figures

Figure 1

16 pages, 3470 KiB  
Article
YOLOv8-Based Estimation of Estrus in Sows Through Reproductive Organ Swelling Analysis Using a Single Camera
by Iyad Almadani, Mohammed Abuhussein and Aaron L. Robinson
Digital 2024, 4(4), 898-913; https://doi.org/10.3390/digital4040044 - 27 Oct 2024
Viewed by 984
Abstract
Accurate and efficient estrus detection in sows is crucial in modern agricultural practices to ensure optimal reproductive health and successful breeding outcomes. A non-contact method using computer vision to detect a change in a sow’s vulva size holds great promise for automating and [...] Read more.
Accurate and efficient estrus detection in sows is crucial in modern agricultural practices to ensure optimal reproductive health and successful breeding outcomes. A non-contact method using computer vision to detect a change in a sow’s vulva size holds great promise for automating and enhancing this critical process. However, achieving precise and reliable results depends heavily on maintaining a consistent camera distance during image capture. Variations in camera distance can lead to erroneous estrus estimations, potentially resulting in missed breeding opportunities or false positives. To address this challenge, we propose a robust six-step methodology, accompanied by three stages of evaluation. First, we carefully annotated masks around the vulva to ensure an accurate pixel perimeter calculation of its shape. Next, we meticulously identified keypoints on the sow’s vulva, which enabled precise tracking and analysis of its features. We then harnessed the power of machine learning to train our model using annotated images, which facilitated keypoint detection and segmentation with the state-of-the-art YOLOv8 algorithm. By identifying the keypoints, we performed precise calculations of the Euclidean distances: first, between each labium (horizontal distance), and second, between the clitoris and the perineum (vertical distance). Additionally, by segmenting the vulva’s size, we gained valuable insights into its shape, which helped with performing precise perimeter measurements. Equally important was our effort to calibrate the camera using monocular depth estimation. This calibration helped establish a functional relationship between the measurements on the image (such as the distances between the labia and from the clitoris to the perineum, and the vulva perimeter) and the depth distance to the camera, which enabled accurate adjustments and calibration for our analysis. Lastly, we present a classification method for distinguishing between estrus and non-estrus states in subjects based on the pixel width, pixel length, and perimeter measurements. The method calculated the Euclidean distances between a new data point and reference points from two datasets: “estrus data” and “not estrus data”. Using custom distance functions, we computed the distances for each measurement dimension and aggregated them to determine the overall similarity. The classification process involved identifying the three nearest neighbors of the datasets and employing a majority voting mechanism to assign a label. A new data point was classified as “estrus” if the majority of the nearest neighbors were labeled as estrus; otherwise, it was classified as “non-estrus”. This method provided a robust approach for automated classification, which aided in more accurate and efficient detection of the estrus states. To validate our approach, we propose three evaluation stages. In the first stage, we calculated the Mean Squared Error (MSE) between the ground truth keypoints of the labia distance and the distance between the predicted keypoints, and we performed the same calculation for the distance between the clitoris and perineum. Then, we provided a quantitative analysis and performance comparison, including a comparison between our previous U-Net model and our new YOLOv8 segmentation model. This comparison focused on each model’s performance in terms of accuracy and speed, which highlighted the advantages of our new approach. Lastly, we evaluated the estrus–not-estrus classification model by defining the confusion matrix. By using this comprehensive approach, we significantly enhanced the accuracy of estrus detection in sows while effectively mitigating human errors and resource wastage. The automation and optimization of this critical process hold the potential to revolutionize estrus detection in agriculture, which will contribute to improved reproductive health management and elevate breeding outcomes to new heights. Through extensive evaluation and experimentation, our research aimed to demonstrate the transformative capabilities of computer vision techniques, paving the way for more advanced and efficient practices in the agricultural domain. Full article
Show Figures

Figure 1

20 pages, 31052 KiB  
Article
Spatiotemporal Information, Near-Field Perception, and Service for Tourists by Distributed Camera and BeiDou Positioning System in Mountainous Scenic Areas
by Kuntao Shi, Changming Zhu, Junli Li, Xin Zhang, Fan Yang, Kun Zhang and Qian Shen
ISPRS Int. J. Geo-Inf. 2024, 13(10), 370; https://doi.org/10.3390/ijgi13100370 - 20 Oct 2024
Cited by 1 | Viewed by 944
Abstract
The collaborative use of camera near-field sensors for monitoring the number and status of tourists is a crucial aspect of smart scenic spot management. This paper proposes a near-field perception technical system that achieves dynamic and accurate detection of tourist targets in mountainous [...] Read more.
The collaborative use of camera near-field sensors for monitoring the number and status of tourists is a crucial aspect of smart scenic spot management. This paper proposes a near-field perception technical system that achieves dynamic and accurate detection of tourist targets in mountainous scenic areas, addressing the challenges of real-time passive perception and safety management of tourists. The technical framework involves the following steps: Firstly, real-time video stream signals are collected from multiple cameras to create a distributed perception network. Then, the YOLOX network model is enhanced with the CBAM module and ASFF method to improve the dynamic recognition of preliminary tourist targets in complex scenes. Additionally, the BYTE target dynamic tracking algorithm is employed to address the issue of target occlusion in mountainous scenic areas, thereby enhancing the accuracy of model detection. Finally, the video target monocular spatial positioning algorithm is utilized to determine the actual geographic location of tourists based on the image coordinates. The algorithm was deployed in the Tianmeng Scenic Area of Yimeng Mountain in Shandong Province, and the results demonstrate that this technical system effectively assists in accurately perceiving and spatially positioning tourists in mountainous scenic spots. The system demonstrates an overall accuracy in tourist perception of over 90%, with spatial positioning errors less than 1.0 m and a root mean square error (RMSE) of less than 1.14. This provides auxiliary technical support and effective data support for passive real-time dynamic precise perception and safety management of regional tourist targets in mountainous scenic areas with no/weak satellite navigation signals. Full article
Show Figures

Figure 1

11 pages, 2023 KiB  
Article
Monocular 3D Multi-Person Pose Estimation for On-Site Joint Flexion Assessment: A Case of Extreme Knee Flexion Detection
by Guihai Yan, Haofeng Yan, Zhidong Yao, Zhongliang Lin, Gang Wang, Changyong Liu and Xincong Yang
Sensors 2024, 24(19), 6187; https://doi.org/10.3390/s24196187 - 24 Sep 2024
Viewed by 933
Abstract
Work-related musculoskeletal disorders (WMSDs) represent a significant health challenge for workers in construction environments, often arising from prolonged exposure to ergonomic risks associated with manual labor, awkward postures, and repetitive motions. These conditions not only lead to diminished worker productivity but also incur [...] Read more.
Work-related musculoskeletal disorders (WMSDs) represent a significant health challenge for workers in construction environments, often arising from prolonged exposure to ergonomic risks associated with manual labor, awkward postures, and repetitive motions. These conditions not only lead to diminished worker productivity but also incur substantial economic costs for employers and healthcare systems alike. Thus, there is an urgent need for effective tools to assess and mitigate these ergonomic risks. This study proposes a novel monocular 3D multi-person pose estimation method designed to enhance ergonomic risk assessments in construction environments. Leveraging advanced computer vision and deep learning techniques, this approach accurately captures and analyzes the spatial dynamics of workers’ postures, with a focus on detecting extreme knee flexion, a critical indicator of work-related musculoskeletal disorders (WMSDs). A pilot study conducted on an actual construction site demonstrated the method’s feasibility and effectiveness, achieving an accurate detection rate for extreme flexion incidents that closely aligned with supervisory observations and worker self-reports. The proposed monocular approach enables universal applicability and enhances ergonomic analysis through 3D pose estimation and group pose recognition for timely interventions. Future efforts will focus on improving robustness and integration with health monitoring to reduce WMSDs and promote worker health. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Back to TopTop