Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (413)

Search Parameters:
Keywords = graphics processing units (GPU)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 9154 KiB  
Article
Turbulent Flow Through Sluice Gate and Weir Using Smoothed Particle Hydrodynamics: Evaluation of Turbulence Models, Boundary Conditions, and 3D Effects
by Efstathios Chatzoglou and Antonios Liakopoulos
Water 2025, 17(2), 152; https://doi.org/10.3390/w17020152 - 8 Jan 2025
Viewed by 408
Abstract
Understanding flow dynamics around hydraulic structures is essential for optimizing water management systems and predicting flow behavior in real-world applications. In this study, we simulate a 3D flow control system featuring a sluice gate and a weir, commonly used in hydraulic engineering. The [...] Read more.
Understanding flow dynamics around hydraulic structures is essential for optimizing water management systems and predicting flow behavior in real-world applications. In this study, we simulate a 3D flow control system featuring a sluice gate and a weir, commonly used in hydraulic engineering. The focus is on accurately incorporating modified dynamic boundary conditions (mDBCs) and viscosity treatment to improve the simulation of complex, turbulent flows. We assess the performance of the Smoothed Particle Hydrodynamics (SPH) method in handling these challenging conditions. Especially when the boundary conditions and applicability to industry are two of the SPH method’s grand challenges. Simulations were conducted on a Graphics Processing Unit (GPU) using the DualSPHysics code. The results were compared to theoretical predictions and experimental data found in the literature. Key hydraulic characteristics, including 3D flow effects, hydraulic jump formation, and turbulent behavior, are examined. The combination of mDBCs with the Laminar plus sub-particle scale turbulence model achieved the correct simulation results. The findings demonstrate agreement between simulations, theoretical predictions, and experimental results. This work provides a reliable framework for analyzing turbulent flows in hydraulic structures and can be used as reference data or a prototype for larger-scale simulations in both research and engineering design, particularly in contexts requiring robust and precise flow control and/or environmental management. Full article
(This article belongs to the Special Issue Hydrodynamic Science Experiments and Simulations)
Show Figures

Figure 1

17 pages, 3121 KiB  
Article
Real-Time Radar Classification Based on Software-Defined Radio Platforms: Enhancing Processing Speed and Accuracy with Graphics Processing Unit Acceleration
by Seckin Oncu, Mehmet Karakaya, Yaser Dalveren, Ali Kara and Mohammad Derawi
Sensors 2024, 24(23), 7776; https://doi.org/10.3390/s24237776 - 4 Dec 2024
Viewed by 683
Abstract
This paper presents a comprehensive evaluation of real-time radar classification using software-defined radio (SDR) platforms. The transition from analog to digital technologies, facilitated by SDR, has revolutionized radio systems, offering unprecedented flexibility and reconfigurability through software-based operations. This advancement complements the role of [...] Read more.
This paper presents a comprehensive evaluation of real-time radar classification using software-defined radio (SDR) platforms. The transition from analog to digital technologies, facilitated by SDR, has revolutionized radio systems, offering unprecedented flexibility and reconfigurability through software-based operations. This advancement complements the role of radar signal parameters, encapsulated in the pulse description words (PDWs), which play a pivotal role in electronic support measure (ESM) systems, enabling the detection and classification of threat radars. This study proposes an SDR-based radar classification system that achieves real-time operation with enhanced processing speed. Employing the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm as a robust classifier, the system harnesses Graphical Processing Unit (GPU) parallelization for efficient radio frequency (RF) parameter extraction. The experimental results highlight the efficiency of this approach, demonstrating a notable improvement in processing speed while operating at a sampling rate of up to 200 MSps and achieving an accuracy of 89.7% for real-time radar classification. Full article
(This article belongs to the Section Radar Sensors)
Show Figures

Figure 1

23 pages, 8542 KiB  
Article
Graphics Processing Unit-Accelerated Propeller Computational Fluid Dynamics Using AmgX: Performance Analysis Across Mesh Types and Hardware Configurations
by Yue Zhu, Jin Gan, Yongshui Lin and Weiguo Wu
J. Mar. Sci. Eng. 2024, 12(12), 2134; https://doi.org/10.3390/jmse12122134 - 22 Nov 2024
Viewed by 572
Abstract
Computational fluid dynamics (CFD) has become increasingly prevalent in marine and offshore engineering, with enhancing simulation efficiency emerging as a critical challenge. This study systematically evaluates the application of graphics processing unit (GPU) acceleration technology in CFD simulation of propeller open water performance. [...] Read more.
Computational fluid dynamics (CFD) has become increasingly prevalent in marine and offshore engineering, with enhancing simulation efficiency emerging as a critical challenge. This study systematically evaluates the application of graphics processing unit (GPU) acceleration technology in CFD simulation of propeller open water performance. Numerical simulations of the VP1304 propeller model were performed using OpenFOAM v2312 integrated with the NVIDIA AmgX library. The research compared GPU acceleration performance against conventional CPU methods across various hardware configurations and mesh types (tetrahedral, hexahedral-dominant, and polyhedral). Results demonstrate that GPU acceleration significantly improved computational efficiency, with tetrahedral meshes achieving over 400% speedup in a 4-GPU configuration, while polyhedral meshes reached over 500% speedup with a fixed mesh count. Among the mesh types, hexahedral-dominant meshes performed best in capturing flow field details. The study also found that GPU acceleration does not compromise simulation accuracy, but its effectiveness is closely related to mesh type and hardware configuration. Notably, GPUs demonstrate more significant advantages when handling large-scale problems. These findings have important practical implications for improving propeller design processes and shortening product development cycles. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

20 pages, 5217 KiB  
Article
A Real-Time Signal Measurement System Using FPGA-Based Deep Learning Accelerators and Microwave Photonic
by Longlong Zhang, Tong Zhou, Jie Yang, Yin Li, Zhiwen Zhang, Xiang Hu and Yuanxi Peng
Remote Sens. 2024, 16(23), 4358; https://doi.org/10.3390/rs16234358 - 22 Nov 2024
Viewed by 673
Abstract
Deep learning techniques have been widely investigated as an effective method for signal measurement in recent years. However, most existing deep learning-based methods still face difficulty in deploying on embedded platforms and perform poorly in real-time applications. To address this, this paper develops [...] Read more.
Deep learning techniques have been widely investigated as an effective method for signal measurement in recent years. However, most existing deep learning-based methods still face difficulty in deploying on embedded platforms and perform poorly in real-time applications. To address this, this paper develops two accelerators, as the core of the signal measurement system, for intelligent signal processing. Firstly, by introducing the idea of automated framework, we propose a simplest deep neural network (DNN)-based hardware structure, which automatically maps algorithms to hardware modules, supports configurable parameters, and has the advantage of low latency, with an average inference time of only 3.5 μs. Subsequently, another accelerator is designed with the efficient hardware structure of the long short-term memory (LSTM) + DNN model, demonstrating outstanding performance with a classification accuracy of 98.82%, mean absolute error (MAE) of 0.27°, and root mean square errors (RMSE) of 0.392° after model compression. Moreover, parallel optimization strategies are exploited to further reduce latency and support simultaneous frequency and direction measurement tasks. Finally, we test the actual collected signal data on the XCVU13P field programmable gate array (FPGA). The results show that the time of inference saves 28–31% for the DNN model and 71–73% for the LSTM + DNN model compared to running on graphic processing unit (GPU). In addition, the parallel strategies further decrease the delay by 23.9% and 37.5% when processing continuous data. The FPGA-based and deep learning-assisted hardware accelerators significantly improve real-time performance and provide a promising solution for signal measurement. Full article
Show Figures

Figure 1

20 pages, 3466 KiB  
Article
Symmetric Tridiagonal Eigenvalue Solver Across CPU Graphics Processing Unit (GPU) Nodes
by Erika Hernández-Rubio, Alberto Estrella-Cruz, Amilcar Meneses-Viveros, Jorge Alberto Rivera-Rivera, Liliana Ibeth Barbosa-Santillán and Sergio Víctor Chapa-Vergara
Appl. Sci. 2024, 14(22), 10716; https://doi.org/10.3390/app142210716 - 19 Nov 2024
Viewed by 649
Abstract
In this work, an improved and scalable implementation of Cuppen’s algorithm for diagonalizing symmetric tridiagonal matrices is presented. This approach uses a hybrid-heterogeneous parallelization technique, taking advantage of GPU and CPU in a distributed hardware architecture. Cuppen’s algorithm is a theoretical concept and [...] Read more.
In this work, an improved and scalable implementation of Cuppen’s algorithm for diagonalizing symmetric tridiagonal matrices is presented. This approach uses a hybrid-heterogeneous parallelization technique, taking advantage of GPU and CPU in a distributed hardware architecture. Cuppen’s algorithm is a theoretical concept and a powerful tool in various scientific and engineering applications. It is a key player in matrix diagonalization, finding its use in Functional Density Theory (FDT) and Spectral Clustering. This highly efficient and numerically stable algorithm computes eigenvalues and eigenvectors of symmetric tridiagonal matrices, making it a crucial component in many computational methods. One of the challenges in parallelizing algorithms for GPUs is their limited memory capacity. However, we overcome this limitation by utilizing multiple nodes with both CPUs and GPUs. This enables us to solve subproblems that fit within the memory of each device in parallel and subsequently combine these subproblems to obtain the complete solution. The hybrid-heterogeneous approach proposed in this work outperforms the state-of-the-art libraries and also maintains a high degree of accuracy in terms of orthogonality and quality of eigenvectors. Furthermore, the sequential version of the algorithm with our approach in this work demonstrates superior performance and potential for practical use. In the experiments carried out, it was possible to verify that the performance of the implementation that was carried out scales by 2× using two graphic cards in the same node. Notably, Symmetric Tridiagonal Eigenvalue Solvers are fundamental to solving more general eigenvalue problems. Additionally, the divide-and-conquer approach employed in this implementation can be extended to singular value solvers. Given the wide range of eigenvalue problems encountered in scientific and engineering domains, this work is essential in advancing computational methods for efficient and accurate matrix diagonalization. Full article
Show Figures

Figure 1

34 pages, 1063 KiB  
Review
A Survey on Design Space Exploration Approaches for Approximate Computing Systems
by Sepide Saeedi, Ali Piri, Bastien Deveautour, Ian O’Connor, Alberto Bosio, Alessandro Savino and Stefano Di Carlo
Electronics 2024, 13(22), 4442; https://doi.org/10.3390/electronics13224442 - 13 Nov 2024
Viewed by 910
Abstract
Approximate Computing (AxC) has emerged as a promising paradigm to enhance performance and energy efficiency by allowing a controlled trade-off between accuracy and resource consumption. It is extensively adopted across various abstraction levels, from software to architecture and circuit levels, employing diverse methodologies. [...] Read more.
Approximate Computing (AxC) has emerged as a promising paradigm to enhance performance and energy efficiency by allowing a controlled trade-off between accuracy and resource consumption. It is extensively adopted across various abstraction levels, from software to architecture and circuit levels, employing diverse methodologies. The primary objective of AxC is to reduce energy consumption for executing error-resilient applications, accepting controlled and inherently acceptable output quality degradation. However, harnessing AxC poses several challenges, including identifying segments within a design amenable to approximation and selecting suitable AxC techniques to fulfill accuracy and performance criteria. This survey provides a comprehensive review of recent methodologies proposed for performing Design Space Exploration (DSE) to find the most suitable AxC techniques, focusing on both hardware and software implementations. DSE is a crucial design process where system designs are modeled, evaluated, and optimized for various extra-functional system behaviors such as performance, power consumption, energy efficiency, and accuracy. A systematic literature review was conducted to identify papers that ascribe their DSE algorithms, excluding those relying on exhaustive search methods. This survey aims to detail the state-of-the-art DSE methodologies that efficiently select AxC techniques, offering insights into their applicability across different hardware platforms and use-case domains. For this purpose, papers were categorized based on the type of search algorithm used, with Machine Learning (ML) and Evolutionary Algorithms (EAs) being the predominant approaches. Further categorization is based on the target hardware, including Field-Programmable Gate Arrays (FPGAs), Application-Specific Integrated Circuits (ASICs), general-purpose Central Processing Units (CPUs), and Graphics Processing Units (GPUs). A notable observation was that most studies targeted image processing applications due to their tolerance for accuracy loss. By providing an overview of techniques and methods outlined in existing literature pertaining to the DSE of AxC designs, this survey elucidates the current trends and challenges in optimizing approximate designs. Full article
Show Figures

Figure 1

19 pages, 5545 KiB  
Article
Edge Computing for AI-Based Brain MRI Applications: A Critical Evaluation of Real-Time Classification and Segmentation
by Khuhed Memon, Norashikin Yahya, Mohd Zuki Yusoff, Rabani Remli, Aida-Widure Mustapha Mohd Mustapha, Hilwati Hashim, Syed Saad Azhar Ali and Shahabuddin Siddiqui
Sensors 2024, 24(21), 7091; https://doi.org/10.3390/s24217091 - 4 Nov 2024
Viewed by 1290
Abstract
Medical imaging plays a pivotal role in diagnostic medicine with technologies like Magnetic Resonance Imagining (MRI), Computed Tomography (CT), Positron Emission Tomography (PET), and ultrasound scans being widely used to assist radiologists and medical experts in reaching concrete diagnosis. Given the recent massive [...] Read more.
Medical imaging plays a pivotal role in diagnostic medicine with technologies like Magnetic Resonance Imagining (MRI), Computed Tomography (CT), Positron Emission Tomography (PET), and ultrasound scans being widely used to assist radiologists and medical experts in reaching concrete diagnosis. Given the recent massive uplift in the storage and processing capabilities of computers, and the publicly available big data, Artificial Intelligence (AI) has also started contributing to improving diagnostic radiology. Edge computing devices and handheld gadgets can serve as useful tools to process medical data in remote areas with limited network and computational resources. In this research, the capabilities of multiple platforms are evaluated for the real-time deployment of diagnostic tools. MRI classification and segmentation applications developed in previous studies are used for testing the performance using different hardware and software configurations. Cost–benefit analysis is carried out using a workstation with a NVIDIA Graphics Processing Unit (GPU), Jetson Xavier NX, Raspberry Pi 4B, and Android phone, using MATLAB, Python, and Android Studio. The mean computational times for the classification app on the PC, Jetson Xavier NX, and Raspberry Pi are 1.2074, 3.7627, and 3.4747 s, respectively. On the low-cost Android phone, this time is observed to be 0.1068 s using the Dynamic Range Quantized TFLite version of the baseline model, with slight degradation in accuracy. For the segmentation app, the times are 1.8241, 5.2641, 6.2162, and 3.2023 s, respectively, when using JPEG inputs. The Jetson Xavier NX and Android phone stand out as the best platforms due to their compact size, fast inference times, and affordability. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

27 pages, 4031 KiB  
Article
Polarization Characteristics of Massive HVI Debris Clouds Using an Improved Monte Carlo Ray Tracing Method for Remote Sensing Applications
by Guangsen Liu, Peng Rao, Yao Li and Wen Sun
Remote Sens. 2024, 16(16), 2925; https://doi.org/10.3390/rs16162925 - 9 Aug 2024
Viewed by 1071
Abstract
As a signature phenomenon of massive hypervelocity impacts (HVIs) in space, debris clouds provide critical optical information for satellite remote sensing and the assessment of large-scale impacts. However, studies of the optical scattering properties of debris clouds remain limited, and existing vector radiative [...] Read more.
As a signature phenomenon of massive hypervelocity impacts (HVIs) in space, debris clouds provide critical optical information for satellite remote sensing and the assessment of large-scale impacts. However, studies of the optical scattering properties of debris clouds remain limited, and existing vector radiative transfer (VRT) methods struggle to accurately simulate the optical characteristics of these complex scatterers. To address this gap, this paper presents an improved Monte Carlo VRT program (PGS–MC) for multicomponent polydisperse scatterers to precisely evaluate the radiation and polarization characteristics of complex scatterers. Based on the Monte Carlo ray tracing (MCRT) method, our program introduces a particle grouping strategy (PGS) to further emphasize the importance of accounting for optical property discrepancies between different materials and particle sizes, thus significantly improving the fidelity of VRT simulations. Moreover, our program, developed using the compute unified device architecture (CUDA), can be run parallelly on graphics processing units (GPUs), which effectively reduces the computational time. The validation results indicated that the developed PGS–MC program can accurately and efficiently simulate the polarization of complex 3D scatterers. A further investigation showed that the polarization characteristics of debris clouds are highly sensitive to parameters such as the angle between the incident and detection directions, number density, particle size distribution, debris material, and wavelength. In addition, the polarization imaging of debris clouds offers distinct advantages over intensity imaging. This study offers guidance for analyzing the VRT properties of massive HVI debris clouds. Additionally, it provides a practical tool and concrete ideas for modeling the polarization characteristics of various complex scatterers, such as aircraft contrails and clouds, etc. Full article
Show Figures

Figure 1

19 pages, 48324 KiB  
Article
An Efficient and Accurate Ground-Based Synthetic Aperture Radar (GB-SAR) Real-Time Imaging Scheme Based on Parallel Processing Mode and Architecture
by Yunxin Tan, Guangju Li, Chun Zhang and Weiming Gan
Electronics 2024, 13(16), 3138; https://doi.org/10.3390/electronics13163138 - 8 Aug 2024
Viewed by 1263
Abstract
When performing high-resolution imaging with ground-based synthetic aperture radar (GB-SAR) systems, the data collected and processed are vast and complex, imposing higher demands on the real-time performance and processing efficiency of the imaging system. Yet a very limited number of studies have been [...] Read more.
When performing high-resolution imaging with ground-based synthetic aperture radar (GB-SAR) systems, the data collected and processed are vast and complex, imposing higher demands on the real-time performance and processing efficiency of the imaging system. Yet a very limited number of studies have been conducted on the real-time processing method of GB-SAR monitoring data. This paper proposes a real-time imaging scheme based on parallel processing models, optimizing each step of the traditional ωK imaging algorithm in parallel. Several parallel optimization schemes are proposed for the computationally intensive and complex interpolation part, including dynamic parallelism, the Group-Nstream processing model, and the Fthread-Group-Nstream processing model. The Fthread-Group-Nstream processing model utilizes FthreadGroup, and Nstream for the finer-grained processing of monitoring data, reducing the impact of the nested depth on the algorithm’s performance in dynamic parallelism and alleviating the issue of serial execution within the Group-Nstream processing model. This scheme has been successfully applied in a synthetic aperture radar imaging system, achieving excellent imaging results and accuracy. The speedup ratio can reach 52.14, and the relative errors in amplitude and phase are close to 0, validating the effectiveness and practicality of the proposed schemes. This paper addresses the lack of research on the real-time processing of GB-SAR monitoring data, providing a reliable monitoring method for GB-SAR deformation monitoring. Full article
(This article belongs to the Topic Radar Signal and Data Processing with Applications)
Show Figures

Figure 1

24 pages, 8434 KiB  
Article
A Fast Inverse Synthetic Aperture Radar Imaging Scheme Combining GPU-Accelerated Shooting and Bouncing Ray and Back Projection Algorithm under Wide Bandwidths and Angles
by Jiongming Chen, Pengju Yang, Rong Zhang and Rui Wu
Electronics 2024, 13(15), 3062; https://doi.org/10.3390/electronics13153062 - 2 Aug 2024
Viewed by 1027
Abstract
Inverse synthetic aperture radar (ISAR) imaging techniques are frequently used in target classification and recognition applications, due to its capability to produce high-resolution images for moving targets. In order to meet the demand of ISAR imaging for electromagnetic calculation with high efficiency and [...] Read more.
Inverse synthetic aperture radar (ISAR) imaging techniques are frequently used in target classification and recognition applications, due to its capability to produce high-resolution images for moving targets. In order to meet the demand of ISAR imaging for electromagnetic calculation with high efficiency and accuracy, a novel accelerated shooting and bouncing ray (SBR) method is presented by combining a Graphics Processing Unit (GPU) and Bounding Volume Hierarchies (BVH) tree structure. To overcome the problem of unfocused images by a Fourier-based ISAR procedure under wide-angle and wide-bandwidth conditions, an efficient parallel back projection (BP) imaging algorithm is developed by utilizing the GPU acceleration technique. The presented GPU-accelerated SBR is validated by comparison with the RL-GO method in commercial software FEKO v2020. For ISAR images, it is clearly indicated that strong scattering centers as well as target profiles can be observed under large observation azimuth angles, Δφ=90°, and wide bandwidths, 3 GHz. It is also indicated that ISAR imaging is heavily sensitive to observation angles. In addition, obvious sidelobes can be observed, due to the phase history of the electromagnetic wave being distorted resulting from multipole scattering. Simulation results confirm the feasibility and efficiency of our scheme by combining GPU-accelerated SBR with the BP algorithm for fast ISAR imaging simulation under wide-angle and wide-bandwidth conditions. Full article
(This article belongs to the Special Issue Microwave Imaging and Applications)
Show Figures

Figure 1

34 pages, 525 KiB  
Review
A Review of Recent Hardware and Software Advances in GPU-Accelerated Edge-Computing Single-Board Computers (SBCs) for Computer Vision
by Umair Iqbal, Tim Davies and Pascal Perez
Sensors 2024, 24(15), 4830; https://doi.org/10.3390/s24154830 - 25 Jul 2024
Cited by 2 | Viewed by 2469
Abstract
Computer Vision (CV) has become increasingly important for Single-Board Computers (SBCs) due to their widespread deployment in addressing real-world problems. Specifically, in the context of smart cities, there is an emerging trend of developing end-to-end video analytics solutions designed to address urban challenges [...] Read more.
Computer Vision (CV) has become increasingly important for Single-Board Computers (SBCs) due to their widespread deployment in addressing real-world problems. Specifically, in the context of smart cities, there is an emerging trend of developing end-to-end video analytics solutions designed to address urban challenges such as traffic management, disaster response, and waste management. However, deploying CV solutions on SBCs presents several pressing challenges (e.g., limited computation power, inefficient energy management, and real-time processing needs) hindering their use at scale. Graphical Processing Units (GPUs) and software-level developments have emerged recently in addressing these challenges to enable the elevated performance of SBCs; however, it is still an active area of research. There is a gap in the literature for a comprehensive review of such recent and rapidly evolving advancements on both software and hardware fronts. The presented review provides a detailed overview of the existing GPU-accelerated edge-computing SBCs and software advancements including algorithm optimization techniques, packages, development frameworks, and hardware deployment specific packages. This review provides a subjective comparative analysis based on critical factors to help applied Artificial Intelligence (AI) researchers in demonstrating the existing state of the art and selecting the best suited combinations for their specific use-case. At the end, the paper also discusses potential limitations of the existing SBCs and highlights the future research directions in this domain. Full article
Show Figures

Figure 1

28 pages, 11142 KiB  
Article
Real-Time Registration of Unmanned Aerial Vehicle Hyperspectral Remote Sensing Images Using an Acousto-Optic Tunable Filter Spectrometer
by Hong Liu, Bingliang Hu, Xingsong Hou, Tao Yu, Zhoufeng Zhang, Xiao Liu, Jiacheng Liu and Xueji Wang
Drones 2024, 8(7), 329; https://doi.org/10.3390/drones8070329 - 17 Jul 2024
Viewed by 1239
Abstract
Differences in field of view may occur during unmanned aerial remote sensing imaging applications with acousto-optic tunable filter (AOTF) spectral imagers using zoom lenses. These differences may stem from image size deformation caused by the zoom lens, image drift caused by AOTF wavelength [...] Read more.
Differences in field of view may occur during unmanned aerial remote sensing imaging applications with acousto-optic tunable filter (AOTF) spectral imagers using zoom lenses. These differences may stem from image size deformation caused by the zoom lens, image drift caused by AOTF wavelength switching, and drone platform jitter. However, they can be addressed using hyperspectral image registration. This article proposes a new coarse-to-fine remote sensing image registration framework based on feature and optical flow theory, comparing its performance with that of existing registration algorithms using the same dataset. The proposed method increases the structure similarity index by 5.2 times, reduces the root mean square error by 3.1 times, and increases the mutual information by 1.9 times. To meet the real-time processing requirements of the AOTF spectrometer in remote sensing, a development environment using VS2023+CUDA+OPENCV was established to improve the demons registration algorithm. The registration algorithm for the central processing unit+graphics processing unit (CPU+GPU) achieved an acceleration ratio of ~30 times compared to that of a CPU alone. Finally, the real-time registration effect of spectral data during flight was verified. The proposed method demonstrates that AOTF hyperspectral imagers can be used in real-time remote sensing applications on unmanned aerial vehicles. Full article
Show Figures

Figure 1

14 pages, 3312 KiB  
Article
NXRouting: A GPU-Enhanced CAD Tool for European Radiation-Hardened FPGAs
by Andrea Portaluri, Sarah Azimi, Andrea Saracino, Luca Sterpone, Alp Kilic and Damien Dupuis
Electronics 2024, 13(14), 2803; https://doi.org/10.3390/electronics13142803 - 16 Jul 2024
Viewed by 795
Abstract
Field Programmable Gate Arrays (FPGAs) have witnessed an increase in space applications in the last years, mainly due to their cost-effective high-performances and flexibility. However, the susceptibility of these devices to radiation-induced effects when working in such an environment is well known. When [...] Read more.
Field Programmable Gate Arrays (FPGAs) have witnessed an increase in space applications in the last years, mainly due to their cost-effective high-performances and flexibility. However, the susceptibility of these devices to radiation-induced effects when working in such an environment is well known. When common mitigation techniques are not sufficient to ensure the correct completion of a task, radiation-hardened FPGAs represent one of the most effective solutions. NanoXplore, in this context, is the first European developer of rad-hard FPGAs, which embed intrinsic high complexity in their architectures preventing the user from using or developing custom placement and routing algorithms. In this paper, we overcame these issues by proposing the first tool tailored to NanoXplore devices which allows the exploration of NanoXplore device architectures and routing of points through a Python interface. We developed a model that reflects the one used by the vendor, allowing the user to extract info about routes, nets and additional logic, otherwise unavailable. The tool also performs routing of points in the programmable logic, computing the optimal path. An implementation of the router on Graphic Processing Unit (GPU) is proposed to exploit the highly parallelizable nature of the problem. Finally, routing timing analyses on different benchmarks have been performed, improving the routing routine time. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

37 pages, 9513 KiB  
Article
Parallel Implicit Solvers for 2D Numerical Models on Structured Meshes
by Yaoxin Zhang, Mohammad Z. Al-Hamdan and Xiaobo Chao
Mathematics 2024, 12(14), 2184; https://doi.org/10.3390/math12142184 - 12 Jul 2024
Viewed by 741
Abstract
This paper presents the parallelization of two widely used implicit numerical solvers for the solution of partial differential equations on structured meshes, namely, the ADI (Alternating-Direction Implicit) solver for tridiagonal linear systems and the SIP (Strongly Implicit Procedure) solver for the penta-diagonal systems. [...] Read more.
This paper presents the parallelization of two widely used implicit numerical solvers for the solution of partial differential equations on structured meshes, namely, the ADI (Alternating-Direction Implicit) solver for tridiagonal linear systems and the SIP (Strongly Implicit Procedure) solver for the penta-diagonal systems. Both solvers were parallelized using CUDA (Computer Unified Device Architecture) Fortran on GPGPUs (General-Purpose Graphics Processing Units). The parallel ADI solver (P-ADI) is based on the Parallel Cyclic Reduction (PCR) algorithm, while the parallel SIP solver (P-SIP) uses the wave front method (WF) following a diagonal line calculation strategy. To map the solution schemes onto the hierarchical block-threads framework of the CUDA on the GPU, the P-ADI solver adopted two mapping methods, one block thread with iterations (OBM-it) and multi-block threads (MBMs), while the P-SIP solver also used two mappings, one conventional mapping using effective WF lines (WF-e) with matrix coefficients and solution variables defined on original computational mesh, and a newly proposed mapping using all WF mesh (WF-all), on which matrix coefficients and solution variables are defined. Both the P-ADI and the P-SIP have been integrated into a two-dimensional (2D) hydrodynamic model, the CCHE2D (Center of Computational Hydroscience and Engineering) model, developed by the National Center for Computational Hydroscience and Engineering at the University of Mississippi. This study for the first time compared these two parallel solvers and their efficiency using examples and applications in complex geometries, which can provide valuable guidance for future uses of these two parallel implicit solvers in computational fluids dynamics (CFD). Both parallel solvers demonstrated higher efficiency than their serial counterparts on the CPU (Central Processing Unit): 3.73~4.98 speedup ratio for flow simulations, and 2.166~3.648 speedup ratio for sediment transport simulations. In general, the P-ADI solver is faster than but not as stable as the P-SIP solver; and for the P-SIP solver, the newly developed mapping method WF-all significantly improved the conventional mapping method WF-e. Full article
(This article belongs to the Special Issue Mathematical Modeling and Numerical Simulation in Fluids)
Show Figures

Figure 1

17 pages, 878 KiB  
Article
Efficient Parallel Processing of R-Tree on GPUs
by Jian Nong, Xi He, Jia Chen and Yanyan Liang
Mathematics 2024, 12(13), 2115; https://doi.org/10.3390/math12132115 - 5 Jul 2024
Viewed by 1165
Abstract
R-tree is an important multi-dimensional data structure widely employed in many applications for storing and querying spatial data. As GPUs emerge as powerful computing hardware platforms, a GPU-based parallel R-tree becomes the key to efficiently port R-tree-related applications to GPUs. However, traditional tree-based [...] Read more.
R-tree is an important multi-dimensional data structure widely employed in many applications for storing and querying spatial data. As GPUs emerge as powerful computing hardware platforms, a GPU-based parallel R-tree becomes the key to efficiently port R-tree-related applications to GPUs. However, traditional tree-based data structures can hardly be directly ported to GPUs, and it is also a great challenge to develop highly efficient parallel tree-based data structures on GPUs. The difficulty mostly lies in the design of tree-based data structures and related operations in the context of many-core architecture that can facilitate parallel processing. We summarize our contributions as follows: (i) design a GPU-friendly data structure to store spatial data; (ii) present two parallel R-tree construction algorithms and one parallel R-tree query algorithm that can take the hardware characteristics of GPUs into consideration; and (iii) port the vector map overlay system from CPU to GPU to demonstrate the feasibility of parallel R-tree. Experimental results show that our parallel R-tree on GPU is efficient and practical. Compared with the traditional CPU-based sequential vector map overlay system, our vector map overlay system based on parallel R-tree can achieve nearly 10-fold speedup. Full article
(This article belongs to the Special Issue Recent Advances of Mathematics in Industrial Engineering)
Show Figures

Figure 1

Back to TopTop