Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (385)

Search Parameters:
Keywords = GPU parallelization

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 6530 KiB  
Article
Machine Learning Analysis Using the Black Oil Model and Parallel Algorithms in Oil Recovery Forecasting
by Bazargul Matkerim, Aksultan Mukhanbet, Nurislam Kassymbek, Beimbet Daribayev, Maksat Mustafin and Timur Imankulov
Algorithms 2024, 17(8), 354; https://doi.org/10.3390/a17080354 - 14 Aug 2024
Viewed by 298
Abstract
The accurate forecasting of oil recovery factors is crucial for the effective management and optimization of oil production processes. This study explores the application of machine learning methods, specifically focusing on parallel algorithms, to enhance traditional reservoir simulation frameworks using black oil models. [...] Read more.
The accurate forecasting of oil recovery factors is crucial for the effective management and optimization of oil production processes. This study explores the application of machine learning methods, specifically focusing on parallel algorithms, to enhance traditional reservoir simulation frameworks using black oil models. This research involves four main steps: collecting a synthetic dataset, preprocessing it, modeling and predicting the oil recovery factors with various machine learning techniques, and evaluating the model’s performance. The analysis was carried out on a synthetic dataset containing parameters such as porosity, pressure, and the viscosity of oil and gas. By utilizing parallel computing, particularly GPUs, this study demonstrates significant improvements in processing efficiency and prediction accuracy. While maintaining the value of the R2 metric in the range of 0.97, using data parallelism sped up the learning process by, at best, 10.54 times. Neural network training was accelerated almost 8 times when running on a GPU. These findings underscore the potential of parallel machine learning algorithms to revolutionize the decision-making processes in reservoir management, offering faster and more precise predictive tools. This work not only contributes to computational sciences and reservoir engineering but also opens new avenues for the integration of advanced machine learning and parallel computing methods in optimizing oil recovery. Full article
Show Figures

Figure 1

14 pages, 2217 KiB  
Article
Power Consumption Comparison of GPU Linear Solvers for Cellular Potts Model Simulations
by Pasquale De Luca, Ardelio Galletti and Livia Marcellino
Appl. Sci. 2024, 14(16), 7028; https://doi.org/10.3390/app14167028 - 10 Aug 2024
Viewed by 458
Abstract
Power consumption is a significant challenge in the sustainability of computational science. The growing energy demands of increasingly complex simulations and algorithms lead to substantial resource use, which conflicts with global sustainability goals. This paper investigates the energy efficiency of different parallel implementations [...] Read more.
Power consumption is a significant challenge in the sustainability of computational science. The growing energy demands of increasingly complex simulations and algorithms lead to substantial resource use, which conflicts with global sustainability goals. This paper investigates the energy efficiency of different parallel implementations of a Cellular Potts model, which models cellular behavior through Hamiltonian energy minimization techniques, leveraging modern GPU architectures. By evaluating alternative solvers, it demonstrates that specific methods can significantly enhance computational efficiency and reduce energy use compared to traditional approaches. The results confirm notable improvements in execution time and energy consumption. In particular, the experiments show a reduction in terms of power of up to 53%, providing a pathway towards more sustainable high-performance computing practices for complex biological simulations. Full article
Show Figures

Figure 1

19 pages, 48324 KiB  
Article
An Efficient and Accurate Ground-Based Synthetic Aperture Radar (GB-SAR) Real-Time Imaging Scheme Based on Parallel Processing Mode and Architecture
by Yunxin Tan, Guangju Li, Chun Zhang and Weiming Gan
Electronics 2024, 13(16), 3138; https://doi.org/10.3390/electronics13163138 - 8 Aug 2024
Viewed by 409
Abstract
When performing high-resolution imaging with ground-based synthetic aperture radar (GB-SAR) systems, the data collected and processed are vast and complex, imposing higher demands on the real-time performance and processing efficiency of the imaging system. Yet a very limited number of studies have been [...] Read more.
When performing high-resolution imaging with ground-based synthetic aperture radar (GB-SAR) systems, the data collected and processed are vast and complex, imposing higher demands on the real-time performance and processing efficiency of the imaging system. Yet a very limited number of studies have been conducted on the real-time processing method of GB-SAR monitoring data. This paper proposes a real-time imaging scheme based on parallel processing models, optimizing each step of the traditional ωK imaging algorithm in parallel. Several parallel optimization schemes are proposed for the computationally intensive and complex interpolation part, including dynamic parallelism, the Group-Nstream processing model, and the Fthread-Group-Nstream processing model. The Fthread-Group-Nstream processing model utilizes FthreadGroup, and Nstream for the finer-grained processing of monitoring data, reducing the impact of the nested depth on the algorithm’s performance in dynamic parallelism and alleviating the issue of serial execution within the Group-Nstream processing model. This scheme has been successfully applied in a synthetic aperture radar imaging system, achieving excellent imaging results and accuracy. The speedup ratio can reach 52.14, and the relative errors in amplitude and phase are close to 0, validating the effectiveness and practicality of the proposed schemes. This paper addresses the lack of research on the real-time processing of GB-SAR monitoring data, providing a reliable monitoring method for GB-SAR deformation monitoring. Full article
(This article belongs to the Topic Radar Signal and Data Processing with Applications)
Show Figures

Figure 1

24 pages, 8434 KiB  
Article
A Fast Inverse Synthetic Aperture Radar Imaging Scheme Combining GPU-Accelerated Shooting and Bouncing Ray and Back Projection Algorithm under Wide Bandwidths and Angles
by Jiongming Chen, Pengju Yang, Rong Zhang and Rui Wu
Electronics 2024, 13(15), 3062; https://doi.org/10.3390/electronics13153062 - 2 Aug 2024
Viewed by 383
Abstract
Inverse synthetic aperture radar (ISAR) imaging techniques are frequently used in target classification and recognition applications, due to its capability to produce high-resolution images for moving targets. In order to meet the demand of ISAR imaging for electromagnetic calculation with high efficiency and [...] Read more.
Inverse synthetic aperture radar (ISAR) imaging techniques are frequently used in target classification and recognition applications, due to its capability to produce high-resolution images for moving targets. In order to meet the demand of ISAR imaging for electromagnetic calculation with high efficiency and accuracy, a novel accelerated shooting and bouncing ray (SBR) method is presented by combining a Graphics Processing Unit (GPU) and Bounding Volume Hierarchies (BVH) tree structure. To overcome the problem of unfocused images by a Fourier-based ISAR procedure under wide-angle and wide-bandwidth conditions, an efficient parallel back projection (BP) imaging algorithm is developed by utilizing the GPU acceleration technique. The presented GPU-accelerated SBR is validated by comparison with the RL-GO method in commercial software FEKO v2020. For ISAR images, it is clearly indicated that strong scattering centers as well as target profiles can be observed under large observation azimuth angles, Δφ=90°, and wide bandwidths, 3 GHz. It is also indicated that ISAR imaging is heavily sensitive to observation angles. In addition, obvious sidelobes can be observed, due to the phase history of the electromagnetic wave being distorted resulting from multipole scattering. Simulation results confirm the feasibility and efficiency of our scheme by combining GPU-accelerated SBR with the BP algorithm for fast ISAR imaging simulation under wide-angle and wide-bandwidth conditions. Full article
(This article belongs to the Special Issue Microwave Imaging and Applications)
Show Figures

Figure 1

17 pages, 10939 KiB  
Article
Application of Multibody Dynamics and Bonded-Particle GPU Discrete Element Method in Modelling of a Gyratory Crusher
by Youwei Xiong, Wei Chen, Tao Ou, Guoyan Zhao and Dongling Wu
Minerals 2024, 14(8), 774; https://doi.org/10.3390/min14080774 - 29 Jul 2024
Viewed by 349
Abstract
The gyratory crusher is one of the most important mineral processing assets in the comminution circuit, and its production performance directly impacts the circuit throughput. Due to its higher energy utilisation rate for rock breakage than semi-autogenous (SAG/AG) milling, it is a common [...] Read more.
The gyratory crusher is one of the most important mineral processing assets in the comminution circuit, and its production performance directly impacts the circuit throughput. Due to its higher energy utilisation rate for rock breakage than semi-autogenous (SAG/AG) milling, it is a common practice in operations to promote and optimise primary crushing before the downstream capacity can be enhanced. This study aims to develop a discrete element modelling (DEM) and multibody dynamics (MBD) cosimulation framework to optimise the performance of the gyratory crusher. An MBD model was initially established to simulate the gyratory crusher’s drivetrain system. A GPU-based DEM was also developed with a parallel bond model incorporated to simulate the particle breakage behaviour. Coupling of the MBD and GPU-based DEM resulted in a cosimulation framework based on the Function Mock-up Interface. An industrial-scale gyratory crusher was selected to test the developed numerical framework, and results indicated that the developed method was capable of modelling normal and choked working conditions. The outcome of this study enabled more realistic gyratory crusher improvement and optimisation strategies for enhanced production. Full article
(This article belongs to the Section Mineral Processing and Extractive Metallurgy)
Show Figures

Figure 1

20 pages, 305 KiB  
Article
Revisiting Database Indexing for Parallel and Accelerated Computing: A Comprehensive Study and Novel Approaches
by Maryam Abbasi, Marco V. Bernardo, Paulo Váz, José Silva and Pedro Martins
Information 2024, 15(8), 429; https://doi.org/10.3390/info15080429 - 24 Jul 2024
Viewed by 497
Abstract
While the importance of indexing strategies for optimizing query performance in database systems is widely acknowledged, the impact of rapidly evolving hardware architectures on indexing techniques has been an underexplored area. As modern computing systems increasingly leverage parallel processing capabilities, multi-core CPUs, and [...] Read more.
While the importance of indexing strategies for optimizing query performance in database systems is widely acknowledged, the impact of rapidly evolving hardware architectures on indexing techniques has been an underexplored area. As modern computing systems increasingly leverage parallel processing capabilities, multi-core CPUs, and specialized hardware accelerators, traditional indexing approaches may not fully capitalize on these advancements. This comprehensive experimental study investigates the effects of hardware-conscious indexing strategies tailored for contemporary and emerging hardware platforms. Through rigorous experimentation on a real-world database environment using the industry-standard TPC-H benchmark, this research evaluates the performance implications of indexing techniques specifically designed to exploit parallelism, vectorization, and hardware-accelerated operations. By examining approaches such as cache-conscious B-Tree variants, SIMD-optimized hash indexes, and GPU-accelerated spatial indexing, the study provides valuable insights into the potential performance gains and trade-offs associated with these hardware-aware indexing methods. The findings reveal that hardware-conscious indexing strategies can significantly outperform their traditional counterparts, particularly in data-intensive workloads and large-scale database deployments. Our experiments show improvements ranging from 32.4% to 48.6% in query execution time, depending on the specific technique and hardware configuration. However, the study also highlights the complexity of implementing and tuning these techniques, as they often require intricate code optimizations and a deep understanding of the underlying hardware architecture. Additionally, this research explores the potential of machine learning-based indexing approaches, including reinforcement learning for index selection and neural network-based index advisors. While these techniques show promise, with performance improvements of up to 48.6% in certain scenarios, their effectiveness varies across different query types and data distributions. By offering a comprehensive analysis and practical recommendations, this research contributes to the ongoing pursuit of database performance optimization in the era of heterogeneous computing. The findings inform database administrators, developers, and system architects on effective indexing practices tailored for modern hardware, while also paving the way for future research into adaptive indexing techniques that can dynamically leverage hardware capabilities based on workload characteristics and resource availability. Full article
(This article belongs to the Special Issue Advances in High Performance Computing and Scalable Software)
29 pages, 1832 KiB  
Article
A Parallel Compression Pipeline for Improving GPU Virtualization Data Transfers
by Cristian Peñaranda, Carlos Reaño and Federico Silla
Sensors 2024, 24(14), 4649; https://doi.org/10.3390/s24144649 - 17 Jul 2024
Viewed by 352
Abstract
GPUs are commonly used to accelerate the execution of applications in domains such as deep learning. Deep learning applications are applied to an increasing variety of scenarios, with edge computing being one of them. However, edge devices present severe computing power and energy [...] Read more.
GPUs are commonly used to accelerate the execution of applications in domains such as deep learning. Deep learning applications are applied to an increasing variety of scenarios, with edge computing being one of them. However, edge devices present severe computing power and energy limitations. In this context, the use of remote GPU virtualization solutions is an efficient way to address these concerns. Nevertheless, the limited network bandwidth might be an issue. This limitation can be alleviated by leveraging on-the-fly compression within the communication layer of remote GPU virtualization solutions. In this way, data exchanged with the remote GPU is transparently compressed before being transmitted, thus increasing network bandwidth in practice. In this paper, we present the implementation of a parallel compression pipeline designed to be used within remote GPU virtualization solutions. A thorough performance analysis shows that network bandwidth can be increased by a factor of up to 2×. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

25 pages, 1173 KiB  
Article
Parallel Implementation of K-Best Quadrature Amplitude Modulation Detection for Massive Multiple Input Multiple Output Systems
by Bhargav Gokalgandhi, Jonathan Ling, Zoran Latinović, Dragan Samardzija and Ivan Seskar
Electronics 2024, 13(14), 2775; https://doi.org/10.3390/electronics13142775 - 15 Jul 2024
Viewed by 448
Abstract
Massive MIMO (Multiple Input Multiple Output) systems impose significant processing burdens along with strict latency requirements. The combination of large-scale antenna arrays and wide bandwidth requirements for next-generation wireless systems creates an exponential increase in frontend to backend data. Balancing the processing latency [...] Read more.
Massive MIMO (Multiple Input Multiple Output) systems impose significant processing burdens along with strict latency requirements. The combination of large-scale antenna arrays and wide bandwidth requirements for next-generation wireless systems creates an exponential increase in frontend to backend data. Balancing the processing latency and reliability is critical for baseband processing tasks such as QAM detection. While linear detection algorithms have low computational complexity, their use in Massive MIMO scenario has heavy degradation in error performance. Nonlinear detection methods such as Maximum Likelihood and Sphere Decoding have good error performance, but they suffer from high, variable, and uncontrollable computational complexity. For such cases, the K-best QAM detection algorithm can provide required control over the system performance while maintaining near-ML error performance. In this paper, hard-output, as well as soft-output K-best QAM detection, is implemented in a CPU by utilizing the multiple cores combined with vector processing. Similarly, hard-output detection in a GPU is implemented by leveraging the SIMD (Single Instruction, Multiple Data) architecture and Warp-based execution model. The processing time per bit and the energy consumption per bit are compared for CPU and GPU implementations for QAM constellation density and MIMO array size. The GPU implementation shows up to 5× processing latency per bit improvement and up to 120× energy consumption per bit improvement over the CPU implementation for typical QAM constellations such as 4, 16, and 64 QAM. GPU implementation also shows up to 125× improvement over CPU implementation in energy consumption per bit for larger MIMO configurations such as 24 × 24 and 32 × 32. Finally, the soft-output detector is combined with a LDPC (Low-Density Parity Check) decoder to obtain the FER (Frame Error Rate) performance for CPU implementation. The FER is then combined with frame processing latency to form a Goodput metric to demonstrate the latency and reliability tradeoff. Full article
(This article belongs to the Section Microwave and Wireless Communications)
Show Figures

Figure 1

37 pages, 9513 KiB  
Article
Parallel Implicit Solvers for 2D Numerical Models on Structured Meshes
by Yaoxin Zhang, Mohammad Z. Al-Hamdan and Xiaobo Chao
Mathematics 2024, 12(14), 2184; https://doi.org/10.3390/math12142184 - 12 Jul 2024
Viewed by 387
Abstract
This paper presents the parallelization of two widely used implicit numerical solvers for the solution of partial differential equations on structured meshes, namely, the ADI (Alternating-Direction Implicit) solver for tridiagonal linear systems and the SIP (Strongly Implicit Procedure) solver for the penta-diagonal systems. [...] Read more.
This paper presents the parallelization of two widely used implicit numerical solvers for the solution of partial differential equations on structured meshes, namely, the ADI (Alternating-Direction Implicit) solver for tridiagonal linear systems and the SIP (Strongly Implicit Procedure) solver for the penta-diagonal systems. Both solvers were parallelized using CUDA (Computer Unified Device Architecture) Fortran on GPGPUs (General-Purpose Graphics Processing Units). The parallel ADI solver (P-ADI) is based on the Parallel Cyclic Reduction (PCR) algorithm, while the parallel SIP solver (P-SIP) uses the wave front method (WF) following a diagonal line calculation strategy. To map the solution schemes onto the hierarchical block-threads framework of the CUDA on the GPU, the P-ADI solver adopted two mapping methods, one block thread with iterations (OBM-it) and multi-block threads (MBMs), while the P-SIP solver also used two mappings, one conventional mapping using effective WF lines (WF-e) with matrix coefficients and solution variables defined on original computational mesh, and a newly proposed mapping using all WF mesh (WF-all), on which matrix coefficients and solution variables are defined. Both the P-ADI and the P-SIP have been integrated into a two-dimensional (2D) hydrodynamic model, the CCHE2D (Center of Computational Hydroscience and Engineering) model, developed by the National Center for Computational Hydroscience and Engineering at the University of Mississippi. This study for the first time compared these two parallel solvers and their efficiency using examples and applications in complex geometries, which can provide valuable guidance for future uses of these two parallel implicit solvers in computational fluids dynamics (CFD). Both parallel solvers demonstrated higher efficiency than their serial counterparts on the CPU (Central Processing Unit): 3.73~4.98 speedup ratio for flow simulations, and 2.166~3.648 speedup ratio for sediment transport simulations. In general, the P-ADI solver is faster than but not as stable as the P-SIP solver; and for the P-SIP solver, the newly developed mapping method WF-all significantly improved the conventional mapping method WF-e. Full article
(This article belongs to the Special Issue Mathematical Modeling and Numerical Simulation in Fluids)
Show Figures

Figure 1

17 pages, 878 KiB  
Article
Efficient Parallel Processing of R-Tree on GPUs
by Jian Nong, Xi He, Jia Chen and Yanyan Liang
Mathematics 2024, 12(13), 2115; https://doi.org/10.3390/math12132115 - 5 Jul 2024
Viewed by 601
Abstract
R-tree is an important multi-dimensional data structure widely employed in many applications for storing and querying spatial data. As GPUs emerge as powerful computing hardware platforms, a GPU-based parallel R-tree becomes the key to efficiently port R-tree-related applications to GPUs. However, traditional tree-based [...] Read more.
R-tree is an important multi-dimensional data structure widely employed in many applications for storing and querying spatial data. As GPUs emerge as powerful computing hardware platforms, a GPU-based parallel R-tree becomes the key to efficiently port R-tree-related applications to GPUs. However, traditional tree-based data structures can hardly be directly ported to GPUs, and it is also a great challenge to develop highly efficient parallel tree-based data structures on GPUs. The difficulty mostly lies in the design of tree-based data structures and related operations in the context of many-core architecture that can facilitate parallel processing. We summarize our contributions as follows: (i) design a GPU-friendly data structure to store spatial data; (ii) present two parallel R-tree construction algorithms and one parallel R-tree query algorithm that can take the hardware characteristics of GPUs into consideration; and (iii) port the vector map overlay system from CPU to GPU to demonstrate the feasibility of parallel R-tree. Experimental results show that our parallel R-tree on GPU is efficient and practical. Compared with the traditional CPU-based sequential vector map overlay system, our vector map overlay system based on parallel R-tree can achieve nearly 10-fold speedup. Full article
(This article belongs to the Special Issue Recent Advances of Mathematics in Industrial Engineering)
Show Figures

Figure 1

17 pages, 22049 KiB  
Communication
Optical Imaging Model Based on GPU-Accelerated Monte Carlo Simulation for Deep-Sea Luminescent Objects
by Qing Han, Mengnan Sun, Bing Zheng and Min Fu
Remote Sens. 2024, 16(13), 2429; https://doi.org/10.3390/rs16132429 - 2 Jul 2024
Viewed by 681
Abstract
Modeling and simulating the underwater optical imaging process can assist in optimizing the configuration of underwater optical imaging technology. Based on the Monte Carlo (MC) method, we propose an optical imaging model which is tailored for deep-sea luminescent objects. Employing GPU parallel acceleration [...] Read more.
Modeling and simulating the underwater optical imaging process can assist in optimizing the configuration of underwater optical imaging technology. Based on the Monte Carlo (MC) method, we propose an optical imaging model which is tailored for deep-sea luminescent objects. Employing GPU parallel acceleration expedites the speed of MC simulation and ray-tracing, achieving a three-order-of-magnitude speedup over a CPU-based program. A deep-sea single-lens imaging system is constructed in the model, composed of a luminescent object, water medium, double-convex lens, aperture diaphragm, and sensor. The image of the luminescent object passing through the imaging system is generated using the forward ray-tracing method. This model enables an intuitive analysis of the inherent optical properties of water and imaging device parameters, such as sensor size, lens focal length, field of view (FOV), and camera position on imaging outcomes in the deep-sea environment. Full article
(This article belongs to the Special Issue Advanced Techniques for Water-Related Remote Sensing)
Show Figures

Figure 1

42 pages, 7686 KiB  
Article
Parallel GPU-Acceleration of Metaphorless Optimization Algorithms: Application for Solving Large-Scale Nonlinear Equation Systems
by Bruno Silva, Luiz Guerreiro Lopes and Fábio Mendonça
Appl. Sci. 2024, 14(12), 5349; https://doi.org/10.3390/app14125349 - 20 Jun 2024
Viewed by 490
Abstract
Traditional population-based metaheuristic algorithms are effective in solving complex real-world problems but require careful strategy selection and parameter tuning. Metaphorless population-based optimization algorithms have gained importance due to their simplicity and efficiency. However, research on their applicability for solving large systems of nonlinear [...] Read more.
Traditional population-based metaheuristic algorithms are effective in solving complex real-world problems but require careful strategy selection and parameter tuning. Metaphorless population-based optimization algorithms have gained importance due to their simplicity and efficiency. However, research on their applicability for solving large systems of nonlinear equations is still incipient. This paper presents a review and detailed description of the main metaphorless optimization algorithms, including the Jaya and enhanced Jaya (EJAYA) algorithms, the three Rao algorithms, the best-worst-play (BWP) algorithm, and the new max–min greedy interaction (MaGI) algorithm. This article presents improved GPU-based massively parallel versions of these algorithms using a more efficient parallelization strategy. In particular, a novel GPU-accelerated implementation of the MaGI algorithm is proposed. The GPU-accelerated versions of the metaphorless algorithms developed were implemented using the Julia programming language. Both high-end professional-grade GPUs and a powerful consumer-oriented GPU were used for testing, along with a set of hard, large-scale nonlinear equation system problems to gauge the speedup gains from the parallelizations. The computational experiments produced substantial speedup gains, ranging from 33.9× to 561.8×, depending on the test parameters and the GPU used for testing. This highlights the efficiency of the proposed GPU-accelerated versions of the metaphorless algorithms considered. Full article
Show Figures

Figure 1

20 pages, 6487 KiB  
Article
UAV Swarm Cooperative Dynamic Target Search: A MAPPO-Based Discrete Optimal Control Method
by Dexing Wei, Lun Zhang, Quan Liu, Hao Chen and Jian Huang
Drones 2024, 8(6), 214; https://doi.org/10.3390/drones8060214 - 22 May 2024
Viewed by 826
Abstract
Unmanned aerial vehicles (UAVs) are commonly employed in pursuit and rescue missions, where the target’s trajectory is unknown. Traditional methods, such as evolutionary algorithms and ant colony optimization, can generate a search route in a given scenario. However, when the scene changes, the [...] Read more.
Unmanned aerial vehicles (UAVs) are commonly employed in pursuit and rescue missions, where the target’s trajectory is unknown. Traditional methods, such as evolutionary algorithms and ant colony optimization, can generate a search route in a given scenario. However, when the scene changes, the solution needs to be recalculated. In contrast, more advanced deep reinforcement learning methods can train an agent that can be directly applied to a similar task without recalculation. Nevertheless, there are several challenges when the agent learns how to search for unknown dynamic targets. In this search task, the rewards are random and sparse, which makes learning difficult. In addition, because of the need for the agent to adapt to various scenario settings, interactions required between the agent and the environment are more comparable to typical reinforcement learning tasks. These challenges increase the difficulty of training agents. To address these issues, we propose the OC-MAPPO method, which combines optimal control (OC) and Multi-Agent Proximal Policy Optimization (MAPPO) with GPU parallelization. The optimal control model provides the agent with continuous and stable rewards. Through parallelized models, the agent can interact with the environment and collect data more rapidly. Experimental results demonstrate that the proposed method can help the agent learn faster, and the algorithm demonstrated a 26.97% increase in the success rate compared to genetic algorithms. Full article
Show Figures

Figure 1

17 pages, 4364 KiB  
Article
Modular Dynamic Phasor Modeling and Simulation of Renewable Integrated Power Systems
by Shirosh Peiris, Shaahin Filizadeh and Dharshana Muthumuni
Energies 2024, 17(11), 2480; https://doi.org/10.3390/en17112480 - 22 May 2024
Viewed by 758
Abstract
This paper presents a dynamic-phasor-based, average-value modeling method for power systems with extensive converter-tied subsystems. In the proposed approach, the overall system model is constructed using modular functions, interfacing both conventional and converter-tied resources. Model validation is performed against detailed Electro-Magnetic Transient (EMT) [...] Read more.
This paper presents a dynamic-phasor-based, average-value modeling method for power systems with extensive converter-tied subsystems. In the proposed approach, the overall system model is constructed using modular functions, interfacing both conventional and converter-tied resources. Model validation is performed against detailed Electro-Magnetic Transient (EMT) simulations. The analytical capabilities offered by the proposed modeling method are demonstrated on a modified IEEE 9-bus system. A Graphics Processing Unit (GPU)-based parallel computing approach for the solution of the resulting model is presented and exemplified on a modified IEEE 118-bus system, showing significant improvements in computing efficiency over EMT solvers. A co-simulation approach using a Central Processing Unit (CPU) and a GPU is also presented and exemplified using a modified version of the IEEE 118-bus system, demonstrating the model’s parallelization. Full article
(This article belongs to the Section A: Sustainable Energy)
Show Figures

Figure 1

15 pages, 1092 KiB  
Article
Performance Evaluation of Parallel Graphs Algorithms Utilizing Graphcore IPU
by Paweł Gepner, Bartłomiej Kocot, Marcin Paprzycki, Maria Ganzha, Leonid Moroz and Tomasz Olas
Electronics 2024, 13(11), 2011; https://doi.org/10.3390/electronics13112011 - 21 May 2024
Viewed by 713
Abstract
Recent years have been characterized by increasing interest in graph computations. This trend can be related to the large number of potential application areas. Moreover, increasing computational capabilities of modern computers allowed turning theory of graph algorithms into explorations of best methods for [...] Read more.
Recent years have been characterized by increasing interest in graph computations. This trend can be related to the large number of potential application areas. Moreover, increasing computational capabilities of modern computers allowed turning theory of graph algorithms into explorations of best methods for their actual realization. These factors, in turn, brought about ideas like creation of a hardware component dedicated to graph computation; i.e., the Graphcore Intelligent Processor Unit (IPU). Interestingly, Graphcore systems are a hardware implementation of the Bulk Synchronous Parallel paradigm, which seemed to be a mostly theoretical concept from the end of last century. In this context, the question that has to be addressed experimentally is as follows: how good are Graphcore systems in comparison with standard systems that can be used to run graph algorithms, i.e., CPUs and GPUs. To provide a partial response to this broad question, in this contribution, PageRank, Single Source Shortest Path and Breadth-First Search algorithms are used to compare the performance of IPU-deployed algorithms to other parallel architectures. Obtained results clearly show that the Graphcore IPU outperforms other devices for the studied heterogeneous algorithms and, currently, provides best-in-class execution time results for a range of graph sizes and densities. Full article
(This article belongs to the Special Issue Recent Advances of Cloud, Edge, and Parallel Computing)
Show Figures

Figure 1

Back to TopTop