Computational electromagnetics methods for analysing nonlinear systems are computationally complex, such as harmonic balance (HB) method, especially when dealing with a large number of frequency points. In this paper,...
详细信息
ISBN:
(数字)9798350351019
ISBN:
(纸本)9798350351026
Computational electromagnetics methods for analysing nonlinear systems are computationally complex, such as harmonic balance (HB) method, especially when dealing with a large number of frequency points. In this paper, we propose a fast parallel algorithm for HB method to accelerate electromagnetic simulation. The new algorithm parallelizes the construction of nonlinear Jacobian matrix, utilizing graphical processing unit (GPU) to realize improvements for electromagnetic simulation. We present the formulations of the parallel HB method, and subsequently provide its implementation details based on the mixed platform with GPU and CPU. Experimental results from several industrial cases illustrate that the new parallel algorithm leads to $3 \times$ speedup compared to the conventional HB method while still maintaining the similar accuracy, where the GPU-accelerated part is about 10 times faster than its CPU counterpart.
Since the advent of parallel algorithms in the C++17 Standard Template Library (STL), the STL has become a viable framework for creating performance-portable applications. Given multiple existing implementations of th...
详细信息
This paper presents a reexamination of the research paper titled "Communication-Avoiding parallel algorithms for TRSM" by Wicky et al. We focus on the communication bandwidth cost analysis presented in the o...
详细信息
The default data structure for storing sparse graphs is Compressed Sparse Row (CSR), which enables efficient algorithms but is not designed to accommodate changes to the graph. Since many real-world graphs are dynamic...
详细信息
ISBN:
(数字)9798350387131
ISBN:
(纸本)9798350387148
The default data structure for storing sparse graphs is Compressed Sparse Row (CSR), which enables efficient algorithms but is not designed to accommodate changes to the graph. Since many real-world graphs are dynamic (i.e., they change over time), there has been significant work towards developing dynamic-graph data structures that can support fast algorithms as well as updates to the graph. This paper introduces Batch-parallel Compressed Sparse Row (BP-CSR), a batch-parallel data structure optimized for storing and processing dynamic graphs based on the Packed Memory Array (PMA). At a high level, Batch-parallel Compressed Sparse Row extends Packed Compressed Sparse Row (PCSR, HPEC '18), a serial dynamic-graph data structure built on a PMA. However, since the original PCSR runs only on one thread, it cannot take advantage of the parallelism available in multithreaded machines. In contrast, Batch-parallel Compressed Sparse Row is built on the batch-parallel Packed Memory Array data structure (PPoPP '24) and can support fast parallel algorithms and updates. The empirical evaluation demonstrates that Batch-parallel Compressed Sparse Row supports fast parallel updates with minimal cost to algorithm performance. Specifically, Batchparallel Compressed Sparse Row performs up to $\mathbf{4 2 0}$ million inserts per second. Across a suite of 10 graph algorithms and 10 input graphs, Batch-parallel Compressed Sparse Row incurs $1.05 \times$ slowdown on average and about $1.5 \times$ slowdown at most compared to Compressed Sparse Row (CSR), a classical static graph representation. Furthermore, the empirical results show that Batch-parallel Compressed Sparse Row outperforms existing tree-based and PMA-based dynamic-graph data structures on both algorithms and updates.
Recently, the frequency of extreme disasters has increased, leading to higher risk of cascading failures in power system. To solve the problems faced by the overload-dominant cascading fault analysis, this paper propo...
详细信息
ISBN:
(数字)9798331523527
ISBN:
(纸本)9798331523534
Recently, the frequency of extreme disasters has increased, leading to higher risk of cascading failures in power system. To solve the problems faced by the overload-dominant cascading fault analysis, this paper proposes a cascading fault path evolution analysis method based on GPU parallel. Case studies confirmed that the overall computation time is reduced by about 50%. This method provides a scientific basis for defense strategies after disaster warnings, thus reducing disaster losses.
Discrete Event System Specification (DEVS) is a modeling and simulation of discrete event systems formalism. Most DEVS-based simulators are implemented as sequential programs. However, simulating large-scale complex m...
详细信息
ISBN:
(数字)9781713899310
ISBN:
(纸本)9798350350562
Discrete Event System Specification (DEVS) is a modeling and simulation of discrete event systems formalism. Most DEVS-based simulators are implemented as sequential programs. However, simulating large-scale complex models in a sequential simulator is impractical (if possible), as simulations may take a long time to execute. A usual technique to speed up simulations is the parallel execution of the simulator. Most parallel discrete-event simulation efforts focus on logical process approaches, resulting in complex simulation architectures. Recent parallelizing efforts lean towards executing the simulators in multicore architectures. Despite promising results, they are limited to the amount of CPU processing cores. In this work, we propose an algorithm to accelerate the execution of DEVS simulations on Graphical Processing Units (GPU) architectures. We show different case studies where the proposed algorithm achieved speedups of up to 12.29 and 16.53 compared to a sequential version.
This paper proposes a parallel acoustic characterization simulation method for near-surface targets. This method is based on the Shooting and Bouncing Ray (SBR)) method and allows for the high-precision simulation of ...
详细信息
ISBN:
(数字)9798331517199
ISBN:
(纸本)9798331517205
This paper proposes a parallel acoustic characterization simulation method for near-surface targets. This method is based on the Shooting and Bouncing Ray (SBR)) method and allows for the high-precision simulation of near-surface target acoustic characteristics. Furthermore, it enables the acceleration of the simulation computation process of complex shape near-surface targets by utilizing a Graphics Processing Unit (GPU) platform. By employing an equivalent decomposition of near-surface target acoustic field echoes, the task scale of 10 billion acoustic beam lines is exceeded on a multi-GPU cluster, thereby enabling the rapid and precise simulation of the acoustic characteristics of large near-surface targets. This approach has significant potential for a wide range of applications.
In this paper, a parallel wavelet transform method based on thermal radiation remote sensing bright temperature data is proposed, which is based on the thermal radiation remote sensing bright temperature data provided...
详细信息
ISBN:
(数字)9798331517090
ISBN:
(纸本)9798331517106
In this paper, a parallel wavelet transform method based on thermal radiation remote sensing bright temperature data is proposed, which is based on the thermal radiation remote sensing bright temperature data provided by the FY-2C satellite, and utilizes the thread pool to manage multiple threads and realize the asynchronous execution of the functions in the task queue, so as to realize the parallelization of the wavelet transform. Experiments show that the parallel algorithm is able to reduce the transform time by half compared with the common serial transform method, and it shows more obvious advantages when transforming large data volumes.
Subgraph matching, also known as motif finding, is a fundamental problem in graph analysis with extensive applications. However, identifying subgraphs in large-scale graphs is challenging due to its NP-Hard complexity...
详细信息
ISBN:
(数字)9798350387131
ISBN:
(纸本)9798350387148
Subgraph matching, also known as motif finding, is a fundamental problem in graph analysis with extensive applications. However, identifying subgraphs in large-scale graphs is challenging due to its NP-Hard complexity. In addition to the time complexity, previous solutions often suffer from excessive memory usage when dealing with large-scale graphs. This issue is exacerbated in shared-memory systems, where memory is more limited compared to distributed settings. Therefore, achieving a balance between execution time and memory efficiency is vital in such environments. In this paper, we present a query-agnostic shared-memory parallel algorithm that incorporates ordering in set intersection, resulting in an 8% reduction in enumeration time for large graphs. Our approach also achieves memory usage reductions ranging from 2×to 8.2× compared to state-of-the-art techniques, while maintaining comparable runtime performance on large datasets. Extensive experiments with various query and graph datasets demonstrate improved scalability and effective workload balancing of our approach compared to other methods.
A novel parallel mechanism for rapid digital-domain self-interference cancellation (SIC) in full-duplex (FD) integrated sensing and communication (ISAC) systems is proposed. The processing delay is minimized by employ...
详细信息
ISBN:
(数字)9798331515669
ISBN:
(纸本)9798331515676
A novel parallel mechanism for rapid digital-domain self-interference cancellation (SIC) in full-duplex (FD) integrated sensing and communication (ISAC) systems is proposed. The processing delay is minimized by employing a parallel cancellation architecture and substituting filtered sampling symbols with known modulation symbols, thus enabling effective and timely SIC for radar sensing. The proposed parallel SIC technique is presented through comprehensive system modeling, algorithm definition, feasibility assessment, numerical simulations, and experimental validations. The analysis shows that the proposed algorithm, with its high convergence speed, can effectively eliminate self-interference under severe conditions of self-interference and high-frequency variations, thereby enhancing the SIC capabilities of the full-duplex ISAC platform and contributing to the improvement of sensing performance.
暂无评论