检索结果-内蒙古大学图书馆

IEEE Asia-Pacific Conference on Antennas and Propagation (APCAP)

作者： Zhengzhuo Wang Yanliang Sha Lingyun Ouyang Quan Chen Jianguo Hu School of Microelectronics Science and Technology Sun Yat-sen University Zhuhai China School of Microelectronics Southern University of Science and Technology Shenzhen China

ISBN: (数字)9798350351019

ISBN: (纸本)9798350351026

Computational electromagnetics methods for analysing nonlinear systems are computationally complex, such as harmonic balance (HB) method, especially when dealing with a large number of frequency points. In this paper, we propose a fast parallel algorithm for HB method to accelerate electromagnetic simulation. The new algorithm parallelizes the construction of nonlinear Jacobian matrix, utilizing graphical processing unit (GPU) to realize improvements for electromagnetic simulation. We present the formulations of the parallel HB method, and subsequently provide its implementation details based on the mixed platform with GPU and CPU. Experimental results from several industrial cases illustrate that the new parallel algorithm leads to $3 \times$ speedup compared to the conventional HB method while still maintaining the similar accuracy, where the GPU-accelerated part is about 10 times faster than its CPU counterpart.

关键词： Jacobian matrices Accuracy Graphics processing units Harmonic analysis Computational electromagnetics Central Processing Unit parallel architectures parallel algorithms Electromagnetics Nonlinear systems

来源：评论

学校读者我要写书评

暂无评论

pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ parallel STL Implementations

arXiv

引用

arXiv 2024年

作者： Laso, Ruben Krupitza, Diego Hunold, Sascha Faculty of Informatics TU Wien Vienna Austria

Since the advent of parallel algorithms in the C++17 Standard Template Library (STL), the STL has become a viable framework for creating performance-portable applications. Given multiple existing implementations of the parallel algorithms, a systematic, quantitative performance comparison is essential for choosing the appropriate implementation for a particular hardware configuration. In this work, we introduce a specialized set of micro-benchmarks to assess the scalability of the parallel algorithms in the STL. By selecting different backends, our micro-benchmarks can be used on multi-core systems and GPUs. Using the suite, in a case study on AMD and Intel CPUs and NVIDIA GPUs, we were able to identify substantial performance disparities among different implementations, including GCC+TBB, GCC+HPX, Intel’s compiler with TBB, or NVIDIA’s compiler with OpenMP and CUDA. © 2024, CC BY.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A Reexamination of the Communication Bandwidth Cost Analysis of A parallel Recursive Algorithm for Solving Triangular Systems of Linear Equations

arXiv

引用

arXiv 2024年

作者： Tang, Yuan School of Computer Science School of Software Fudan University Shanghai China

This paper presents a reexamination of the research paper titled "Communication-Avoiding parallel algorithms for TRSM" by Wicky et al. We focus on the communication bandwidth cost analysis presented in the original work and identify potential issues that require clarification or revision. The problem at hand is the need to address inconsistencies and miscalculations found in the analysis, particularly in the categorization of costs into three scenarios based on the relationship between matrix dimensions and processor count. Our findings contribute to the ongoing discourse in the field and pave the way for further improvements in this area of research. Copyright © 2024, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Batch-parallel Compressed Sparse Row: A Locality-Optimized Dynamic-Graph Representation

Batch-Parallel Compressed Sparse Row: A Locality-Optimized D...

引用

IEEE Conference on High Performance Extreme Computing (HPEC)

作者： Brian Wheatman Randal Burns Helen Xu University of Chicago Chicago IL Department of Computer Science Johns Hopkins University Baltimore MD School of Computational Science and Engineering Georgia Institute of Technology Atlanta GA

ISBN: (数字)9798350387131

ISBN: (纸本)9798350387148

The default data structure for storing sparse graphs is Compressed Sparse Row (CSR), which enables efficient algorithms but is not designed to accommodate changes to the graph. Since many real-world graphs are dynamic (i.e., they change over time), there has been significant work towards developing dynamic-graph data structures that can support fast algorithms as well as updates to the graph. This paper introduces Batch-parallel Compressed Sparse Row (BP-CSR), a batch-parallel data structure optimized for storing and processing dynamic graphs based on the Packed Memory Array (PMA). At a high level, Batch-parallel Compressed Sparse Row extends Packed Compressed Sparse Row (PCSR, HPEC '18), a serial dynamic-graph data structure built on a PMA. However, since the original PCSR runs only on one thread, it cannot take advantage of the parallelism available in multithreaded machines. In contrast, Batch-parallel Compressed Sparse Row is built on the batch-parallel Packed Memory Array data structure (PPoPP '24) and can support fast parallel algorithms and updates. The empirical evaluation demonstrates that Batch-parallel Compressed Sparse Row supports fast parallel updates with minimal cost to algorithm performance. Specifically, Batchparallel Compressed Sparse Row performs up to $\mathbf{4 2 0}$ million inserts per second. Across a suite of 10 graph algorithms and 10 input graphs, Batch-parallel Compressed Sparse Row incurs $1.05 \times$ slowdown on average and about $1.5 \times$ slowdown at most compared to Compressed Sparse Row (CSR), a classical static graph representation. Furthermore, the empirical results show that Batch-parallel Compressed Sparse Row outperforms existing tree-based and PMA-based dynamic-graph data structures on both algorithms and updates.

关键词： Costs Heuristic algorithms Instruction sets Throughput Arrays parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Research on Cascading Fault Path Analysis of Power System Based on GPU parallel Computing

Research on Cascading Fault Path Analysis of Power System Ba...

引用

IEEE Conference on Energy Internet and Energy System Integration (EI2)

作者： Jing Xu Dai Cui Zhengwen Li Guangyu Zhu Chenyang Zhao Yiheng Bian Gengfeng Li Liaoning Electric Power Grid State Grid Corporation of China Shenyang China State Key Laboratory of Electrical Insulation and Power Equipment Xi'an Jiaotong University Xi’an China

ISBN: (数字)9798331523527

ISBN: (纸本)9798331523534

Recently, the frequency of extreme disasters has increased, leading to higher risk of cascading failures in power system. To solve the problems faced by the overload-dominant cascading fault analysis, this paper proposes a cascading fault path evolution analysis method based on GPU parallel. Case studies confirmed that the overall computation time is reduced by about 50%. This method provides a scientific basis for defense strategies after disaster warnings, thus reducing disaster losses.

关键词： Correlation Disasters Power system protection Graphics processing units Voltage System integration Security Indexes Power system faults parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Flattened parallel DEVS Simulations on GPU Architectures

Flattened Parallel DEVS Simulations on GPU Architectures

引用

Annual Modeling and Simulation Conference (ANNSIM)

作者： Guillermo G. Trabes Alonso Inostrosa-Psijas Verónica Gil-Costa Gabriel A. Wainer Advanced Real-Time Simulation Laboratory Carleton University Canada Escuela de Ingeniería Informática Universidad de Valparaíso Chile Universidad Nacional de San Luis Argentina

ISBN: (数字)9781713899310

ISBN: (纸本)9798350350562

Discrete Event System Specification (DEVS) is a modeling and simulation of discrete event systems formalism. Most DEVS-based simulators are implemented as sequential programs. However, simulating large-scale complex models in a sequential simulator is impractical (if possible), as simulations may take a long time to execute. A usual technique to speed up simulations is the parallel execution of the simulator. Most parallel discrete-event simulation efforts focus on logical process approaches, resulting in complex simulation architectures. Recent parallelizing efforts lean towards executing the simulators in multicore architectures. Despite promising results, they are limited to the amount of CPU processing cores. In this work, we propose an algorithm to accelerate the execution of DEVS simulations on Graphical Processing Units (GPU) architectures. We show different case studies where the proposed algorithm achieved speedups of up to 12.29 and 16.53 compared to a sequential version.

关键词： Computers Protocols Multicore processing Simulation Graphics processing units Computer architecture Hardware Discrete-event systems Central Processing Unit parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Simulation of Acoustic Characteristics for Near-Surface Targets

Parallel Simulation of Acoustic Characteristics for Near-Sur...

引用

IEEE International Conference on Control Science and Systems Engineering (CCSSE)

作者： Hao Li Jincheng Hu Ziyu Ji Xianyun Wu School of Telecommunications Engineering Xidian University Xi'an China

ISBN: (数字)9798331517199

ISBN: (纸本)9798331517205

This paper proposes a parallel acoustic characterization simulation method for near-surface targets. This method is based on the Shooting and Bouncing Ray (SBR)) method and allows for the high-precision simulation of near-surface target acoustic characteristics. Furthermore, it enables the acceleration of the simulation computation process of complex shape near-surface targets by utilizing a Graphics Processing Unit (GPU) platform. By employing an equivalent decomposition of near-surface target acoustic field echoes, the task scale of 10 billion acoustic beam lines is exceeded on a multi-GPU cluster, thereby enabling the rapid and precise simulation of the acoustic characteristics of large near-surface targets. This approach has significant potential for a wide range of applications.

关键词： Shape Scalability Loading Graphics processing units Systems engineering and theory Load management Acoustics Acoustic beams Time-domain analysis parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Wavelet Transform Method Based on Remotely Sensed Bright Temperature Data from Thermal Radiation

Parallel Wavelet Transform Method Based on Remotely Sensed B...

引用

Artificial Intelligence Technology (ACAIT), Asian Conference on

作者： Congxin Wei Shouwei Na School of Computer and Communication Lanzhou University of Technology Lanzhou China

ISBN: (数字)9798331517090

ISBN: (纸本)9798331517106

In this paper, a parallel wavelet transform method based on thermal radiation remote sensing bright temperature data is proposed, which is based on the thermal radiation remote sensing bright temperature data provided by the FY-2C satellite, and utilizes the thread pool to manage multiple threads and realize the asynchronous execution of the functions in the task queue, so as to realize the parallelization of the wavelet transform. Experiments show that the parallel algorithm is able to reduce the transform time by half compared with the common serial transform method, and it shows more obvious advantages when transforming large data volumes.

关键词： Wavelet transforms Temperature sensors Temperature distribution Satellites Thermal management parallel algorithms Artificial intelligence Remote sensing Faces

来源：评论

学校读者我要写书评

暂无评论

MESM: A Query-Agnostic and Memory-Efficient parallel Subgraph Matching Algorithm

MESM: A Query-Agnostic and Memory-Efficient Parallel Subgrap...

引用

IEEE Conference on High Performance Extreme Computing (HPEC)

作者： Shubhashish Kar Shaikh Arifuzzaman Department of Computer Science University of Nevada Las Vegas Las Vegas USA

ISBN: (数字)9798350387131

ISBN: (纸本)9798350387148

Subgraph matching, also known as motif finding, is a fundamental problem in graph analysis with extensive applications. However, identifying subgraphs in large-scale graphs is challenging due to its NP-Hard complexity. In addition to the time complexity, previous solutions often suffer from excessive memory usage when dealing with large-scale graphs. This issue is exacerbated in shared-memory systems, where memory is more limited compared to distributed settings. Therefore, achieving a balance between execution time and memory efficiency is vital in such environments. In this paper, we present a query-agnostic shared-memory parallel algorithm that incorporates ordering in set intersection, resulting in an 8% reduction in enumeration time for large graphs. Our approach also achieves memory usage reductions ranging from 2×to 8.2× compared to state-of-the-art techniques, while maintaining comparable runtime performance on large datasets. Extensive experiments with various query and graph datasets demonstrate improved scalability and effective workload balancing of our approach compared to other methods.

关键词： Runtime Scalability High performance computing Memory management Distance measurement Complexity theory parallel algorithms Time complexity

来源：评论

学校读者我要写书评

暂无评论

A parallel Mechanism for Fast Digital SIC in Full-Duplex ISAC systems

A Parallel Mechanism for Fast Digital SIC in Full-Duplex ISA...

引用

Signal, Information and Data Processing (ICSIDP), IEEE International Conference on

作者： Jie Yang Changhao Du Yingshen Zhu Hang Ruan Zhongshan Zhang School of Cyberspace Science and Technology Beijing Institute of Technology Beijing China Beijing Institute of Radio Measurement Beijing China

ISBN: (数字)9798331515669

ISBN: (纸本)9798331515676

A novel parallel mechanism for rapid digital-domain self-interference cancellation (SIC) in full-duplex (FD) integrated sensing and communication (ISAC) systems is proposed. The processing delay is minimized by employing a parallel cancellation architecture and substituting filtered sampling symbols with known modulation symbols, thus enabling effective and timely SIC for radar sensing. The proposed parallel SIC technique is presented through comprehensive system modeling, algorithm definition, feasibility assessment, numerical simulations, and experimental validations. The analysis shows that the proposed algorithm, with its high convergence speed, can effectively eliminate self-interference under severe conditions of self-interference and high-frequency variations, thereby enhancing the SIC capabilities of the full-duplex ISAC platform and contributing to the improvement of sensing performance.

关键词： Interference cancellation Symbols Modulation Full-duplex system Radar Systems modeling Integrated sensing and communication Numerical simulation parallel algorithms Convergence

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：