检索结果-内蒙古大学图书馆

5th IEEE Student Conference on Electric Machines and Systems, SCEMS 2022

作者： Zhang, Zhe He, Shaomin Chen, Jian Yang, Huan Zhao, Rongxiang Yang, Ruoyan Zhejiang University College of Electrical Engineering Hangzhou China Shenzhen Costoms District Shenzhen China

ISBN: (数字)9781665476898

ISBN: (纸本)9781665476898

HIL (Hardware in Loop) is an efficient and convenient tool for the test and verification of electrical drive system which requires high reliability and safety. With the application of high-frequency SiC inverter, the time scale is reduced below μs-level, as a result, the computation speed becomes an unignorable challenge for HIL system because of the strict real-time constraint. Under these circumstances, the development and applications of hardware with parallel process structure such as FPGA is a solution for computation acceleration on the hardware level. On the software level, unfortunately, most of the state-of-the-art numerical models and algorithms are based on sequential mechanism, as a result, the parallel algorithm is demanded to achieve further acceleration. This paper proposes a general method based on computation front to design parallel algorithm for real-time simulation. By analyzing numerical flow diagram, numerical integration method is optimized to realize acceleration derived from parallel segmentation. Simulation results show that good acceleration effect without great accuracy reduction is achieved. © 2022 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A Hybrid parallel Algorithm With Multiple Improved Strategies 12th

A Hybrid Parallel Algorithm With Multiple Improved Strategie...

引用

12th IFIP TC 12 International Conference on Intelligent Information Processing, IIP 2022

作者： Wang, Tingting Pan, Jeng-Shyang Song, Pei-Cheng Chu, Shu-Chuan College of Computer Science and Engineering Shandong University of Science and Technology Qingdao China Department of Information Management Chaoyang University of Technology Taichung Taiwan College of Science and Engineering Flinders University Bedford Park Australia

ISBN: (纸本)9783031039478

This paper proposes a novel hybrid parallel algorithm with multiple improved strategies. The whole population is divided into three subpopulations and each sub-population executes butterfly optimization algorithm, grey wolf optimization algorithm, and marine predator algorithm respectively. Meanwhile, they share information through three different communication strategies. And in order to improve the performance of the algorithm, the text uses the cubic chaotic mapping mechanism in the initialization stage. At the same time, the idea of adaptive parameter strategy is also introduced, so that some hyperparameters are changed along with the iteration. The results show that the algorithm can provide very competitive results, and is superior to the best algorithm in the literature on most test functions. © 2022, IFIP International Federation for Information Processing.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Research on parallel Algorithm of High-Power Microwave Device Simulation Based on MSMPI 23

Research on Parallel Algorithm of High-Power Microwave Devic...

引用

23rd International Vacuum Electronics Conference, IVEC 2022

作者： Hu, Yulan Liu, Dagang Liu, Laqun Wang, Huihui School of Electronic Science and Engineering University of Electronic Science and Technology of China Chengdu610054 China

ISBN: (纸本)9781665443258

The improvement of parallel computing efficiency is of great significance to the development and research of particle simulation theory. This article first compares the communication efficiency of MSMPI and MPICH2. The electromagnetic particle-in-cell simulation parallel algorithm and dynamic load balancing algorithm based on MSMPI shared memory characteristics are designed. Finally, the parallel test of RKA and MILO was realized by the particle simulation software CHIPIC-3D. The test results show that the two algorithms can significantly improve the efficiency of parallel computing. © 2022 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Improving the efficiency of sparse matrix class processing by using the SPM-CSR parallel algorithm and OpenMP technology 4

Improving the efficiency of sparse matrix class processing b...

引用

4th International Youth Conference on Radio Electronics, Electrical and Power Engineering, REEPE 2022

作者： Al-Saeedi, Adnan Adhab K. Yurievna, Shamaeva Olga Khairi, Teaba W. National Research University 'MPEI' Department of Applied Mathematics and Artificial Intelligence Moscow Russia Al-Furat AL-Awast Technical University Babylon Technical Institute Kufa Iraq University of Technology-Iraq Computer Science Dept Iraq

ISBN: (纸本)9781665414340

The development of computationally efficient algorithms and the improvement of their software implementation are urgent issues that require continuous attention due to the ongoing development of computer system architecture. In multithreading systems bad coordination cause a big problems and give unwilling results. The purpose of the work is to improve performance of the parallel algorithm for sparse matrix multiplication in CSR format by using vector (SpMV-CSR) using OpenMP technology. Using OpenMP technology and reduction directive to solve sparse matrix time execution and bad coordination problem leads to execution time reduction and speed-up increase. Results obtained an acceleration increase of 14% when using reduction. © 2022 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Work-Stealing Prefix Scan: Addressing Load Imbalance in Large-Scale Image Registration

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2022年第3期33卷 523-535页

作者： Copik, Marcin Grosser, Tobias Hoefler, Torsten Bientinesi, Paolo Berkels, Benjamin Swiss Fed Inst Technol Dept Comp Sci CH-8092 Zurich Switzerland Univ Edinburgh Sch Informat Edinburgh EH8 9YL Midlothian Scotland Umea Univ Dept Comp Sci S-90736 Umea Sweden Rhein Westfal TH Aachen Inst Geometry & Pract Math AICES Grad Sch D-52062 Aachen Germany

parallelism patterns (e.g., map or reduce) have proven to be effective tools for parallelizing high-performance applications. In this article, we study the recursive registration of a series of electron microscopy images - a time consuming and imbalanced computation necessary for nano-scale microscopy analysis. We show that by translating the image registration into a specific instance of the prefix scan, we can convert this seemingly sequential problem into a parallel computation that scales to over thousand of cores. We analyze a variety of scan algorithms that behave similarly for common low-compute operators and propose a novel work-stealing procedure for a hierarchical prefix scan. Our evaluation shows that by identifying a suitable and well-optimized prefix scan algorithm, we reduce time-to-solution on a series of 4,096 images spanning ten seconds of microscopy acquisition from over 10 hours to less than 3 minutes (using 1024 Intel Haswell cores), enabling derivation of material properties at nanoscale for long microscopy image series.

关键词： Microscopy Image registration Heuristic algorithms Scanning electron microscopy Load management Standards parallel algorithms Prefix sum parallel algorithms work stealing load balancing image registration

来源：评论

学校读者我要写书评

暂无评论

Exploiting soft constraints within decomposition and coordination methods for sub-hourly unit commitment

引用

INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS 2022年 139卷 108023-108023页

作者： Raghunathan, Niranjan Bragin, Mikhail A. Yan, Bing Luh, Peter B. Moslehi, Khosrow Feng, Xiaoming Yu, Yaowen Yu, Chien-Ning Tsai, Chia-Chun Univ Connecticut Dept Elect & Comp Engn Storrs CT 06269 USA Rochester Inst Technol Dept Elect & Microelect Engn Rochester NY 14623 USA Hitachi ABB Power Grids San Jose CA 95134 USA Huazhong Univ Sci & Technol Sch Artificial Intelligence & Automation Wuhan 430074 Peoples R China

Unit commitment (UC) is an important problem solved on a daily basis within a strict time limit. While hourly UC is currently used, they may not be flexible enough to accommodate the growing changes of demand and the increasing penetration of intermittent renewables. Sub-hourly UC is therefore recommended. This, however, will significantly increase problem complexity even under the deterministic setting because of the considerable increase of the number of intervals, leading to the drastic increase of the numbers of system coupling constraints and binary variables as compared to that of hourly UC. Consequently, existing methods may not be able to obtain good solutions within the time limit for large problems. In this paper, deterministic sub-hourly UC is considered with an innovative exploitation of "soft constraints" constraints that do not need to be strictly satisfied but their violations are penalized by predetermined coefficients. This, in conjunction with our recent "Surrogate Absolute Value Lagrangian Relaxation" approach where the "relaxed problem" is not required to be fully optimized, facilitates the formation and resolution of a new type of subproblems where soft system coupling constraints (e.g., reserve and transmission capacity constraints) are not relaxed. This then leads to a drastic reduction of the number of multipliers, decreased computational requirements, and improved solution quality. To further enhance the speed, a parallel version is developed. Testing results based on the Polish system demonstrate the effectiveness and robustness of both the sequential and parallel versions at finding high-quality solutions within the time limit.

关键词： Sub-hourly unit commitment Soft constraints Surrogate Lagrangian Relaxation (SLR) Surrogate Absolute-Value Lagrangian Relaxa-tion (SAVLR) parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Optimization of a GMRES-based parallel algorithm for numerical simulation of multicomponent multiphase flow in porous media

Optimization of a GMRES-based parallel algorithm for numeric...

引用

European Conference on the Mathematics of Geological Reservoirs 2022, ECMOR 2022

作者： Kassymbek, N. Lebedev, D. Akhmed-Zaki, D. Kazakh National University Kazakhstan Astana IT University Kazakhstan

ISBN: (纸本)9789462824263

The actual task of petroleum geophysics is to solve the problem of multicomponent multiphase flow in a porous medium. At the same time, the development of effective parallel algorithms is an urgent task for modeling processes of this type. For these purposes, the authors carried out the following works. The problem of modeling the movement of a multicomponent multiphase liquid in a porous medium was solved. The equations describing fluid movement were linearized by the Newton-Raphson method and the resulting system of linear equations was solved using the generalized minimal residuals method (GMRES) with the Incomplete Lower-Upper factorization (ILU(0)) preconditioner. As a result, the problem was solved by the developed algorithm called Newton-ILU(0)-GMRES. A parallel program using the MPI standard and a fragmented program in the LuNA language for shared memory systems were developed and implemented. After analyzing the results of the previous work, it was decided to optimize the parallel GMRES algorithm, since the parallel program was not effective enough. Bottlenecks were identified, in particular Arnoldi orthogonalization, which negatively affected the effectiveness of the parallel program. An optimized version of the GMRES parallel algorithm was developed and implemented, which avoids the loss of parallel execution efficiency on Arnoldi orthogonalization. The developed program was tested and the results were analyzed. Comparisons were made with the previous version of the developed method. © 2022 European Conference on the Mathematics of Geological Reservoirs 2022, ECMOR 2022. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A Native Tensor-Vector Multiplication Algorithm for High Performance Computing

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2022年第12期33卷 3363-3374页

作者： Martinez-Ferrer, Pedro J. Yzelman, A. N. Beltran, Vicenc Barcelona Supercomp Ctr BSC Barcelona 08034 Spain Univ Politecn Catalunya UPC Barcelona 08034 Spain Huawei Technol Switzerland AG Comp Syst Lab CH-3097 Zurich Switzerland

Tensor computations are important mathematical operations for applications that rely on multidimensional data. The tensor-vector multiplication (TVM) is the most memory-bound tensor contraction in this class of operations. This article proposes an open-source TVM algorithm which is much simpler and efficient than previous approaches, making it suitable for integration in the most popular BLAS libraries available today. Our algorithm has been written from scratch and features unit-stride memory accesses, cache awareness, mode obliviousness, full vectorization and multi-threading as well as NUMA awareness for non-hierarchically stored dense tensors. Numerical experiments are carried out on tensors up to order 10 and various compilers and hardware architectures equipped with traditional DDR and high bandwidth memory (HBM). For large tensors the average performance of the TVM ranges between 62% and 76% of the theoretical bandwidth for NUMA systems with DDR memory and remains independent of the contraction mode. On NUMA systems with HBM the TVM exhibits some mode dependency but manages to reach performance figures close to peak values. Finally, the higher-order power method is benchmarked with the proposed TVM kernel and delivers on average between 58% and 69% of the theoretical bandwidth for large tensors.

关键词： Tensors Kernel Libraries Bandwidth Virtual machine monitors Layout Benchmark testing parallel algorithms shared memory tensor computations high bandwidth memory NUMA

来源：评论

学校读者我要写书评

暂无评论

Triangle Counting Through Cover-Edges

Triangle Counting Through Cover-Edges

引用

IEEE High Performance Extreme Computing Virtual Conference (HPEC)

作者： Bader, David A. Li, Fuhuan Ganeshan, Anya Gundogdu, Ahmet Lew, Jason Rodriguez, Oliver Alvarado Du, Zhihui New Jersey Inst Technol Dept Data Sci Newark NJ 07102 USA

ISBN: (纸本)9798350308600

Counting and finding triangles in graphs is often used in real-world analytics to characterize cohesiveness and identify communities in graphs. In this paper, we propose the novel concept of a cover-edge set that can be used to find triangles more efficiently. We use a breadth-first search (BFS) to quickly generate a compact cover-edge set. Novel sequential and parallel triangle counting algorithms are presented that employ cover-edge sets. The sequential algorithm avoids unnecessary triangle-checking operations, and the parallel algorithm is communication-efficient. The parallel algorithm can asymptotically reduce communication on massive graphs such as from real social networks and synthetic graphs from the Graph500 Benchmark. In our estimate from massive-scale Graph500 graphs, our new parallel algorithm can reduce the communication on a scale 36 graph by 1156x and on a scale 42 graph by 2368x.

关键词： Graph algorithms Triangle Counting parallel algorithms High Performance Data Analytics

来源：评论

学校读者我要写书评

暂无评论

Fast, parallel, and cache-friendly suffix array construction

引用

algorithms FOR MOLECULAR BIOLOGY 2024年第1期19卷 16页

作者： Khan, Jamshed Rubel, Tobias Molloy, Erin Dhulipala, Laxman Patro, Rob Univ Maryland Dept Comp Sci College Pk MD 20742 USA

Purpose String indexes such as the suffix array (sa) and the closely related longest common prefix (lcp) array are fundamental objects in bioinformatics and have a wide variety of applications. Despite their importance in practice, few scalable parallel algorithms for constructing these are known, and the existing algorithms can be highly non-trivial to implement and *** In this paper we present caps-sa, a simple and scalable parallel algorithm for constructing these string indexes inspired by samplesort and utilizing an LCP-informed mergesort. Due to its design, caps-sa has excellent memory-locality and thus incurs fewer cache misses and achieves strong performance on modern multicore systems with deep cache *** We show that despite its simple design, caps-sa outperforms existing state-of-the-art parallel sa and lcp-array construction algorithms on modern hardware. Finally, motivated by applications in modern aligners where the query strings have bounded lengths, we introduce the notion of a bounded-context sa and show that caps-sa can easily be extended to exploit this structure to obtain further speedups. We make our code publicly available at https://***/jamshed/CaPS-SA.

关键词： Suffix array Longest common prefix Data structures Indexing parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：