检索结果-内蒙古大学图书馆

IEEE MICROWAVE MAGAZINE 2013年第4期14卷 102-115页

作者： Kinayman, Noyan MIT Lincoln Lab Lexington MA 02173 USA

Scientific parallel programming has become mainstream in recent years by the introduction of high-performance graphics processing units (GPUs) that are specifically designed for numerical processing. In addition, freely available programming tools have made it possible for anyone who wants to leverage the processing power of GPUs to do so relatively easily. This article provides an introduction to parallel programming using GPUs with numerical examples demonstrating the speedup that can be obtained in a microwave engineering problem. All programming tools that are used in the article can be obtained free-of-charge from online resources. This accessibility is a tremendous benefit to engineers, students, and enthusiasts.

关键词： Graphics processing units parallel programming Numerical processing Microwave engineering

来源：评论

学校读者我要写书评

暂无评论

PARCS Technology: Concept and Implementations

引用

CYBERNETICS AND SYSTEMS ANALYSIS 2023年第5期59卷 832-843页

作者： Anisimov, A. V. Derevianchenko, O. V. Kuliabko, P. P. Fedorus, O. M. Taras Shevchenko Natl Univ Kyiv Kiev Ukraine

An overview of PARCS (parallel Asynchronous Recursive Control Space) technology developments is provided. The concept of the control space, i.e., a model apparatus, based on which the logical structure of the investigated problem (system) is described, and dynamic changes in it are reflected, is considered. The PARCS model is proposed whose application leads to flexible and unified adaptation to emerging programming technologies. PARCS-extension of the following programming languages is considered: PASCAL, C, FORTRAN, MODULA2, Java, CUDA, OpenCL, PYTHON, .NET, GO/PYTHON.

关键词： CS VPS PARCS distributed systems parallel programming programming languages cloud computing

来源：评论

学校读者我要写书评

暂无评论

Impact of Design Decisions on Performance of Embarrassingly parallel .NET Database Application

引用

VIETNAM JOURNAL OF COMPUTER SCIENCE 2025年第1期12卷 101-122页

作者： Karwaczynski, Piotr Sitko, Marcin Pietras, Sylwia Marczuk, Bogdan Wasielewski, Mariusz Kwiatkowski, Jan Fras, Mariusz Sygnity SA Strzegomska 140a PL-54429 Wroclaw Poland Wroclaw Univ Sci & Technol Wybrzeze Wyspianskiego 27 PL-50370 Wroclaw Poland

The implementation of parallel applications is always a challenge. It embraces many distinctive design decisions that are to be taken. The paper presents issues of parallel processing with use of .NET applications and popular Database Management Systems (DBMSes). In the paper, four design dilemmas are addressed: how efficient is the auto-parallelism implemented in the .NET TPL library, how do popular DBMSes differ in serving parallel requests, what is the optimal size of data chunks in the data parallelism scheme, and how the TPL auto-parallelism behaves in the public clouds. They are analyzed in the context of the typical and practical business case originated from IT solutions which are dedicated for the energy market participants. The paper presents the results of experiments conducted in a controlled, on-premises and cloud environments. The experiments allowed to compare the performance of the TPL auto-parallelism with a wide range of manually set numbers of worker threads. They also helped to evaluate four DBMSes: Oracle, MySQL, PostgreSQL, and MSSQL in the scenario of serving parallel queries. Finally, they showed the impact of data chunk sizes on the overall performance.

关键词： parallel programming TPL performance of processing data parallelism

来源：评论

学校读者我要写书评

暂无评论

A practical parallel preprocessing strategy for 3D numerical manifold method

引用

Science China(Technological Sciences) 2022年第12期65卷 2856-2865页

作者： YANG YongTao LI JunFeng State Key Laboratory of Geomechanics and Geotechnical Engineering Institute of Rock and Soil MechanicsChinese Academy of SciencesWuhan 430071China University of Chinese Academy of Sciences Beijing 100049China

Over the past three decades,the numerical manifold method(NMM)has attracted many researchers from geotechnical community because it unifies the solutions of continuous and discontinuous problems in the same ***,due to the lack of ready-made preprocessing tools,the development of three dimensional NMM(3DNMM)is still limited.A practical strategy to generate the discretized models for a 3DNMM analysis is *** the proposed strategy,regular hexahedral meshes are uniformly deployed to construct the mathematical cover *** physical meshes including the joints,material interfaces,and problem domain boundaries are adopted to cut the mathematical cover system into physical cover system and manifold elements(MEs).To improve the efficiency of the proposed strategy,the Intel threading building blocks(TBB)parallel library for CPU paralleling is *** typical examples are adopted to validate the proposed *** results show that the proposed strategy can effectively generate the discretized 3D models of some geotechnical problems for 3DNMM *** proposed strategy deserves a further investigation.

关键词： three-dimensional numerical manifold method 3D cutting parallel programming physical system encoding

来源：评论

学校读者我要写书评

暂无评论

Performance Portability of the Chapel Language on Heterogeneous Architectures

Performance Portability of the Chapel Language on Heterogene...

引用

1st International Conference on Smart Energy Systems and Artificial Intelligence (SESAI)

作者： Milthorpe, Josh Wang, Xianghao Azizi, Ahmad Oak Ridge Natl Lab POB 2009 Oak Ridge TN 37830 USA Australian Natl Univ Canberra ACT Australia

ISBN: (纸本)9798350364613;9798350364606

A performance-portable application can run on a variety of different hardware platforms, achieving an acceptable level of performance without requiring significant rewriting for each platform. Several performance-portable programming models are now suitable for high-performance scientific application development, including OpenMP and Kokkos. Chapel is a parallel programming language that supports the productive development of high-performance scientific applications and has recently added support for GPU architectures through native code generation. Using three mini-apps BabelStream, miniBUDE, and TeaLeaf we evaluate the Chapel language's performance portability across various CPU and GPU platforms. In our evaluation, we replicate and build on previous studies of performance portability using mini-apps, comparing Chapel against OpenMP, Kokkos, and the vendor programming models CUDA and HIP. We find that Chapel achieves comparable performance portability to OpenMP and Kokkos and identify several implementation issues that limit Chapel's performance portability on certain platforms.

关键词： performance portability Chapel language mini app parallel programming general-purpose GPU progranuning

来源：评论

学校读者我要写书评

暂无评论

RMASanitizer: Generalized Runtime Detection of Data Races in Remote Memory Access Applications 24

RMASanitizer: Generalized Runtime Detection of Data Races in...

引用

53rd International Conference on parallel Processing (ICPP)

作者： Schwitanski, Simon Oraji, Yussur Mustafa Paetzold, Cornelius Jenke, Joachim Tomski, Felix Mueller, Matthias S. Rhein Westfal TH Aachen High Performance Comp Aachen Germany

ISBN: (纸本)9798400717932

Remote Memory Access (RMA) programming models enable processes running on a distributed-memory computer to access and manipulate the memory of other processes directly. Such one-sided communication has the benefit that the receiving process is not actively involved in the communication compared to the classical two-sided message-passing model. The three programming models MPI RMA, OpenSHMEM, and GASPI provide such a communication scheme. However, RMA models require the developer to synchronize the accesses with corresponding API calls correctly. Concurrent modifications of the same (remote) memory location due to wrong or missing synchronization lead to data races. Such data races are undefined behavior and may result in non-deterministic failures of the program execution. This paper presents RMASanitizer, an on-the-fly race detector for MPI RMA, OpenSHMEM, and GASPI applications. It relies on a generalized race detection model independent of the concrete RMA programming model. RMASanitizer combines a dynamic on-the-fly analysis with a static analysis at compile-time that detects and instruments only relevant memory accesses. It is implemented as part of the MPI correctness checking framework MUST which we extended with support for OpenSHMEM and GASPI. We show that RMASanitizer can detect races in MPI RMA, OpenSHMEM, and GASPI applications with an accuracy of over 95 percent by running it on the data race benchmark suite RMARaceBench. On proxy applications, the slowdown for the execution with up to 700 processes ranges from 1.1x to 30x, depending on the application, showing that our tool is applicable in practice.

关键词： Remote Memory Access RMA Correctness Race Detection parallel programming MUST

来源：评论

学校读者我要写书评

暂无评论

A Study of Performance Portability in Plasma Physics Simulations 11th

A Study of Performance Portability in Plasma Physics Simulat...

引用

11th Latin American Conference on High Performance Computing

作者： Ruzicka, Josef Asch, Christian Meneses, Esteban Rampp, Markus Laure, Erwin Natl High Technol Ctr Natl Adv Comp Lab San Jose CA USA Costa Rica Inst Technol Sch Comp Cartago Costa Rica Max Planck Gesell Max Planck Comp & Data Facil Garching Germany

ISBN: (纸本)9783031800832;9783031800849

The high-performance computing (HPC) community has recently seen a substantial diversification of hardware platforms and their associated programming models. From traditional multicore processors to highly specialized accelerators, vendors and tool developers back up the relentless progress of those architectures. In the context of scientific programming, it is fundamental to consider performance portability frameworks, i.e., software tools that allow programmers to write code once and run it on different computer architectures without sacrificing performance. We report here on the benefits and challenges of performance portability using a field-line tracing simulation and a particle-incell code, two relevant applications in computational plasma physics with applications to magnetically-confined nuclear-fusion energy research. For these applications we report performance results obtained on four HPC platforms with server-class CPUs from Intel (Xeon) and AMD (EPYC), and high-end GPUs from Nvidia and AMD, including the latest Nvidia H100 GPU and the novel AMD Instinct MI300A APU. Our results show that both Kokkos and OpenMP are powerful tools to achieve performance portability and decent "out-of-the-box" performance, even for the very latest hardware platforms. For our applications, Kokkos provided performance portability to the broadest range of hardware architectures from different vendors.

关键词： parallel programming Performance Portability Plasma Physics

来源：评论

学校读者我要写书评

暂无评论

Recurrent Neural Network parallelization for Hate Messages Detection 6th

Recurrent Neural Network Parallelization for Hate Messages D...

引用

6th Conference on Research in Computer Science (CRI)

作者： Nguele, Thomas Messi Nzeko'o, Armel Jacques Nzekon Onana, Damase Donald Univ Yaounde I FS Comp Sci Dept Yaounde Cameroon Univ Ebolowa HITLC Comp Engn Dept Ebolowa Cameroon Sorbonne Univ IRD UMI 209 UMMISCO F-93143 Bondy France

ISBN: (纸本)9783031631092;9783031631108

Hate speech is a threat to democratic values, because it stimulates incitement to discrimination, which international law prohibits. To limit the harmful effects of this scourge, scientists often integrate into social network platforms models provided by deep learning algorithms allowing to detect and react automatically to a message with a hateful nature. One of the particularities of these algorithms is that they are so efficient as the amount of data used is large. However, sequential execution of these algorithms on large amounts of data can take a very long time. In this paper we first compared three variants of Recurrent Neural Network (RNN) to detect hate messages. We have shown that Long Short Time Memory (LSTM) provides better metric performance, but implies more important execution time in comparison with Gated Recurrent Unit (GRU) and standard RNN. To have both good metric performance and reduced execution time, we proceeded to a parallel implementation of the training algorithms. We proposed a parallel implementation based on an implicit aggregation strategy in comparison to the existing approach which is based on a strategy with an aggregation function. The experimental results on an 8-core machine at 2.20GHz show that better results are obtained with the parallelization strategy that we proposed. For the parallel implementation of an LSTM using the dataset obtained on kaggle, we obtained an f-measure of 0.70 and a speedup of 2.2 with our approach, compared to a f-measure of 0.65 and a speedup of 2.19 with an explicit aggregation strategy between workers.

关键词： Deep Learning Recurrent Neural Network hateful messages recognition system parallel programming

来源：评论

学校读者我要写书评

暂无评论

Scalable, Programmable and Dense: The HammerBlade Open-Source RISC-V Manycore 51

Scalable, Programmable and Dense: The HammerBlade Open-Sourc...

引用

ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)

作者： Jung, Dai Cheol Ruttenberg, Max Gao, Paul Davidson, Scott Petrisko, Daniel Li, Kangli Kamath, Aditya K. Cheng, Lin Xie, Shaolin Pan, Peitian Zhao, Zhongyuan Yue, Zichao Veluri, Bandhav Muralitharan, Sripathi Sampson, Adrian Lumsdaine, Andrew Zhang, Zhiru Batten, Christopher Oskin, Mark Richmond, Dustin Taylor, Michael Bedford Univ Washington Seattle WA 98195 USA Cornell Univ Ithaca NY USA PNNL Richland WA USA Univ Calif Santa Cruz Santa Cruz CA USA

ISBN: (纸本)9798350326598;9798350326581

Existing tiled manycore architectures propose to convert abundant silicon resources into general-purpose parallel processors with unmatched computational density and programmability. However, as we approach 100K cores in one chip, conventional manycore architectures struggle to navigate three key axes: scalability, programmability, and density. Many manycores sacrifice programmability for density;or scalability for programmability. In this paper, we explore HammerBlade, which simultaneously achieves scalability, programmability and density. HammerBlade is a fully open-source RISC-V manycore architecture, which has been silicon-validated with a 2048-core ASIC implementation using a 14/16nm process. We evaluate the system using a suite of parallel benchmarks that captures a broad spectrum of computation and communication patterns.

关键词： manycore architecture parallel programming open-source hardware RISC-V

来源：评论

学校读者我要写书评

暂无评论

Edge-parallel Graph Encoder Embedding

Edge-Parallel Graph Encoder Embedding

引用

1st International Conference on Smart Energy Systems and Artificial Intelligence (SESAI)

作者： Lubonja, Ariel Shen, Cencheng Priebe, Carey Burns, Randal Johns Hopkins Univ Dept Comp Sci Baltimore MD 21218 USA Univ Delaware Dept Appl Econ & Stat Newark DE USA Johns Hopkins Univ Dept Appl Math & Stat Baltimore MD USA

ISBN: (纸本)9798350364613;9798350364606

New algorithms for embedding graphs have reduced the asymptotic complexity of finding low-dimensional representations. One-Hot Graph Encoder Embedding (GEE) uses a single, linear pass over edges and produces an embedding that converges asymptotically to the spectral embedding. The scaling and performance benefits of this approach have been limited by a serial implementation in an interpreted language. We refactor GEE into a parallel program in the Ligra graph engine that maps functions over the edges of the graph and uses lock-free atomic instructions to prevent data races. On a graph with 1.86 edges, this results in a 500 times speedup over the original implementation and a 17 times speedup over a just-in-time compiled version.

关键词： graph embedding graph processing parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：