检索结果-内蒙古大学图书馆

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2013年 7850卷 501-502页

作者： Bocchino Jr., Robert L. Carnegie Mellon University United States

ISBN: (纸本)9783642369452

In recent years, the research community has made great strides in alias annotations that support parallel programming [1]. Using these techniques, programmers no longer have to guess where aliased mutable state may cause unintended data races or nondeterminism;instead, such problems can simply be eliminated, either at compile time or at runtime. This represents a major advance in the safety and reliability of parallel code. © Springer-Verlag Berlin Heidelberg 2013.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallel Computing of 3-D FEA Including Matrix Preconditioning for Analysis of Rotating Machines Coupled With Circuit Equations

引用

IEEE TRANSACTIONS ON MAGNETICS 2021年第6期57卷 1-4页

作者： Utsunomiya, Ryouma Yamazaki, Katsumi Chiba Inst Technol Dept Elect & Elect Engn Narashino Chiba 2750016 Japan

In this article, we propose a parallel computing method of 3-D finite-element analysis coupled with circuit equations for characteristic calculation of rotating machines. In the proposed method, the preconditioning part in the matrix solver is parallelized as well as the other part, in order to obtain the stable solution within short computational time. The proposed method is applied to the loss calculation of an interior permanent magnet synchronous motor fed by an inverter to clarify the advantages.

关键词： Eddy currents finite-element methods parallel programming permanent magnet motors

来源：评论

学校读者我要写书评

暂无评论

Optimizing the Cray Graph Engine for performant analytics on cluster, SuperDome Flex, Shasta systems and cloud deployment

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2024年第10期36卷 e7982-e7982页

作者： Rickett, Christopher D. Maschhoff, Kristyn J. Sukumar, Sreenivas R. Hewlett Packard Enterprise Spring TX 77389 USA

We present updates to the Cray Graph Engine, a high performance in-memory semantic graph database, which enable performant execution across multiple architectures as well as deployment in a container to support cloud and as-a-service graph analytics. This paper discusses the changes required to port and optimize CGE to target multiple architectures, including Cray Shasta systems, large shared-memory machines such as SuperDome Flex (SDF), and cluster environments such as Apollo systems. The porting effort focused primarily on removing dependences on XPMEM and Cray PGAS and replacing these with a simplified PGAS library based upon POSIX shared memory and one-sided MPI, while preserving the existing Coarray-C++ CGE code base. We also discuss the containerization of CGE using Singularity and the techniques required to enable container performance matching native execution. We present early benchmarking results for running CGE on the SDF, Infiniband clusters and Slingshot interconnect-based Shasta systems.

关键词： Cray Graph Engine graph analytics parallel programming pattern mining pattern search PGAS semantics

来源：评论

学校读者我要写书评

暂无评论

Towards Automatic Block Size Tuning for Image Processing Algorithms on CUDA

Towards Automatic Block Size Tuning for Image Processing Alg...

引用

17th International Conference on Software Technologies (ICSOFT)

作者： Guerfi, Imene Kriaa, Lobna Saidane, Leila Azouz Univ Manouba Natl Sch Comp Sci ENSI CRISTAL Lab RAMSIS Pole Manouba Tunisia

ISBN: (纸本)9789897585883

With the growing amount of data, computational power has became highly required in all fields. To satisfy these requirements, the use of GPUs seems to be the appropriate solution. But one of their major setbacks is their varying architectures making writing efficient parallel code very challenging, due to the necessity to master the GPU's low-level design. CUDA offers more flexibility for the programmer to exploit the GPU's power with ease. However, tuning the launch parameters of its kernels such as block size remains a daunting task. This parameter requires a deep understanding of the architecture and the execution model to be well-tuned. Particularly, in the Viola-Jones algorithm, the block size is an important factor that improves the execution time, but this optimization aspect is not well explored. This paper aims to offer the first steps toward automatically tuning the block size for any input without having a deep knowledge of the hardware architecture, which ensures the automatic portability of the performance over different GPUs architectures. The main idea is to define techniques on how to get the optimum block size to achieve the best performance. We pointed out the impact of using static block size for all input sizes on the overall performance. In light of the findings, we presented two dynamic approaches to select the best block size suitable to the input size. The first one is based on an empirical search;this approach provides the optimal performance;however, it is tough for the programmer, and its deployment is time-consuming. In order to overcome this issue, we proposed a second approach, which is a model that automatically selects a block size. Experimental results show that this model can improve the execution time by up to 2.5x over the static approach.

关键词： GPU Computing parallel programming Program Optimization Auto-tuning and Face Detection

来源：评论

学校读者我要写书评

暂无评论

SwarmL: A Language for programming Fully Distributed Intelligent Building Systems

引用

BUILDINGS 2023年第2期13卷 499页

作者： Chen, Wenjie Yang, Qiliang Jiang, Ziyan Xing, Jianchun Zhao, Shuo Zhou, Qizhen Han, Deshuai Feng, Bowei Army Engn Univ PLA Coll Def Engn Nanjing 211101 Peoples R China Tsinghua Univ Bldg Energy Res Ctr Beijing 100084 Peoples R China China Xian Satellite Control Ctr Xian 710043 Peoples R China Rocket Force Univ Engn Coll Combat Support Xian 710025 Peoples R China

Fully distributed intelligent building systems can be used to effectively reduce the complexity of building automation systems and improve the efficiency of the operation and maintenance management because of its self-organization, flexibility, and robustness. However, the parallel computing mode, dynamic network topology, and complex node interaction logic make application development complex, time-consuming, and challenging. To address the development difficulties of fully distributed intelligent building system applications, this paper proposes a user-friendly programming language called SwarmL. Concretely, SwarmL (1) establishes a language model, an overall framework, and an abstract syntax that intuitively describes the static physical objects and dynamic execution mechanisms of a fully distributed intelligent building system, (2) proposes a physical field-oriented variable that adapts the programming model to the distributed architectures by employing a serial programming style in accordance with human thinking to program parallel applications of fully distributed intelligent building systems for reducing programming difficulty, (3) designs a computational scope-based communication mechanism that separates the computational logic from the node interaction logic, thus adapting to dynamically changing network topologies and supporting the generalized development of the fully distributed intelligent building system applications, and (4) implements an integrated development tool that supports program editing and object code generation. To validate SwarmL, an example application of a real scenario and a subject-based experiment are explored. The results demonstrate that SwarmL can effectively reduce the programming difficulty and improve the development efficiency of fully distributed intelligent building system applications. SwarmL enables building users to quickly understand and master the development methods of application tasks in fully distributed intelligent

关键词： swarm intelligence fully distributed intelligent building system parallel programming domain-specific language

来源：评论

学校读者我要写书评

暂无评论

Towards Safer parallel STL Usage 16

Towards Safer Parallel STL Usage

引用

16th IEEE International Scientific Conference on Informatics, Informatics 2022

作者： Barth, Benjamin Szalay, Richard Porkolab, Zoltan Eötvös Loránd University Faculty of Informatics Budapest Hungary Eötvös Loránd University Department of Programming Languages and Compilers Budapest Hungary

ISBN: (纸本)9798350310344

Effective and safe parallel programming is among the biggest challenges of today's software technology. The C++ 17 standard introduced parallel STL: a set of overloaded functions taking an additional 'execution policy' parameter in the Algorithms chapter of the Standard library. During the years since its introduction, a few shortages of parallel STL have been revealed. While the Standard defines the semantics of the individual algorithms, adherence to their abstract requirements-e.g., absolutely no data races or deadlocks during the evaluation of a predicate or other customisation point-is up to the developer. Experience shows that programmers frequently make mistakes and write erroneous code, which is hard to debug. In this paper, we investigate some of the critical issues of the parallel STL library and suggest improvements to increase its safety. While a fully automatic detection of erroneous constructs is computationally infeasible to do, we introduce a framework with which the user will be able to indicate-axiomatically, based on absolute trust-that an operation has 'safe' properties, e.g., commutativity of certain functors. We implemented a prototype of the proposed framework to demonstrate its usability and effectiveness. © 2022 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Flexible task-DAG management in PHAST library: Data-parallel tasks and orchestration support for heterogeneous systems

Flexible task-DAG management in PHAST library: Data-parallel...

引用

作者： Peccerillo, Biagio Bartolini, Sandro Department of Information Engineering and Mathematical Sciences University of Siena Siena Italy

Heterogeneous architectures proved successful in achieving unprecedented performance and energy-efficiency. However, taking advantage of these diverse processing elements is still hard. Programmers need to code through the different approaches suitable for each target architecture and need to decide the distribution of activities on the different resources. The majority of current frameworks focuses on either performance or productivity. The former mainly provides low-level target-specific programming interfaces, and the latter offers high-level tools that often fail in achieving high-performance. In both cases, the design is usually data-parallel, as task-parallelism is not supported. In this work, we propose a task-based solution within the data-parallel heterogeneous single-source PHAST library. Tasks can be coded in a target-agnostic fashion, can be compiled and parallelized on multi-core CPUs and NVIDIA GPUs automatically and support the choice of the execution platform at runtime. We evaluate the capabilities of the proposed task-directed acyclic graph support in case of an extensive set of randomly generated task-based applications with different sizes and characteristics. We compare it against a SYCL implementation in terms of performance and complexity metrics, highlighting that PHAST achieves about 1.56× and 2.60× speedup over SYCL for multi-core CPU and GPU, respectively, while improving also code complexity metrics. © 2020 John Wiley & Sons, Ltd.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Improved Efficiency Adaptive Arithmetic Coding for GPU with Lower Shared Memory Usage 4

Improved Efficiency Adaptive Arithmetic Coding for GPU with ...

引用

4th International Conference on Frontiers Technology of Information and Computer, ICFTIC 2022

作者： Yan, Songsong Tong, Weiqin School of Computer Engineering and Science Shanghai University Shanghai China

ISBN: (纸本)9798350321951

Arithmetic coding (AC) is widely used for lossless data compression, and parallelization of arithmetic coding is relatively simple because all symbols can be encoded independently. On the other hand, parallel adaptive arithmetic coding stores the data model for each data block (the model contains the data structure used to calculate the probability distribution of the data) and these data models need to be frequently accessed and modified during encoding and decoding, so the access latency and computational complexity of the data model are important factors affecting performance. In this paper, we present a new data structure called partial prefix sum array which can quickly calculate the probability distribution of the data, so that parallel adaptive arithmetic encoding and decoding can be accelerated. Furthermore, we use a mode of coalescing access to access global memory, thereby improving the throughput of global memory. Experimental results for 6 files on NVIDIA Tesla M60 GPU show that our GPU adaptive arithmetic encoding and decoding run 1.61x- 2.75x times and 1.03x-2.22x times faster than previously presented GPU implementation, respectively. © 2022 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Efficient Transform Algorithms for parallel Ultra-Low-Power IoT End Nodes

引用

IEEE EMBEDDED SYSTEMS LETTERS 2021年第4期13卷 210-213页

作者： Mazzoni, Benedetta Benatti, Simone Benini, Luca Tagliavini, Giuseppe Univ Bologna Dept Elect Elect & Informat Engn I-40126 Bologna Italy Swiss Fed Inst Technol Dept Informat Technol & Elect Engn CH-8092 Zurich Switzerland Univ Bologna Dept Comp Sci & Engn I-40126 Bologna Italy

Modern Internet of Things (IoT) end nodes must support computational intensive workloads at a limited power budget. parallel ultra-low-power (PULP) architectures are a promising target for this scenario, and the availability of highly optimized software libraries is crucial to exploit parallelism and reduce software development costs. This letter proposes an efficient parallel design of the widely used short-time Fourier transform (STFT) and discrete wavelet transform (DWT) transforms targeting ultra-low-power IoT devices. We address key performance challenges related to fine-grained synchronization and banking conflicts in shared memory. We achieve high throughput (50.95 samples/mu s, on average), good parallel speedup (up to 6.79x), and high energy efficiency (up to 172.55 GOp/s/W) on a cluster of eight RISC-V cores optimized for PULP operation.

关键词： Discrete wavelet transform (DWT) Internet of Things (IoT) parallel programming short-time Fourier transform (STFT)

来源：评论

学校读者我要写书评

暂无评论

EBMAN-HP: A parallel model for simulation of sensor-based ebb-and-flow subirrigation systems

引用

AGRICULTURAL WATER MANAGEMENT 2023年 275卷

作者： Naghedifar, Seyed Mohammadreza Ziaei, Ali Naghi Ferdowsi Univ Mashhad FUM Coll Agr Dept Water Sci & Engn Mashhad *** Iran

Ebb-and-flow irrigation system is a closed-loop efficient subirrigation system. In this study, a numerical model (EBMAN-HP) has been presented for simulation of all components (variations of water depth in supply tank and concrete floor/tank) and all phases of flood-floor/bench ebb-and-flow subirrigation systems. The model benefits from a fine-tuned computational algorithm for hysteresis module. The model can simulate both time-specified and sensor-based irrigation scheduling. Since ebb-and-flow irrigation system incorporates numerous pots, Richards' equation should be solved for several pots to obtain sufficient understanding of the whole system. Therefore, the proposed model benefits from OpenMP parallel programming to speed up the execution time. Besides, a novel parallel TDMA solver have been presented that accelerates the computation speed by breaking a large system of equations into several simultaneously-solved portions. The model has been validated and verified against several analytical, numerical and experimental test cases. The results showed hysteresis module can completely remove artificial pumping error in two critical test cases. The parallel TDMA solver was shown to be able to reach to the speedup of about 90 %. The model was shown to perform faster than Hydrus-1D even in serial mode for coarser grids (about 52 % faster in average of 8 test cases) and similar to Hydrus-1D for dense grids (about 6 % faster in average of 4 test cases) with the perfect agreement (NSE between 0.999 and 1.000 and the average difference in MBE less than 0.1 % for 12 cases). parallel model could boost the models' performance to about 500 % using 6 processors. Finally, comprehensive illustrative example has been shown to present almost all capabilities of model.

关键词： Richards? equation parallel programming Hysteresis Irrigation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：