检索结果-内蒙古大学图书馆

A case study on expressiveness and performance of component-oriented parallel programming

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2013年第5期73卷 557-569页

作者： de Carvalho Junior, Francisco Heron de Rezende, Cenez Araujo Univ Fed Ceara Dept Comp Fortaleza Ceara Brazil

Component-oriented programming has been applied to address the requirements of large-scale applications from computational sciences and engineering that present high performance computing (HPC) requirements. However, parallelism continues to be a challenging requirement in the design of CBHPC (Component-Based High Performance Computing) platforms. This paper presents strong evidence about the efficacy and the efficiency of HPE (Hash programming Environment), a CBHPC platform that provides full support for parallel programming, on the development, deployment and execution of numerical simulation code onto cluster computing platforms. (C) 2012 Elsevier Inc. All rights reserved.

关键词： High performance computing Component-based software engineering parallel programming Performance evaluation

来源：评论

学校读者我要写书评

暂无评论

A type system to avoid runtime errors for Multi-ML 21

A type system to avoid runtime errors for Multi-ML

引用

21st IEEE International Symposium on parallel and Distributed Computing (ISPDC)

作者： Gava, Frederic Allombert, Victor Tesson, Julien Univ Paris Est UPEC LACL Creteil France

ISBN: (数字)9781665488020

ISBN: (纸本)9781665488020

programming parallel architectures using a hierarchical point of view is becoming today's standard as machines are structured by multiple layers of memories. To handle such architectures, we focus on the MULTI-BSP bridging model. This model extends BSP and proposes a structured way of programming multi-level architectures. In the context of parallel programming we, now need to manage new concerns such as memory coherency, deadlocks and safe data communications. To do so, we propose a typing system for MULTI-ML, a ML-like programming language based on the MULTI-BSP model. This type system introduces data locality using type annotations and effects to be able to detected wrong uses of multi-level architectures. We thus ensure that "Well-typed programs cannot go wrong" on hierarchical architectures.

关键词： Type safety parallel programming MULTI-BSP

来源：评论

学校读者我要写书评

暂无评论

A parallel Implementation of the Triangular Shepard Interpolation Method 30

A Parallel Implementation of the Triangular Shepard Interpol...

引用

30th Euromicro International Conference on parallel, Distributed and Network-Based Processing (PDP)

作者： Dell'Accio, Francesco Di Tommaso, Filomena Giordano, Andrea Rongo, Rocco Spataro, William Univ Calabria Dept Math & Comp Sci Calabria Italy ICAR CNR Arcavacata Di Rende Italy

ISBN: (纸本)9781665469586

The triangular Shepard interpolation method is an extension of the well-known bivariate Shepard's method for interpolating large sets of scattered data. In particular, the classical point-based weight functions are substituted by basis functions built upon triangulation of the scattered points. As shown in the literature, this method exhibits advantages with respect to other interpolation methods for interpolating scattered bivariate data. Nevertheless, as the size of the data set increases, an efficient implementation of the method becomes more and more necessary. In this paper, we present a parallel implementation of the triangular Shepard interpolation method that beside exploiting benefits due to the parallelization itself, introduces a novel approach for the triangulation of the scattered data.

关键词： Shepard Method Bivariate Interpolation parallel programming

来源：评论

学校读者我要写书评

暂无评论

SpDISTAL: Compiling Distributed Sparse Tensor Computations

SpDISTAL: Compiling Distributed Sparse Tensor Computations

引用

International Conference for High Performance Computing, Networking, Storage and Analysis (HPC)

作者： Yadav, Rohan Aiken, Alex Kjolstad, Fredrik Stanford Univ Stanford CA 94305 USA

ISBN: (纸本)9781665454445

We introduce SpDISTAL, a compiler for sparse tensor algebra that targets distributed systems. SpDISTAL combines separate descriptions of tensor algebra expressions, sparse data structures, data distribution, and computation distribution. Thus, it enables distributed execution of sparse tensor algebra expressions with a wide variety of sparse data structures and data distributions. SpDISTAL is implemented as a C++ library that targets a distributed task-based runtime system and can generate code for nodes with both multi-core CPUs and multiple GPUs. SpDISTAL generates distributed code that achieves performance competitive with hand-written distributed functions for specific sparse tensor algebra expressions and that outperforms general interpretation-based systems by one to two orders of magnitude. © 2022 IEEE.

关键词： Computer Science programming parallel programming

来源：评论

学校读者我要写书评

暂无评论

Novel Approach to Minimize the Memory Requirements of Frequent Subgraph Mining Techniques

引用

Chinese Journal of Electronics 2021年第2期30卷 258-267页

作者： B?LG?N Turgay Tugay O?UZ Murat Department of Computer Engineering Bursa Technical University Microsoft Corporation Turkey Branch

Frequent subgraph mining(FSM) is a subset of the graph mining domain that is extensively used for graph classification and clustering. Over the past decade, many efficient FSM algorithms have been developed with improvements generally focused on reducing the time complexity by changing the algorithm structure or using parallel programming techniques. FSM algorithms also require high memory consumption, which is another problem that should be solved. In this paper, we propose a new approach called Predictive dynamic sized structure packing(PDSSP) to minimize the memory needs of FSM algorithms. Our approach redesigns the internal data structures of FSM algorithms without making algorithmic modifications. PDSSP offers two contributions. The first is the Dynamic Sized Integer Type, a newly designed unsigned integer data type, and the second is a data structure packing technique to change the behavior of the compiler. We examined the effectiveness and efficiency of the PDSSP approach by experimentally embedding it into two state-of-the-art algorithms, g Span and *** compared our implementations to the performance of the originals. Nearly all results show that our proposed implementation consumes less memory at each support level, suggesting that PDSSP extensions could save memory, with peak memory usage decreasing up to 38% depending on the dataset.

关键词： data structures algorithmic modifications high memory consumption graph mining domain Dynamic Sized Integer Type Predictive dynamic sized structure packing internal data structures graph classification parallel programming peak memory usage memory requirements PDSSP extensions finite state machines graph theory PDSSP approach algorithm structure newly designed unsigned integer data type data mining data structure packing technique efficient FSM algorithms clustering frequent subgraph mining techniques

来源：评论

学校读者我要写书评

暂无评论

Peachy parallel Assignments (EduPar 2022) 36

Peachy Parallel Assignments (EduPar 2022)

引用

36th IEEE International parallel and Distributed Processing Symposium (IEEE IPDPS)

作者： Buecker, H. Martin Casanova, Henri da Silva, Rafael Ferreira Lasserre, Alice Luyen, Derrick Namyst, Raymond Schoder, Johannes Wacrenier, Pierre-Andre Bunde, David P. Friedrich Schiller Univ Jena Inst Comp Sci Jena Germany Univ Hawaii Informat & Comp Sci Honolulu HI 96822 USA Oak Ridge Natl Lab Natl Ctr Computat Sci Oak Ridge TN USA Univ Bordeaux Comp Sci Dept Inria Bordeaux Sud Ouest Talence France Knox Coll Comp Sci Dept Galesburg IL USA

ISBN: (纸本)9781665497473

The presentation of Peachy parallel Assignments in several workshops on parallel and distributed computing education aims to promote the reuse of highquality assignments, both saving precious faculty time and improving the quality of course assignments. Presented assignments are selected competitively- they must have been successfully used in a real classroom, be easy for other instructors to adopt, and be "cool and inspirational" to encourage students to spend time on them and talk about them with others. Winning assignments are also archived on the Peachy parallel Assignments website. In this installment of Peachy parallel Assignments, we present three new assignments. The first assignment is to simulate an Abelian Sandpile, with grains of sand moving from tall piles to shorter ones. This is a discrete simulation that creates colorful and intricate images. The second assignment is a Big Data problem in which students use the MapReduce paradigm to recreate "Warming Stripes", a visualization of climate data that highlights climate change. The third assignment introduces climate-oriented optimization by asking students to schedule distributed workflows to minimize their carbon footprint.

关键词： Peachy parallel Assignments parallel computing education High-Performance Computing education parallel programming Curriculum Development Abelian Sandpile parallel Simulation MapReduce Big Data Warming Stripes Distributed Workflow Scheduling Carbon Footprint

来源：评论

学校读者我要写书评

暂无评论

Hunting the pertinency of hash and bloom filter combinations on GPU for fast pattern matching

引用

International Journal of Information Technology (Singapore) 2022年第5期14卷 2667-2679页

作者： Bhat, Radhakrishna Thilak, Reddy Kanala Vaibhav, Reddy Panyala Department of Computer Science and Engineering Manipal Institute of Technology Manipal Academy of Higher Education Karnataka Manipal 576104 India

There has been rapid growth in the field of graphical processing unit (GPU) programming due to the drastic increase in the computing hardware manufacturing. The technology used in these devices is now more affordable and accessible to the general public. With this growth, many serial programming applications that are now being transformed into more efficient parallel programming applications with significant improvement in the performance. The best example for this is parallel implementation of the probabilistic data structure Bloom filter in set membership queries. However, despite of it’s remarkable performance in speed and memory usage, there is a computational overhead in the calculation of hashes in Bloom filter. In this paper, the impact of the choice of hash functions on the qualitative properties of the Bloom filter has been experimentally recorded and the results show that there is a possibility of large performance gap among various hash functions. We have implemented the Bloom filter based pattern matching technique on GPU using compute unified device architecture (CUDA) and benchmark the performance of several cryptographic and non-cryptographic hash functions. © 2022, The Author(s).

关键词： Bloom filter Graphical processing unit parallel computing parallel programming Pattern matching

来源：评论

学校读者我要写书评

暂无评论

Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow 13

Improving performance of SYCL applications on CPU architectu...

引用

13th International Workshop on programming Models and Applications for Multicores and Manycores (PMAM) part of PPoPP Conference

作者： Ghiglio, Pietro Dolinsky, Uwe Goli, Mehdi Narasimhan, Kumudha Codeplay Software Ltd Edinburgh Scotland

ISBN: (纸本)9781450393393

The wide adoption of SYCL as an open-standard API for accelerating C++ software in domains such as HPC, Automotive, Artificial Intelligence, Machine Learning, and other areas necessitates efficient compiler and runtime support for a growing number of different platforms. Existing SYCL implementations provide support for various devices like CPUs, GPUs, DSPs, FPGAs, etc, typically via OpenCL or CUDA backends. While accelerators have increased the performance of user applications significantly, employing CPU devices for further performance improvement is beneficial due to the significant presence of CPUs in existing datacenters. SYCL applications on CPUs, currently go through an OpenCL backend. Though an OpenCL backend is valuable in supporting accelerators, it may introduce additional overhead for CPUs since the host and device are the same. Overheads like a run-time compilation of the kernel, transferring of input/output memory to/from the OpenCL device, invoking the OpenCL kernel, may not be necessary when running on the CPU. While some of these overheads (such as data transfer) can be avoided by modifying the application, it can introduce disparity in the SYCL application's ability to achieve performance portability on other devices. In this paper, we propose an alternate approach to running SYCL applications on CPUs. We bypass OpenCL and use a CPU-directed compilation flow, along with the integration of Whole Function Vectorization to generate optimized host and device code together in the same translation unit. We compare the performance of our approach - the CPU-directed compilation flow, with an OpenCL backend for existing SYCL-based applications, with no code modification. We run experiments across various CPU architectures to attest to the efficacy of our proposed approach.

关键词： SYCL parallel programming multi-cores software acceleration portability standards compiler optimizations

来源：评论

学校读者我要写书评

暂无评论

Exploring parallelism of a BRDF algorithm using CUDA

Exploring Parallelism of a BRDF algorithm using CUDA

引用

International Conference on Electronics, Information, and Communication (ICEIC)

作者： Yi, Hyuck Baek, SunHo Kim, JunSeong Chung Ang Univ Sch Elect & Elect Engn Seoul South Korea

ISBN: (纸本)9781665409346

While parallel hardware has become common, most typical engineers and scientists tend to follow the traditional single-core processing approach causing major drawbacks in their developments. In this study, to fully utilize the computing power we present practical parallelization approaches for typical engineers. We implement a BRDF estimation algorithm pursuing parallelism at various levels using CUDA. Experiments with a set of real environmental data show that even a simple parallelization can drastically improve performance: the speedup is between 4.93 and 64.10 depending on the approach in parallelization and the problem size. Little efforts in parallel programming can bring efficient computing.

关键词： parallel programming BRDF CUDA

来源：评论

学校读者我要写书评

暂无评论

Numerical efficiency of CUDA based parallel programming for dynamic analysis of multi-body systems with multi-joints and multi-force elements

引用

JOURNAL OF MECHANICAL SCIENCE AND TECHNOLOGY 2013年第12期27卷 3565-3570页

作者： Jun, Chul-Woong Sohn, Jeong-Hyun Pukyong Natl Univ Grad Sch Mechatron Engn Pusan 608739 South Korea Pukyong Natl Univ Dept Mech & Automot Engn Pusan 608739 South Korea

The graphic processor unit (GPU) is an ideal solution to problems involving parallel data computations. A serial CPU-based program of dynamic analysis for multi-body systems is rebuilt as a parallel program that uses the GPU's advantages. We developed an analysis code named GMAP to investigate how the dynamic analysis algorithm of multi-body systems is implemented in the GPU parallel programming. The numerical accuracy of GMAP is compared with the commercial program MSC/ADAMS. The numerical efficiency of GMAP is compared with the sequential CPU-based program. Multiple pendulums with bodies and joints and the net-shape system with bodies and spring-dampers are employed for computer simulations. The simulation results indicate that the accuracy of GMAP's solution is the same as that of ADAMS. In the net type system that has 2370 spring-dampers, GMAP indicates an improved efficiency of about 566.7 seconds (24.7% improvement). It is noted that the larger the size of the system, the better the time efficiency.

关键词： Graphic processor unit parallel programming Multi-body dynamics Computer simulation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：