检索结果-内蒙古大学图书馆

30th IEEE international parallel and Distributed Processing symposium (IPDPS)

作者： Booth, Joshua Dennis Kim, Kyungjoo Rajamanickam, Sivasankaran Sandia Natl Labs Ctr Res Comp POB 5800 Albuquerque NM 87185 USA

ISBN: (纸本)9781509036820

All many-core systems require fine-grained shared memory parallelism, however the most efficient way to extract such parallelism is far from trivial. Fine-grained parallel algorithms face various performance trade-offs related to tasking, accesses to global data-structures, and use of shared cache. While programming models provide high level abstractions, such as data and task parallelism, algorithmic choices still remain open on how to best implement irregular algorithms, such as sparse factorizations, while taking into account the trade-offs mentioned above. In this paper, we compare these performance trade-offs for task and data parallelism on different hardware architectures such as Intel Sandy Bridge, Intel Xeon Phi, and IBM Power8. We do this by comparing the scaling of a new task-parallel incomplete sparse Cholesky factorization called Tacho and a new data-parallel incomplete sparse LU factorization called Basker. Both solvers utilize Kokkos programming model and were developed within the ShyLU package of Trilinos. Using these two codes we demonstrate how high-level programming changes affect performance and overhead costs on multiple multi/many-core systems. We find that Kokkos is able to provide comparable performance with both parallel_for and task/futures on traditional x86 multicores. However, the choice of which high-level abstraction to use on many-core systems depends on both the architectures and input matrices.

关键词： Performance programming Models Sparse Factorizations Sparse Linear Algebra

来源：评论

学校读者我要写书评

暂无评论

Heterogeneous computing

引用

parallel COMPUTING 2005年第7期31卷 649-652页

作者： Kalinov, A Lastovetsky, A Robert, Y Russian Acad Sci Inst Syst Programming Moscow 109004 Russia Natl Univ Ireland Univ Coll Dublin Dept Comp Sci Dublin 4 Ireland Ecole Normale Super Lyon INRIA CNRS UMR 5668Lab LIP F-69364 Lyon France

This special issue on heterogeneous computing is a follow-on of two well established workshops in the domain, namely HCW, the IEEE Heterogeneous Computing Workshop (held in Santa Fe in April 2004, in conjunction with IPDPS) and HeteroPar, the international Workshop on algorithms, Models and Tools for parallel Computing on Heterogeneous Networks (held in Cork in July 2004, in conjunction with ISPDC). Networks of computers are the most commonly available parallel architecture now. Unlike dedicated parallel computer systems, networks are inherently heterogeneous. They consist of diverse computers of different performances interconnected via mixed network equipments providing communication links of different speeds and bandwidths. Traditional parallel algorithms and tools are aimed at homogeneous multiprocessors and cannot be efficiently used for parallel computing on heterogeneous networks. New ideas, dedicated algorithms, and tools are needed to efficiently use this new type of parallel architectures.

关键词： parallel architectures EQUIPMENT Computers cork algorithms parallel PROCESSING (COMPUTERS) hybrid network Heterogeneous networks CONJUNCTION Heterogeneous

来源：评论

学校读者我要写书评

暂无评论

An implementation of GPU-based parallel optimization for an extended uncertain data query algorithm

An implementation of GPU-based parallel optimization for an ...

引用

2011 4th international symposium on parallel architectures, algorithms and programming, PAAP 2011

作者： Chen, Ningjiang Yu, Minmin Hu, Dandan College of Computer Electronic and Information Guangxi University Nanning China

ISBN: (纸本)9780769545752

To deal with users' diversified query requirements on uncertain data, an uncertain data query semantic for requirement extension named RU-Topk is introduced. In the high-load application environment, the top-k query algorithm's response time may be long. In order to satisfy performance requirements, with the consideration of the algorithm's features, the design and implementation of GPU-based RU-Topk algorithm as well as a batch scheduling strategy are presented. Finally, the experimental results on GPU platform show that they can obtain optimized performance. © 2011 IEEE.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

Utilizing Many-Core Accelerators for Halo and Center Finding within a Cosmology Simulation 5

Utilizing Many-Core Accelerators for Halo and Center Finding...

引用

IEEE 5th symposium on Large Data Analysis and Visualization (LDAV)

作者： Sewell, Christopher Lo, Li-ta Heitmann, Katrin Habib, Salman Ahrens, James Los Alamos Natl Lab Los Alamos NM 87545 USA Argonne Natl Lab Argonne IL 60439 USA

ISBN: (纸本)9781467385176

Efficiently finding and computing statistics about "halos" (regions of high density) are essential analysis steps for N-body cosmology simulations. However, in state-of-the-art simulation codes, these analysis operators do not currently take advantage of the shared-memory data-parallelism available on multi-core and many-core architectures. The Hybrid / Hardware Accelerated Cosmology Code (HACC) is designed as an MPI+X code, but the analysis operators are parallelized only among MPI ranks, because of the difficulty in porting different X implementations (e.g., OpenMP, CUDA) across all architectures on which it is run. In this paper, we present portable data-parallel algorithms for several variations of halo finding and halo center finding algorithms. These are implemented with the PISTON component of the VTK-m framework, which uses Nvidia's Thrust library to construct data-parallel algorithms that allow a single implementation to be compiled to multiple backends to target a variety of multi-core and many-core architectures. Finally, we compare the performance of our halo and center finding algorithms against the original HACC implementations on the Moonlight, Stampede, and Titan supercomputers. The portability of Thrust allowed the same code to run efficiently on each of these architectures. On Titan, the performance improvements using our code have enabled halo analysis to be performed on a very large data set (81923 particles across 16,384 nodes of Titan) for which analysis using only the existing CPU algorithms was not feasible.

关键词： D.1.3 [Software]: programming Techniques ??? [Concurrent Prgm.] haloes Titan ARCHITECTURE Codes Cosmology FINDINGS thrust Portability algorithms data parallel

来源：评论

学校读者我要写书评

暂无评论

Making Learning parallel Processing Interesting

Making Learning Parallel Processing Interesting

引用

26th IEEE international parallel and Distributed Processing symposium (IPDPS) / Workshop on High Performance Data Intensive Computing

作者： Liu, Jie Wu, Yanwei Marsaglia, John Western Oregon Univ Div Comp Sci Monmouth OR 97361 USA

ISBN: (纸本)9780769546766

The abundant availability of multi-core computers makes "parallel computers" a common place and teaching Computer Science students to be able to design and develop parallel algorithms an urgent task. Most students recognize the needs of developing skills in parallel programming. However, since their Computer Science related curriculum are mostly taught based on sequential computers, introducing a new way of analysis and solving problems can be difficult. Making students interested in the subject can have a pivotal effect in the learning outcomes. In this short paper, we show several approaches we have been using to excite our students about learning parallel processing at our Concurrent Systems class, where parallel processing and parallel programming are taught. Some approach include showing students algorithms with an appeared impossible high performance, showing them simple steps to achieve 100% CPU utilization on multi-core computers, combining sequential algorithms they learned in the past to create new parallel algorithms, and challenging them with implementing some rather complex parallel algorithms.

关键词： parallel processing multi core algorithms students' interests curriculum

来源：评论

学校读者我要写书评

暂无评论

BaLinda: a simple parallel programming model with active objects 3

BaLinda: a simple parallel programming model with active obj...

引用

3rd international symposium on parallel architectures, algorithms, and Networks (I-Span 97)

作者： Yuen, CK Feng, MD Natl Univ Singapore Dept Informat Syst & Comp Sci Singapore 119260 Singapore

ISBN: (纸本)0818682596

This paper presents the BaLinda model, based on last in/first out threads that interact via a shared tuplespace, and discusses the idea of using function-based objects as the basic unit of parallel execution and the hierarchical structure to partition tuplespaces. It is argued that the two-level parallel execution, both within. and between objects, are well suited to scalable parallel platforms with shared memory nodes connected by high speed networks.

关键词： HIgh speed networks

来源：评论

学校读者我要写书评

暂无评论

A parallel Dynamic programming Algorithm on a Multi-core Architecture 07

A Parallel Dynamic Programming Algorithm on a Multi-core Arc...

引用

19th Annual symposium on parallelism in algorithms and architectures

作者： Tan, Guangming Sun, Ninghui Gao, Guang R. Chinese Acad Sci Key Lab Comp Syst & Architecture Beijing Peoples R China

ISBN: (纸本)9781595936677

Dynamic programming is an efficient technique to solve combinatorial search and optimization problem. There have been many parallel dynamic programming algorithms. The purpose of this paper is to study a family of dynamic programming algorithm where data dependence appear between non-consecutive stages, in other words, the data dependence is non-uniform. This kind of dynamic programming is typically called nonserial polyadic dynamic programming. Owing to the: non-uniform data dependence;it is harder to optimize this problem for parallelism and locality on parallel architectures. In this paper, we address the chanllenge of exploiting fine grain parallelism and locality of nonserial polyadic dynamic programming on a multi-core architecture. We present a programming and execution model for multi-core architectures with memory hierarchy. In the framework of the new model, the parallelism and locality benifit from a data dependence transformation. We propose a parallel pipelined algorithm for filling the dynamic programming matrix by decomposing the computation operators. The new parallel algorithm tolerates the memory access latency using multi-thread and is easily improved with the technique. We formulate and analytically solve the optimization problem determing the the size that minimizes the total execution time. The experiments on a simulator give a validation of the proposed model and show that the fine grain parallel algorithm achieves sub-linear speedup and that a potential high scalability on multi-core arichitecture.

关键词： Dynamic programming Data Dependence Multicore Memory Hierarchy Scalabilitiy

来源：评论

学校读者我要写书评

暂无评论

Engineering parallel algorithms

Engineering parallel algorithms

引用

5th IEEE international symposium on High Performance Distributed Computing

作者： Fang, ND UNIV BASEL DEPT COMP ENGNCH-4056 BASELSWITZERLAND

ISBN: (纸本)0818675829

The rise of explicit parallel programming involves new problems: lack of structure for parallel algorithms and the ad hoc development of parallel algorithms. We use skeletons to characterize and design parallel algorithms and define a process to refine the designs step-by-step into programs. This paper introduces a high-level library on top of MPI which is derived from the skeleton concept to achieve better programmability and obtain portability. We conclude with a CFD application to demonstrate our idea.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Computation Oriented parallel FFT algorithms on distributed computer

Computation Oriented Parallel FFT algorithms on distributed ...

引用

international symposium on parallel architectures, algorithms, and programming

作者： Li, Pengzhen Dong, Weiqing Department of Computer Science and Technology Xi'an Jiaotong University Xi'an China

ISBN: (纸本)9780769543123

FFT is a widely used algorithm, of which parallelization is a very important topic. There were a lot of works for this field and many parallel algorithms were published in several decades. In this paper, an algorithm named Computation Oriented parallel FFT (COPF) is proposed. COPF which dates from the classic parallel radix-2 FFT focuses on the butterfly structure in FFT and adopts a proper strategy in parallel phrase on distributed system. In the serial phrase, COPF takes FFTW3 to accelerate the serial process and extends the application. © 2010 IEEE.

关键词： Fast Fourier transforms

来源：评论

学校读者我要写书评

暂无评论

Analytic comparison of two advanced C language-based parallel programming models

Analytic comparison of two advanced C language-based paralle...

引用

3rd international symposium on parallel and Distributed Computing (ISPDC 2004)/3rd international Workshop on algorithms, Models and Tools for parallel Computing on Heterogeneous Networks (HeteroPar 04)

作者： Marowka, A Hebrew Univ Jerusalem Sch Comp Sci & Engn Comp Aided Design Lab IL-91904 Jerusalem Israel

ISBN: (纸本)0769522106

There are two main approaches for designing parallel language. The first approach states that parallel computing demands new programming concepts and radical intellectual changes regarding the way we think about programming, as compared to sequential computing. Therefore, the design of such a parallel language must present new constructs and new programming methodologies. The second approach states that there is no need to reinvent the wheel, and serial languages can be extended to support parallelism. The motivation behind this approach is to keep the language as friendly as possible for the programmer who is the main bridge toward wider acceptance of the new language. In this paper we present a qualitative evaluation of two contemporary parallel languages: OpenMP-C and Unified parallel C (UPC). Both are explicit parallel programming languages based on the ANSI C standard. OpenMP-C was designed for shared-memory architectures and extends the base-language by using compiler directives that annotate the original source-code. On the other hand, UPC was designed for distribute-shared memory architectures and extends the base-language by new parallel constructs. We deconstruct each parallel language into its basic components, show examples, make a detailed analysis, compare them, and finally draw some conclusions.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：