检索结果-内蒙古大学图书馆

25th IEEE international parallel and Distributed Processing symposium, Workshops and Phd Forum, IPDPSW 2011

作者： Zhang, Yongpeng North Carolina State University United States

ISBN: (纸本)9780769543857

Emerging accelerating architectures, such as GPUs, have proved successful in providing significant performance gains to various application domains. This is done by exploiting data parallelism in existing algorithms. However, programming in a data-parallel fashion imposes extra burdens to programmers, who are used to writing sequential programs. New programming models and frameworks are needed to reach a balance between programmability, portability and performance. We start from stream processing domain and propose GStream, a general-purpose, scalable data streaming framework on GPUs. The contributions of GStream are as follows: (1) We provide powerful, yet concise language abstractions suitable to describe conventional algorithms as streaming problems. (2) We project these abstractions onto GPUs to fully exploit their inherent massive data-parallelism. (3) We demonstrate the viability of streaming on accelerators. Experiments show that the proposed framework provides flexibility, programmability and performance gains for various benchmarks from a collection of domains, including but not limited to data streaming, data parallel problems, numerical codes and text search. This work lays a foundation to our future work to develop more general data parallel programming models for many-core architectures. © 2011 IEEE.

关键词： Benchmarking

来源：评论

学校读者我要写书评

暂无评论

A distributed dynamic parallel algorithm for SIFT feature extraction

A distributed dynamic parallel algorithm for SIFT feature ex...

引用

international symposium on parallel architectures, algorithms, and programming

作者： Jiang, Guiyuan Zhang, Guiyuan Zhang, Dakun School of Computer Science and Software Engineering Tianjin Polytechnic University Tianjin 300000 China

ISBN: (纸本)9780769543123

This paper deals with the issue of developing efficient algorithms for accelerating SIFT (Scale Invariant Feature Transform) features extraction under distributed environment. The proposed distributed dynamic parallel algorithm (DDP-SIFT) using a special data parallel approach that divides the Gauss Scale Space by octave aimed at acquiring large image blocks which is of great importance in some application. To make this approach effective, steps of building Gauss Scale Space are changed, and only the prerequisite part which is only 1/13 of the whole pyramid will be produced before tasks allocation, and Allocation Data Quantity (ADQ) is decreased by 13 times. Data blocks are assembled as tasks maintained in task lists, and dynamically allocated to Computing Nodes. A refined-blocking approach is proposed to further improve load balance. Our investigations show that the proposed algorithm has remarkable performance on accelerating SIFT features extraction while pursuing large data blocks. © 2010 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel efficiency and parametric optimization in CASTEP

Parallel efficiency and parametric optimization in CASTEP

引用

2011 4th international symposium on parallel architectures, algorithms and programming, PAAP 2011

作者： Chen, Jun Fu, Liangjie Yang, Huaming High Performance Computing Center Central South University Changsha 410083 China School of Resources Processing and Bioengineering Central South University Changsha 410083 China

ISBN: (纸本)9780769545752

parallel efficiency is always a fundamental research field in high performance computing. This paper focuses on parallel computing at high performance computing cluster with CASTEP program, discusses multi-core parallel efficiency in CASTEP, and analyses the influence of the main calculation parameters upon total CPU time and memory usage in case study, such as CPU cores(CPUs), cutoff energy, k-point, and supercell size. The paper also rationalizes and optimizes in detail the better use of limited computing resources under special circumstances. © 2011 IEEE.

关键词： Program processors

来源：评论

学校读者我要写书评

暂无评论

Program transformations and skeletons: Formal derivation of parallel programs 1

Program transformations and skeletons: Formal derivation of ...

引用

1st Aizu international symposium on parallel algorithms/Architecture Synthesis, AISPAS 1995

作者： Geerling, A. Max Computing Science Institute University of Nijmegen Toernooiveld 1 NijmegenNL-6525 ED Netherlands

ISBN: (纸本)081867038X

The paper describes-from a software engineering perspective-a framework for the formal development of parallel algorithms on arbitrary architectures. The algorithms are synthesised in a transformational way, i.e. by applying correctness preserving rewrite rules to a formal specification. The architectures are modelled by skeletons-higher order functions that represent elementary computations on a certain architecture. It is shown that the combination of transformational programming and skeletons stimulates the reuse of program derivations. Furthermore, interskeleton transformations will provide the means for architecture independent program development. © 1995 IEEE.

关键词： Computer software reusability

来源：评论

学校读者我要写书评

暂无评论

An Entertaining Approach to parallel programming Education 32

An Entertaining Approach to Parallel Programming Education

引用

32nd IEEE international parallel and Distributed Processing symposium (IPDPS)

作者： Buzek, Emanuel Krulis, Martin Charles Univ Prague Fac Math & Phys Parallel Architectures Applicat Algorithms Res Gr Prague Czech Republic

ISBN: (纸本)9781538655559

Despite the facts that multicore CPUs are present in virtually every personal computer or cell phone and distributed systems in the form of cloud services are steadily penetrating various domains of our lives, only a minority of programmers and computer science graduates are able to effectively design and develop parallel and distributed applications. Serial thinking is natural to all humans and it is also encouraged by many computer science curricula. Even though that leading educational institutions are attempting to rectify this trend by introducing parallel programming courses into their study programs, these courses are often dedicated for more experienced students in their fourth of fifth year since mastering modern parallel technologies like OpenMP or CUDA requires certain level of programming skills. It can be argued, that the parallel thinking should be taught much sooner, perhaps even before tertiary education. To this end, we have created an educational platform Parapple that aims to introduce parallelism and related problems like load balancing or synchronization to inexperienced programmers in an entertaining form. Our platform is web-based, so it can run in any modern browser on all operating systems without installation and the users are required to have only a very basic understanding of structural imperative programming.

关键词： parallel programming education simple visual web

来源：评论

学校读者我要写书评

暂无评论

On a scheme for parallel sorting on heterogeneous clusters

引用

FUTURE GENERATION COMPUTER SYSTEMS 2002年第3期18卷 353-372页

作者： Cérin, C Gaudiot, JL Univ So Calif Los Angeles CA 90089 USA Univ Picardie Jules Verne Laria F-80000 Amiens France

We discuss parallel sorting algorithms and their implementations suitable for cluster architectures in order to optimize cluster resources. We focus on the time spent in computation and the load balancing properties when processors are running at different speeds, i.e. correlated by a multiplicative constant factor (our weak definition of heterogeneous platform). One scheme is under study: parallel sorting by sampling (either regular sampling technique introduced by Shi and Schaeffer [J. parallel Distrib. Comput. 14 (4) (1992) 361] or the over-partitioning scheme introduced by Li and Seveik [parallel sorting by over-partitioning, in: Proceedings of the Sixth Annual symposium on parallel algorithms and architectures, ACM Press, New York, June 1994]). What is important in the paper is mainly the load balance factor and not necessary the execution time. It is clear that improved load balance leads to improved execution titre. The results presented in the paper demonstrate that load balancing for the case of computers with heterogeneous processing capacity is more challenging than for the homogeneous case. The survey, through the sorting case study, allow us to identify some algorithmic issues and software challenges to master heterogeneous cluster platforms in order to better utilize theta: data decomposition techniques, scheduling and load balancing methods. (C) 2002 Elsevier Science B.V. All rights reserved.

关键词： performance evaluation and modeling of parallel integer sorting algorithms sorting by regular sampling and by over-partitioning data distribution load balancing strategies BSP programming

来源：评论

学校读者我要写书评

暂无评论

Thread-level parallel algorithm for sorting integer sequence on multi-core computers

Thread-level parallel algorithm for sorting integer sequence...

引用

2011 4th international symposium on parallel architectures, algorithms and programming, PAAP 2011

作者： Cheng, Zhong Qi, Ke Liu, Jun Huang, Yi-Ran School of Computer and Electronics and Information Guangxi University Nanning Guangxi 530004 China School of Information and Statistics Guangxi University for Finance and Economics Nanning Guangxi 530004 China

ISBN: (纸本)9780769545752

According to the characteristics of multi-core architectures and binary storage property of integer sequence, this paper proposes an efficient thread-level parallel algorithm for sorting integer sequence on multi-core computers. The algorithm divides the input integer sequence to several data blocks in main memory and distributes these blocks into the shared L2 cache and private L1 cache respectively, implements dynamically load balance among the processing cores, and utilizes data-level parallel SIMD instructions and thread-binding technique to speed up the sorting procedure. Experiment results show that the algorithm can obtain high speedup and good scalability, and its execution efficiency will not be affected by the data distribution of input integer sequence. © 2011 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Decentralized in-order execution of a sequential task-based code for shared-memory architectures 36

Decentralized in-order execution of a sequential task-based ...

引用

36th IEEE international parallel and Distributed Processing symposium (IEEE IPDPS)

作者： Castes, Charly Agullo, Emmanuel Aumage, Olivier Saillard, Emmanuelle Inria LaBRI Bordeaux France Ecole Polytech Fed Lausanne Lausanne Switzerland

ISBN: (纸本)9781665497473

The hardware complexity of modern machines makes the design of adequate programming models crucial for jointly ensuring performance, portability, and productivity in high-performance computing (HPC). Sequential task-based programming models paired with advanced runtime systems allow the programmer to write a sequential algorithm independently of the hardware architecture in a productive and portable manner, and let a third party software layer -the runtime system- deal with the burden of scheduling a correct, parallel execution of that algorithm to ensure performance. Many HPC algorithms have successfully been implemented following this paradigm, as a testimony of its effectiveness. Developing algorithms that specifically require fine-grained tasks along this model is still considered prohibitive, however, due to per-task management overhead [1], forcing the programmer to resort to a less abstract, and hence more complex "task+X" model. We thus investigate the possibility to offer a tailored execution model, trading dynamic mapping for efficiency by using a decentralized, conservative in-order execution of the task flow, while preserving the benefits of relying on the sequential taskbased programming model. We propose a formal specification of the execution model as well as a prototype implementation, which we assess on a shared-memory multicore architecture with several synthetic workloads. The results show that under the condition of a proper task mapping supplied by the programmer, the pressure on the runtime system is significantly reduced and the execution of fine-grained task flows is much more efficient.

关键词： Runtime Computational modeling Software algorithms Computer architecture programming Hardware Software

来源：评论

学校读者我要写书评

暂无评论

parallel algorithm of visualization of reservoir numerical simulation based on PEBI grids

Parallel algorithm of visualization of reservoir numerical s...

引用

2011 4th international symposium on parallel architectures, algorithms and programming, PAAP 2011

作者： Dong, Lanfang Lu, Detang Li, Meng Vision Computing and Visualization Laboratory School of Computer Science and Technology University of Science and Technology of China Hefei China Department of Modern Mechanics University of Science and Technology of China Hefei China

ISBN: (纸本)9780769545752

The speed of calculating, tracking and filling the isolines has a direct impact on the performance of user interaction. In this paper, we begin with the serial algorithm of visualization and implement its parallel algorithm. First, we divide the Delaunay grids generated from the PEBI grids into several regions. Calculation, tracking of isolines and calculation of saturation are implemented in each region respectively. Then the tracking results of each region are integrated for the entire work area. The parallel examples using OpenMP on computers with dual-core/quad-core are given at the end of this paper. The experimental results show that the parallel processing can greatly reduce the time required for data processing in visualization. © 2011 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

programming with divide-and-conquer skeletons: A case study of FFT

引用

JOURNAL OF SUPERCOMPUTING 1998年第1-2期12卷 85-97页

作者： Gorlatch, S Univ Passau D-94030 Passau Germany

We demonstrate an approach to parallel programming, based on skeletons - parameterized program schemas with efficient implementations over diverse architectures. The contribution of the paper is two-fold: (1)we classify divide-and-conquer (DC) algorithms and provide a family of provably correct parallel implementations for a particular DC skeleton, called DH (distributable homomorphism);(2) we adjust the mathematical specification of the Fast Fourier Transform (FFT) to the DH skeleton and, thereby, obtain a generic SPMD program, well suited for implementation under MPI. The generic program includes the efficient FFT solutions used in practice - the binary-exchange and the 2D- and 3D-transpose implementations - as special cases.

关键词： parallel programming skeletons divide-and-conquer Bird-Meertens formalism (BMF) Fast Fourier Transform (FFT)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：