检索结果-内蒙古大学图书馆

29th IEEE International Parallel and Distributed Processing Symposium (IPDPS)

作者： Pichon, Gregoire Haidar, Azzam Faverge, Mathieu Kurzak, Jakub Inria Bordeaux Sud Ouest Bordeaux INP Talence France Univ Tennessee Innovat Comp Lab Knoxville TN USA

ISBN: (纸本)9781479986484

Computing eigenpairs of a symmetric matrix is a problem arising in many industrial applications, including quantum physics and finite-elements computation for automobiles. A classical approach is to reduce the matrix to tridiagonal form before computing eigenpairs of the tridiagonal matrix. Then, a back-transformation allows one to obtain the final solution. Parallelism issues of the reduction stage have already been tackled in different shared-memory libraries. In this article, we focus on solving the tridiagonal eigenproblem, and we describe a novel implementation of the Divide and Conquer algorithm. The algorithm is expressed as a sequential task-flow, scheduled in an out-of-order fashion by a dynamic runtime which allows the programmer to play with tasks granularity. The resulting implementation is between two and five times faster than the equivalent routine from the INTEL MKL library, and outperforms the best MRRR implementation for many matrices.

关键词： Eigensolver multicore task-based programming PLASMA LAPACK

来源：评论

学校读者我要写书评

暂无评论

Towards seismic wave modeling on heterogeneous many-core architectures using task-based runtime system 27

Towards seismic wave modeling on heterogeneous many-core arc...

引用

IEEE 27th International Symposium on Computer Architecture and High Performance Computing

作者： Martinez, Victor Michea, David Dupros, Fabrice Aumage, Olivier Thibault, Samuel Aochi, Hideo Navaux, Philippe O. A. Univ Fed Rio Grande do Sul Inst Informat Av Bento Gonsalves 9500Campus Vale Porto Alegre RS Brazil Bur Rech Geol & Minieres Orleans France Inria Bordeaux Sud Ouest Bordeaux France

ISBN: (纸本)9781467380119

Understanding three-dimensional seismic wave propagation in complex media is still one of the main challenges of quantitative seismology. Because of its simplicity and numerical efficiency, the finite-differences method is one of the standard techniques implemented to consider the elastodynamics equation. Additionally, this class of modeling heavily relies on parallel architectures in order to tackle large scale geometries including a detailed description of the physics. Last decade, significant efforts have been devoted towards efficient implementation of the finite-differences methods on emerging architectures. These contributions have demonstrated their efficiency leading to robust industrial applications. The growing representation of heterogeneous architectures combining general purpose multicore platforms and accelerators leads to re-design current parallel application. In this paper, we consider StarPU task-based runtime system in order to harness the power of heterogeneous CPU+GPU computing nodes. We detail our implementation and compare the performance obtained with the classical CPU or GPU only versions. Preliminary results demonstrate significant speedups in comparison with the best implementation suitable for homogeneous cores.

关键词： GPU Heterogeneous architectures Seismic wave simulation task-based programming

来源：评论

学校读者我要写书评

暂无评论

Runtime-Driven Shared Last-Level Cache Management for task-Parallel Programs 15

Runtime-Driven Shared Last-Level Cache Management for Task-P...

引用

International Conference for High Performance Computing, Networking, Storage and Analysis (SC)

作者： Pan, Abhisek Pai, Vijay S. Purdue Univ Sch Elect & Comp Engn W Lafayette IN 47907 USA

ISBN: (纸本)9781450337236

task-parallel programming models with input annotation-based concurrency extraction at runtime present a promising paradigm for programming multicore processors. Through management of dependencies, task assignments, and orchestration, these models markedly simplify the programming effort for parallelization while exposing higher levels of concurrency. In this paper we show that for multicores with a shared last-level cache (LLC), the concurrency extraction framework can be used to improve the shared LLC performance. based on the input annotations for future tasks, the runtime instructs the hardware to prioritize data blocks with future reuse while evicting blocks with no future reuse. These instructions allow the hardware to preserve all the blocks for at least some of the future tasks and evict dead blocks. This leads to a considerable improvement in cache efficiency over what is achieved by hardware-only replacement policies, which can replace blocks for all future tasks resulting in poor hit-rates for all future tasks. The proposed hardware-software technique leads to a mean improvement of 18% in application performance and a mean reduction of 26% in misses over a shared LLC managed by the Least Recently Used replacement policy for a set of input-annotated task-parallel programs using the OmpSs programming model implemented on the NANOS++ runtime. In contrast, the state-of-the-art thread-based partitioning scheme suffers an average performance loss of 2% and an average in crease of 15% in misses over the baseline.

关键词： Shared cache partitioning task-based programming Reuse distance Multicore

来源：评论

学校读者我要写书评

暂无评论

Performance Analysis and Optimisation of Two-sided Factorization Algorithms for Heterogeneous Platform

引用

Procedia Computer Science 2015年 51卷 180-190页

作者： Khairul Kabir Azzam Haidar Stanimire Tomov Jack Dongarra University of Tennessee Knoxville TN USA Oak Ridge National Laboratory Oak Ridge TN USA University of Manchester Manchester U.K

Many applications, ranging from big data analytics to nanostructure designs, require the solution of large dense singular value decomposition (SVD) or eigenvalue problems. A first step in the solution methodology for these problems is the reduction of the matrix at hand to condensed form by two-sided orthogonal transformations. This step is standardly used to significantly accelerate the solution process. We present a performance analysis of the main two-sided factorizations used in these reductions: the bidiagonalization, tridiagonalization, and the upper Hessenberg factorizations on heterogeneous systems of multicore CPUs and Xeon Phi coprocessors. We derive a performance model and use it to guide the analysis and to evaluate performance. We develop optimized implementations for these methods that get up to 80% of the optimal performance bounds. Finally, we describe the heterogeneous multicore and coprocessor development considerations and the techniques that enable us to achieve these high-performance results. The work here presents the first highly optimized implementation of these main factorizations for Xeon Phi coprocessors. Compared to the LAPACK versions optmized by Intel for Xeon Phi (in MKL), we achieve up to 50% speedup.

关键词： Eigensolver Multicore Xeon Phi task-based programming

来源：评论

学校读者我要写书评

暂无评论

An execution environment for flexible task-oriented software on multicore systems

引用

CONCURRENT ENGINEERING-RESEARCH AND APPLICATIONS 2012年第2期20卷 161-173页

作者： Rauber, Thomas Ruenger, Gudula Univ Bayreuth Dept Comp Sci Bayreuth Germany Tech Univ Chemnitz Dept Comp Sci Chemnitz Germany

The article addresses the challenges of software development for current and future parallel computers, which are expected to be dominated by multicore and many-core architectures. Using these multicore processors for cluster systems will create systems with thousands of cores and deep memory hierarchies. To efficiently exploit the tremendous parallelism of these hardware platforms, a new generation of programming methodologies is needed. This article proposes a parallel programming methodology exploiting a task-based representation of application software. For the specification of task-based programs, a coordination language is presented, which uses external variables to express the cooperation between tasks. For the actual execution of a task-based program on a specific parallel architecture, different dynamic scheduling algorithms embedded into an execution environment are introduced. Runtime experiments for complex methods from a numerical analysis are performed on different parallel execution platforms.

关键词： task-based programming coordination language scheduling parallel execution mapping

来源：评论

学校读者我要写书评

暂无评论

An execution environment for flexible task-oriented software on multicore systems

An execution environment for flexible task-oriented software...

引用

International Conference on Complex Systems Design and Management (CSDM)

作者： Rauber, Thomas Ruenger, Gudula Univ Bayreuth Dept Comp Sci Bayreuth Germany Tech Univ Chemnitz Dept Comp Sci Chemnitz Germany

关键词： task-based programming coordination language scheduling parallel execution mapping

来源：评论

学校读者我要写书评

暂无评论

Modeling the Energy Consumption for Concurrent Executions of Parallel tasks 14

Modeling the Energy Consumption for Concurrent Executions of...

引用

14th Communications and Networking Symposium (CNS 2011) / Spring Simulation Multiconference (SpringSim '11)

作者： Rauber, Thomas Ruenger, Gudula Univ Bayreuth Bayreuth Germany Tech Univ Chemnitz Chemnitz Germany

ISBN: (纸本)9781617828379

programming models using parallel tasks provide portable performance and scalability for modular applications on many high-performance systems. This is achieved by the flexibility of a two-level programming structure supporting mixed task and data parallelism. Due to the emerging importance of energy efficiency in high-performance computing, programming models with parallel tasks should be extended to be able to include energy concerns. based on a well-accepted analytical energy model for a processor's energy consumption, this article explores the energy consumption of parallel tasks with communication that are executed concurrently with other tasks. Simulations show the different energy consumption scenarios for different task cooperations and demonstrate the potential for a flexible energy usage on varying parallel platforms.

关键词： energy model task-based programming communication

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：