检索结果-内蒙古大学图书馆

International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (IC-SAMOS)

作者： Maxime Pelcat Cédric Bourrasset Luca Maggiani François Berry INSA Rennes IETR UBL Rennes France Institut Pascal Aubière France Atos/Bull Center for Excellence in Parallel Programming CINES Montpellier France Scuola Superiore Sant' Anna Pisa Italy

The complexity of hardware systems is currently growing faster than the productivity of system designers and programmers. This phenomenon is called Design Productivity Gap and results in inflating design costs. In this paper, the notion of Design Productivity is precisely defined, as well as a metric to assess the Design Productivity of a High-Level Synthesis (HLS) method versus a manual hardware description. The proposed Design Productivity metric evaluates the trade-off between design efficiency and implementation quality. The method is generic enough to be used for comparing several HLS methods of different natures, opening opportunities for further progress in Design Productivity. To demonstrate the Design Productivity evaluation method, an HLS compiler based on the CAPH language is compared to manual VHDL writing. The causes that make VHDL lower level than CAPH are discussed. Versions of the sub-pixel interpolation filter from the MPEG HEVC standard are implemented and a design productivity gain of 2.3× in average is measured for the CAPH HLS method. It results from an average gain in design time of 4.4× and an average loss in quality of 1.9×.

关键词： Productivity Measurement Hardware design languages Hardware Field programmable gate arrays Complexity theory Writing

来源：评论

学校读者我要写书评

暂无评论

parallel Performance Analysis of a Regional Numerical Weather Prediction Model in a Petascale Machine 2nd

Parallel Performance Analysis of a Regional Numerical Weathe...

引用

2nd Latin American High-Performance Computing Conference (CARLA)

作者： Souto, Roberto Pinto da Silva Dias, Pedro Leite Vigilant, Franck Natl Lab Sci Comp LNCC 333 Ave Getulio Vargas BR-25651075 Petropolis Brazil Atos Bull Ctr Excellence Parallel Programming CEP F-38432 Echirolles France

ISBN: (纸本)9783319269283;9783319269276

This paper presents the parallel performance achieved by a regional model of numerical weather prediction (NWP), running on thousands of computing cores in a petascale supercomputing system. It was obtained good scalability, running with up to 13440 cores, distributed in 670 nodes. These results enables this application to solve large computational challenges, such as perform weather forecast at very high spatial resolution.

关键词： parallel performance analysis Numerical weather prediction Petascale supercomputing

来源：评论

学校读者我要写书评

暂无评论

Inverse docking method for new proteins targets identification: A parallel approach

引用

parallel COMPUTING 2015年 42卷 48-59页

作者： Vasseur, Romain Baud, Stephanie Steffenel, Luiz Angelo Vigouroux, Xavier Martiny, Laurent Krajecki, Michael Dauchez, Manuel Univ Reims URCA MEDyC CNRS UMR 7369 Reims France Univ Reims URCA P3M Multiscale Mol Modeling Plateform Reims France Univ Reims URCA CReSTIC SYSCOM Team EA 3804 Reims France Bull SAS Ctr Excellence Parallel Programming Echirolles France

Molecular docking is a widely used computational technique that allows studying structure-based interactions complexes between biological objects at the molecular scale. The purpose of the current work is to develop a set of tools that allows performing inverse docking, i.e., to test at a large scale a chemical ligand on a large dataset of proteins, which has several applications on the field of drug research. We developed different strategies to parallelize/distribute the docking procedure, as a way to efficiently exploit the computational performance of multi-core and multi-machine (cluster) environments. The experiments conducted to compare these different strategies encourage the search for decomposing strategies since it improves the execution of inverse docking. (C) 2014 Elsevier B.V. All rights reserved.

关键词： Molecular docking Inverse docking Blind docking Master-worker Distributed computing

来源：评论

学校读者我要写书评

暂无评论

SPar: A DSL for High-Level and Productive Stream parallelism

引用

parallel PROCESSING LETTERS 2017年第1期27卷 1740005-1740005页

作者： Griebler, Dalvan Danelutto, Marco Torquati, Massimo Fernandes, Luiz Gustavo Pontifical Catholic Univ Rio Grande do Sul PUCRS Fac Informat Comp Sci Grad Program PPGCC GMAP Ave Ipiranga6681 Bldg 32 BR-90619900 Porto Alegre RS Brazil Univ Pisa UNIPI Dept Comp Sci Parallel Programming Models Grp Largo Pontecorvo 3 I-56127 Pisa Italy

This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. A set of tools process SPar code (C++ annotated code using the SPar attributes) to generate FastFlow C++ code that exploits the stream parallelism denoted by SPar annotations while targeting shared memory multi-core architectures. We outline the main SPar features along with the main implementation techniques and tools. Also, we show the results of experiments assessing the feasibility of the entire approach as well as SPar's performance and expressiveness.

关键词： Stream parallelism high-level parallel programming domain-specific languages parallel design patterns algorithmic skeletons C++11 attributes

来源：评论

学校读者我要写书评

暂无评论

Separating the wheat from the chaff: Identifying relevant and similar performance data with visual analytics 2

Separating the wheat from the chaff: Identifying relevant an...

引用

2nd Workshop on Visual Performance Analysis, VPA 2015

作者： Von Rüden, Laura Hermanns, Marc-André Behrisch, Michael Keim, Daniel Mohr, Bernd Wolf, Felix GSC Computational Engineering Parallel Programming TU Darmstadt Germany JARA-HPC RWTH Aachen University Germany Data Analysis and Visualization University of Konstanz Germany Jülich Supercomputing Centre Forschungszentrum Jülich Germany

ISBN: (纸本)9781450340137

Performance-analysis tools are indispensable for understanding and optimizing the behavior of parallel programs running on increasingly powerful supercomputers. However, with size and complexity of hardware and software on the rise, performance data sets are becoming so voluminous that their analysis poses serious challenges. In particular, the search space that must be traversed and the number of individual performance views that must be explored to identify phenomena of interest becomes too large. To mitigate this problem, we use visual analytics. Specifically, we accelerate the analysis of performance profiles by automatically identifying (1) relevant and (2) similar data subsets and their performance views. We focus on views of the virtual-process topology, showing that their relevance can be well captured with visual-quality metrics and that they can be further assigned to topical groups according to their visual features. A case study demonstrates that our approach helps reduce the search space by up to 80%. Copyright 2015 ACM.

关键词： Supercomputers

来源：评论

学校读者我要写书评

暂无评论

Energy profile of rollback-recovery strategies in high performance computing

引用

parallel COMPUTING 2014年第9期40卷 536-547页

作者： Meneses, Esteban Sarood, Osman Kale, Laxmikant V. Univ Illinois Dept Comp Sci Parallel Programming Lab Champaign IL 61820 USA

Extreme-scale computing is set to provide the infrastructure for the advances and breakthroughs that will solve some of the hardest problems in science and engineering. However, resilience and energy concerns loom as two of the major challenges for machines at that scale. The number of components that will be assembled in the supercomputers plays a fundamental role in these challenges. First, a large number of parts will substantially increase the failure rate of the system compared to the failure frequency of current machines. Second, those components have to fit within the power envelope of the installation and keep the energy consumption within operational margins. Extreme-scale machines will have to incorporate fault tolerance mechanisms and honor the energy and power restrictions. Therefore, it is essential to understand how fault tolerance and energy consumption interplay. This paper presents a comparative evaluation and analysis of energy consumption of three different rollback-recovery protocols: checkpoint/restart, message logging and parallel recovery. Our experimental evaluation shows parallel recovery has the minimum execution time and energy consumption. Additionally, we present an analytical model that projects parallel recovery can reduce energy consumption more than 37% compared to checkpoint/restart at extreme scale. (C) 2014 Elsevier B.V. All rights reserved.

关键词： Rollback-recovery Checkpoint/restart Message logging parallel recovery Energy consumption

来源：评论

学校读者我要写书评

暂无评论

An immersed free-surface boundary treatment for seismic wave simulation

引用

GEOPHYSICS 2015年第5期80卷 T193-T209页

作者： Gao, Longfei Brossier, Romain Pajot, Benjamin Tago, Josue Virieux, Jean Univ Grenoble Alpes CNRS ISTerre Grenoble France Ctr Excellence Parallel Programming Grenoble France Univ Nacl Autonoma Mexico Fac Ingn Mexico City 04510 DF Mexico

Finite-difference methods are popular for wave simulation within the seismic exploration community, thanks to their efficiency. However, difficulties arise when encountering complex topography due to the regular grid pattern of the finitedifference schemes. Despite alternatives that can handle the free surface with little effort, such as the spectral element or discontinuous Galerkin's methods, incorporating a free-surface boundary condition within the finite-difference framework is still appealing, even at the cost of extra algorithm complexity and stronger requirement of computational resources. We present a free-surface boundary treatment within the finite-difference framework, belonging to the family of the immersed-boundary methods. Inherently, the presented boundary treatment is separated from the rest of the wave simulation, which makes it easy to be integrated in existing finite-difference codes. Specifically, we construct an extrapolation operator for each grid point above the free surface, if requested by the finite-difference stencil, to estimate its fictitious wavefield value at each time step. These operators are constructed only once and remain unchanged for all the time steps and source locations. The memory requirement of these operators is significant. Fortunately, grouping together multiple simulations concerning different source locations makes it possible to dilute the memory burden to a negligible level. Additionally, applying these operators incurs numerical noise, which may lead to long time instabilities. In such a scenario, additional numerical procedures, for instance, introducing artificial diffusion, are necessary to control the instability and obtain sensible simulation results. Successful applications of the presented boundary treatment to elastic-wave equations on domains with nontrivial topographies, in 2D and 3D, are presented. Robust and efficient numerical techniques to control high-frequency numerical noise remain to be investigat

关键词： Wave equation free surface immersed-boundary methods extrapolation staggered grid instability

来源：评论

学校读者我要写书评

暂无评论

Scalable and Efficient parallel Selection

Scalable and Efficient Parallel Selection

引用

10th International Conference on parallel Processing and Applied Mathematics (PPAM)

作者： Siebert, Christian Rhein Westfal TH Aachen Dept Comp Sci Lab Parallel Programming Aachen Germany

ISBN: (纸本)9783642552243

Selection algorithms find the kth smallest element from a set of elements. Although there are optimal parallel selection algorithms available for theoretical machines, these algorithms are not only difficult to implement but also inefficient in practice. Consequently, scalable applications can only use few special cases such as minimum and maximum, where efficient implementations exist. To overcome such limitations, we propose a general parallel selection algorithm that scales even on today's largest supercomputers. Our approach is based on an efficient, unbiased median approximation method, recently introduced as median-of-3 reduction, and Hoare's sequential QuickSelect idea from 1961. The resulting algorithm scales with a time complexity of O(log(2) n) for n distributed elements while needing only O(1) space. Furthermore, we prove it to be a practical solution by explaining implementation details and showing performance results for up to 458, 752 processor cores.

关键词： Selection QuickSelect Median parallel algorithms MPI

来源：评论

学校读者我要写书评

暂无评论

Perfectly Load-Balanced, Stable, Synchronization-Free parallel Merge

引用

parallel PROCESSING LETTERS 2014年第1期24卷 1450005-1-1450005-11页

作者： Siebert, Christian Traeff, Jesper Larsson Rhein Westfal TH Aachen Dept Comp Sci Lab Parallel Programming Schinkel Str 2a D-52062 Aachen Germany Vienna Univ Technol Inst Informat Syst Res Grp Parallel Comp A-1040 Vienna Austria

We present a simple, work-optimal and synchronization-free solution to the problem of stably merging in parallel two given, ordered arrays of m and n elements into an ordered array of m+n elements. The main contribution is a new, simple, fast and direct algorithm that determines, for any prefix of the stably merged output array, the exact prefixes of each of the two input arrays needed to produce this output prefix. More precisely, for any given index in the resulting, but not yet constructed output array, representing the desired output prefix, the algorithm computes the indices (called co-ranks) in each of the two input arrays representing the required input prefixes without having to merge the input arrays. The co-ranking algorithm takes O(log min(m,n)) time steps, and uses 0(1) space. Co-ranking is used in parallel to partition the input arrays into a collection of as many pairs as desired, each pair with exactly the same number of elements. Any stable, sequential merge algorithm can be used to merge pairs independently. The result is a perfectly load-balanced, stable, parallel merge algorithm. Co-ranking and sequential merging of pairs can be done without synchronization. Compared to other, linear speedup approaches to the parallel merge problem, the algorithm is considerably simpler and can be up to a factor of two faster. Compared to previous algorithms for solving the co-ranking problem, the new algorithm works for arbitrary output array indices and maintains stability in the presence of repeated elements at no extra space or time cost. When the number of processing elements p does not exceed (m n)/ log min(m, n), the parallel merge algorithm has perfect, linear speedup p. Furthermore, it is easy to implement on both shared and distributed memory systems.

关键词： Merge problem parallel merging Stable merging

来源：评论

学校读者我要写书评

暂无评论

Special issue: Combined Special issue on Euro-Par 2013 and JAVA Technologies for Real-Time and Embedded Systems (JTRES 2012)

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2014年第14期26卷 2345-2346页

作者： Lengauer, Christian Bouge, Luc Wolf, Felix Univ Passau D-94030 Passau Germany ENS Rennes IRISA INRIA F-35170 Bruz France German Res Sch Simulat Sci Lab Parallel Programming D-52062 Aachen Germany

This special issue of Concurrency and Computation: Practice and Experience contains revised and extended versions of selected papers presented at the conference Euro-Par 2013. Euro-Par—the European Conference on parallel Computing—is an annual series of international conferences dedicated to the promotion and advancement of all aspects of parallel and distributed computing. Euro-Par covers a wide spectrum of topics from algorithms and theory to software technology and hardware-related issues, with application areas ranging from scientific to mobile and cloud computing. The major part of the Euro-Par audience consists of researchers in academic institutions, government laboratories and industrial organisations.

关键词： Concurrency parallel Lines experience algorithms master element Mobile

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：