检索结果-内蒙古大学图书馆

Performance optimizations for scalable implicit RANS calculations with SU2

COMPUTERS & FLUIDS 2016年 129卷 146-158页

作者： Economon, Thomas D. Mudigere, Dheevatsa Bansal, Gaurav Heinecke, Alexander Palacios, Francisco Park, Jongsoo Smelyanskiy, Mikhail Alonso, Juan J. Dubey, Pradeep Stanford Univ Dept Aeronaut & Astronaut Stanford CA 94305 USA Intel Corp Parallel Comp Lab Bangalore Karnataka India Intel Corp Software & Serv Grp Hillsboro OR 97124 USA Intel Corp Parallel Comp Lab Santa Clara CA 95044 USA Boeing Co Adv Concepts Grp Long Beach CA 90808 USA

In this paper, we present single- and multi-node optimizations of SU2, a widely-used, open-source Computational Fluid Dynamics application, aimed at improving performance and scalability for implicit Reynolds-averaged Navier-Stokes calculations on unstructured grids. Typical industry-standard implementations are currently limited by unstructured accesses, variable degrees of parallelism, as well as the global synchronizations inherent in traditionally used Krylov linear solvers. Therefore, we rely on aggressive single-node optimizations, such as hierarchical parallelism, dynamic threading, compacted memory layout, and vectorization, along with a communication-friendly agglomeration (geometric) linear multi grid solver. Based on results with the well-known ONERA M6 geometry, our single core and shared memory optimizations result in a speedup of 2.6X on the latest 14-core Intel (R) Xeon (TM) (1) E5-2697v3 processor when compared to the baseline SU2 implementation with 14 MPI ranks. In multi-node settings, the hybrid OpenMP+MPI multigrid implementation achieves 2X higher parallel efficiency on 256 nodes over conventional Krylov-based (GMRES) methods. (C) 2016 Elsevier Ltd. All rights reserved.

关键词： CFD Multigrid parallel algorithms High performance computing

来源：评论

学校读者我要写书评

暂无评论

Speeding up parallel Combinatorial Optimization algorithms with Las Vegas Method 10th

Speeding up Parallel Combinatorial Optimization Algorithms w...

引用

10th International Conference on Large-Scale Scientific Computations (LSSC)

作者： Zavalnij, Bogdan Univ Pecs Inst Math & Informat Pecs Hungary

ISBN: (纸本)9783319265209;9783319265193

In this paper we introduce a new method for speeding up parallel run times of discrete optimization problems which can be used for different problems. We propose that the variant of the Monte Carlo method, the Las Vegas method can be used for overcoming some special barriers that can occur in the course of dividing such problems. Especially the problem of maximum clique and k-clique is examined, and the new algorithm with the relevant measurements is presented.

关键词： Las Vegas method parallel algorithms Maximum clique

来源：评论

学校读者我要写书评

暂无评论

Scalable, Robust, Fault-Tolerant parallel QR Factorization 19

Scalable, Robust, Fault-Tolerant Parallel QR Factorization

引用

第十五届分布式计算及其应用国际学术研讨会

作者： Camille Coti LIPN CNRS UMR 7030 Université Paris 13 Sorbonne Paris Cité

ISBN: (纸本)9781509035946

In this paper,we are presenting QR factorization algorithms that can tolerate process crashes and soft errors(bit flips).Our algorithms take advantage of structural properties of a QR factorization algorithm referred to as "communication-avoiding".We show that,exploiting these properties,our resilient,robust algorithms modify the communication pattern of the computation but do not add any significant computation in the critical path.

关键词： Fault tolerance Fault tolerant systems Matrix decomposition Algorithm design and analysis Robustness Standards parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization

引用

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE 2016年第4期42卷 1–35页

作者： Rouet, Francois-Henry Li, Xiaoye S. Ghysels, Pieter Napov, Artem Lawrence Berkeley Natl Lab MS 50F-1650One Cyclotron Rd Berkeley CA 94720 USA Univ Libre Bruxelles Serv Metrol Nucl CP 165 84 Ave FD Roosevelt 50 B-1050 Brussels Belgium

We present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, relies on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. This work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.

关键词： Design algorithms Performance HSS matrices randomized sampling ULV factorization parallel algorithms distributed-memory

来源：评论

学校读者我要写书评

暂无评论

SkyAlign: a portable, work-efficient skyline algorithm for multicore and GPU architectures

引用

VLDB JOURNAL 2016年第6期25卷 817-841页

作者： Bogh, Kenneth S. Chester, Sean Assent, Ira Aarhus Univ Abogade 34 DK-8200 Aarhus N Denmark Norwegian Univ Sci & Technol NTNU Sem Saelandsvei 7-9 N-7491 Trondheim Norway

The skyline operator determines points in a multidimensional dataset that offer some optimal trade-off. State-of-the-art CPU skyline algorithms exploit quad-tree partitioning with complex branching to minimise the number of point-to-point comparisons. Branch-phobic GPU skyline algorithms rely on compute throughput rather than partitioning, but fail to match the performance of sequential algorithms. In this paper, we introduce a new skyline algorithm, SkyAlign, that is designed for the GPU, and a GPU-friendly, grid-based tree structure upon which the algorithm relies. The search tree allows us to dramatically reduce the amount of work done by the GPU algorithm by avoiding most point-to-point comparisons at the cost of some compute throughput. This trade-off allows SkyAlign to achieve orders of magnitude faster performance than its predecessors. Moreover, a NUMA-oblivious port of SkyAlign outperforms native multicore state of the art on challenging workloads by an increasing margin as more cores and sockets are utilised.

关键词： Work-efficiency GPGPU Multicore parallel algorithms Skyline operator Data structures

来源：评论

学校读者我要写书评

暂无评论

VTK-m: Accelerating the Visualization Toolkit for Massively Threaded Architectures

引用

IEEE COMPUTER GRAPHICS AND APPLICATIONS 2016年第3期36卷 48-58页

作者： Moreland, Kenneth Sewell, Christopher Usher, William Lo, Li-ta Meredith, Jeremy Pugmire, David Kress, James Schroots, Hendrik Ma, Kwan-Liu Childs, Hank Larsen, Matthew Chen, Chun-Ming Maynard, Robert Geveci, Berk Sandia Natl Labs Livermore CA 94550 USA Los Alamos Natl Lab Comp & Computat Sci Div Los Alamos NM 87545 USA Univ Utah Sci Comp & Imaging Inst Salt Lake City UT 84112 USA Los Alamos Natl Lab Los Alamos NM 87545 USA Oak Ridge Natl Lab Oak Ridge TN 37831 USA Univ Oregon Comp Sci Eugene OR 97403 USA Intel Corp Santa Clara CA 95051 USA Univ Calif Davis Comp Sci Davis CA 95616 USA Univ Oregon Eugene OR 97403 USA Lawrence Livermore Natl Lab Livermore CA 94550 USA Ohio State Univ Dept Comp Sci & Engn Columbus OH 43210 USA Kitware Clifton Pk NY 12065 USA Kitware Sci Comp Clifton Pk NY 12065 USA

One of the most critical challenges for high-performance computing (HPC) scientific visualization is execution on massively threaded processors. Of the many fundamental changes we are seeing in HPC systems, one of the most profound is a reliance on new processor types optimized for execution bandwidth over latency hiding. Our current production scientific visualization software is not designed for these new types of architectures. To address this issue, the VTK-m framework serves as a container for algorithms, provides flexible data representation, and simplifies the design of visualization algorithms on new and future computer architecture.

关键词： Data Visualization Algorithm Design And Analysis Software Engineering Message Systems parallel Processing Computer Architecture Computational Modeling VTK M Framework Computer Graphics High Performance Computing Visualization Software parallel algorithms Algorithmic Structures Massively Threaded Processors

来源：评论

学校读者我要写书评

暂无评论

Mesh-free data transfer algorithms for partitioned multiphysics problems: Conservation, accuracy, and parallelism

引用

JOURNAL OF COMPUTATIONAL PHYSICS 2016年 307卷 164-188页

作者： Slattery, Stuart R. Oak Ridge Natl Lab Comp Sci & Math Div Computat Engn & Energy Sci Grp 1 Bethel Valley Rd Oak Ridge TN 37831 USA

In this paper we analyze and extend mesh-free algorithms for three-dimensional data transfer problems in partitioned multiphysics simulations. We first provide a direct comparison between a mesh-based weighted residual method using the common-refinement scheme and two mesh-free algorithms leveraging compactly supported radial basis functions: one using a spline interpolation and one using a moving least square reconstruction. Through the comparison we assess both the conservation and accuracy of the data transfer obtained from each of the methods. We do so for a varying set of geometries with and without curvature and sharp features and for functions with and without smoothness and with varying gradients. Our results show that the mesh-based and mesh-free algorithms are complementary with cases where each was demonstrated to perform better than the other. We then focus on the mesh-free methods by developing a set of algorithms to parallelize them based on sparse linear algebra techniques. This includes a discussion of fast parallel radius searching in point clouds and restructuring the interpolation algorithms to leverage data structures and linear algebra services designed for large distributed computing environments. The scalability of our new algorithms is demonstrated on a leadership class computing facility using a set of basic scaling studies. These scaling studies show that for problems with reasonable load balance, our new algorithms for both spline interpolation and moving least square reconstruction demonstrate both strong and weak scalability using more than 100,000 MPI processes with billions of degrees of freedom in the data transfer operation. (C) 2015 Elsevier Inc. All rights reserved.

关键词： Data transfer Multiphysics parallel algorithms Moving least square Spline interpolation

来源：评论

学校读者我要写书评

暂无评论

Some recent advances in automated analysis

引用

INTERNATIONAL JOURNAL ON SOFTWARE TOOLS FOR TECHNOLOGY TRANSFER 2016年第2期18卷 121-128页

作者： Abraham, Erika Havelund, Klaus Rhein Westfal TH Aachen Aachen Germany CALTECH Jet Prop Lab Pasadena CA USA

Due to the increasing complexity of software systems, there is a growing need for automated and scalable software synthesis and analysis. In the last decade, active research in the formal methods community brought interesting results and valuable tools. However, there are still challenges to face and hard problems that need to be solved. We briefly outline some recent trends, and review some of the latest achievements, introducing six papers selected from the 20th International Conference on Tools and algorithms for the Construction and Analysis of Systems (TACAS 2014).

关键词： Analysis parallel algorithms Satisfiability modulo theories Runtime verification Probabilistic systems

来源：评论

学校读者我要写书评

暂无评论

A parallel edge orientation algorithm for quadrilateral meshes

A parallel edge orientation algorithm for quadrilateral mesh...

引用

作者： Homolya, M. Ham, D.A. Grantham Institute Imperial College London LondonSW7 2AZ United Kingdom Department of Computing Imperial College London LondonSW7 2AZ United Kingdom Department of Mathematics Imperial College London LondonSW7 2AZ United Kingdom

One approach to achieving correct finite element assembly is to ensure that the local orientation of facets relative to each cell in the mesh is consistent with the global orientation of that facet. Rognes et al. have shown how to achieve this for any mesh composed of simplex elements, and *** contains a serial algorithm for constructing a consistent orientation of any quadrilateral mesh of an orientable manifold. The core contribution of this paper is the extension of this algorithm for distributed memory parallel computers, which facilitates its seamless application as part of a parallel simulation system. Furthermore, our analysis establishes a link between the well-known Union-Find algorithm and the construction of a consistent orientation of a quadrilateral mesh. As a result, existing work on the parallelization of the Union-Find algorithm can be easily adapted to construct further parallel algorithms for mesh orientations. © 2016 SIAM. Published by SIAM.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel algorithm to find collision in merkle-damgard construction with fixed point for 2n=2 k work

Parallel algorithm to find collision in merkle-damgard const...

引用

2016 International Seminar on Application of Technology for Information and Communication, ISEMANTIC 2016

作者： Sofu, Risqi Y.S Windarta, Susila Sekolah Tinggi Sandi Negara Bogor Indonesia

ISBN: (纸本)9781509023264

In this paper we provide method a collision attack on all n-bit iterated hash functions with Merkle-Damgard construction use parallel algorithm, allowing a collision to be found for a 2n block message and k-sum of computer with about 2n=2 k work. Davies- Meyer scheme using SIMECK-32 algorithm as an example, our attack can find a collision for a 232 bit total output with 8 computer become 213 work for each computer. The result of this research is plaintext that meets the characteristics of fixed point that does not affect the plaintext hash value because the resulting output is the used IV value itself. Plaintext is used to construct collision. Apparently the result of the application of the Davies-Meyer scheme is not resistant to collision attack because there are three fixed point in the two IV samples which are used.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：