检索结果-内蒙古大学图书馆

On scalable parallel recursive backtracking

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2015年 84卷 65-75页

作者： Abu-Khzam, Faisal N. Daudjee, Khuzaima Mouawad, Amer E. Nishimura, Naomi Lebanese Amer Univ Dept Comp Sci & Math Beirut Lebanon Univ Waterloo David R Cheriton Sch Comp Sci Waterloo ON N2L 3G1 Canada

Supercomputers are equipped with an increasingly large number of cores to use computational power as a way of solving problems that are otherwise intractable. Unfortunately, getting serial algorithms to run in parallel to take advantage of these computational resources remains a challenge for several application domains. Many parallel algorithms can scale to only hundreds of cores. The limiting factors of such algorithms are usually communication overhead and poor load balancing. Solving NP-hard graph problems to optimality using exact algorithms is an example of an area in which there has so far been limited success in obtaining large scale parallelism. Many of these algorithms use recursive backtracking as their core solution paradigm. In this paper, we propose a lightweight, easy-to-use, scalable approach for transforming almost any recursive backtracking algorithm into a parallel one. Our approach incurs minimal communication overhead and guarantees a load-balancing strategy that is implicit, i.e., does not require any problem-specific knowledge. The key idea behind our approach is the use of efficient traversal operations on an indexed search tree that is oblivious to the problem being solved. We test our approach with parallel implementations of algorithms for the well-known Vertex Cover and Dominating Set problems. On sufficiently hard instances, experimental results show nearly linear speedups for thousands of cores, reducing running times from days to just a few minutes. (C) 2015 Elsevier Inc. All rights reserved.

关键词： parallel algorithms Recursive backtracking Load balancing Vertex cover Dominating set

来源：评论

学校读者我要写书评

暂无评论

parallel Experiments with RARE-BLAS

Parallel Experiments with RARE-BLAS

引用

International Symposium on Symbolic and Numeric algorithms for Scientific Computing (SYNASC)

作者： Chemseddine Chohra Philippe Langlois David Parello DALI Laboratoire d'Informatique Robotique et de Microélectronique de Montpellier Montpellier CNRS France

ISBN: (纸本)9781509057085

Numerical reproducibility failures rise in parallel computation because of the non-associativity of floating-point summation. Optimizations on massively parallel systems dynamically modify the floating-point operation order. Hence, numerical results may change from one run to another. We propose to ensure reproducibility by extending as far as possible the IEEE-754 correct rounding property to larger operation sequences. Our RARE-BLAS (Reproducible, Accurately Rounded and Efficient BLAS) benefits from recent accurate and efficient summation algorithms. Solutions for level 1 (asum, dot and nrm2) and level 2 (gemv) routines are provided. We compare their performance to the Intel MKL library and to other existing reproducible algorithms. For both shared and distributed memory parallel systems, we exhibit an extra-cost of 2× in the worst case scenario, which is satisfying for a wide range of applications. For Intel Xeon Phi accelerator a larger extra-cost (4× to 6×) is observed, which is still helpful at least for debugging and validation.

关键词： Sockets Instruction sets Libraries Bandwidth Scalability Debugging parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

FPGA based parallel implementation of morphological filters

FPGA based parallel implementation of morphological filters

引用

International Conference on Microelectronics, Computing and Communications (MicroCom)

作者： Debasish Mukherjee Susanta Mukhopadhyay G. P. Biswas Department of Computer Science and Engineering Indian School of Mines Dhanbad India

ISBN: (纸本)9781467366229

This paper presents a parallel algorithm and its hardware architecture for implementing 2-D gray-scale morphological operations namely dilation and erosion using rectangular flat top structuring elements. The proposed architecture supports parallel extension whereby throughput and processing frame rate is enhanced. The architecture is fully generic and runtime programmable with respect to image size and structuring elements size respectively. The main advantage of the architecture is its low latency, lower internal memory requirements, higher processing frame rate and throughput which makes it more amenable to real time applications. Additionally, it makes use of stream processing which eliminates the need for buffering image data, whereby memory overhead is minimized. The architecture has been synthesized using Xilinx Design Suite 14.2 ISE and prototyped on Virtex 5 FPGA Board and verified using xilinx ISIM Simulator. The proposed architecture has been tested for images of varied gray-scale geometric dimension and the results shows satisfactory performance.

关键词： Computer architecture Hardware parallel algorithms Field programmable gate arrays Clocks Signal generators Gray-scale

来源：评论

学校读者我要写书评

暂无评论

parallel COMPUTATION OF ENTRIES OF A^-1

引用

SIAM JOURNAL ON SCIENTIFIC COMPUTING 2015年第2期37卷 C268-C284页

作者： Amestoy, Patrick R. Duff, Iain S. L'Excellent, Jean-Yves Rouet, Francois-Henry Univ Toulouse INPT ENSEEIHT IRIT F-31071 Toulouse France CERFACS F-31057 Toulouse France Rutherford Appleton Lab Didcot OX11 0QX Oxon England Univ Lyon Inria F-69364 Lyon 07 France Univ Lyon Lab LIP UMR CNRS ENS Lyon Inria 5668 F-69364 Lyon 07 France Univ Calif Berkeley Lawrence Berkeley Natl Lab Berkeley CA 94720 USA

In this paper, we consider the computation in parallel of several entries of the inverse of a large sparse matrix. We assume that the matrix has already been factorized by a direct method and that the factors are distributed. Entries are efficiently computed by exploiting sparsity of the right-hand sides and the solution vectors in the triangular solution phase. We demonstrate that in this setting, parallelism and computational efficiency are two contrasting objectives. We develop an efficient approach and show its efficiency on a general purpose parallel multifrontal solver.

关键词： sparse matrices direct methods for linear system and matrix inversion parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A General Purpose Branch and Bound parallel Algorithm

A General Purpose Branch and Bound Parallel Algorithm

引用

Euromicro Conference on parallel, Distributed and Network-Based Processing

作者： Alexandros C. Dimopoulos Christos Pavlatos George Papakonstantinou Harokopio University of Athens Tavros Greece Hellenic Air Force Academy Athens Greece National Technical University of Athens Zografou Greece

In this paper a parallel algorithm for branch and bound applications is proposed. The algorithm is a general purpose one and it can be used to parallelize effortlessly any sequential branch and bound style algorithm, that is written in a certain format. It is a distributed dynamic scheduling algorithm, i.e. each node schedules the load of its cores, it can be used with different programming platforms and architectures and is a hybrid algorithm (OpenMP, MPI). To prove its validity and efficiency the proposed algorithm has been implemented and tested with numerous examples in this paper that are described in detail. A speed-up of about 9 has been achieved for the tested examples, for a cluster of three nodes with four cores each.

关键词： Instruction sets Clustering algorithms Arrays Heuristic algorithms Partitioning algorithms parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Fast parallel Operations on Search Trees

Fast Parallel Operations on Search Trees

引用

International Conference on High Performance Computing

作者： Yaroslav Akhremtsev Peter Sanders KIT Institute of Theoretical Informatics Karlsruhe Germany

ISBN: (纸本)9781509054121

Using (a, b)-trees as an example, we show how to perform a parallel split with logarithmic latency and parallel join, bulk updates, intersection, union (or merge), and (symmetric) set difference with logarithmic latency and with information theoretically optimal work. We present both asymptotically optimal solutions and simplified versions that perform well in practice - they are several times faster than previous implementations.

关键词： Vegetation Data structures Phase change random access memory Fuses Informatics Electronic mail parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel computation of k-nearest neighbor joins using MapReduce

Parallel computation of k-nearest neighbor joins using MapRe...

引用

IEEE International Conference on Big Data

作者： Wooyeol Kim Younghoon Kim Kyuseok Shim Seoul National University Hanyang University

ISBN: (纸本)9781467390064

The k-nearest neighbor (kNN) join has recently attracted considerable attention due to its broad applications. However, processing fcNN joins is very expensive due to the quadratic nature of the join operation. Furthermore, since there is an increasing trend of applications to deal with big data, computing fcNN joins becomes more challenging. In order to process such big data, parallel and distributed computing using MapReduce recently have received a lot of attention. In this paper, we propose the efficient parallel algorithm KNN-MR to process the fcNN joins using MapReduce. To reduce not only the computational cost of fcNN joins but also the network cost of communicating across machines, we develop the novel vector projection pruning which enables us to identify non-fcNN points that are guaranteed not to be included in the result of a fcNN join. Our performance study confirms the effectiveness and scalability of the proposed algorithm.

关键词： Partitioning algorithms Big data Algorithm design and analysis Approximation algorithms Clustering algorithms parallel algorithms Market research

来源：评论

学校读者我要写书评

暂无评论

parallel N-Body Simulation Based on the PM and P3M Methods Using Multigrid Schemes in conjunction with Generic Approximate Sparse Inverses

引用

MATHEMATICAL PROBLEMS IN ENGINEERING 2015年第1期2015卷 1-12页

作者： Kyziropoulos, P. E. Filelis-Papadopoulos, C. K. Gravvanis, G. A. Democritus Univ Thrace Sch Engn Dept Elect & Comp Engn GR-67100 Xanthi Greece

During the last decades, Multigrid methods have been extensively used for solving large sparse linear systems. Considering their efficiency and the convergence behavior, Multigrid methods are used in many scientific fields as solvers or preconditioners. Herewith, we propose two hybrid parallel algorithms for N-Body simulations using the Particle Mesh method and the Particle Particle Particle Mesh method, respectively, based on the V-Cycle Multigrid method in conjunction with Generic Approximate Sparse Inverses. The N-Body problem resides in a three-dimensional torus space, and the bodies are subject only to gravitational forces. In each time step of the above methods, a large sparse linear system is solved to compute the gravity potential at each nodal point in order to interpolate the solution to each body. Then the Velocity Verlet method is used to compute the new position and velocity from the acceleration of each respective body. Moreover, a parallel Multigrid algorithm, with a truncated approach in the levels computed in parallel, is proposed for solving large linear systems. Furthermore, parallel results are provided indicating the efficiency of the proposed Multigrid N-Body scheme. Theoretical estimates for the complexity of the proposed simulation schemes are provided.

关键词： N-body simulations (Astronomy) APPROXIMATION theory MULTIGRID methods (Numerical analysis) GRAVITATION PARTICLE acceleration parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

On the Energy Complexity of parallel algorithms

On the Energy Complexity of Parallel Algorithms

引用

International Conference on parallel Processing (ICPP)

作者： Vijay Anand Korthikanti Gul Agha Mark Greenstreet Department of Computer Science University of Illinois Urbana-Champaign USA Department of Computer Science University of British Columbia Canada

For a given algorithm, the energy consumed in executing the algorithm has a nonlinear relationship with performance. In case of parallel algorithms, energy use and performance are functions of the structure of the algorithm. We define the asymptotic energy complexity of algorithms which models the minimum energy required to execute a parallel algorithm for a given execution time as a function of input size. Our methodology provides us with a way of comparing the orders of (minimal) energy required for different algorithms and can be used to define energy complexity classes of parallel algorithms.

关键词： parallel algorithms Complexity theory Time frequency analysis Energy consumption Arrays Computational modeling Multicore processing

来源：评论

学校读者我要写书评

暂无评论

Louvain community detection with parallel heuristics on GPUs

Louvain community detection with parallel heuristics on GPUs

引用

IEEE International Conference on Intelligent Engineering Systems (INES)

作者： Richard Forster Faculty of Informatics Eotvos Lorand University Budapest Hungary

Community detection has become an important operation in numerous graph based applications. It is used to reveal groups that exist within real world networks without imposing prior size or cardinality constraints on the set of communities. Despite its potential, the support for parallel computers is rather limited. This is largely because the algorithm is irregular and the underlying heuristics imply a sequential nature. In this paper I present parallelization heuristics for fast community detection using the Louvain method as it is applied on GPUs. The Louvain method is a multi-phase, iterative heuristic for modularity optimization. It was originally developed by Blondel et al. (2008), the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. The parallel heuristics used, were first introduced by Hao Lu et al. (2015). As the Louvain method is inherently sequential, it limits the possibility of scalable usage. Thanks to the proposed parallel heuristics, I observe how this method can behave on GPUs. For evaluation I implemented the heuristics using CUDA on a GeForce GTX 980M GPU and for testing I used organization landscapes from the CERN developed Collaboration Spotting project that involves patents and publications to visualize the connections in technologies among its collaborators. Compared to the parallel Louvain implementation running on 8 threads on the same machine that has the used GPU, the CUDA implementation is able to produce community outputs comparable to the CPU generated results, while providing absolute speedups of up to 12 using the GeForce GTX 980M mobile GPU.

关键词： Graphics processing units Computer architecture Instruction sets parallel algorithms Heuristic algorithms Partitioning algorithms Image edge detection

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：