检索结果-内蒙古大学图书馆

Bulk synchronous parallel algorithms for the external memory model

THEORY OF COMPUTING SYSTEMS 2002年第6期35卷 567-597页

作者： Dehne, F Dittrich, W Hutchinson, D Maheshwari, A Carleton Univ Sch Comp Sci Ottawa ON K1S 5B6 Canada Bosch Telecom GmbH UC ON ERS D-71522 Backnang Germany Duke Univ Dept Comp Sci Durham NC 27708 USA

Blockwise access. to data is a central theme in the design of efficient external memory (EM) algorithms. A second important issue, when more than one disk is present, is fully parallel disk I/O. In this paper we present a simple, deterministic simulation technique which transforms certain Bulk Synchronous parallel (BSP) algorithms into efficient parallel EM algorithms. It optimizes blockwise data access and parallel disk I/O and, at the same time, utilizes multiple processors connected via a communication network or shared memory. We obtain new improved parallel EM algorithms for a large number of problems including sorting, permutation, matrix transpose, several geometric and GIS problems including three-dimensional convex hulls (two-dimensional Voronoi diagrams), and various graph problems. We show that certain parallel algorithms known for the BSP model can be used to obtain EM algorithms that meet well known I/O complexity lower bounds for various problems, including sorting.

关键词： COMPUTER storage devices parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Exploiting multi-grained parallelism in reconfigurable SBC architectures

Exploiting multi-grained parallelism in reconfigurable SBC a...

引用

13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines

作者： Zambreno, J Honbo, D Choudhary, A Northwestern Univ Dept Elect & Comp Engn Evanston IL 60208 USA

ISBN: (纸本)0769524451

In recent years, reconfigurable technology has emerged as a popular choice for implementing various types of cryptographic functions. Nevertheless, an insufficient amount effort has been placed into fully exploiting the tremendous amounts of parallelism intrinsic to FPGAs for this class of algorithms. In this paper, we focus on block cipher architectures and explore design decisions that leverage the multi-grained parallelism inherent in many of these algorithms. We demonstrate the usefulness of this approach with a highly parallel FPGA implementation of the AES standard, and present results detailing the area/delay tradeoffs resulting from our design decisions.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Mining traces of large scale systems 1

引用

6th International Conference on algorithms and Architectures for parallel Processing

作者： Cérin, C Koskas, M Univ Paris 13 LIPN CNRS UMR 7030 F-93430 Villetaneuse France Univ Picardie LaMFA CNRS UMR 6140 F-80039 Amiens France

ISBN: (数字)9783540320715

ISBN: (纸本)3540292357

Large scale distributed computing infrastructure captures the use of high number of nodes, poor communication performance and continously varying resources that are not available at any time. In this paper, we focus on the different tools available for mining traces of the activities of such aforementioned architecture. We propose new techniques for fast management of a frequent itemset mining parallel algorithm. The technique allow us to exhibit statistical results about the activity of more that one hundred PCs connected to the web.

关键词： parallel algorithms global computing platforms meta-data data mining application high performance and distributed databases trace analysis data management resource management

来源：评论

学校读者我要写书评

暂无评论

parallelizing particle swarm optimization

Parallelizing particle swarm optimization

引用

IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing

作者： Li, B Wada, K Univ Tsukuba Dept Comp Sci Tsukuba Ibaraki 305 Japan

ISBN: (纸本)0780391950

This paper focuses on a parallel version of particle swarm optimization (PSO) algorithm which can significantly reduces execution time for solving complex large-scale optimization problems. This paper gives an overview of PSO algorithm, and then proposes a design and an implementation of parallel PSO. The proposed algorithm eliminates redundant synchronizations and optimizes message transfer to overlap communication with computation. The experimental results showed that 13.2 times speedup was obtained by the proposed parallel PSO algorithm with 14 processors.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel algorithm and its proof for block-tridiagonal linear systems

引用

Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University 2005年第6期23卷 817-820页

作者： Cui, Xining Lu, Quanyi Department of Applied Mathematics Northwestern Polytechnical University Xi'an 710072 China

There exists a parallel algorithm for block-tridiagonal linear systems[1]. We aim to present a different parallel algorithm for such systems. For our algorithm, like Ref 1, we give convergence proof when the coefficient matrix is a M-matrix;but unlike Ref. 1, we give also convergence proof for our algorithm when the coefficient matrix is a positive definite matrix, whereas Ref. 1 did not give convergence proof in such a case. We present a parallel algorithm of two-stage iterative method for solving large block-tridiagonal linear systems Ax = b on distributed-memory multi-computer. Furthermore, we give convergence proofs when the PEk method or GS method is applied to the inner iteration and the coefficient matrix A is a symmetric positive definite matrix or a M-matrix respectively. Finally, we give a numerical example, for which we give tabulated results of three cases;(1) our algorithm when PE inner iteration (PE is one kind of PEk) is used;(2) our algorithm when GS inner iteration is used;(3) the multi-splitting algorithm of Ref. 1. Numerical results indicate preliminarily that the time needed by our algorithm is less than that of Ref. 1's algorithm and the efficiency of our algorithm is higher than that of Ref. 1's algorithm.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Highly latency tolerant Gaussian elimination 05

Highly latency tolerant Gaussian elimination

引用

6th International Workshop on Grid Computing

作者： Endo, T Taura, K Univ Tokyo Tokyo Japan

ISBN: (纸本)0780394925

Large latencies over WAN will remain an obstacle to running communication intensive parallel applications on Grid environments. This paper takes one of such applications, Gaussian elimination of dense matrices and describes a parallel algorithm that is highly tolerant to latencies. The key technique is a pivoting strategy called batched pivoting;which requires much less frequent synchronizations than other methods. Although it is one of relaxed pivoting methods that may select other pivots than the 'best' ones;we show that it achieves good numerical accuracy. Through experiments with random matrices of the sizes of 64 to 49,152, botched pivoting achieves comparable numerical accuracy to that of partial pivoting. We also evaluate parallel execution speed of our implementation and show that it is much more tolerant to latencies than partial pivoting.

关键词： Delay Wide area networks Application software Distributed computing Supercomputers Bandwidth Equations Concurrent computing parallel algorithms Computer networks

来源：评论

学校读者我要写书评

暂无评论

A simple optimal randomized algorithm for sorting on the PDM

引用

16th International Symposium on algorithms and Computations (ISAAC 2005)

作者： Rajasekaran, S Sen, S Univ Connecticut Dept CSE Storrs CT 06269 USA Indian Inst Technol Kharagpur Dept CSE Kharagpur W Bengal India

ISBN: (纸本)3540309357

The parallel Disks Model (PDM) has been proposed to alleviate the I/O bottleneck that arises in the processing of massive data sets. Sorting has been extensively studied on the PDM model clue to the fundamental nature of the problem. Several randomized algorithms are known for sorting. Most of the prior algorithms suffer from undue complications in memory layouts, implementation, or lack of tight analysis. In this paper we present a simple randomized algorithm that sorts in optimal time with high probablity and has all the desirable features for practical implementation.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Lagrange-Newton-Krylov-Schur methods for PDE-constrained optimization. Part I: The Krylov-Schur solver

引用

SIAM JOURNAL ON SCIENTIFIC COMPUTING 2005年第2期27卷 687-713页

作者： Biros, G Ghattas, O NYU Courant Inst Math Sci Dept Comp Sci New York NY 10012 USA Carnegie Mellon Univ Dept Biomed Engn Ultrascale Simulat Lab Pittsburgh PA 15213 USA Carnegie Mellon Univ Dept Civil & Environm Engn Ultrascale Simulat Lab Pittsburgh PA 15213 USA

Large-scale optimization of systems governed by partial differential equations ( PDEs) is a frontier problem in scientific computation. Reduced quasi-Newton sequential quadratic programming (SQP) methods are state-of-the-art approaches for such problems. These methods take full advantage of existing PDE solver technology and parallelize well. However, their algorithmic scalability is questionable;for certain problem classes they can be very slow to converge. In this two-part article we propose a new method for steady-state PDE-constrained optimization, based on the idea of using a full space Newton solver combined with an approximate reduced space quasi-Newton SQP preconditioner. The basic components of the method are Newton solution of the first-order optimality conditions that characterize stationarity of the Lagrangian function;Krylov solution of the Karush - Kuhn - Tucker ( KKT) linear systems arising at each Newton iteration using a symmetric quasi-minimum residual method;preconditioning of the KKT system using an approximate state/decision variable decomposition that replaces the forward PDE Jacobians by their own preconditioners, and the decision space Schur complement ( the reduced Hessian) by a BFGS approximation initialized by a two- step stationary method. Accordingly, we term the new method Lagrange - Newton - Krylov - Schur (LNKS). It is fully parallelizable, exploits the structure of available parallel algorithms for the PDE forward problem, and is locally quadratically convergent. In part I of this two- part article, we investigate the effectiveness of the KKT linear system solver. We test our method on two optimal control problems in which the state constraints are described by the steady-state Stokes equations. The objective is to minimize dissipation or the deviation from a given velocity field;the control variables are the boundary velocities. Numerical experiments on up to 256 Cray T3E processors and on an SGI Origin 2000 include scalability and

关键词： sequential quadratic programming adjoint methods PDE constrained optimization optimal control Lagrange-Newton-Krylov-Schur methods Navier-Stokes finite elements preconditioners indefinite systems nonlinear equations parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Efficient parallelization of Spatial Approximation Trees

引用

5th International Conference on Computational Science (ICCS 2005)

作者： Marín, M Reyes, N Univ Magallanes Comp Cs Dept Punta Arenas Chile Univ San Luis Comp Cs Dept San Luis Argentina

ISBN: (纸本)3540260323

This paper describes the parallelization of the Spatial Approximation Tree. This data structure has been shown to be an efficient index structure for solving range queries in high-dimensional metric space databases. We propose a method for load balancing the work performed by the processors. The method is self-tuning and is able to dynamically follow changes in the work-load generated by user queries. Empirical results with different databases show efficient performance in practice. The algorithmic design is based on the use of the bulk-synchronous model of parallel computing.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Automatic tuning of PDGEMM towards optimal performance

Automatic tuning of PDGEMM towards optimal performance

引用

11th International Euro-Par Conference

作者： Hunold, S Rauber, T Univ Bayreuth Dept Math & Phys Bayreuth Germany

ISBN: (纸本)3540287000

Sophisticated parallel matrix multiplication algorithms like PDGEMM exhibit a complex structure and can be controlled by a large set of parameters including blocking factors and block sizes used for the serial execution on one of the participating processors. But it requires a deep understanding of both the parallel algorithm and the execution platform to select the parameters such that a minimum execution time results. In this article, we describe a simple mechanism that automatically selects a suitable set of parameters for PDGEMM which leads to a minimum execution time in most cases.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：