检索结果-内蒙古大学图书馆

An evaluation of adaptive numerical integration algorithms on parallel systems

parallel algorithms and Applications 2003年第1-2期18卷 27-47页

作者： Schürer, Rudolf Uhl, Andreas Department of Mathematics University of Salzburg Hellbrunner str. 34 A-5020 Salzburg Austria Department of Scientific Computing University of Salzburg Salzburg Austria

parallel adaptive algorithms for the approximation of a multi-dimensional integral over an hyper-rectangular region are described. algorithms with centralized global region collection are compared to algorithms using local region collections. The latter algorithms should result in better scalability since global communication is avoided. Both types of algorithms are compared to quasi-Monte Carlo integration. Tests are performed using Genz's test functions and speed-up results are given.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Derivation of a parallel string matching algorithm

引用

INFORMATION PROCESSING LETTERS 2003年第5期85卷 255-260页

作者： Misra, J Univ Texas Austin TX 78712 USA

We derive an efficient parallel algorithm to find all occurrences of a pattern string in a subject string in O(log n) time, where n is the length of the subject string. The number of processors employed is of the order of the product of the two string lengths. The theory of powerlists [J. Kornerup, PhD Thesis, 1997;J. Misra, ACM Trans. Programming Languages Systems 16 (16) (1994) 1737-1740] is central to the development of the algorithm and its algebraic manipulations. (C) 2002 Elsevier Science B.V. All rights reserved.

关键词： parallel algorithms string data structures powerlist

来源：评论

学校读者我要写书评

暂无评论

The merits of a parallel genetic algorithm in solving hard optimization problems

引用

JOURNAL OF BIOMECHANICAL ENGINEERING-TRANSACTIONS OF THE ASME 2003年第1期125卷 141-146页

作者： van Soest, AJK Casius, LJRR Free Univ Amsterdam Fac Human Movement Sci Inst Fundamental & Clin Human Movement Sci NL-1081 BT Amsterdam Netherlands

A parallel genetic algorithm for optimization is outlined, and its performance on both mathematical and biomechanical optimization problems is compared to a sequential quadratic programming algorithm, a downhill simplex algorithm and a simulated annealing algorithm. When high-dimensional non-smooth or discontinuous problems with numerous local optima are considered, only the simulated annealing and the genetic algorithm, which are both characterized by a weak search heuristic, are successful in finding the optimal region in parameter space. The key advantage of the genetic algorithm is that it can easily be parallelized at negligible overhead.

关键词： quadratic programming Errors Optimization simulated annealing parallel algorithms algorithms Biomechanics Simulated annealing Genetic algorithms Quadratic programming biomechanics genetic algorithms

来源：评论

学校读者我要写书评

暂无评论

Testing parallel random number generators

引用

parallel COMPUTING 2003年第1期29卷 69-94页

作者： Srinivasan, A Mascagni, M Ceperley, D Florida State Univ Dept Comp Sci Tallahassee FL 32308 USA Univ Illinois Natl Ctr Supercomp Applicat Urbana IL 61801 USA

Monte Carlo computations are considered easy to parallelize. However, the results can be adversely affected by defects in the parallel pseudorandom number generator used. A parallel pseudorandom number generator must be tested for two types of correlations-(i) intrastream correlation, as for any sequential generator, and (ii) inter-stream correlation for correlations between random number streams on different processes. Since bounds on these correlations are difficult to prove mathematically, large and thorough empirical tests are necessary. Many of the popular pseudorandom number generators in use today were tested when computational power was much lower, and hence they were evaluated with much smaller test sizes. This paper describes several tests of pseudorandom number generators, both statistical and application-based. We show defects in several popular generators. We describe the implementation of these tests in the SPRNG [ACM Trans. Math. Software 26 (2000) 436;SPRNG-scalable parallel random number generators. SPRNG 1.0-http: //www. ncsa. uiuc, edu/ Apps/SPRNG;SPRNG 2. 0-http: //sprng. cs, fsu. edu] test suite and also present results for the tests conducted on the SPRNG generators. These generators have passed some of the largest empirical random number tests. (C) 2002 Elsevier Science B.V. All rights reserved.

关键词： parallel random number generators random number tests parallel algorithms random number software

来源：评论

学校读者我要写书评

暂无评论

Solving awari with parallel retrograde analysis

引用

COMPUTER 2003年第10期36卷 26-+页

作者： Romein, JW Bal, HE Free Univ Amsterdam Amsterdam Netherlands

In awari, a two-person game of pure skill, players sow stones into pits on a board. The game's rules define how to capture stones, and the player who captures the most wins the game. For more than a decade, researchers have studied computerized techniques to play awari. The authors have now solved the game by determining the score of 889,063,398,406 board positions and storing them in databases. They performed the necessary computations on a 144-processor parallel computer with 72 gigabytes of main memory and a fast Myrinet interconnect.

关键词： Concurrent computing Databases Africa parallel algorithms Clustering algorithms Clocks

来源：评论

学校读者我要写书评

暂无评论

An efficient parallel algorithm with application to computational fluid dynamics

引用

COMPUTERS & MATHEMATICS WITH APPLICATIONS 2003年第1-3期45卷 165-188页

作者： Rivera, W Zhu, JP Huddleston, D Univ Akron Dept Theoret & Appl Math Akron OH 44224 USA Univ Puerto Rico Dept Elect & Comp Engn Mayaguez PR 00680 USA Mississippi State Univ Dept Civil Engn Mississippi State MS 39762 USA

When solving time-dependent partial differential equations on parallel computers using the nonoverlapping domain decomposition method, one often needs numerical boundary conditions on the boundaries between subdomains. These numerical boundary conditions can significantly affect the stability and accuracy of the final algorithm. In this paper, a stability and accuracy analysis of the existing methods for generating numerical boundary conditions will be presented, and a new approach based on explicit predictors and implicit correctors will be used to solve convect ion-diffusion equations on parallel computers, with application to aerospace engineering for the solution of Euler equations in computational fluid dynamics simulations. Both theoretical analyses and numerical results demonstrate significant improvement in stability and accuracy by using the new approach. (C) 2003 Elsevier Science Ltd. All rights reserved.

关键词： time lagging explicit predictor domain decomposition parallel algorithms partial differential equations

来源：评论

学校读者我要写书评

暂无评论

Constructing H4, a fast depth-size optimal parallel prefix circuit

引用

JOURNAL OF SUPERCOMPUTING 2003年第3期24卷 279-304页

作者： Lin, YC Hsu, YH Liu, CK Natl Taiwan Univ Sci & Technol Dept Comp Sci & Informat Engn Taipei 106 Taiwan Natl Taiwan Univ Sci & Technol Dept Elect Engn Taipei 106 Taiwan

Given n values x(1), x(2),...,x(n) and an associative binary operation x, the prefix problem is to compute x(1) x(2) x...x x(i), 1 less than or equal to i less than or equal to n. Prefix circuits are combinational circuits for solving the prefix problem. For any n-input prefix circuit D with depth d and size s, if d + s = 2 n-2, then D is depth-size optimal. In general, a prefix circuit with a small depth is faster than one with a large depth. For prefix circuits with the same depth, a prefix circuit with a smaller fan-out occupies less area and is faster in VLSI implementation. This paper is on constructing parallel prefix circuits that are depth-size optimal with small depth and small fan-out. We construct a depth-size optimal prefix circuit H 4 with fan-out 4. It has the smallest depth among all known depth-size optimal prefix circuits with a constant fan-out;furthermore, when n greater than or equal to 136, its depth is less than, or equal to, those of all known depth-size optimal prefix circuits with unlimited fan-out. A size lower bound of prefix circuits is also derived. Some properties related to depth-size optimality and size optimality are introduced;they are used to prove that H 4 is depth-size optimal.

关键词： depth depth-size optimal fan-out parallel algorithms prefix circuits size optimal

来源：评论

学校读者我要写书评

暂无评论

Coarse-grained parallel transitive closure algorithm: Path decomposition technique

引用

COMPUTER JOURNAL 2003年第4期46卷 391-400页

作者： Gibbons, A Pagourtzis, A Potapov, I Rytter, W Kings Coll London Dept Comp Sci London WC2R 2LS England Univ Liverpool Dept Comp Sci Liverpool L69 7ZF Merseyside England Natl Tech Univ Athens Dept Elect & Comp Engn GR-10682 Athens Greece Warsaw Univ Inst Informat Warsaw Poland

We investigate the relation between fine-grained and coarse-grained distributed computations of a class of problems related to the generic transitive closure problem (TC for short). We choose an intricate systolic algorithm for the TC problem, by Guibas, Kung and Thompson (GKT algorithm for short), as a starting point due to its particularly close relationship to matrix multiplication. The GKT algorithm reduces the TC problem to three successive parallel matrix multiplications. We extract the main ideas of this algorithm, namely different path decompositions related to min-paths and max-paths computations and devise a two-pass parallel algorithm, such that the second pass is purely a triangular matrix multiplication involving exactly 1/3 of the total number of elementary operations (multiplying two single elements of the matrix). This is helpful in coarse-grained parallel computations since matrix multiplication is well parallelizable. A novel approach is used and as a first result a more efficient and simpler two-pass fine-grained algorithm is designed. The second result is a non-trivial transformation of this fine-grained algorithm into a coarse-grained (and more practical) version. The full proof of correctness of the transformation, which is presented in the appendices, is quite complex and is the hardest result of the paper. Our algorithms are specially structured to directly show the correspondence between the main fine-grained and the main coarse-grained operations.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Computational experience with sequential and parallel, preconditioned Jacobi-Davidson for large, sparse symmetric matrices

引用

JOURNAL OF COMPUTATIONAL PHYSICS 2003年第1期188卷 318-331页

作者： Bergamaschi, L Pini, G Sartoretto, F Univ Venice Dipartimento Informat I-30171 Mestre VE Italy Univ Padua Dipartimento Metodi & Modelli Matemat Sci Applica I-35131 Padua Italy

The Jacobi-Davidson (JD) algorithm was recently proposed for evaluating a number of the eigenvalues of a matrix. JD goes beyond pure Krylov-space techniques;it cleverly expands its search space, by solving the so-called correction equation, thus in. principle providing a more powerful method. Preconditioning the Jacobi-Davidson correction equation is mandatory when large, sparse matrices are analyzed. We considered several preconditioners: Classical block-Jacobi, and IC(0), together with approximate inverse (AIN-V or FSAI) preconditioners. The rationale for using approximate inverse preconditioners is their high parallelization potential, combined with their efficiency in accelerating the iterative solution of the correction equation. Analysis was carried on the sequential performance of preconditioned JD for the spectral decomposition of large, sparse matrices, which originate in the numerical integration of partial differential equations arising in physical and engineering problems. It was found that JD is highly sensitive to preconditioning, and it can display an irregular convergence behavior. We parallelized JD by data-splitting techniques, combining them with techniques to reduce the amount of communication data. Our own parallel, preconditioned code was executed on a dedicated parallel machine, and we present the results of our experiments. Our JD code provides an appreciable parallel degree of computation. Its performance was also compared with those of PARPACK and parallel DACG. (C) 2003 Elsevier Science B.V. All rights reserved.

关键词： eigenvalues sparse approximate inverses parallel algorithms Jacobi-Davidson method

来源：评论

学校读者我要写书评

暂无评论

SPMD cluster-based parallel 3D OSEM

引用

IEEE TRANSACTIONS ON NUCLEAR SCIENCE 2003年第5期50卷 1498-1502页

作者： Jones, JP Jones, WF Kehren, F Newport, DF Reed, JH Lenox, MW Baker, K Byars, LG Michel, C Casey, ME CPS Innovat Knoxville TN 37932 USA Concorde Microsyst Knoxville TN 37932 USA Byars Consulting Knoxville TN 37932 USA

This study empirically compares two approaches to parallel 3-D OSEM that differ as to whether calculations are assigned to nodes by projection number or by transaxial plane number. For projection space decomposition (PSD), the forward projection is completely parallel, but backprojection requires a slow image synchronization. For image space decomposition (ISD), the communication associated with forward projection can be overlapped with calculation, and the communication associated with backprojection is more efficient. To compare these methods, an implementation of 3-D OSEM for three PET scanners is developed that runs on an experimental 9-node, 18-processor cluster computer. For selected benchmarks, both methods exhibit speedups in excess of eight or nine nodes, and comparable performance for the tested range of cluster sizes.

关键词： biomedical image processing image reconstruction parallel algorithms parallel processing positron emission tomography

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：