parallel adaptive algorithms for the approximation of a multi-dimensional integral over an hyper-rectangular region are described. algorithms with centralized global region collection are compared to algorithms using ...
详细信息
parallel adaptive algorithms for the approximation of a multi-dimensional integral over an hyper-rectangular region are described. algorithms with centralized global region collection are compared to algorithms using local region collections. The latter algorithms should result in better scalability since global communication is avoided. Both types of algorithms are compared to quasi-Monte Carlo integration. Tests are performed using Genz's test functions and speed-up results are given.
Issues of implementation of an object-oriented library for parallel interior-point methods are addressed. The solver can easily exploit an), special structure of the underlying optimization problem. In particular, it ...
详细信息
Issues of implementation of an object-oriented library for parallel interior-point methods are addressed. The solver can easily exploit an), special structure of the underlying optimization problem. In particular, it allows a nested embedding of structures and by this means very complicated real-life optimization problems can be modelled. The efficiency of the solver is illustrated on several problems arising in the optimization of networks. The sequential implementation outperforms the state-of-the-art commercial optimization software. The parallel implementation achieves speed-ups of about 3.1-3.9 on 4-processors parallel systems and speed-ups of about 10-12 on 16-processors parallel systems.
Recent developments in magnetic disk technology have made stored-integral techniques competitive with the currently more widely used direct methods, which involve the recalculation of the basic two-electron integrals....
详细信息
Recent developments in magnetic disk technology have made stored-integral techniques competitive with the currently more widely used direct methods, which involve the recalculation of the basic two-electron integrals. We present efficient conventional (all integrals stored) and semidirect Hartree-Fock and DFT algorithms with data compression for single-processor and distributed memory parallel computers, and compare them with the corresponding direct algorithms. On inexpensive modem personal computer-based hardware, the stored integral method is up to three times more efficient than the direct method in terms of total elapsed job time. (C) 2002 Wiley Periodicals, Inc.
We derive an efficient parallel algorithm to find all occurrences of a pattern string in a subject string in O(log n) time, where n is the length of the subject string. The number of processors employed is of the orde...
详细信息
We derive an efficient parallel algorithm to find all occurrences of a pattern string in a subject string in O(log n) time, where n is the length of the subject string. The number of processors employed is of the order of the product of the two string lengths. The theory of powerlists [J. Kornerup, PhD Thesis, 1997;J. Misra, ACM Trans. Programming Languages Systems 16 (16) (1994) 1737-1740] is central to the development of the algorithm and its algebraic manipulations. (C) 2002 Elsevier Science B.V. All rights reserved.
In awari, a two-person game of pure skill, players sow stones into pits on a board. The game's rules define how to capture stones, and the player who captures the most wins the game. For more than a decade, resear...
详细信息
In awari, a two-person game of pure skill, players sow stones into pits on a board. The game's rules define how to capture stones, and the player who captures the most wins the game. For more than a decade, researchers have studied computerized techniques to play awari. The authors have now solved the game by determining the score of 889,063,398,406 board positions and storing them in databases. They performed the necessary computations on a 144-processor parallel computer with 72 gigabytes of main memory and a fast Myrinet interconnect.
Monte Carlo computations are considered easy to parallelize. However, the results can be adversely affected by defects in the parallel pseudorandom number generator used. A parallel pseudorandom number generator must ...
详细信息
Monte Carlo computations are considered easy to parallelize. However, the results can be adversely affected by defects in the parallel pseudorandom number generator used. A parallel pseudorandom number generator must be tested for two types of correlations-(i) intrastream correlation, as for any sequential generator, and (ii) inter-stream correlation for correlations between random number streams on different processes. Since bounds on these correlations are difficult to prove mathematically, large and thorough empirical tests are necessary. Many of the popular pseudorandom number generators in use today were tested when computational power was much lower, and hence they were evaluated with much smaller test sizes. This paper describes several tests of pseudorandom number generators, both statistical and application-based. We show defects in several popular generators. We describe the implementation of these tests in the SPRNG [ACM Trans. Math. Software 26 (2000) 436;SPRNG-scalable parallel random number generators. SPRNG 1.0-http: //www. ncsa. uiuc, edu/ Apps/SPRNG;SPRNG 2. 0-http: //sprng. cs, fsu. edu] test suite and also present results for the tests conducted on the SPRNG generators. These generators have passed some of the largest empirical random number tests. (C) 2002 Elsevier Science B.V. All rights reserved.
When solving time-dependent partial differential equations on parallel computers using the nonoverlapping domain decomposition method, one often needs numerical boundary conditions on the boundaries between subdomains...
详细信息
When solving time-dependent partial differential equations on parallel computers using the nonoverlapping domain decomposition method, one often needs numerical boundary conditions on the boundaries between subdomains. These numerical boundary conditions can significantly affect the stability and accuracy of the final algorithm. In this paper, a stability and accuracy analysis of the existing methods for generating numerical boundary conditions will be presented, and a new approach based on explicit predictors and implicit correctors will be used to solve convect ion-diffusion equations on parallel computers, with application to aerospace engineering for the solution of Euler equations in computational fluid dynamics simulations. Both theoretical analyses and numerical results demonstrate significant improvement in stability and accuracy by using the new approach. (C) 2003 Elsevier Science Ltd. All rights reserved.
Given n values x(1), x(2),...,x(n) and an associative binary operation x, the prefix problem is to compute x(1) x(2) x...x x(i), 1 less than or equal to i less than or equal to n. Prefix circuits are combinational cir...
详细信息
Given n values x(1), x(2),...,x(n) and an associative binary operation x, the prefix problem is to compute x(1) x(2) x...x x(i), 1 less than or equal to i less than or equal to n. Prefix circuits are combinational circuits for solving the prefix problem. For any n-input prefix circuit D with depth d and size s, if d + s = 2 n-2, then D is depth-size optimal. In general, a prefix circuit with a small depth is faster than one with a large depth. For prefix circuits with the same depth, a prefix circuit with a smaller fan-out occupies less area and is faster in VLSI implementation. This paper is on constructing parallel prefix circuits that are depth-size optimal with small depth and small fan-out. We construct a depth-size optimal prefix circuit H 4 with fan-out 4. It has the smallest depth among all known depth-size optimal prefix circuits with a constant fan-out;furthermore, when n greater than or equal to 136, its depth is less than, or equal to, those of all known depth-size optimal prefix circuits with unlimited fan-out. A size lower bound of prefix circuits is also derived. Some properties related to depth-size optimality and size optimality are introduced;they are used to prove that H 4 is depth-size optimal.
This paper explores the prefix operation on a message-passing fully connected multicomputer with multiport postal communication. We present an exact communication lower bound for the prefix operation on the model. Two...
详细信息
This paper explores the prefix operation on a message-passing fully connected multicomputer with multiport postal communication. We present an exact communication lower bound for the prefix operation on the model. Two efficient parallel prefix algorithms are also presented;they are optimal in terms of the number of communication steps. For an input of size n, one of the algorithms using n processors is also time-optimal;the other algorithm using p < n processors can be cost-optimal and can achieve linear speedup.
暂无评论