We consider the problem of constructing binary heaps on constant degree networks performing compare-exchange operations only. The heap data structure, introduced by William and Williams [Comm. ACM 7 (6) (1964) 347-348...
详细信息
We consider the problem of constructing binary heaps on constant degree networks performing compare-exchange operations only. The heap data structure, introduced by William and Williams [Comm. ACM 7 (6) (1964) 347-348], has many applications and, therefore, has been intensively studied in sequential and parallel context. In particular, Brodal and Pinotti [Theoret. Comput. Sci. 250 (2001) 235-245] have recently presented two families of comparator networks: the first of depth 4 log N and the second of size O(N log log N) for constructing binary heaps of size N. In this note, we give an new construction of such a network with the running time improved to 3 log N. Moreover, the network has a novel property of being 3-periodic, that is, for each unit of time i the same sets of operations are performed in units i and i + 3. Then we argue that our construction is optimal with respect to the length of the period, that is, we prove that there is no 2-periodic network that is able to build a binary heap in sublinear time. Finally, we show that our construction can be used to decrease also the depth of the networks with O(N log log N) size. (C) 2001 Elsevier Science B.V. All rights reserved.
For the solutions of linear systems of equations with unsymmetric coefficient matrices, we have proposed an improved version of the quasi-minimal residual (IQMR) method [Proceedings of The International Conference on ...
详细信息
For the solutions of linear systems of equations with unsymmetric coefficient matrices, we have proposed an improved version of the quasi-minimal residual (IQMR) method [Proceedings of The International Conference on High Performance Computing and Networking (HPCN-97) (1997);IEICE Trans Inform Syst E80-D (9) (1997) 919] by using the Lanczos process as a major component combining elements of numerical stability and parallel algorithm design. For the Lanczos process, stability is obtained by a coupled two-term procedure that generates Lanczos vectors scaled to unit length. The algorithm is derived so that all inner products and matrix-vector multiplications of a single iteration step are independent and the communication time required for inner product can be overlapped efficiently with computation time. In this paper, a theoretical model of computation and communications phases is presented to allow us to give a quantitative analysis of the parallel performance with a two-dimensional grid topology. The efficiency, speed-up, and runtime are expressed as functions of the number of processors scaled by the number of processors that gives the minimal runtime for the given problem size. The model not only evaluates effectively the improvements in performance due to communication reduction by overlapping, but also provides useful insight into the scalability of the IQMR method. The theoretical results on the performance are demonstrated by experimental timing results carried out on a massively parallel distributed memory Parsytec system. (C) 2002 Published by Elsevier Science Ltd.
The Linear Array with a Reconfigurable Pipelined Bus System (LARPBS) is a newly introduced parallel computational model, where processors are connected by a reconfigurable optical bus. In this paper, we show that the ...
详细信息
The Linear Array with a Reconfigurable Pipelined Bus System (LARPBS) is a newly introduced parallel computational model, where processors are connected by a reconfigurable optical bus. In this paper, we show that the selection problem can be solved on the LARPBS model deterministically in O((log log N)(2)/log log jog N) time. To our best knowledge, this is the best deterministic selection algorithm on any model with a reconfigurable optical bus.
Multiple addition is the problem of adding N b-bit integers. Prefix sums and multiple addition play fundamental roles in many algorithms, particularly on the reconfigurable mesh (R-Mesh). Scaling algorithms on the R-M...
详细信息
Multiple addition is the problem of adding N b-bit integers. Prefix sums and multiple addition play fundamental roles in many algorithms, particularly on the reconfigurable mesh (R-Mesh). Scaling algorithms on the R-Mesh to run with the same or increased efficiency on fewer processors is a challenging and important proposition. In this paper. we present algorithms that scale with increasing efficiency for multiple addition, prefix sums. and matrix-vector multiplication. Along the way. we obtain an improved multiple addition algorithm. (C) 2001 Elsevier Science B.V. All rights reserved.
The effect of data allocation strategies on the running time of parallel Cholesky factorization algorithms on orthogonal multiprocessors has been studied. Four new strategies which give better running time are propose...
详细信息
The effect of data allocation strategies on the running time of parallel Cholesky factorization algorithms on orthogonal multiprocessors has been studied. Four new strategies which give better running time are proposed and their time complexities are analyzed. Finally it is shown that near optimal performance can be obtained using two of our strategies. (C) 2002 Elsevier Science B.V. All rights reserved.
Hybrid metaheuristics have received considerable interest these recent years in the field of combinatorial optimization. A wide variety of hybrid approaches have been proposed in the literature. In this paper, a taxon...
详细信息
Hybrid metaheuristics have received considerable interest these recent years in the field of combinatorial optimization. A wide variety of hybrid approaches have been proposed in the literature. In this paper, a taxonomy of hybrid metaheuristics is presented in an attempt to provide a common terminology and classification mechanisms. The taxonomy, while presented in terms of metaheuristics, is also applicable to most types of heuristics and exact optimization algorithms. As an illustration of the usefulness of the taxonomy an annoted bibliography is given which classifies a large number of hybrid approaches according to the taxonomy.
Gossiping is the communication problem in which each node has a unique message (token) to be transmitted to every other node. The nodes exchange their tokens by packets. A solution to the problem is judged by how many...
详细信息
Gossiping is the communication problem in which each node has a unique message (token) to be transmitted to every other node. The nodes exchange their tokens by packets. A solution to the problem is judged by how many rounds of packet sending it requires. In this paper, we consider the version of the problem in which small-size packets (each carrying exactly one token) are used, the links (edges) of the network are half-duplex (only one packet can flow through a link at a time), and the nodes are all-port (a node's incident edges can all be active at the same time). This is also known as the H* model. We study the 2D square mesh and the 2D square torus. An improved, asymptotically optimal algorithm for the mesh and an optimal algorithm for the torus are presented.
Many parallel algorithms on the reconfigurable mesh have been developed so far. However, it is hard to understand the behavior of these parallel algorithms, mainly because the bus topology dynamically changes during t...
详细信息
Many parallel algorithms on the reconfigurable mesh have been developed so far. However, it is hard to understand the behavior of these parallel algorithms, mainly because the bus topology dynamically changes during the execution of an algorithm. In this work, we present the visual mesh system (VMesh), a tool for visualizing algorithms on the reconfigurable mesh. The main objective of the VMesh is to provide a comprehensive environment for algorithm visualization and development. The VMesh has shown to be a valuable tool for studying and understanding the behavior of parallel algorithms on the reconfigurable mesh.
In this paper we present a coarse-grained parallel algorithm for solving the string edit distance problem for a string A and all substrings of a string C. Our method is based on a novel CGM/BSP parallel dynamic progra...
详细信息
In this paper we present a coarse-grained parallel algorithm for solving the string edit distance problem for a string A and all substrings of a string C. Our method is based on a novel CGM/BSP parallel dynamic programming technique for computing all highest scoring paths in a weighted grid graph. The algorithm requires log p rounds/supersteps and O(p/n2 log m) local computation, where p is the number of processors, p2 ≤ m ≤ n. To our knowledge, this is the first efficient CGM/BSP algorithm for the alignment of all substrings of C with A. Furthermore, the CGM/BSP parallel dynamic programming technique presented is of interest in its own right and we expect it to lead to other parallel dynamic programming methods for the CGM/BSP.
We present a new parallel computation model that enables the design of resource-optimal scalable parallel algorithms and simplifies their analysis. The model rests on the novel idea of incorporating relative optimalit...
详细信息
暂无评论