In this paper we present new results on the approximate parallel construction of Huffman codes. Our algorithm achieves linear work and logarithmic time, provided that the initial set of elements is sorted. This is the...
详细信息
ISBN:
(纸本)3540438645
In this paper we present new results on the approximate parallel construction of Huffman codes. Our algorithm achieves linear work and logarithmic time, provided that the initial set of elements is sorted. This is the first parallel algorithm for that problem with the optimal time and work. Combining our approach with the best known parallel sorting algorithms we can construct an almost optimal Huffman tree with optimal time and work. This also leads to the first parallel algorithm that constructs exact Huffman codes with maximum codeword length H in time O(H) and with n processors. This represents a useful improvement since most practical situations satisfy H = O(log n).
We present a significant improvement for parallel integer sorting. On the EREW (exclusive read exclusive write) PRAM our algorithm sorts n integers in the range {0, 1,..., m 1} in time O(log n) with O(n(q) (log n) ove...
详细信息
ISBN:
(纸本)0898714346
We present a significant improvement for parallel integer sorting. On the EREW (exclusive read exclusive write) PRAM our algorithm sorts n integers in the range {0, 1,..., m 1} in time O(log n) with O(n(q) (log n) over bar /k) operations using word length k log( m + n), where 1 less than or equal to k less than or equal to log n. In this paper we present the following four variants of our algorithm. (1) The first variant sorts integers in {0, 1,..., m - 1} in time O(log n) and in linear space with O(n) operations using word length log m log n. (2) The second variant sorts integers in {0, 1,..., n - 1} in time O ( log n) and in linear space with O(n rootlog n) operations using word length log n. (3) The third variant sorts integers in {0, 1,..., m - 1} in time O(log(3)/(2)n) and in linear space with O(n rootlog n) operations using word length log(m + n). (4) The fourth variant sorts integers in {0, 1,..., m - 1} in time O(log n) and space O(nm(epsilon)) with O(n rootlog n) operations using word length log( m + n). Our algorithms can then be generalized to the situation where the word length is k log( m + n), 1 less than or equal to k less than or equal to log n.
Multiprocessor architectures with few but powerful processors gain more and more popularity. In this paper we describe a parallel iterative algorithm to perform randomization for a continuous time Markov, chain with a...
详细信息
ISBN:
(纸本)0769515975
Multiprocessor architectures with few but powerful processors gain more and more popularity. In this paper we describe a parallel iterative algorithm to perform randomization for a continuous time Markov, chain with a Kronecker representation on a shared memory architecture. The Kronecker representation is modified for a parallel matrix-vector multiplication with a fast multiplication scheme and no write conflicts on iteration vectors. The proposed technique is applied on a model of a workstation cluster for dependability analysis, corresponding computations are performed on two multiprocessor architectures, a Sun enterprise and a SGI Origin 2000 to measure its performance.
Let T e an edge-weighted tree. A p-core of T is a set of p mutually disjoint paths in T that minimizes the sum of the distances of all vertices in T from any of the p paths, where p greater than or equal to 1 is an in...
详细信息
Let T e an edge-weighted tree. A p-core of T is a set of p mutually disjoint paths in T that minimizes the sum of the distances of all vertices in T from any of the p paths, where p greater than or equal to 1 is an integer. In this paper, an O(n) time algorithm is proposed for the case p = 2, where n is the number of vertices in T. Our algorithm improves the two O(n(2)) time algorithms previously proposed by Becker and Perl Discrete Appl. Math., 11 (1985), pp. 103-113]. With some modi cations, the proposed algorithm can e implemented on the EREW PRAM in O (log(2) n) time using O (n log n) work.
We consider the problem of constructing binary heaps on constant degree networks performing compare-exchange operations only. The heap data structure, introduced by William and Williams [Comm. ACM 7 (6) (1964) 347-348...
详细信息
We consider the problem of constructing binary heaps on constant degree networks performing compare-exchange operations only. The heap data structure, introduced by William and Williams [Comm. ACM 7 (6) (1964) 347-348], has many applications and, therefore, has been intensively studied in sequential and parallel context. In particular, Brodal and Pinotti [Theoret. Comput. Sci. 250 (2001) 235-245] have recently presented two families of comparator networks: the first of depth 4 log N and the second of size O(N log log N) for constructing binary heaps of size N. In this note, we give an new construction of such a network with the running time improved to 3 log N. Moreover, the network has a novel property of being 3-periodic, that is, for each unit of time i the same sets of operations are performed in units i and i + 3. Then we argue that our construction is optimal with respect to the length of the period, that is, we prove that there is no 2-periodic network that is able to build a binary heap in sublinear time. Finally, we show that our construction can be used to decrease also the depth of the networks with O(N log log N) size. (C) 2001 Elsevier Science B.V. All rights reserved.
For the solutions of linear systems of equations with unsymmetric coefficient matrices, we have proposed an improved version of the quasi-minimal residual (IQMR) method [Proceedings of The International Conference on ...
详细信息
For the solutions of linear systems of equations with unsymmetric coefficient matrices, we have proposed an improved version of the quasi-minimal residual (IQMR) method [Proceedings of The International Conference on High Performance Computing and Networking (HPCN-97) (1997);IEICE Trans Inform Syst E80-D (9) (1997) 919] by using the Lanczos process as a major component combining elements of numerical stability and parallel algorithm design. For the Lanczos process, stability is obtained by a coupled two-term procedure that generates Lanczos vectors scaled to unit length. The algorithm is derived so that all inner products and matrix-vector multiplications of a single iteration step are independent and the communication time required for inner product can be overlapped efficiently with computation time. In this paper, a theoretical model of computation and communications phases is presented to allow us to give a quantitative analysis of the parallel performance with a two-dimensional grid topology. The efficiency, speed-up, and runtime are expressed as functions of the number of processors scaled by the number of processors that gives the minimal runtime for the given problem size. The model not only evaluates effectively the improvements in performance due to communication reduction by overlapping, but also provides useful insight into the scalability of the IQMR method. The theoretical results on the performance are demonstrated by experimental timing results carried out on a massively parallel distributed memory Parsytec system. (C) 2002 Published by Elsevier Science Ltd.
The Linear Array with a Reconfigurable Pipelined Bus System (LARPBS) is a newly introduced parallel computational model, where processors are connected by a reconfigurable optical bus. In this paper, we show that the ...
详细信息
The Linear Array with a Reconfigurable Pipelined Bus System (LARPBS) is a newly introduced parallel computational model, where processors are connected by a reconfigurable optical bus. In this paper, we show that the selection problem can be solved on the LARPBS model deterministically in O((log log N)(2)/log log jog N) time. To our best knowledge, this is the best deterministic selection algorithm on any model with a reconfigurable optical bus.
Multiple addition is the problem of adding N b-bit integers. Prefix sums and multiple addition play fundamental roles in many algorithms, particularly on the reconfigurable mesh (R-Mesh). Scaling algorithms on the R-M...
详细信息
Multiple addition is the problem of adding N b-bit integers. Prefix sums and multiple addition play fundamental roles in many algorithms, particularly on the reconfigurable mesh (R-Mesh). Scaling algorithms on the R-Mesh to run with the same or increased efficiency on fewer processors is a challenging and important proposition. In this paper. we present algorithms that scale with increasing efficiency for multiple addition, prefix sums. and matrix-vector multiplication. Along the way. we obtain an improved multiple addition algorithm. (C) 2001 Elsevier Science B.V. All rights reserved.
The effect of data allocation strategies on the running time of parallel Cholesky factorization algorithms on orthogonal multiprocessors has been studied. Four new strategies which give better running time are propose...
详细信息
The effect of data allocation strategies on the running time of parallel Cholesky factorization algorithms on orthogonal multiprocessors has been studied. Four new strategies which give better running time are proposed and their time complexities are analyzed. Finally it is shown that near optimal performance can be obtained using two of our strategies. (C) 2002 Elsevier Science B.V. All rights reserved.
Gossiping is the communication problem in which each node has a unique message (token) to be transmitted to every other node. The nodes exchange their tokens by packets. A solution to the problem is judged by how many...
详细信息
Gossiping is the communication problem in which each node has a unique message (token) to be transmitted to every other node. The nodes exchange their tokens by packets. A solution to the problem is judged by how many rounds of packet sending it requires. In this paper, we consider the version of the problem in which small-size packets (each carrying exactly one token) are used, the links (edges) of the network are half-duplex (only one packet can flow through a link at a time), and the nodes are all-port (a node's incident edges can all be active at the same time). This is also known as the H* model. We study the 2D square mesh and the 2D square torus. An improved, asymptotically optimal algorithm for the mesh and an optimal algorithm for the torus are presented.
暂无评论