Summary form only given. The design of algorithms exhibiting a high degree of temporal and spatial locality of reference is crucial to attain good performance on current and foreseeable computing systems featuring eve...
详细信息
Summary form only given. The design of algorithms exhibiting a high degree of temporal and spatial locality of reference is crucial to attain good performance on current and foreseeable computing systems featuring ever deeper memory hierarchies. Previous work has demonstrated that task parallelism can be efficiently transformed into locality of reference in two-level hierarchies. Recently, we moved a step forward and showed how the more structured type of parallelism exposed by submachine locality can be efficiently turned into temporal locality on arbitrarily deep hierarchies. We complete and extend the above result by encompassing also spatial locality. Specifically, we present a scheme to simulate parallel algorithms designed for the decomposable BSP (a BSP variant which captures submachine locality) on the hierarchical memory model with block transfer. The simulation yields good hierarchy-conscious sequential algorithms from parallel ones, and provides evidence of the strict relation between submachine locality in parallel computation and locality of reference (both temporal and spatial) in the hierarchical memory setting.
Hybrid adders, combining a sparse carry-lookahead tree and a carry-select output stage are a well-known implementation form of high-speed adders. In this paper, a hybrid Ling carry-select adder is presented. It is sho...
详细信息
Hybrid adders, combining a sparse carry-lookahead tree and a carry-select output stage are a well-known implementation form of high-speed adders. In this paper, a hybrid Ling carry-select adder is presented. It is shown how a carry-select output stage can be used to eliminate the entire conversion of all pseudo-carries. The adder is implemented in enhanced multiple output domino logic (EMODL). A technique is presented to avoid false discharge paths, which present impairment to EMODL, in the sum selection multiplexer.
Given an array of positive and negative values, we consider the problem of K maximum sums. When an overlapping property needs to be observed, previous algorithms for the maximum sum are not directly applicable. We des...
详细信息
ISBN:
(纸本)0769521355
Given an array of positive and negative values, we consider the problem of K maximum sums. When an overlapping property needs to be observed, previous algorithms for the maximum sum are not directly applicable. We designed an O(K * n) algorithm for the K maximum subsequences problem. This was then modified to solve the K maximum subarrays problem in O(K * n/sup 3/) time. Finally, we present a VLSI K maximum subarrays algorithm with O(K * n) steps and a circuit size of O(n/sup 2/), which is cost-optimal in parallelisation of the sequential algorithm.
In this paper, we propose a new algorithm, named Distributed Max-Miner (DMM), for mining maximal frequent itemsets from databases. A frequent itemset is maximal if none of its supersets is frequent. DMM requires very ...
详细信息
ISBN:
(纸本)9780780384309
In this paper, we propose a new algorithm, named Distributed Max-Miner (DMM), for mining maximal frequent itemsets from databases. A frequent itemset is maximal if none of its supersets is frequent. DMM requires very low communication and synchronization overhead in distributed computing systems. DMM has the local mining phase and the global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix-tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. This global mining phase using the prefix-tree can work with any local mining algorithm. We implemented DMM on a cluster of workstations and evaluated its performance for various cases. DMM demonstrates better performance than other sequential and parallel algorithms, and its performance is quite scalable, even when there are large maximal frequent itemsets (i.e., long patterns) in databases.
Summary form only given. In recent years, there was a huge development of low cost large scale parallel systems. The design of efficient parallel algorithms has to be reconsidered to take into account new parameters o...
详细信息
Summary form only given. In recent years, there was a huge development of low cost large scale parallel systems. The design of efficient parallel algorithms has to be reconsidered to take into account new parameters of such execution platforms which are characterized by a larger number of heterogeneous processors, often organized as hierarchical subsystems. Alternative computational models have been designed to take into account these new characteristics. parallel tasks model /spl times/ PT in short - is a promising alternative for scheduling parallel applications. Another way of looking at the problem (which is somehow a dual view) is the divisible load model (DL) where an application is considered as a collection of a large number of elementary - sequential - computing units. These two new views of the problem allow us to consider communications implicitly or to mask them, leading to more tractable problems. This paper, first, presents some approximation algorithms for the PT model with a special emphasis on new execution platforms. We show how to mix these results with the DL model to manage the resources of an actual computational grid of 600 processors.
In this paper a model of a versatile associative graph processor called AGP is proposed. The model can work both in bit-serial and in bit-parallel mode and enables simultaneous search for a set of comparands and selec...
详细信息
In this paper a model of a versatile associative graph processor called AGP is proposed. The model can work both in bit-serial and in bit-parallel mode and enables simultaneous search for a set of comparands and selection of the search types. In addition it has some built-in operations designed for associative graph algorithms. The selected functions and basic procedures of this model are described and its possible architecture is discussed.
Summary form only given. Evolutionary algorithms (EAs) are applied to solve the radio network design problem (RND). The task is to find the best set of transmitter locations in order to cover a given geographical regi...
详细信息
Summary form only given. Evolutionary algorithms (EAs) are applied to solve the radio network design problem (RND). The task is to find the best set of transmitter locations in order to cover a given geographical region at an optimal cost. Usually, parallel EAs are needed in order to cope with the high computational requirements of such a problem. Here, we try to develop and evaluate a set of sequential and parallel genetic algorithms (GAs) in order to solve efficiently the RND problem. The results show that our distributed steady state GA is an efficient and accurate tool for solving RND that even outperforms existing parallel solutions. The sequential algorithm performs very efficiently from a numerical point of view, although the distributed version is much faster, with an observed linear speedup.
Wavelet analysis has received considerable interest in the recent years because of its efficiency in the several practical applications. Image processing for wavelet transformation is considered as one of the most pow...
详细信息
Wavelet analysis has received considerable interest in the recent years because of its efficiency in the several practical applications. Image processing for wavelet transformation is considered as one of the most powerful methods that provide a good quality of results. However, its implementation may be too time-consuming accordingly to the problem size. parallel processing can be a solution to speed up wavelet transformation programs. In this context, and in order to have a quick image compression/decompression program based on 1D wavelet transformation, we have designed three parallel algorithms that where implemented on an IBM RS6000/SP machine. The first parallel algorithm exploits control parallelism it was developed with OpenMP and executed on one four-processor node. The two others exploit data parallelism and were developed with MPI directives. Finally, we present an evaluation of these algorithms based on an experimental study.
BAOR (block accelerated over-relaxation) method, now commonly used in solving engineering problems involving block tridiagonal coefficient matrix, is not suitable for parallel computing. We proposed a parallel algorit...
详细信息
BAOR (block accelerated over-relaxation) method, now commonly used in solving engineering problems involving block tridiagonal coefficient matrix, is not suitable for parallel computing. We proposed a parallel algorithm that like BAOR algorithm is good in convergence, but that unlike BAOR algorithm is suitable for parallel computing. We explained why BAOR algorithm is not suitable for parallel computing. This understanding helps us to make our algorithm suitable for parallel computing. We gave one illustrative example. The iterative time needed by our algorithm is roughly the same as that needed by BAOR algorithm. These results indicate preliminarily that our algorithm is effective and feasible.
Summary form only given. In multiprogrammed systems, synchronization often turns out to be a performance bottleneck and the source of poor fault-tolerance. Wait-free and lock-free algorithms can do without locking mec...
详细信息
Summary form only given. In multiprogrammed systems, synchronization often turns out to be a performance bottleneck and the source of poor fault-tolerance. Wait-free and lock-free algorithms can do without locking mechanisms, and therefore do not suffer from these problems. We present an efficient almost wait-free algorithm for parallel accessible hashtables, which promises more robust performance and reliability than conventional lock-based implementations. Our solution is as efficient as sequential hashtables. It can easily be implemented using C-like languages and requires on average only constant time for insertion, deletion or accessing of elements. The algorithm allows the hashtables to grow and shrink when needed. A true problem of wait-free and lock-free algorithms is that they are hard to design correctly, even when apparently straightforward. The reason for this is that processes can execute all statements in every conceivable order. Since our algorithm is quite large and rather complex, we turned to the interactive theorem prover PVS to prove safety of our algorithm, which we could not have done reliably by hand. To our knowledge no algorithms of comparable complexity have ever been mechanically verified. Wait-freedom is shown informally.
暂无评论