Efficient, numerically stable and highly concurrent algorithms for adaptive multichannel least-squares lattice filters are presented. The require O(pm/sup 2/) orthogonal operations to update the linear prediction and ...
详细信息
Efficient, numerically stable and highly concurrent algorithms for adaptive multichannel least-squares lattice filters are presented. The require O(pm/sup 2/) orthogonal operations to update the linear prediction and estimation errors as well as all reflection coefficients for a m-channel and p-tap problem. parallel array architectures for high-speed implementations are discussed.< >
The authors describe the development of parallel implementations for the bin packing problem. Bin packing algorithms are studied to understand various resource allocation issues and the impact of the different packing...
详细信息
The authors describe the development of parallel implementations for the bin packing problem. Bin packing algorithms are studied to understand various resource allocation issues and the impact of the different packing heuristics on the packing efficiency. The seemingly serial nature of the bin packing simulation has prohibited previous experimentations from going beyond sizes of several thousands bins. The authors show that by adopting fairly simple data parallel algorithms a linear speedup is possible. Sizes of up to hundreds of thousands of bins have been simulated for different parameters and heuristics. The authors focussed on the well known first-fit heuristic. They also considered another potentially superior heuristic, k-delayed best fit.< >
Data formats which can offer a full range of tradeoffs between bit-serial and bit-parallel computation are described. The design of arithmetic operators based on these data formats, especially a digit-serial multiplie...
详细信息
Data formats which can offer a full range of tradeoffs between bit-serial and bit-parallel computation are described. The design of arithmetic operators based on these data formats, especially a digit-serial multiplier is presented. It is shown that these data formats can combine the advantage of limited interconnection and small chip size of bit-serial computation with the advantage of throughput of parallel computation and that they lead to the most efficient pipelined implementation of the desired function.< >
The use of star graphs as a viable interconnection scheme for parallel computers has been examined by a number of authors. An attractive feature of this class of graphs is that it has sublogarithmic diameter and has a...
详细信息
The use of star graphs as a viable interconnection scheme for parallel computers has been examined by a number of authors. An attractive feature of this class of graphs is that it has sublogarithmic diameter and has a great deal of symmetry akin to the binary hypercube. The authors describe a new class of algorithms for embedding Hamiltonian cycle, the set of all even cycles and a variety of two and multi-dimensional grids in a star graph. They derive an algorithm for the ranking and the unranking problem with respect to the Hamiltonian cycle.< >
The paper identifies two barrier synchronization techniques that are appropriate for nonuniform memory architectures. The first of these two, the dissemination barrier, is based on a progressive skip-ring of flags. Th...
详细信息
The paper identifies two barrier synchronization techniques that are appropriate for nonuniform memory architectures. The first of these two, the dissemination barrier, is based on a progressive skip-ring of flags. The second, called the reflected-tree barrier, is based on software combining trees. The dissemination barrier is optimal in terms of the number of barrier stages required by the algorithm. The reflected-tree barrier, however, is optimal in the number of remote network operations required for algorithm completion. These two are compared by means of probabilistic simulations up to 4096 nodes and actual execution results up to 90 nodes on a BBN GP-1000. These results show that the reflected-tree barrier is significantly better on 30 nodes or more.< >
The authors describe a parallel algorithm and architecture for implementing the LZ technique for data compression. Data compression is the reduction of redundancy in data representation in order to decrease storage an...
详细信息
The authors describe a parallel algorithm and architecture for implementing the LZ technique for data compression. Data compression is the reduction of redundancy in data representation in order to decrease storage and communication costs. The LZ-based compression method is a very powerful technique and gives very high compression efficiency for text as well as image data. The proposed architecture is systolic and uses the principles of pipelining and parallelism in order to obtain high speed and throughput. The order of complexity of the computations is reduced from n/sup 2/ to n. The data compression hardware can be integrated into real time systems so that data can be compressed and decompressed on-the-fly. The basic processor cell for the systolic array is currently being implemented using CMOS VLSI technology.< >
The overall architecture of the European Declarative System (EDS), ESPRIT II Project 2025 is briefly described. The aim of the project is to design and implement both the hardware and software for a parallel informati...
详细信息
The overall architecture of the European Declarative System (EDS), ESPRIT II Project 2025 is briefly described. The aim of the project is to design and implement both the hardware and software for a parallel information server. The final system will be used to develop advanced business information systems. The EDS machine is designed to be a high speed backend accelerator, primarily for data/knowledge base applications, serving a range of host computers. It is a message passing multiprocessor with a distributed store, commonly referred to as a shared nothing architecture. Whilst shared nothing architectures are not novel, their application to declarative/symbolic languages is relatively new. The authors describe the EDS software activities at ECRC. A conclusion reporting the current status of the project is given.< >
The Warwick Pyramid Machine, an architecture consisting of both SIMD and MIMD parts in a multiple SIMD organization which can operate effectively at all levels of an image analysis problem, is described. The hardware ...
详细信息
The Warwick Pyramid Machine, an architecture consisting of both SIMD and MIMD parts in a multiple SIMD organization which can operate effectively at all levels of an image analysis problem, is described. The hardware design is discussed, along with programming issues. Several image analysis algorithms that have been mapped onto the architecture are presented.< >
The authors present two EREW PRAM algorithms and one CREW PRAM algorithm for solving set recurrence equations of the type commonly used in dynamic programming solutions to many problems in pattern matching, sequence c...
详细信息
The authors present two EREW PRAM algorithms and one CREW PRAM algorithm for solving set recurrence equations of the type commonly used in dynamic programming solutions to many problems in pattern matching, sequence comparison, and language recognition. All three algorithms run in O(log/sup 2/n) time. The first EREW PRAM algorithm uses O(M(n/log n)log/sup 2/n) processors, where M(n) is the number of processors needed to multiply two n*n Boolean matrices in O(log n) time on an EREW PRAM. The second algorithm uses O(M'(n/(log n square root loglog n))log/sup 2/n) processors, where M'(n) is the number of processors needed to multiply two n*n matrices over an arbitrary ring in O(log n) time on an EREW PRAM. The CREW PRAM algorithm uses O(M"(n/log/sup 1.5/n)log/sup 2/n) processors, where M"(n) is the number of processors needed to multiply two n*n matrices over an arbitrary ring in O(log n) time on a CREW PRAM. The authors show that linear context-free languages can be recognized in O(log/sup 2/n) time using o(M(n)) processors on an EREW PRAM. They give applications to other problems such as string shuffling, recognition of languages accepted by two-head nondeterministic finite automata, and transductions defined by two-tape nondeterministic finite transducers.< >
Semi-joins have been introduced to reduce the cost of processing distributed queries. The determine the optimal semi-join processing strategy, one must consider both the reduction effect and the transmission overhead....
详细信息
Semi-joins have been introduced to reduce the cost of processing distributed queries. The determine the optimal semi-join processing strategy, one must consider both the reduction effect and the transmission overhead. Researchers have studied the problem of finding the optimal semi-join strategies for processing tree queries. Although a number algorithms have been proposed, they either have a very high complexity or only solve special cases. It is conjectured that the tree query processing problem is inherently difficult. The author formally answers this question by proving it NP-hard.< >
暂无评论