The use of star graphs as a viable interconnection scheme for parallel computers has been examined by a number of authors. An attractive feature of this class of graphs is that it has sublogarithmic diameter and has a...
详细信息
The use of star graphs as a viable interconnection scheme for parallel computers has been examined by a number of authors. An attractive feature of this class of graphs is that it has sublogarithmic diameter and has a great deal of symmetry akin to the binary hypercube. The authors describe a new class of algorithms for embedding Hamiltonian cycle, the set of all even cycles and a variety of two and multi-dimensional grids in a star graph. They derive an algorithm for the ranking and the unranking problem with respect to the Hamiltonian cycle.< >
The paper identifies two barrier synchronization techniques that are appropriate for nonuniform memory architectures. The first of these two, the dissemination barrier, is based on a progressive skip-ring of flags. Th...
详细信息
The paper identifies two barrier synchronization techniques that are appropriate for nonuniform memory architectures. The first of these two, the dissemination barrier, is based on a progressive skip-ring of flags. The second, called the reflected-tree barrier, is based on software combining trees. The dissemination barrier is optimal in terms of the number of barrier stages required by the algorithm. The reflected-tree barrier, however, is optimal in the number of remote network operations required for algorithm completion. These two are compared by means of probabilistic simulations up to 4096 nodes and actual execution results up to 90 nodes on a BBN GP-1000. These results show that the reflected-tree barrier is significantly better on 30 nodes or more.< >
The authors describe a parallel algorithm and architecture for implementing the LZ technique for data compression. Data compression is the reduction of redundancy in data representation in order to decrease storage an...
详细信息
The authors describe a parallel algorithm and architecture for implementing the LZ technique for data compression. Data compression is the reduction of redundancy in data representation in order to decrease storage and communication costs. The LZ-based compression method is a very powerful technique and gives very high compression efficiency for text as well as image data. The proposed architecture is systolic and uses the principles of pipelining and parallelism in order to obtain high speed and throughput. The order of complexity of the computations is reduced from n/sup 2/ to n. The data compression hardware can be integrated into real time systems so that data can be compressed and decompressed on-the-fly. The basic processor cell for the systolic array is currently being implemented using CMOS VLSI technology.< >
The overall architecture of the European Declarative System (EDS), ESPRIT II Project 2025 is briefly described. The aim of the project is to design and implement both the hardware and software for a parallel informati...
详细信息
The overall architecture of the European Declarative System (EDS), ESPRIT II Project 2025 is briefly described. The aim of the project is to design and implement both the hardware and software for a parallel information server. The final system will be used to develop advanced business information systems. The EDS machine is designed to be a high speed backend accelerator, primarily for data/knowledge base applications, serving a range of host computers. It is a message passing multiprocessor with a distributed store, commonly referred to as a shared nothing architecture. Whilst shared nothing architectures are not novel, their application to declarative/symbolic languages is relatively new. The authors describe the EDS software activities at ECRC. A conclusion reporting the current status of the project is given.< >
The Warwick Pyramid Machine, an architecture consisting of both SIMD and MIMD parts in a multiple SIMD organization which can operate effectively at all levels of an image analysis problem, is described. The hardware ...
详细信息
The Warwick Pyramid Machine, an architecture consisting of both SIMD and MIMD parts in a multiple SIMD organization which can operate effectively at all levels of an image analysis problem, is described. The hardware design is discussed, along with programming issues. Several image analysis algorithms that have been mapped onto the architecture are presented.< >
The authors present two EREW PRAM algorithms and one CREW PRAM algorithm for solving set recurrence equations of the type commonly used in dynamic programming solutions to many problems in pattern matching, sequence c...
详细信息
The authors present two EREW PRAM algorithms and one CREW PRAM algorithm for solving set recurrence equations of the type commonly used in dynamic programming solutions to many problems in pattern matching, sequence comparison, and language recognition. All three algorithms run in O(log/sup 2/n) time. The first EREW PRAM algorithm uses O(M(n/log n)log/sup 2/n) processors, where M(n) is the number of processors needed to multiply two n*n Boolean matrices in O(log n) time on an EREW PRAM. The second algorithm uses O(M'(n/(log n square root loglog n))log/sup 2/n) processors, where M'(n) is the number of processors needed to multiply two n*n matrices over an arbitrary ring in O(log n) time on an EREW PRAM. The CREW PRAM algorithm uses O(M"(n/log/sup 1.5/n)log/sup 2/n) processors, where M"(n) is the number of processors needed to multiply two n*n matrices over an arbitrary ring in O(log n) time on a CREW PRAM. The authors show that linear context-free languages can be recognized in O(log/sup 2/n) time using o(M(n)) processors on an EREW PRAM. They give applications to other problems such as string shuffling, recognition of languages accepted by two-head nondeterministic finite automata, and transductions defined by two-tape nondeterministic finite transducers.< >
Semi-joins have been introduced to reduce the cost of processing distributed queries. The determine the optimal semi-join processing strategy, one must consider both the reduction effect and the transmission overhead....
详细信息
Semi-joins have been introduced to reduce the cost of processing distributed queries. The determine the optimal semi-join processing strategy, one must consider both the reduction effect and the transmission overhead. Researchers have studied the problem of finding the optimal semi-join strategies for processing tree queries. Although a number algorithms have been proposed, they either have a very high complexity or only solve special cases. It is conjectured that the tree query processing problem is inherently difficult. The author formally answers this question by proving it NP-hard.< >
A new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models is introduced. A framework for classifying shared accesses and reasoning abo...
详细信息
A new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models is introduced. A framework for classifying shared accesses and reasoning about event ordering is developed. The release consistency model is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization. Possible performance gains from the less strict constraints of the release consistency model are explored. Finally, practical implementation issues are discussed, with the discussion concentrating on issues relevant to scalable architectures.< >
A description is given of the Software Development System (SDS), which consists of a preprocessor and a general-purpose scheduler, and which has been used to generate schedules for two-layer fully connected neural net...
详细信息
A description is given of the Software Development System (SDS), which consists of a preprocessor and a general-purpose scheduler, and which has been used to generate schedules for two-layer fully connected neural networks. The particular structure (precedence graph) of the code of the back-propagation algorithm which is used to generate parallel code is described. Also described is the prototype ADSP-2100 parallel processing system that has been built and on which some of the smaller size neural networks have been implemented. Cycle count for scheduling the back-propagation algorithm on various sizes of neural networks and two digital signal processing (DSP) architectures and associated throughput rates obtained are given. Cycle counts for computation (with concurrent data communication) and communication only are also obtained.< >
In this paper we consider some of the central issues involved in the design of parallel algorithms. We describe several efficient algorithms for idealised shared memory architectures and draw some conclusions as to wh...
详细信息
In this paper we consider some of the central issues involved in the design of parallel algorithms. We describe several efficient algorithms for idealised shared memory architectures and draw some conclusions as to what would be required to implement them on a realistic physical architecture, i.e. one with distributed memory. We also describe some systolic algorithms for matrix computations, sequence comparison and molecular modelling, and briefly discuss their implementation on arrays of transputers. In the final section we discuss the question of whether the current preoccupation with architectural details in parallel algorithm design is likely to persist. We briefly describe some techniques which show that a physically realistic general purpose parallel architecture based on distributed memory can be constructed which will execute any shared memory parallel algorithm with no significant overhead due to communication. We thus have the attractive prospect in the very near future of architectural independence in parallel algorithm design.
暂无评论