We describe the basic architecture of JUMP-1, an MPP prototype developed by collaboration between 7 universities. The proposed architecture can exploit high performance of coarse-grained RISC processor performance in ...
详细信息
We describe the basic architecture of JUMP-1, an MPP prototype developed by collaboration between 7 universities. The proposed architecture can exploit high performance of coarse-grained RISC processor performance in connection with flexible fine-grained operation such as distributed shared memory, versatile synchronization and message communications.< >
We design an efficient sublinear time parallel construction of optimal binary search trees. The efficiency of the parallel algorithm corresponds to its total work (the product time × processors). Our algorithm wo...
详细信息
High performance processors based on pipeline processing play an important role in scientific computation. We have proposed a hybrid pipeline architecture named Jetpipeline in our former work. The concept of Jetpipeli...
详细信息
High performance processors based on pipeline processing play an important role in scientific computation. We have proposed a hybrid pipeline architecture named Jetpipeline in our former work. The concept of Jetpipeline comes from the integration of superscalar, VLIW and vector architectures. Jetpipeline has multiple instruction pipelines, which execute multiple instructions like superscalar architectures. Instructions to be executed simultaneously are statically scheduled by a compiler like VLIW architectures. Therefore, parallelism derivation and instruction scheduling are very important for Jetpipeline. Software pipelining is one of the well-known techniques to achieve high throughput when processing loop programs. In this paper, we propose software pipelining for Jetpipeline. Firstly, the overview of the Jetpipeline architecture is described. Then the banked register configuration of Jetpipeline for reducing hardware complexity and supporting software pipelining is presented. Finally, the effectiveness of software pipelining for Jetpipeline is discussed by simulation.< >
Trade-offs between the SIMD and MIMD models of architecture for parallelism are presented. Mixed-mode parallelism, where a machine can switch between the SIMD and MIMD modes of parallelism at instruction-level granula...
详细信息
Trade-offs between the SIMD and MIMD models of architecture for parallelism are presented. Mixed-mode parallelism, where a machine can switch between the SIMD and MIMD modes of parallelism at instruction-level granularity with generally negligible overhead, is discussed. Advantages and disadvantages of mixed-mode parallelism and an example of a mixed-mode parallel algorithm are given. The relationship of mixed-mode processing to high-performance heterogeneous computing is overviewed. Difficulties involved with evaluating interconnection networks for parallel machines are then considered. There are a myriad of metrics that have been used in the literature. The problems involved with choosing the most appropriate metric or weighted set of metrics, and performing "fair" comparisons, are explored.< >
This paper addresses the use of parallel simulation techniques to speedup the simulation of multistage interconnection networks. The conventional null-message approach to resolving deadlock problem in conservative sim...
详细信息
ISBN:
(纸本)0818665076
This paper addresses the use of parallel simulation techniques to speedup the simulation of multistage interconnection networks. The conventional null-message approach to resolving deadlock problem in conservative simulation is based on a lookahead mechanism. For some application domains, unfortunately, the lookahead information is not available. Consequently, the simulation using null messages will be trapped in a livelock. We propose a deadlock/livelock free scheme using null messages, but without the guaranteed lookahead, to coordinate the simulation, and different partitioning techniques for mapping of the simulation program onto multicomputers. A flushing mechanism to address the combinatoric explosion of using null-message in conservative simulation is also discussed. Our analysis shows that the proposed flushing mechanism effectively reduces the number of null messages from exponential to linear.< >
Describes several algorithms to perform all-to-all communication on a two-dimensional mesh connected computer with wormhole routing. The authors discuss both direct algorithms, in which data is sent directly from sour...
详细信息
Describes several algorithms to perform all-to-all communication on a two-dimensional mesh connected computer with wormhole routing. The authors discuss both direct algorithms, in which data is sent directly from source to destination processor, and indirect algorithms in which data is sent through one or more intermediate processors. The authors propose algorithms for both power-of-two and non power-of-two meshes as well as an algorithm which works for any arbitrary mesh. They have developed analytical models to estimate the performance of the algorithms on the basis of system parameters. Performance results obtained on the Intel Touchstone Delta are compared with the estimated values.< >
We know that a significant advantage of content addressable memory (CAM) is that operations are performed locally, thus it can eliminate the problem of bottleneck between processor and memory. In this paper, we propos...
详细信息
We know that a significant advantage of content addressable memory (CAM) is that operations are performed locally, thus it can eliminate the problem of bottleneck between processor and memory. In this paper, we propose a CAM-based associative processing processor (HAPP) which is able to combine with a general processor to form an array-processor system, and besides retrieval operations, it can assist the general processor to manipulate nested loop structure with data-flow dependence for achieving high speedup in this system. We enumerate some problems of applying HAPP to a computer system to deal with nested loop structure, and the methods we used to resolve them. Also we compare HAPP with a parallel machine, BBN TC2000, to prove that HAPP gains a smaller communication penalty when the number of data items access of BBN TC2000 surpasses penalty plane.< >
Maya is a platform for investigating the impact of different memory coherence protocols on parallelarchitectures. We present the implementations of several weak memory protocols, together with some new primitives ded...
详细信息
Maya is a platform for investigating the impact of different memory coherence protocols on parallelarchitectures. We present the implementations of several weak memory protocols, together with some new primitives dedicated to weak memories using Maya. The results of some user applications are summarized and the impact of weak memories on the efficiency of these parallel programs is discussed.< >
In this paper, we shall present several algorithms for determining the maximum number of vertex connectivity, testing k-vertex connectivity, determining the maximum number of vertex disjoint s-t paths and finding k-ve...
详细信息
ISBN:
(纸本)0818665076
In this paper, we shall present several algorithms for determining the maximum number of vertex connectivity, testing k-vertex connectivity, determining the maximum number of vertex disjoint s-t paths and finding k-vertex disjoint s-t paths problems on a permutation graph, respectively. We first give several O(n/sup 2/) time sequential algorithms for determining the maximum number of vertez connectivity, testing k-vertex connectivity and determining the maximum number of vertex disjoint s-t paths problems, respectively. Then, an O(kn/sup 2/) time algorithm for finding k-vertex disjoint s-t paths problem on a permutation graph is also proposed. Moreover, we also derive the corresponding parallel algorithms for these problems from the proposed sequential algorithms. On the EREW PRAM model, we first propose several O(log n) time optimal speed-tip parallel algorithms for determining the maximum m number of vertez connectivity, testing k-vertex connectivity and determining the maximum number of vertex disjoint s-t paths problems, all with O(n/sup 2/log n) processors, respectively. Then, an O(nlog n) time parallel algorithm for finding k-vertex disjoint s-t paths problem using O(n/sup 2/log n) processors is also developed, where k is a fixed integer.< >
We introduce the HyperC language, a data parallel extension of C intended for portability over a wide range of architectures. We present the main topics of the language: the explicit parallelism through the data, the ...
详细信息
We introduce the HyperC language, a data parallel extension of C intended for portability over a wide range of architectures. We present the main topics of the language: the explicit parallelism through the data, the synchronous semantics and the parallel flow control that allows asynchronous execution, new function qualifiers to emphasize locality properties code and, finally, new communication techniques to allow overlap of communications and computations even for irregular computations. All these features are discussed with respect to portability and code reusability issues.< >
暂无评论