The present super computer will be replaced by a massively parallel computer consisting of a large number of processing elements which satisfy the continuous increasing depend for computing power. Practical parallel c...
详细信息
The present super computer will be replaced by a massively parallel computer consisting of a large number of processing elements which satisfy the continuous increasing depend for computing power. Practical parallel computing model has been expected to develop efficient parallel algorithms on massively parallel computers. Thus, we have presented a practical parallel computation model LogPQ by taking account of communication queues into the LogP model. This paper addresses the performance of a parallel matrix multiplication algorithm using LogPQ and LogP models. The parallel algorithm is implemented on Cray T3E and the parallel performances are compared with on the old machine CM-5. This shows that the communication network of T3E has superior buffering behavior than CM-5, in which we don't need to prepare extra buffering on T3E. Although, a little effect remains for both of the send and receive bufferings. On the other hand, the effect of message size remains, which shows the necessity of the overhead and gap proportional to the message size.
Several parallel computation models including the Actor model have been proposed. Since these models have only primitive constructs for parallelcomputation, it is not easy to build a model in terms of what kind of ro...
详细信息
BSR (Broadcasting with Selective Reduction) is a PRAM more powerful than any CRCW PRAM. in order to extend the Broadcast Instruction of BSR and make it more useful for a large class of applications, this article permi...
详细信息
BSR (Broadcasting with Selective Reduction) is a PRAM more powerful than any CRCW PRAM. in order to extend the Broadcast Instruction of BSR and make it more useful for a large class of applications, this article permits it to use a general form of selection, specifically, an arbitrary relational expression. BSR with general selection is denoted by BSR+. Thus, BSR or BSR with L criteria (k > 1) is BSR+ in a special case. An efficient implementation for the Broadcast Instruction of BSR+ is proposed;requiring (1/k)th of the circuits used by the best previous implementation of BSR with k criteria. Of all PRAMs, BSR+ is the most powerful in computation.
Speedup is considered as the criterion of determining whether a parallel algorithm is optimal. But broadcast-class problems, existing only on parallel computer system, have no sequential algorithms at all. Speedup sta...
详细信息
Speedup is considered as the criterion of determining whether a parallel algorithm is optimal. But broadcast-class problems, existing only on parallel computer system, have no sequential algorithms at all. Speedup standard becomes invalid here. Through this research on broadcast algorithms under several typical parallel computation models,a model-independent evaluation standard min C2 is developed, which can be not only used to determine an optimal broadcasting algorithm, but also normalized to apply to any parallel algorithm. As a new idea, min C2 will lead to a new way in this field.
We present a model of parallelcomputation, the parameterized task graph, which is a compact, problem size independent, representation of some frequently used directed acyclic task graphs. Techniques automating the co...
详细信息
We present a model of parallelcomputation, the parameterized task graph, which is a compact, problem size independent, representation of some frequently used directed acyclic task graphs. Techniques automating the construction of such a representation, starting from an annotated sequential program are proposed. We show that many important properties of the task graph such as the computational load of the nodes and the communication volume of the edges can be automatically deduced in a problem size independent way.
Many simulations in the natural sciences and engineering require the numerical solution of nonlinear differential equations. For this class of numerical methods, we propose an appropriate parallel computation model on...
详细信息
ISBN:
(纸本)9780818671203
Many simulations in the natural sciences and engineering require the numerical solution of nonlinear differential equations. For this class of numerical methods, we propose an appropriate parallel computation model on distributed memory machines that supports the prediction of execution times. As a case study, we investigate the parallel implementation of the diagonal-implicitly iterated Runge-Kutta method, a solution method for stiff systems of ordinary differential equations. An implementation on the Intel iPSC/860 confirms the accuracy of the prediction model.
We add an extension called debit arcs to traditional place/transition nets. A debit arc incident upon a transition represents an always true precondition;when the transition fires, a token is subtracted from the place...
详细信息
We add an extension called debit arcs to traditional place/transition nets. A debit arc incident upon a transition represents an always true precondition;when the transition fires, a token is subtracted from the place issuing the debit arc, creating an antioken if no tokens are present to substract. We show that two different policies on how tokens and antitokens annihilate produce two classes of automata with different recognition powers.
We introduce a new model of parallelcomputation, the FIFO nets. We show how it can simulate Petri nets and coloured Petri nets and prove that a restriction of it (alphabetical FIFO nets) has the power of Turing machi...
详细信息
We introduce a new model of parallelcomputation, the FIFO nets. We show how it can simulate Petri nets and coloured Petri nets and prove that a restriction of it (alphabetical FIFO nets) has the power of Turing machines. Furthermore, we define monogeneous FIFO nets and use the coverability graph for proving that it is decidable whether or not a monogeneous net is bounded and whether or not its language is regular.
This paper presents a method for synthesizing or growing live and safe marked graph models of decision-free concurrent comutations. The approach is modular in the sense that subsystems r represented by arcs (and nodes...
详细信息
This paper presents a method for synthesizing or growing live and safe marked graph models of decision-free concurrent comutations. The approach is modular in the sense that subsystems r represented by arcs (and nodes) are added one by one without the need of redesigning the entire system. The foliowing properties of marked graph models can be prescribed in the synthesis: liveness (absence of deadlocks), safeness (absence of overflows), the number of reachability classes, the maximum resource (temporary storage) requirement, computation rate (performance), as well as the numbers of arcs and states.
暂无评论