Our main result in this paper is a parallel algorithm for suffiz-prefix matching that has optimal speedup on a CRCW PRAM. It runs in time O(logn) using n/log n processors. This algorithm is important because we utihze...
详细信息
The authors have simulated several numerical and nonnumerical algorithms on five distributed-memory parallel processors (DMPPs). All five DMPPs have the same topology (a torus) and the same number of nodes. The archit...
详细信息
ISBN:
(纸本)9780897913195
The authors have simulated several numerical and nonnumerical algorithms on five distributed-memory parallel processors (DMPPs). All five DMPPs have the same topology (a torus) and the same number of nodes. The architectures differ only in the speed of communication between neighboring nodes, with the computation unit unchanged. The authors quantify the effect that interprocessor communication speed and synchronization overhead have on the performance of the DMPPs. After introducing their rationale and reviewing related work, the authors present and discuss the results of the simulations.
This paper shows how n-node, e-edge graphs can be contracted in a manner similar to the parallel tree contraction algorithm due to Miller and Reif. We give an O((n + e)/lgn)-processor deterministic algorithm that cont...
详细信息
Rule-based systems appear to be capable of exploiting large amounts of parallelism, because it is possible to match each rule to the data memory in parallel. It is pointed out that in practice the speedup from paralle...
详细信息
ISBN:
(纸本)081860719X
Rule-based systems appear to be capable of exploiting large amounts of parallelism, because it is possible to match each rule to the data memory in parallel. It is pointed out that in practice the speedup from parallelism is quite limited, less than 10-fold. The reasons for the small speedup are: (1) the small number of rules relevant to each change to data memory;(2) the large variation in the processing required by the relevant rules;and (3) the small number of changes made to data memory between synchronization steps. To obtain this limited factor of 10-fold speedup, it is necessary to exploit parallelism at a very fine granularity. It is suggested that a suitable architecture to exploit such fine-grain parallelism is a bus-based shared-memory multiprocessor with 32-64 processors. Using such a multiprocessor (with individual processors working at 2 MIPS), it is possible to obtain execution speeds of about 3800 rule-firings/s. This speed is significantly higher than that obtained by other proposed parallel implementations of rule-based systems.
An approach is proposed for modeling off-the-shelf hardware and for modeling parallelalgorithms, along with a design methodology to use the information provided by these models, to design a class of macro-pipelined s...
详细信息
ISBN:
(纸本)0818606347
An approach is proposed for modeling off-the-shelf hardware and for modeling parallelalgorithms, along with a design methodology to use the information provided by these models, to design a class of macro-pipelined special-purpose architectures. Nine parameters to form a model of the characteristics of parallel/distributed algorithms and the environment in which they must execute are presented. In addition, a set of tuples to model the characteristics of computer architectures is presented. By combining the tuples with the parameters, the execution time of the algorithm modeled by the parameters on the hardware modeled by the tuples can be approximated. The combination of these models could be used as a basis for computer-aided tools used in the design of macro-pipelined parallel/distributed processors.
The parallel processing array consists of an n×n array of processors to which a cn2 node directed graph can be input by placing c nodes at every point of the array. It is shown that every one of the following pro...
ISBN:
(纸本)9781450374385
The parallel processing array consists of an n×n array of processors to which a cn2 node directed graph can be input by placing c nodes at every point of the array. It is shown that every one of the following properties of the graph can be computed in order of n steps:(1) to test whether the graph is strongly connected,(2) to mark a node in every strongly connected component,(3) to mark a strong connected component that contains a given node,(4) to mark a simple directed path between a given pair of nodes, and(5) to compute the interconnectivity of the c nodes within every *** a consequence, several open problems can be solved. For example, any language recognized by a 2-dimensional finite state automaton with 1 pebble can be recognized in order of n steps by a parallel processing array. (It was not even known whether the languages recognized by 2-dimensional finite state automata without pebbles can be recognized in order of n steps by a parallel processing array). Note that a 1 pebble automaton can run for order of n4 steps before accepting an input.
暂无评论