Campbell9;s Lenient Unified Model of parallel Systems (CLUMPS) is presented, a candidate model of parallel computation which aims to tackle and solve the deficiencies of existing candidate models. It is shown that ...
详细信息
Campbell's Lenient Unified Model of parallel Systems (CLUMPS) is presented, a candidate model of parallel computation which aims to tackle and solve the deficiencies of existing candidate models. It is shown that all parallel computers can perform the same computations, but differ in their ability to support different communication loads. this conclusion is reflected in the definition of CLUMPS which aims to be architecture-independent, reflective of execution costs, expressible and intellectually manageable. It also reflects the principle that if a problem can be partitioned into regions, and if those regions are preserved in the mapping of the algorithm to the architecture, then greater communication efficiency can be achieved than if the locality was not preserved. Algorithmic skeletons are seen as high level language constructs capturing parallelism, hence communication, in a regular and manageable manner. Such skeletons can be costed in terms of CLUMPS to provide parallel performance prediction.
In this paper, we present a set of asynchronous backtrackable communication primitives and their integration into the SLOOP parallel Object-Oriented Logic language. After a brief survey of inter-process communication ...
详细信息
In this paper, we present a set of asynchronous backtrackable communication primitives and their integration into the SLOOP parallel Object-Oriented Logic language. After a brief survey of inter-process communication models in parallel logic languages, we define the set of communication primitives and give a uniform semantics of the extension of Prolog thus obtained. then, we detail an implementation of these primitives based on the study of a dependency graph recording the connections between communication points. In order to integrate the primitives into the SLOOP language, we have defined a set of system classes used as an interface between the Prolog layer and the object layer. Lastly, an example illustrates most of the concepts supported by our language.
Generalized Stochastic Petri Nets (GSPN) have gained a wide acceptance as a modeling tool for the performance analysis of concurrent systems. However, the applicability of this methodology is severely limited by the p...
详细信息
ISBN:
(纸本)3540600299
Generalized Stochastic Petri Nets (GSPN) have gained a wide acceptance as a modeling tool for the performance analysis of concurrent systems. However, the applicability of this methodology is severely limited by the potential state space explosion phenomenon. In this paper we describe massively parallel approaches to the most computing-intensive part of the solution of GSPN models: the state space construction. the effectiveness of these parallel approaches stays, for every GSPN, in their ability to deal with very large reachability spaces in reasonable time. Boththe SIMD and the MIMD programming models are considered, and examples are given using recent massively parallelprocessingarchitectures (CM-5, T3D).
In this paper, a highly-parallel architecture for concurrent rule match is proposed to speed up the execution time of match process of AI production systems. the architecture fully exploits the advantages of content-a...
详细信息
In this paper, a highly-parallel architecture for concurrent rule match is proposed to speed up the execution time of match process of AI production systems. the architecture fully exploits the advantages of content-addressable memory (CAM) not only to buffer the database of current assertions, called the working memory (WM), but also to support the functions of parallelly evaluating interconditions among patterns of productions. the architecture first compiles the left-hand side (LHS) of each production into a symbolic form, and then assigns a CAM cell array, called CAM block, to each production for buffering elements as well as evaluating interconditions. the set of productions that are affected during a match cycle can be evaluated concurrently and independently within their own CAM blocks. Due to the uniformity of constructing arrays of processing elements by CAM blocks, the novel architecture is suitable for VLSI implementation. the analysis of the expected performance indicates that the novel architecture might speed up conventional forward-chaining production systems by perhaps a factor of 100 or higher.
Tools for computer-aided satellite image analysis require interactivity, i.e. the capability to modify some parameters and see instantaneously the result of the processing, for efficient work. Due to the amount of dat...
详细信息
With nearest neighbor load balancing algorithms, a processor makes balancing decisions based on its local information and manages work load migrations within its neighborhood. this paper compares a couple of fairly we...
详细信息
With nearest neighbor load balancing algorithms, a processor makes balancing decisions based on its local information and manages work load migrations within its neighborhood. this paper compares a couple of fairly well-known nearest neighbor algorithms, the dimension exchange and the diffusion methods and their variants in terms of their performances in both one-port and all-port communication architectures. It turns out that the dimension exchange method outperforms the diffusion method in the one-port communication model, and that the strength of the diffusion method is in asynchronous implementations in the all-port communication model. the underlying communication networks considered assume the most popular topologies, the mesh and the torus and their special cases: the hypercube and the k-ary n-cube.< >
We present two scalar processors called PVP-SWPC and PVP-SWSW for high-speed list vector processing. Memory access latency should be tolerated for this objective. PVP-SWPC tolerates the latency by introducing slide-wi...
详细信息
We present two scalar processors called PVP-SWPC and PVP-SWSW for high-speed list vector processing. Memory access latency should be tolerated for this objective. PVP-SWPC tolerates the latency by introducing slide-windowed floating-point registers and prefetch-to-cache instruction. PVP-SWSW tolerates the latency by introducing slide-windowed general and floating-point registers. Owing to the slide-window structure, both processors can utilize more registers in keeping upward compatibility with existing scalar architecture. the evaluation shows that these processors successfully hide memory latency and realize fast list vector processing.
this paper presents an self-steered algorithm which can be used to correct pointing errors in microwave communications. the algorithm is based on the fact that the output power of an optimized beamformer achieves a lo...
详细信息
this paper presents an self-steered algorithm which can be used to correct pointing errors in microwave communications. the algorithm is based on the fact that the output power of an optimized beamformer achieves a local maximum if the steering vector coincides withthat of the desired signal, so long as the interferences are outside the mainbeam. By approximating the steering vector by its first order Taylor series expansion in terms of the steering angles, the maximization process reduces to a 2-dimensional optimization problem. Numerical results are presented to illustrate the performance achievable.
Monte Carlo (MC) and molecular dynamics (MD) simulations are powerful tools for understanding the properties of systems of interacting electrons and phonons in a solid. When mobile electrons are studied, these simulat...
详细信息
Monte Carlo (MC) and molecular dynamics (MD) simulations are powerful tools for understanding the properties of systems of interacting electrons and phonons in a solid. When mobile electrons are studied, these simulations are limited to a few hundred particles. More powerful machines and algorithms must be used to address many of the most important issues in the field. We present results from using the p4 parallel programming system on a variety of parallelarchitectures to conduct MC and MD simulations.< >
this paper explores the fundamental impacts of pipeline technology on massively parallel SIMD architectures. the potential for performance improvement in the instruction delivery is explored, and stall penalties assoc...
详细信息
this paper explores the fundamental impacts of pipeline technology on massively parallel SIMD architectures. the potential for performance improvement in the instruction delivery is explored, and stall penalties associated with reduction operations are derived. Scheduling mechanisms to mitigate stall cycles are also presented. In addition, the design of pipelined processing elements is considered, and formula for stall penalties, and area costs are constructed. these results indicate that a 5-10 fold improvement in SIMD performance is well within technological limits.< >
暂无评论