A stochastic model for a class of distributed asynchronous fixed point algorithms is presented and a methodology for optimizing the rate of convergence is introduced. An important parameter in our model, called the de...
详细信息
ISBN:
(纸本)0818626720
A stochastic model for a class of distributed asynchronous fixed point algorithms is presented and a methodology for optimizing the rate of convergence is introduced. An important parameter in our model, called the degree of synchronization, quantifies the average amount of time each processor is willing to wait for information from other processors (before beginning computation of its update variable based on the available estimates of variables from other processors). the main focus of the paper is to analyze the relationship between the convergence rate and the degree of synchronization for a class of iterative fixed point algorithms. Preliminary analysis indicates that significant improvements in convergence rates can be achieved by proper control of the parameters in our model.
this paper compares and analyzes massively parallel SIMD architectures as processing environments for parallel logic simulation. the CM-2 and the MP-1 are considered as target machines for the comparison. Detailed con...
详细信息
ISBN:
(纸本)0818626720
this paper compares and analyzes massively parallel SIMD architectures as processing environments for parallel logic simulation. the CM-2 and the MP-1 are considered as target machines for the comparison. Detailed contrasts between the two parallel schemes are made based on actual simulation results and system performance. distributed event-driven simulation protocols are used to obtain experimental results for the two massively SIMD machines. According to the results, the MP-1 is 2 to 2.5 times faster than the CM-2 for up to 16 K gate benchmark circuits, while the CM-2 can accommodate circuits with a larger number of gates of processors. the presented comparisons and analysis of the two machines can be used to choose a SIMD machine for efficient parallel logic simulation.
With recent improvements in single CPU performance, several issues become more important in multiprocessor design. Two of these are interprocessor communication and locality. In parallel systems with fast CPUs, locali...
详细信息
ISBN:
(纸本)0818626720
With recent improvements in single CPU performance, several issues become more important in multiprocessor design. Two of these are interprocessor communication and locality. In parallel systems with fast CPUs, locality is vital to performance. However, traditional parallel programming models such as shared memory or message passing do not naturally lead to programs that exhibit locality. In this paper, the Seamless model for interprocessor communication is presented which is based on locality and that allows the programmer to explicitly manipulate a program's locality to optimize performance. Additionally, this model can support latency tolerance with proper hardware support. Extensions to the C programming language that support this model are also presented. Finally, a parallel program utilizing this model is provided to illustrate the paradigm.
this paper explores the use of Proteus, an architecture-independent language suitable for prototyping parallel and distributed programs. Proteus is a high-level imperative notation based on sets and sequences with a s...
详细信息
ISBN:
(纸本)0818626720
this paper explores the use of Proteus, an architecture-independent language suitable for prototyping parallel and distributed programs. Proteus is a high-level imperative notation based on sets and sequences with a single construct for the parallel composition of processes communicating through shared memory. Several different parallel algorithms for N-body simulation are presented in Proteus, illustrating how Proteus provides a common foundation for expressing the various parallel programming models. this common foundation allows prototype parallel programs to be tested and evolved without the use of machine-specific languages. To transform prototypes to implementations on specific architectures, program refinement techniques are utilized. Refinement strategies are illustrated that target broad-spectrum parallel intermediate languages, and their viability is demonstrated by refining an N-body algorithm to data-parallel CVL code.
this paper describes a framework for implementing neural networks on massively parallel machines. the framework is generic and applies to a range of neural networks (Multi Layer Perception, Competitive Learning, Self-...
详细信息
ISBN:
(纸本)0818626720
this paper describes a framework for implementing neural networks on massively parallel machines. the framework is generic and applies to a range of neural networks (Multi Layer Perception, Competitive Learning, Self-Organizing Map, etc.) as well as a range of massively parallel machines (Connection Machine, distributed Array Processor, MasPar). It consists of two phases: an abstract decomposition of neural networks and a machine specific decomposition. the abstract decomposition identifies the parallelism implemented by neural networks, and provides alternative distribution schemes according to the required exploitation of parallelism. the machine specific decomposition considers the relevant machine criteria, and integrates these withthe result of the abstract decomposition to form a `decision' system. this system formalizes the relative gain of each distribution scheme according to neural network and machine criteria. It then identifies their possible optimizations. Finally, it computes and ranks the absolute speed up of each distribution scheme.
A distributed system consists of a set of loosely connected state machines which do not share a global memory. All the possible global states of the system can be split up into legal and illegal states. A self-stabili...
详细信息
ISBN:
(纸本)0818626720
A distributed system consists of a set of loosely connected state machines which do not share a global memory. All the possible global states of the system can be split up into legal and illegal states. A self-stabilizing system is a network of processors, which, when started from an arbitrary (and possibly illegal) initial state, always returns to a legal state in a finite number of steps. One issue in designing self-stabilizing algorithms is the number of states required by each machine. this paper presents algorithms which will be self-stabilizing while only requiring each machine in the network to have two states. Probability is used in some of the algorithms in order to make this possible. the algorithms are given along with correctness proofs.
A class of novel interconnection topologies called the generalized Fibonacci cubes is presented. the generalized Fibonacci cubes include the hypercubes, the recently proposed Fibonacci cubes, and some other asymmetric...
详细信息
ISBN:
(纸本)0818626720
A class of novel interconnection topologies called the generalized Fibonacci cubes is presented. the generalized Fibonacci cubes include the hypercubes, the recently proposed Fibonacci cubes, and some other asymmetric interconnection topologies bridging between the two mentioned above. the generalized Fibonacci cubes can serve as a framework for studying degraded hypercubes due to faulty nodes or links. Previously known algorithms for hypercubes do not generalize to this class of interconnection topologies. In this paper, we present distributed routing and broadcasting algorithms that can be applied to all members of this class of interconnection topologies. It is shown that our distributed routing algorithm always finds a shortest and deadlock-free path. the broadcasting algorithms are designed and evaluated based on boththe all-port and the 1-port communication models. the all-port broadcasting algorithm is probably optimal in terms of minimizing routing steps. An upper bound for the 1-port broadcasting algorithm is determined, which is shown to be optimal for certain cases.
Aroma simplifies the task of parallelizing large applications on multicomputers by providing applications with a shared object space. Aroma supports both traditional monolithic objects and aggregate object that can be...
详细信息
ISBN:
(纸本)0818626720
Aroma simplifies the task of parallelizing large applications on multicomputers by providing applications with a shared object space. Aroma supports both traditional monolithic objects and aggregate object that can be partitioned across multiple nodes. Aggregate objects support data parallelism efficiently. An Aroma program consists of tasks that operate on shared objects. Task typically execute on the node on which their input data is located, thus minimizing communication. Shared data objects have synchronization properties associated withthem, making it possible to parallelize a large class of applications without using explicit locks and condition variables. In this paper, we present and justify the Aroma language features, and give examples of Aroma programs. Aroma has been implemented on the Nectar multicomputer and we give performance results for several applications.
A hierarchical task queue organization that avoids the task queue bottleneck associated withthe centralized organization and provides performance better than centralized and distributed organizations is proposed. A d...
详细信息
A paradigm for avoiding buffer deadlock in point-to-point multicomputer networks is presented which is ideal for today's high connectivity, load sharing networks. Unlike the traditional resource ordering principle...
详细信息
ISBN:
(纸本)0818626720
A paradigm for avoiding buffer deadlock in point-to-point multicomputer networks is presented which is ideal for today's high connectivity, load sharing networks. Unlike the traditional resource ordering principle, this paradigm not only allows unrestricted routing but uses the existence of multiple paths to its direct advantage. Deadlock is avoided entirely using exchange buffers which are not used for message queues, thus eliminating queueing overhead. the paradigm is topology-independent, imposes no routing restrictions, and uses states of neighboring links only, so that it can be built into link level protocol, providing unrestricted deadlock-free routing, while operating transparent to any fault-tolerant topology-specific routing algorithm.
暂无评论