Communication is Logically Instantaneous (LI) if it is possible to timestamp communication events with integers in such a way that (1) timestamps increase within each process and (2) the sending and the delivery event...
详细信息
ISBN:
(纸本)3540663630
Communication is Logically Instantaneous (LI) if it is possible to timestamp communication events with integers in such a way that (1) timestamps increase within each process and (2) the sending and the delivery events associated with each message have the same timestamp. So, there is a logical time frame in which for each message, the send event and the corresponding delivery events occur simultaneously. LI is stronger than Causally Ordered (CO) communication, but weaker than Rendezvous (RDV) communication. this paper explores Logically Instantaneous communication and provides a simple and efficient protocol that implements LI on top of asynchronous distributed systems. LI is attractive as it includes co and provides more concurrency than RDV. Moreover it allows to adopt the following approach: first design a distributed application assuming Rendezvous communication, and then run it on top of an asynchronous distributed system providing only LI communication.
All-to-all communication is one of the most dense communication patterns and occurs in many important applications in parallelcomputing. In this paper, we present a new all-to-all broadcast algorithm in all-port mesh...
详细信息
ISBN:
(纸本)0769500048
All-to-all communication is one of the most dense communication patterns and occurs in many important applications in parallelcomputing. In this paper, we present a new all-to-all broadcast algorithm in all-port mesh and torus networks. Unlike existing all-to-all broadcast algorithms, the new algorithm takes advantage of overlapping of message switching time and transmission time, and achieves optimal transmission time for all-to-all broadcast. In addition, in most cases, the total communication delay is close to the lower bound of all-to-ail broadcast within a small constant range. Finally, the algorithm is conceptually simple, and symmetrical for every message and every node so that it can be easily implemented in hardware and achieves the optimum in practice.
this paper investigates the effectiveness of parallelcomputing in the calculation of the bispectrum. the bispectrum is estimated by using two different methods namely the direct and the indirect. the direct method em...
详细信息
the discrete wavelet transform requires repeated high and low pass filtering of data scan lines. there has been a variety of special purpose devices and parallel algorithms designed to speedup this computationally int...
详细信息
We present an integrated environment for the systematic development of parallel and distributed programs. Our approach allows the user to construct complex applications by composing and transforming skeletons, i.e., r...
详细信息
ISBN:
(纸本)3540663630
We present an integrated environment for the systematic development of parallel and distributed programs. Our approach allows the user to construct complex applications by composing and transforming skeletons, i.e., recurring patterns of task and data parallelism. First academic and commercial experience with skeleton-based systems has demonstrated the benefits of the approach but also the lack of a dedicated set of methods for algorithm design and performance prediction. We take a first step towards such a set of methods by proposing an environment which integrates a framework for algorithm transformation, called FAN, with two existing skeleton-based programming systems: the academic system P3L and its commercial counterpart SkIE.
Process duplication is a classical method to cope with process crashes: a set of replicated processes constitutes a group that implements some fault-tolerant service. Several distributed systems are structured as a se...
详细信息
ISBN:
(纸本)3540663630
Process duplication is a classical method to cope with process crashes: a set of replicated processes constitutes a group that implements some fault-tolerant service. Several distributed systems are structured as a set of interacting reliable groups. this paper presents a clock management protocol where a logical clock is associated with each group (usually, logical clocks are associated with processes). the main problem that has to be salved is to ensure that all processes of a group behave in the same manner despite non-deterministic statements. It is shown that this problem can be reduced to the consensus problem. So, the proposed group clock protocol is based on an underlying building block providing a solution to the consensus problem.
the proceedings contain 59 papers. the special focus in this conference is on parallelcomputing in Regular Structures. the topics include: Analytical modeling of parallel application in heterogeneous computing enviro...
ISBN:
(纸本)3540663630
the proceedings contain 59 papers. the special focus in this conference is on parallelcomputing in Regular Structures. the topics include: Analytical modeling of parallel application in heterogeneous computing environments;skeletons and transformations in an integrated parallel programming environment;sequential unification and aggressive lookahead mechanisms for data memory accesses;a coordination model and facilities for efficient parallel computation;parallelizing of sequential programs on the basis of pipeline and speculative features of the operators;kinetic model of parallel data processing;PSA approach to population models for parallel genetic algorithms;highly accurate numerical methods for incompressible 3D fluid flows on parallel architectures;dynamic task scheduling with precedence constraints and communication delays;two-dimensional scheduling of algorithms with uniform dependencies;consistent lamport clocks for asynchronous groups with process crashes;comparative analysis of learning methods of cellular-neural associative memory;emergence and propagation of round autowave in cellular neural network;routing and embeddings in super cayley graphs;implementing cellular automata based models on parallel architectures;overview, design innovations, and preliminary results;implementing model checking and equivalence checking for time petri nets by the RT-MEC tool;learning concurrent programming;the speedup performance of an associative memory based logic simulator;a high-level programming environment for distributed memory architectures;virtual shared files;an object oriented environment to manage the parallelism of the FIIT applications;performance studies of shared-nothing parallel transaction processing systems;synergetic tool environments and logically instantaneous communication on top of distributed memory parallel machines.
Oil a large-scale parallel computer system, shared memory provides a general and convenient programming environment. this paper describes a lightweight method for constructing an efficient shared memory system support...
详细信息
ISBN:
(纸本)0769500048
Oil a large-scale parallel computer system, shared memory provides a general and convenient programming environment. this paper describes a lightweight method for constructing an efficient shared memory system supported by hierarchical coherence management and generalized combining. the hierarchical management technique and generalized combining cooperate with each other We eliminate the following heavyweight and high-cost factors: a large amount of directory memory which is proportional to the number of processors, a separate memory component for the directory tag/state information, and a protocol processor In our method, the amount of memory required for the directory is proportional to the logarithm of the number of processors. this implies that a single,word for each memory block is sufficient for covering a massively parallel system and that the access costs of the directory are small. Moreover our combining technique, generalized combining, does nor expect the accidental events which existing combining networks do, that is, events that messages meet each other at a switching node. A switching node can combine succeeding messages,vith a preceding one even after the preceding message leaves the node. this carl increase the rate of successful combining. We have developed a prototype parallel computer OCHANOMIZ-5, that implements this lightweight distributed shared memory and generalized combining with simple hardware. the results of evaluating the prototype's performance using several programs show that our methodology provides the advantages of parallelization.
the proceedings contain 210 papers. the topics discussed include: adaptive scheduling for task farming with grid middleware;applying human factors to the design of performance tools;building the teraflops/petabytes pr...
ISBN:
(纸本)3540664432
the proceedings contain 210 papers. the topics discussed include: adaptive scheduling for task farming with grid middleware;applying human factors to the design of performance tools;building the teraflops/petabytes production supercomputing center;a coming of age for Beowulf-class computing;using preemptive thread migration to load-balance data-parallel applications;multi-protocol communications and high speed networks;an online algorithm for dimension-bound analysis;improving the performance of distributed shared memory environments on grid multiprocessors;performance analysis of wormhole switching with adaptive routing in a two-dimensional torus;set associative cache behavior optimization;a performance study of modern web server applications;performance evaluation of object oriented middleware;and performance evaluation and benchmarking of native signal processing.
distributed Caml (D'Caml) is a distributed implementation of Caml, a dialect of ML. the compiler produces native code for diverse execution platforms. the distributed shared memory allows transmission and sharing ...
详细信息
暂无评论