distributed shared memory is an attractive option for realizing functionally distributed computing in a wide area distributed environment, because of its simplicity and flexibility in software programming. However, up...
详细信息
distributed shared memory is an attractive option for realizing functionally distributed computing in a wide area distributed environment, because of its simplicity and flexibility in software programming. However, up till now, distributed shared memory has mainly been studied in a local environment. In a widely distributed environment, latency of communication greatly affects system performance. Moreover, bandwidth of networks available in a wide area is dramatically increasing recently DSM architecture using high performance networks must be different from the case of low speed networks being used. In this paper, distributed shared memory models in a widely distributed environment are discussed and evaluated. First, existing distributed shared memory models are examined: They are shared virtual memory and replicated sharedmemory. Next, an improved replicated sharedmemory model, which uses internal machine memory,is proposed. In this model, we assume the existence of a seamless, multi-cast wide area network infrastructure - for example, an ATM network. A prototype of this model using multi-thread programming have been implemented on multi-CPU SPARCstations and an ATM-LAN. These DSM models are compared with SCRAMNet(TM), whose mechanism is based on replicated sharedmemory. Results from this evaluation show the superiority of the replicated sharedmemory compared to shared virtual memory when the length of the network is large. While replicated sharedmemory using external memory is influenced by the ratio of local and global accesses, replicated sharedmemory using internal machine memory is suitable for a wide variety of cases. The replicated sharedmemory model is considered to be suitable particularly for applications which impose real time operation in a widely distributed environment, since some latency hiding techniques such as context switching or data prefetching are not effective for real time demands.
Any parallel program has abstractions that are shared by the program's multiple processes, including data structures containing shared data, code implementing operations like global sums or minima, type instances ...
详细信息
Any parallel program has abstractions that are shared by the program's multiple processes, including data structures containing shared data, code implementing operations like global sums or minima, type instances used for process synchronization or communication. Such shared abstractions can considerably affect the performance of parallel programs, on both distributed and sharedmemory multiprocessors. As a result, their implementation must be efficient, and such efficiency should be achieved without unduly compromising program portability and maintainability. Unfortunately, efficiency and portability can be at cross-purposes, since high performance typically requires changes in the representation of shared abstractions across different parallel machines. The primary contribution of the DSA library presented and evaluated in this paper is its representation of shared abstractions as objects that may be internally distributed across different nodes of a parallel machine. Such distributedshared abstractions (DSA) are encapsulated so that their implementations are easily changed while maintaining program portability across parallel architectures ranging from small-scale multiprocessors, to medium-scale shared and distributedmemory machines, and potentially, to networks of computer workstations. The principal results presented in this paper are 1) a demonstration that the fragmentation of object state across different nodes of a multiprocessor machine can significantly improve program performance, and 2) that such object fragmentation can be achieved without compromising portability by changing object interfaces. These results are demonstrated using implementations of the DSA library on several medium-scale multiprocessors, including the BBN Butterfly, Kendall Square Research, and SGI sharedmemory multiprocessors. The DSA library's evaluation uses synthetic workloads and a parallel implementation of a branch-and-bound algorithm for solving the Traveling Salesperson P
distributed shared memory combines the advantages of sharedmemory multi-processors and distributed computer systems. Evaluations of four experimental or commercial approaches to hardware DSM show its potential for la...
详细信息
distributed shared memory combines the advantages of sharedmemory multi-processors and distributed computer systems. Evaluations of four experimental or commercial approaches to hardware DSM show its potential for large-scale, high-performance multiprocessor systems.
This paper describes the DOSMOS(1) parallel programming environment. Based on a DSM layer, this system has been specially designed to ensure scalability and efficiency. Several novel features are introduced as the gro...
详细信息
ISBN:
(纸本)0780335295
This paper describes the DOSMOS(1) parallel programming environment. Based on a DSM layer, this system has been specially designed to ensure scalability and efficiency. Several novel features are introduced as the grouping of processes, the possibility of mixing message-passing (PVM) code and DSM code, the definition of optimized weak consistency protocols, the integration of monitoring facilities. First experiments on networks of workstations show the effectiveness of these features.
distributed shared memory has increasingly become a desirable programming model on which to program multicomputer systems. Such systems strike a balance between the performance attainable in distributed-memory multipr...
详细信息
distributed shared memory has increasingly become a desirable programming model on which to program multicomputer systems. Such systems strike a balance between the performance attainable in distributed-memory multiprocessors and the ease of programming on shared-memory systems. In shared-memory systems, concurrent tasks communicate through shared variables, and synchronization of access to shared data is an important issue. Semaphores have been traditionally used to provide this synchronization. In this paper we propose a decentralized scheme to support semaphores in a virtual shared-memory system. Our method of grouping semaphores into semaphore pages and caching a semaphore at a processor on demand eliminates the reliability problems and bottlenecks associated with centralized schemes. We compare the performance of our scheme with a centralized implementation of semaphores and conclude that our system performs better under high semaphore access rates as well as larger numbers of processors.
The abstraction of a sharedmemory is of growing importance in distributed computing systems. Traditional memory consistency ensures that all processes agree on a common order of all operations on memory. Unfortunatel...
详细信息
The abstraction of a sharedmemory is of growing importance in distributed computing systems. Traditional memory consistency ensures that all processes agree on a common order of all operations on memory. Unfortunately, providing these guarantees entails access latencies that prevent scaling to large systems. This paper weakens such guarantees by defining causal memory, an abstraction that ensures that processes in a system agree on the relative ordering of operations that are causally related. Because causal memory is weakly consistent, it admits more executions, and hence more concurrency, than either atomic or sequentially consistent memories. This paper provides a formal definition of causal memory and gives an implementation for message-passing systems. In addition, it describes a practical class of programs that, if developed for a strongly consistent memory, run correctly with causal memory.
WAKASHI/C (centralized server version) is a subsystem at the base level of the ''Shusse Uo'' project. It provided the C programmer with the basic functions to manipulate the distributedshared persiste...
详细信息
WAKASHI/C (centralized server version) is a subsystem at the base level of the ''Shusse Uo'' project. It provided the C programmer with the basic functions to manipulate the distributedshared persistent data and the distributedshared volatile data. In WAKASHI/C, the virtual memory scheme and the distributed shared memory scheme are combined to realize the distributedshared persistent heap and the distributed and shared volatile heap, for the efficient handling of multimedia data. This paper discusses the realization of WAKASHI/C corresponding to the distributed computing environment. To evaluate the performance for the manipulation of the multimedia database, the extended object operation benchmark is proposed, and the execution performance of WAKASHI/C is evaluated.
In this paper, a general blackboard based system is studied using Stochastic Perti Nets (SPN). We show how the dynamic behavior of the blackboard can be modeled, and analyze the marketing graph of the resulting stocha...
详细信息
In this paper, a general blackboard based system is studied using Stochastic Perti Nets (SPN). We show how the dynamic behavior of the blackboard can be modeled, and analyze the marketing graph of the resulting stochastic Perti Net. The net is simulated with respect to the number of available processors, the event interarrival time and the average unification time, and performance results are presented.
Conventional multiprocessors mostly use centralized, memory-based barriers to synchronize concurrent processes created in multiple processors. These centralized barriers often become the bottleneck or hot spots in the...
详细信息
Conventional multiprocessors mostly use centralized, memory-based barriers to synchronize concurrent processes created in multiple processors. These centralized barriers often become the bottleneck or hot spots in the sharedmemory, In this paper, we overcome the difficulty by presenting a distributed and hardwired barrier architecture, that is hierarchically constructed for fast synchronization in cluster-structured multiprocessors, The hierarchical architecture enables the scalability of cluster-structured multiprocessors. A special set of synchronization primitives is developed for explicit use of distributed barriers dynamically, To show the application of the hardwired barriers, we demonstrate how to synchronize Doall and Doacross loops using a limited number of hardwired barriers, Timing analysis shows an O(10(2)) to O(10(5)) reduction in synchronization overhead, compared with the use of software-controlled barriers implemented in a sharedmemory, The hardwired architecture is effective is implementing any partially ordered set of barriers or fuzzy barriers with extended synchronization regions, The versatility, scalability, programmability, and low overhead make the distributed barrier architecture attractive in constructing fine-grain, massively parallel MIMD systems using multiprocessor clusters with distributed shared memory.
distributed systems are an alternative to shared-memory multiprocessors for the execution of parallel applications. PANDA is a run-time system that provides architectural support for efficient parallel and distributed...
详细信息
distributed systems are an alternative to shared-memory multiprocessors for the execution of parallel applications. PANDA is a run-time system that provides architectural support for efficient parallel and distributed programming. It supplies fast user-level threads and a means for transparent and coordinated sharing of objects across a homogeneous network. The paper motivates the major architectural choices that guided our design. The problem of sharing data in a distributed environment is discussed, and the performance of the mechanisms provided by the PANDA prototype implementation is assessed.
暂无评论