As the prices of commodity workstations go down, clusters of workstations have started to emerge as a viable economic solution for scalable computing. Recent advances in networking technology have made it possible to ...
详细信息
As the prices of commodity workstations go down, clusters of workstations have started to emerge as a viable economic solution for scalable computing. Recent advances in networking technology have made it possible to obtain high-bandwidth connections between applications. However, the interconnect latency between workstation nodes in a cluster remains a serious concern and can prove to be the limiting factor in workstation performance. In this paper, we present the CNI or cluster network interface that achieves the twin goals of low latency and high bandwidth. In addition, CNI efficiently supports multiple programming paradigms for programming generality. This is done by functionally coupling the network interface more closely to the CPU without violating the constraints of a standard workstation architecture. CNI results in performance gains for applications, substantially reducing communication overhead and delay.
A second order Moller-Plesset (MP2) energy gradient algorithm for distributedmemory parallel computers is described. A direct approach is used in that integrals are recalculated as required, but the degree of recalcu...
详细信息
A second order Moller-Plesset (MP2) energy gradient algorithm for distributedmemory parallel computers is described. A direct approach is used in that integrals are recalculated as required, but the degree of recalculation is minimized by exploiting the large global memory typically available on parallel machines. Results, obtained using up to 256 processors of the Gray T3D show very good scalability, with over 99.5% parallelism.
According to the characteristics of large scale network computing systems, we proposed a group consistency model based on the concept of group to construct a DSM system. The novel model can use different inter-group a...
详细信息
According to the characteristics of large scale network computing systems, we proposed a group consistency model based on the concept of group to construct a DSM system. The novel model can use different inter-group and intra-group consistencies and lend itself to flexible, easily-managable, and application-suitable DSM in large scale systems. A group consistency model, which applies entry consistency among groups and lazy release consistency in a group, together with its implementation policy is discussed in this paper. It employs write-update and multiple-writer protocols in a group, and thus facilitates the simultaneous read and write in a group. The suitable protocols eliminate the false sharing and reduce the data acquiring time in a group. Furthermore, the inter-group consistency also suits the features of data sharing among groups and transmits the data modifications originated from a group in bulk to reduce the network traffic. In the end, an example using group consistency model is given and the trivial group consistency is discussed.
The performance of distributed shared memory depends on the memory coherence algorithms and the access characteristics of shared data. In this paper, we propose an efficient coherence scheme using multiple coherence a...
详细信息
The performance of distributed shared memory depends on the memory coherence algorithms and the access characteristics of shared data. In this paper, we propose an efficient coherence scheme using multiple coherence algorithms with self-adjusting feature. Our method can dynamically choose a more adaptive coherence algorithm for each variable class and the incorrect classification of shared variables will not affect the performance. We show that for each fixed classification, application programs suffer 5.1%, 4.6%, and 48.9% increases in the average execution time, when compared against the performance of a self-adjusting scheme. Experiments have shown our approach achieving good performance.
We describe the evolution of a distributed shared memory (DSM) system, Mirage, and the difficulties encountered when moving the system from a Unix-based kernel on the VAX to a Unix-based kernel on personal computers. ...
详细信息
We describe the evolution of a distributed shared memory (DSM) system, Mirage, and the difficulties encountered when moving the system from a Unix-based kernel on the VAX to a Unix-based kernel on personal computers. Mirage provides a network transparent form of sharedmemory for a loosely coupled environment. The system hides network boundaries for processes that are accessing sharedmemory and is upward compatible with the Unix System V Interface Definition. This paper addresses the architectural dependencies in the design of the system and evaluates performance of the implementation. The new version, Mirage+, performs well compared to Mirage even though eight times the amount of data is sent on each page fault because of the larger page size used in the implementation. We show that performance of systems with a large page size to network packet size can be dramatically improved on conventional hardware by applying three well-known techniques: packet blasting, compression, and running at interrupt level. The measured time for a page fault in Mirage+ has been reduced 37 per cent by sending a page using packet blasting instead of using a handshake for each portion of the page. When compression was added to Mirage+, the time to fault a page across the network was further improved by 47 per cent when the page was compressed into one network packet. Our measured performance compares favorably with the amount of time it takes to fault a page from disk. Lastly, running at interrupt level may improve performance 16 per cent when faulting pages without compression.
Software distributed shared memory (DSM) provides a convenient and effective solution for programming parallel applications on distributed systems. However, the performance of current implementations suffers from larg...
详细信息
Software distributed shared memory (DSM) provides a convenient and effective solution for programming parallel applications on distributed systems. However, the performance of current implementations suffers from large overhead in enforcing memory coherence. Coherence faults are the sources of massive network traffic. Various memory consistency models have been proposed in order to eliminate the effects of network traffic and memory latency. In this paper, we present a novel approach that combines relaxed memory consistency models and a compiler strategy to solve memory coherence problems for DSM. This approach produces fewer coherence faults. Experimental results also show this hybrid approach is effective for reducing the memory coherence overhead of DSM.
Communication latency is central to multiprocessor design. This study presents the design principles of the EM-X distributed-memory multiprocessor towards tolerating communication latency. The EM-X overlaps computatio...
详细信息
Communication latency is central to multiprocessor design. This study presents the design principles of the EM-X distributed-memory multiprocessor towards tolerating communication latency. The EM-X overlaps computation with communication for latency tolerance by multithreading. In particular, we present two types of hardware support for remote memory access: (1) priority-based packet scheduling for thread invocation, and (2) direct remote memory access. The priority-based scheduling policy extends a FIFO ordered thread invocation policy to adopt to different computational needs. The direct remote memory access is designed to overlap remote memory operations with thread execution. The 80-processor prototype of EM-X is developed and is operational since December 1995. We execute several programs on the machine and evaluate how the EM-X effectively overlaps computation with communication toward tolerating communication latency for high performance parallel computing.
The management of memory coherence is an important problem in distributed shared memory (DSM) system. In a cache-based coherence DSM system using linked list structure, the key to maintaining the coherence and improvi...
详细信息
The management of memory coherence is an important problem in distributed shared memory (DSM) system. In a cache-based coherence DSM system using linked list structure, the key to maintaining the coherence and improving system performance is how to manage the owner in the linked list. This paper presents the design of a new management protocol-NONH (New-OwnerNew-Head) and its performance evaluation. The analysis results show that thisprotocol can improve the scalability and performence of a coherent DSM system using linked list. It is also suitable for managing the cache coherency in tree-like hierarchical architecture.
A consistency condition for distributed shared memory is fast if it has a fast implementation in which the execution time of every operation is significantly faster than the network delay. These conditions include Pip...
详细信息
A consistency condition for distributed shared memory is fast if it has a fast implementation in which the execution time of every operation is significantly faster than the network delay. These conditions include Pipelined RAM, weak consistency, causal memory, and one interpretation of processor consistency. It is shown that if a condition is fast then it does not support non-centralized solutions for mutual exclusion.
暂无评论