We claim in this paper that both remote process creation and process migration are efficient mechanisms to be used in the improvement or development of highperformancecomputer systems. In particular, we demonstrate ...
详细信息
ISBN:
(纸本)0818673982
We claim in this paper that both remote process creation and process migration are efficient mechanisms to be used in the improvement or development of highperformancecomputer systems. In particular, we demonstrate that the claims made by some researchers that process migration is too heavy to be used to support dynamic load balancing are unsubstantiated. We support our claim by presenting these two mechanisms available in the RHODOS distributed operating system, comparing and contrasting these mechanisms and reporting on their performance.
We discuss here the emergent Web based distributed environments for HPCC on the NII withthe focus on Java as an enabling technology. We start with a review of the past, presence and the near term future of the 'J...
详细信息
ISBN:
(纸本)0818675829
We discuss here the emergent Web based distributed environments for HPCC on the NII withthe focus on Java as an enabling technology. We start with a review of the past, presence and the near term future of the 'Java phenomenon', exposed here in the background of some related previous approaches towards a distributed interpretative virtual machine architecture.
Achieving 100 TeraOps performance within a ten-year horizon will require massively-parallel architectures that exploit both commodity software and hardware technology for cost efficiency. Increasing clock rates and sy...
详细信息
ISBN:
(纸本)0818675519
Achieving 100 TeraOps performance within a ten-year horizon will require massively-parallel architectures that exploit both commodity software and hardware technology for cost efficiency. Increasing clock rates and system diameter in clock periods will make efficient management of communication and coordination increasingly critical. Configurable logic presents a unique opportunity to customize bindings, mechanisms, and policies which comprise the interaction of processing, memory, I/O and communication resources. this programming flexibility, or customizability, can provide the key to achieving robust highperformance. the MultiprocessOr with Reconfigurable Parallel Hardware (MORPH) uses reconfigurable logic blocks integrated withthe system core to control policies, interactions, and interconnections. this integrated configurability can improve the performance of local memory hierarchy, increase the efficiency of interprocessor coordination, or better utilize the network bisection of the machine. MORPH provides a framework for exploring such integrated application-specific customizability. Rather than complicate the situation, MORPH's configurability supports component software and interoperabililty frameworks, allowing direct support for application-specified patterns, objects, and structures. this paper reports the motivation and initial design of the MORPH system.
In this paper, we address the problem of supporting highperformance Distributed computing (HPDC) applications running over ATM networks. For this purpose, we consider a logically separate subnetwork for these applica...
详细信息
ISBN:
(纸本)0818675829
In this paper, we address the problem of supporting highperformance Distributed computing (HPDC) applications running over ATM networks. For this purpose, we consider a logically separate subnetwork for these applications. After presenting an architectural reference model for the HPDC subnetwork and distinguishing which functions should be installed over the ATM network in order to satisfy the needs of HPDC applications, we propose two mechanisms that aim at optimizing communications by taking advantage of boththe special properties of HPDC traffic and the cell-based nature of ATM. the performance of these mechanisms is evaluated and compared withthat achieved by the SSCOP protocol. the results show that when the ATM network experiences high load and the HPDC applications make an intensive use of arrays, cell-based mechanisms become more robust than standard SSCOP and provide low latency and efficient cell loss recovery. Since both situations are very likely to occur in HPDC environments, we conclude that the introduction of cell-based retransmission mechanisms does contribute to enhance performance of HPDC systems over ATM networks.
In this paper we present the design and implementation of a conservative garbage collection algorithms for distributed shared memory (DSM) applications that use weakly-typed languages like C or C++, and evaluate its p...
详细信息
ISBN:
(纸本)0818673982
In this paper we present the design and implementation of a conservative garbage collection algorithms for distributed shared memory (DSM) applications that use weakly-typed languages like C or C++, and evaluate its performance. In the absence of language support to identify references, our algorithm constructed a conservative approximation of the set of cross-node references based on local information only. It was also designed to tolerate memory inconsistency on DSM systems that use relaxed consistency protocols. these techniques enabled every node to perform garbage collections without communicating with others, effectively avoiding the high cost of cross-node communication in networks of workstations. We measured the performance of our garbage collector against explicit programmer management using three application programs. In two out of the three programs the performance of the GC version is within 15% of the explicit version. the results showed that the garbage collector has two effects on application programs. On one hand, it tends to reduce memory locality, increasing the communication cost;on the other hand, it may eliminate synchronization and memory accesses that would be incurred if memory were managed by the programmer, reducing the communication cost.
Non-blocking atomic commitment protocols enable a decision (commit or abort) to be reached at every correct participant, despite the failure of others. the cost for non-blocking implies however (1) a high number of me...
详细信息
ISBN:
(纸本)0818673982
Non-blocking atomic commitment protocols enable a decision (commit or abort) to be reached at every correct participant, despite the failure of others. the cost for non-blocking implies however (1) a high number of messages and communication steps required to reach commit, and (2) a complicated termination protocol needed in the case of failure suspicions. In this paper, we present a non-blocking protocol, called MD3PC (Modular and Decentralized three Phase Commit), which enables to trade resiliency against efficiency. As conveyed by our performance measures, MD3PC is faster than existing non-blocking protocols, and in the case of a broadcast network and a reasonable resiliency rate (e.g 2 or 3) is almost as efficient as the classical (blocking) 2PC. the termination protocol of MD3PC is encapsulated inside a majority consensus protocol. this modularity leads to a simple structure of MD3PC and enables a precise characterization of its liveness in an asynchronous system with an unreliable failure detector.
DAISy (Distributed Array of Inexpensive Systems) is a 16 node PC cluster running a full UNIX compatible operating system. the network media used includes standard 10Mb/s (10BASE-2) Ethernet (used for client node NFS m...
详细信息
ISBN:
(纸本)0818675829
DAISy (Distributed Array of Inexpensive Systems) is a 16 node PC cluster running a full UNIX compatible operating system. the network media used includes standard 10Mb/s (10BASE-2) Ethernet (used for client node NFS mounts and any client node interactive work users find necessary), and, switched 100Mbs/ (100BASE-TX) Fast Ethernet (used for user program message passing traffic). the DAISy cluster is used to investigate the viability of commodity PC technology to perform computation of scientific and engineering problems traditionally performed on 'Supercomputers,' and more recently highperformance RISC workstations and clusters of RISC workstations. performance analysis of the various single node subsystems were carried out, along withperformance analysis of the cluster as a whole on a number of parallel applications. the results show that the current Pentium 90MHz CPU and motherboards used are well within that of many low-end workstations offered by traditional workstation vendors.
In this paper, we study highperformance networks with wormhole routing and investigate their performance in terms of meeting message delay constraints. Traditional system uses unregulated greedy transmission control....
详细信息
In this paper, we study highperformance networks with wormhole routing and investigate their performance in terms of meeting message delay constraints. Traditional system uses unregulated greedy transmission control. this may result in unfairness of network access and unbounded packet blocking time, making it very difficult to efficiently support real-time applications. To overcome this problem, we propose a regulated transmission control method in which packet transmission at the source is regulated and hence unnecessary network contention is eliminated. the regulated method is a generalization of the unregulated method and can be easily implemented in most of the commercially available networks.
JUMP-1 is a distributed shared-memory massively parallel computer and is composed of multiple clusters of interconnected network called RDT (Recursive Diagonal Torus). Each cluster in JUMP-1 consists of 4 element proc...
详细信息
JUMP-1 is a distributed shared-memory massively parallel computer and is composed of multiple clusters of interconnected network called RDT (Recursive Diagonal Torus). Each cluster in JUMP-1 consists of 4 element processors, secondary cache memories, and 2 MBP (Memory Based Processor) for high-speed synchronization and communication among clusters. the I/O subsystem is connected to a cluster via a high-speed serial link called STAFF-Link. the I/O buffer memory is mapped onto the JUMP-1 global shared-memory to permit each I/O access operation as memory access. In this paper we describe evaluation of the fundamental performance of the disk I/O subsystem using event-driven simulation, and estimated performance with a Video On Demand (VOD) application.
the Legion project at the University of Virginia is an architecture for designing and building system services that provide the illusion of a single virtual machine to users, a virtual machine that provides secure sha...
详细信息
ISBN:
(纸本)0818675829
the Legion project at the University of Virginia is an architecture for designing and building system services that provide the illusion of a single virtual machine to users, a virtual machine that provides secure shared object and shared name spaces, application adjustable fault-tolerance, improved response time, and greater throughput. Legion targets wide area assemblies of workstations, supercomputers, and parallel supercomputers. Legion tackles problems not solved by existing workstation based parallel processing tools;the system will enable fault-tolerance, wide area parallel processing, inter-operability, heterogeneity, a single global name space, protection, security, efficient scheduling, and comprehensive resource management. this paper describes the core Legion object model, which specifies the composition and functionality of Legion's core objects - those objects that cooperate to create, locate, manage, and remove objects in the Legion system. the object model facilitates a flexible extensible implementation, provides a single global name space, grants site autonomy to participating organizations, and scales to millions of sites and trillions of objects.
暂无评论