NASA's future deep-space missions will require onboard software upgrade. A challenge that arises from this is that of guarding the system against performance loss caused by residual design faults in the new versio...
详细信息
While the MPP is still the most common architecture in supercomputer centers today, a simpler and cheaper machine configuration is appearing at many supercomputing sites. this alternative setup may be described simply...
详细信息
ISBN:
(纸本)0769507832;0769507840
While the MPP is still the most common architecture in supercomputer centers today, a simpler and cheaper machine configuration is appearing at many supercomputing sites. this alternative setup may be described simply as a collection of multiprocessors or a distributed server system. this collection of multiprocessors is fed by a single common stream of jobs, where each job is dispatched to exactly one of the multiprocessor machines for processing. the biggest question which arises in such distributed sewer systems is what is a good rule for assigning jobs to host machines: i.e. what is a good task assignment policy. Many task assignment policies have been proposed, bur not systematically evaluated under supercomputing workloads. In this paper we start by comparing existing task assignment policies using a trace-driven simulation under supercomputing workloads. We validate our experiments by providing analytical proofs of the performance of each of these policies. these proofs also help provide much intuition. We find that while the performance of supercomputing servers varies widely withthe task assignment policy, none of the above task assignment policies perform as well as we would like. We observe that all policies proposed thus far aim to balance loan among the hosts. We propose a policy which purposely unbalances loan among the hosts, yet, counter-to-intuition, is also fair in that it achieves the same expected slowdown for all jobs - thus no jobs are biased against. We evaluate this policy again using both trace-driven simulation and analysis. We find that the performance of the load unbalancing policy is significantly better than the best of those policies which balance load.
the BaBar experiment at the Stanford Linear Accelerator Center (SLAC) is designed to perform a high precision investigation of the decays of the B-meson produced from electron-positron interactions. the experiment, st...
详细信息
ISBN:
(纸本)0769507840
the BaBar experiment at the Stanford Linear Accelerator Center (SLAC) is designed to perform a high precision investigation of the decays of the B-meson produced from electron-positron interactions. the experiment, started in May 1999 will generate approximately 300TB/year of data for 10 years. All of the data will reside in Objectivity databases accessible via the Advanced Multi-threaded Sewer (AMS). To date, over 70TB of data have been placed in Objectivity/DB, making it one of the largest databases in the world. Providing access to such a large quantity of data through a database sewer is a daunting task. A full-scale testbed environment had to be developed to tune various software parameters and a fundamental change had to occur in the AMS architecture to allow it to scale past several hundred terabytes of data. Additionally, several protocol extensions had to be implemented to provide practical access to large quantities of data. this paper will describe the design of the database and the changes that,ve needed to make in the AMS for scalability reasons and how the lessons we learned would be applicable to virtually, and kind of database sewer seeking to operate in the Petabyte region.
the number of applications requiring highly reliable and/or safety-critical computing is increasing. One emerging safety metric is the Mean Time To Unsafe Failure (MTTUF). this paper summarizes a novel technique for d...
详细信息
ISBN:
(纸本)0769509274
the number of applications requiring highly reliable and/or safety-critical computing is increasing. One emerging safety metric is the Mean Time To Unsafe Failure (MTTUF). this paper summarizes a novel technique for determining the MTTUF for a given architecture. the first step in determining the MTTUF for a system is to estimate system Mean Time To Failure (MTTF) and system fault coverage. Once these two parameters are known then the system MTTUF can be calculated. the presented technique allows MTTF and system coverage to be estimated from dependability models that incorporate rime varying failure and/or repair rates. Existing techniques for the estimation of MTTUF require constant rate dependability models. For the sake of simplicity, this paper uses Markov models to calculate MTTUF. the presented approach greatly simplifies the calculation of system MTTUF. Finally, a comparison is made between reliability expected time metrics (MTTF and MTBF) and safety expected rime metrics(MTTUF and MTBUF).
In designing complex systems, a performance evaluation model is essential in determining a system configuration and identifying performance bottlenecks. Several C++-based general-purpose simulation tools such as syste...
详细信息
ISBN:
(纸本)076950728X
In designing complex systems, a performance evaluation model is essential in determining a system configuration and identifying performance bottlenecks. Several C++-based general-purpose simulation tools such as systemC and CynLib have also been introduced. However, these tools are cycle-based, which simulates a system synchronously under the assumption that all modules are invoked even cycle, thus eliminating scheduling overhead in order;to simulate a system containing multiple clocks or asynchronous circuits with accuracy, an event-driven approach is highly desirable. We have developed an event-driven framework of computer system simulation in C++, called simCore, which is mainly targeted for performance evaluation simulation of computer systems, providing concurrent execution of multiple modules and event-driven module interaction mechanisms. However, in order to demonstrate its cycle-accuracy and high simulation speed, rye compared two MIPS-based system simulators, one bused on the C++-based event-driven simulation core and the other based on Verilog-XL.
the proceedings contain 45 papers. the topics discussed include: automatic configuration and run-time adaptation of distributed applications;a distributed multi-storage resource architecture and 110 performance predic...
ISBN:
(纸本)0769507832
the proceedings contain 45 papers. the topics discussed include: automatic configuration and run-time adaptation of distributed applications;a distributed multi-storage resource architecture and 110 performance prediction for scientific computing;Uintah: a massively parallel problem solving environment;an enabling framework for master-worker applications on the computational grid;a component based services architecture for building distributed applications;incorporating job migration and network RAM to share cluster memory resources;using idle workstations to implement predictive prefetching;a monitoring sensor management system for grid environments;robust resource management for metacomputers;performance evaluation of a firewall-compliant Globus-based wide-area cluster system;synchronizing network probes to avoid measurement intrusiveness withthe network weather service;and an evaluation of alternative designs for a grid information service.
I/O intensive applications have posed great challenges to computational scientists. A major problem of these applications is that users have to sacrifice performance requirement in order to satisfy storage capacity re...
详细信息
I/O intensive applications have posed great challenges to computational scientists. A major problem of these applications is that users have to sacrifice performance requirement in order to satisfy storage capacity requirement in a conventional computing environment. Further performance improvement is impeded by the physical nature of these storage media even state-of-the-art I/O optimizations are employed. In this paper, we present a distributed multi-storage resource architecturethat can satisfy bothperformance and capacity requirements by employing multiple storage resources. Compared to traditional single storage resource architecture, our architecture provides a more flexible and reliable computing environment. It can bring new opportunities for highperformancecomputing as well as inherit state-of-the-art I/O optimization approaches that have already been developed. We also develop an Application Programming Interface (API) that provides transparent management and access to various storage resources in our computing environment. As I/O usually dominates the performance in I/O intensive applications, we establish an I/O performance prediction mechanism which consists of a performance database and a prediction algorithm to help users better evaluate and schedule their applications. A tool is also developed to help users automatically generate the performance database. the experiments show that our multi-storage resource architecture is a promising platform for highperformance distributed computing.
Middleware provides inter-operability and transparent location of servers in a heterogeneous distributed environment. A careful design of the middleware software is required however for achieving highperformance. thi...
详细信息
ISBN:
(纸本)9781581131833
Middleware provides inter-operability and transparent location of servers in a heterogeneous distributed environment. A careful design of the middleware software is required however for achieving highperformance. this research proposes an adaptive middleware architecture for CORBA-based systems. the adaptive middleware agent that maps an object name to the object reference has two modes of operations. In the handle-driven mode it returns a reference for the requested object to the client that uses this reference to re-send the request for the desired operation to the server whereas in the forwarding mode it forwards the entire client request to the server. the server upon invocation performs the desired operation and returns the results to the client. An adaptive ORB dynamically switches between these two modes depending on the current system load. Using a commercial middleware product called Orbix-MT we have implemented a skeletal performance prototype for the adaptive ORB. Based on measurements made on a network of workstations and a synthetic workload we observe that the adaptive ORB can produce a substantial benefit in performance in comparison to a pure handle-driven or a pure forwarding ORB. Our measurements provide valuable insights into system behavior and performance.
2-D mesh has proved to be one of the effective and efficient topologies for high-performance massively parallel computing (MPC) systems, and wormhole routing is being used as the switching scheme in most of the MPCs. ...
this paper describes an approach to building a distributed software component system for scientific and engineering applications that is based on representing GRID services as application-level software components. th...
详细信息
this paper describes an approach to building a distributed software component system for scientific and engineering applications that is based on representing GRID services as application-level software components. these GRID services provide tools such as registry and directory services, event services, and remote component creation. While a services-based architecture for Grids and other distributed systems is not new, this framework provides several unique features. First, the public interfaces to each software component are described as XML documents. this allows many adaptors and user interfaces to be generated from the specification dynamically. Second, this system is designed to exploit the resources of existing Grid infrastructures like Globus[7], [15], Legion[17], [7], and commercial Internet frameworks like e-speak[11]. third, and most important, the component-based design extends throughout the system. Hence tools such as application builders which allow users to select components, start them on remote resources, and connect and execute them, are also interchangeable software components. Consequently, it is possible to build distributed applications using a graphical 'drag-and-drop' interface, a web-based interface, a scripting language like Python, or an existing tool such as Matlab.
暂无评论