the electromagnetic environment (EME) produces (is a form of) electrical energy of the same type that is used by electrical/electronic equipment to process and transfer information. As such, this environment represent...
详细信息
the electromagnetic environment (EME) produces (is a form of) electrical energy of the same type that is used by electrical/electronic equipment to process and transfer information. As such, this environment represents a fundamental threat to the proper operation of systems that depend on such equipment. For electrical/electronic systems providing functions that can affect the safe flight and landing of an aircraft (level A systems), the EME threat translates to a threat to the airplane itself. When protection against EME effects is being developed, architectural techniques should be applied, particularly to achieve the high margin of safety needed for level A electrical/electronic systems. the computing platform for the aircraft Information Management System (AIMS) used on the Boeing 777 aircraft and Versatile Integrated Avionics (VIA) technology is an example of the application of an architectural philosophy in the design of the digital engine for such aircraft systems. Another is a prototype computing platform for rapid recovery from "soft faults" (upset, momentary interference, etc.).
this paper presents HAL's Mercury Interconnect architecture, an interconnect infrastructure designed to link commodity microprocessors, memory, and I/O components into high-performance multiprocessing servers. Bot...
ISBN:
(纸本)9780897919012
this paper presents HAL's Mercury Interconnect architecture, an interconnect infrastructure designed to link commodity microprocessors, memory, and I/O components into high-performance multiprocessing servers. Both shared-memory and message-passing systems, as well as hybrid systems are supported by the interconnect. the key attributes of the Mercury Interconnect architecture are: low latency, high bandwidth, a modular and flexible design, reliability/availability/serviceability (RAS) features, and a simplicity that enables very cost-effective implementations. the first implementation of the architecture links multiple 4-processor Pentium™ Pro based nodes. In a 4-node (16-processor) shared-memory configuration, this system achieves a remote read latency of just over 1 µs, and a maximum interconnect bandwidth of 6.4 GByte/s. Both of these parameters far outpace comparable SCI-based solutions, while utilizing much fewer hardware components.
We present the Data Parallel Fortran (DPF) benchmark suite, a set of data parallel Fortran codes for evaluating data parallel compilers appropriate for any target parallel architecture, with shared or distributed memo...
详细信息
We present the Data Parallel Fortran (DPF) benchmark suite, a set of data parallel Fortran codes for evaluating data parallel compilers appropriate for any target parallel architecture, with shared or distributed memory. the codes are provided in basic, optimized and several library versions. the functionality of the benchmarks cover collective communication functions, scientific software library functions, and application kernels that reflect the computational structure and communication patterns in fluid dynamic simulations, fundamental physics and molecular studies in chemistry or biology. the DPF benchmark suite assumes the language model of highperformance Fortran, and provides performance evaluation metrics of busy and elapsed times and FLOP rates, FLOP count, memory usage, communication patterns, focal memory access, and arithmetic efficiency as well as operation and communication counts per iteration. An instance of the benchmark suite was fully implemented in CM-Fortran and tested on the CM-5.
We present a customizable simulator called netsim for highperformance point to point workstation networks that is accurate enough to be used for application level performance analysis, yet is easy enough to customize...
详细信息
We present a customizable simulator called netsim for highperformance point to point workstation networks that is accurate enough to be used for application level performance analysis, yet is easy enough to customize for multiple architectures and software configurations. Customization is accomplished without using any proprietary information, using only publicly available hardware specifications and information that can be readily determined using a suite of test programs. We customized netsim for two platforms: a 16 node IBM SP-2 with a multistage network and a 10 node DEC Alpha Farm with an ATM switch. We show that netsim successfully models these two architectures with a 2-6% error on the SP-2 and less than 10% error on the Alpha Farm for most test cases. It achieves this accuracy at the cost of a 7-36 fold simulation slowdown with respect to the SP-2 and a 3-8 fold slowdown with respect to the Alpha Farm.
File system designers today face a dilemma. A log-structured file system (LFS) can offer superior performance for many common workloads such as those with frequent small writes, read traffic that is predominantly abso...
详细信息
File system designers today face a dilemma. A log-structured file system (LFS) can offer superior performance for many common workloads such as those with frequent small writes, read traffic that is predominantly absorbed by the cache, and sufficient idle time to clean the log. However, an LFS has poor performance for other workloads, such as random updates to a full disk with little idle time to clean. In this paper, we show how adaptive algorithms can be used to enable LFS to provide highperformance across a wider range of workloads. First, we show how to improve LFS write performance in three ways: by choosing the segment size to match disk and workload characteristics, by modifying the LFS cleaning policy to adapt to changes in disk utilization, and by using cached data to lower cleaning costs. Second, we show how to improve LFS read performance by reorganizing data to match read patterns. Using trace-driven simulations on a combination of synthetic and measured workloads, we demonstrate that these extensions to LFS can significantly improve its performance.
the effectiveness of texture mapping in enhancing the realism of computer generated imagery has made support for real-time texture mapping a critical part of graphics pipelines. Despite a recent surge in interest in t...
ISBN:
(纸本)9780897919012
the effectiveness of texture mapping in enhancing the realism of computer generated imagery has made support for real-time texture mapping a critical part of graphics pipelines. Despite a recent surge in interest in three-dimensional graphics from computer architects, high-quality high-speed texture mapping has so far been confined to costly hardware systems that use brute-force techniques to achieve highperformance. One obstacle faced by designers of texture mapping systems is the requirement of extremely high bandwidth to texture memory. high bandwidth is necessary since there are typically tens to hundreds of millions of accesses to texture memory per second. In addition, to achieve the high clock rates required in graphics pipelines, low-latency access to texture memory is needed. In this paper, we propose the use of texture image caches to alleviate the above bottlenecks, and evaluate various tradeoffs that arise in such *** find that the factors important to cache behavior are (i) the representation of texture images in memory, (ii) the rasterization order on screen and (iii) the cache organization. through a detailed investigation of these issues, we explore the best way to exploit locality of reference and determine whether this technique is robust with respect to different scenes and different amounts of texture. Overall, we observe that there is a significant amount of temporal and spatial locality and that the working set sizes are relatively small (at most 16KB) across all cases that we studied. Consequently, the memory bandwidth requirements of a texture cache system are substantially lower (at least three times and as much as fifteen times) than the memory bandwidth requirements of a system which achieves equivalent performance but does not utilize a cache. these results are very encouraging and indicate that caching is a promising approach to designing memory systems for texture mapping.
this paper describes the DIGITAL Continuous Profiling Infrastructure, a sampling-based profiling system designed to run continuously on production systems. the system supports multiprocessors, works on unmodified exec...
详细信息
this paper describes the DIGITAL Continuous Profiling Infrastructure, a sampling-based profiling system designed to run continuously on production systems. the system supports multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel. Samples are collected at a high rate (over 5200 samples/sec per 333-MHz processor), yet with low overhead (1-3% slowdown for most workloads). Analysis tools supplied withthe profiling system use the sample data to produce an accurate accounting, down to the level of pipeline stalls incurred by individual instructions, of where time is being spent. When instructions incur stalls, the tools identify possible reasons, such as cache misses, branch mispredictions, and functional unit contention. the fine-grained instruction-level analysis guides users and automated optimizers to the causes of performance problems and provides important insights for fixing them.
We claim in this paper that both remote process creation and process migration are efficient mechanisms to be used in the improvement or development of highperformancecomputer systems. In particular, we demonstrate ...
详细信息
ISBN:
(纸本)0818673982
We claim in this paper that both remote process creation and process migration are efficient mechanisms to be used in the improvement or development of highperformancecomputer systems. In particular, we demonstrate that the claims made by some researchers that process migration is too heavy to be used to support dynamic load balancing are unsubstantiated. We support our claim by presenting these two mechanisms available in the RHODOS distributed operating system, comparing and contrasting these mechanisms and reporting on their performance.
We discuss here the emergent Web based distributed environments for HPCC on the NII withthe focus on Java as an enabling technology. We start with a review of the past, presence and the near term future of the 'J...
详细信息
ISBN:
(纸本)0818675829
We discuss here the emergent Web based distributed environments for HPCC on the NII withthe focus on Java as an enabling technology. We start with a review of the past, presence and the near term future of the 'Java phenomenon', exposed here in the background of some related previous approaches towards a distributed interpretative virtual machine architecture.
Achieving 100 TeraOps performance within a ten-year horizon will require massively-parallel architectures that exploit both commodity software and hardware technology for cost efficiency. Increasing clock rates and sy...
详细信息
ISBN:
(纸本)0818675519
Achieving 100 TeraOps performance within a ten-year horizon will require massively-parallel architectures that exploit both commodity software and hardware technology for cost efficiency. Increasing clock rates and system diameter in clock periods will make efficient management of communication and coordination increasingly critical. Configurable logic presents a unique opportunity to customize bindings, mechanisms, and policies which comprise the interaction of processing, memory, I/O and communication resources. this programming flexibility, or customizability, can provide the key to achieving robust highperformance. the MultiprocessOr with Reconfigurable Parallel Hardware (MORPH) uses reconfigurable logic blocks integrated withthe system core to control policies, interactions, and interconnections. this integrated configurability can improve the performance of local memory hierarchy, increase the efficiency of interprocessor coordination, or better utilize the network bisection of the machine. MORPH provides a framework for exploring such integrated application-specific customizability. Rather than complicate the situation, MORPH's configurability supports component software and interoperabililty frameworks, allowing direct support for application-specified patterns, objects, and structures. this paper reports the motivation and initial design of the MORPH system.
暂无评论