Scalability of applications on distributed shared-memory (DSM) multiprocessors is limited by communication overheads. At some point, using more processors to increase parallelism yields diminishing returns or even deg...
详细信息
ISBN:
(纸本)0769518710
Scalability of applications on distributed shared-memory (DSM) multiprocessors is limited by communication overheads. At some point, using more processors to increase parallelism yields diminishing returns or even degrades performance. When increasing concurrency is futile, we propose an additional mode of execution, called slipstream mode, that instead enlists extra processors to assist parallel tasks by reducing perceived overheads. We consider DSM multiprocessors built from dual-processor chip multiprocessor (CMP) nodes with shared L2 cache. A task is allocated on one processor of each CMP node. The other processor of each node executes a reduced version of the same task. The reduced version skips shared-memory stores and synchronization, running ahead of the true task. Even with the skipped operations, the reduced task makes accurate forward progress and generates an accurate reference stream, because branches and addresses depend primarily on private data. Slipstream execution mode yields two benefits. First, the reduced task prefetches data on behalf of the true task. second, reduced tasks provide a detailed picture of future reference behavior, enabling a number of optimizations aimed at accelerating coherence events, e.g., self-invalidation. For multiprocessor systems with up to 16 CMP nodes, slipstream, mode outperforms running one or two conventional tasks per CMP in 7 out of 9 parallel scientific benchmarks. Slipstream mode is 12-19% faster with prefetching only and up to 29% faster with self-invalidation enabled.
A highly dependable embedded fault-tolerant memory architecture for high performance massively parallel computing applications and its dependability assurance techniques are proposed and discussed in this paper. The p...
详细信息
ISBN:
(纸本)0769519385
A highly dependable embedded fault-tolerant memory architecture for high performance massively parallel computing applications and its dependability assurance techniques are proposed and discussed in this paper. The proposed fault tolerant memory provides two distinctive repair mechanisms: the permanent laser redundancy reconfiguration during the wafer probe stage in the factory to enhance its manufacturing yield and the dynamic BIST/BISD/BISR (built-in-self-test-diagnosis-repair)-based reconfiguration of the redundant resources in field to maintain high field reliability. The system reliability which is mainly determined by hardware configuration demanded by software and field reconfiguration/repair utilizing unused processor and memory modules is referred to as HW/SW Co-reliability. Various system configuration options in terms of parallel processing unit size and processor/memory intensity are also introduced and their HW/SW Co-reliability characteristics are discussed. A modeling and assurance technique for HW/SW Co-reliability with emphasis on the dependability assurance techniques based on combinatorial modeling suitable for the proposed memory design is developed and validated by extensive parametric simulations. Thereby, design and implementation of memory-reliability-optimized and highly reliable fault-tolerant field reconfigurable massively parallel computing systems can be achieved.
Recovering from node failures is a critical issue in distributed database systems. In conventional log-based recovery protocols, the nodes providing recovery service may be overburdened, especially when the recovery i...
详细信息
ISBN:
(纸本)0769520693
Recovering from node failures is a critical issue in distributed database systems. In conventional log-based recovery protocols, the nodes providing recovery service may be overburdened, especially when the recovery is resource consuming. In this paper, an agent-based dynamic recovery protocol is presented. It divides the whole recovery process into three major steps: log-recovery, agent-recovery, and synchronization. The key idea of this protocol is to cache new database operations initiated during recovery into agents. All these cached operations are then replayed independently for further recovery. The analysis indicates that the new protocol can minimize internode's dependency and improve recovery speed As a result, system failure rate is cut down and the overall performance gets improved.
The execution environments for scientific applications have evolved significantly over the years. Vector and parallel architectures have provided significantly faster computations. Cluster computers have reduced the c...
详细信息
Java-based middleware, and application servers, in particular are rapidly, gaining importance as a new cl ass of workload for commercial multiprocessor servers. SPEC has recognized this trend with its adoption of SPEC...
详细信息
ISBN:
(纸本)0769518710
Java-based middleware, and application servers, in particular are rapidly, gaining importance as a new cl ass of workload for commercial multiprocessor servers. SPEC has recognized this trend with its adoption of SPECjbb2000 and the new SPECjAppServer2001 (ECperf) as standard benchmarks. Middleware, by definition, connects other tiers of server software. SPECjbb is a simple benchmark that combines middleware services, a simple database server and client drivers into a single Java *** more closely models commercial middleware by, using a commercial application server and separate machines for the different tiers. Because it is a distributed benchmark, ECperf provides an opportunity, for architects to isolate the behavior of middleware. In this paper we present a detailed characterization of the memory, system behavior of ECperf and SPECjbb using both commercial server hardware and Simics full-system simulation. We find that the memory, footprint and primary, working sets,of these workloads are small compared to other commercial workloads (e.g., on-line transaction processing), and that a large fraction of the working sets are shared between processors. We observed two key, differences between ECperf and SPECjbb that highlight the importance of isolating the behavior of the middle tier First, ECperf has a larger instruction footprint, resulting in much higher miss rates for intermediate-size instruction caches. second, SPECjbb's data set size increases linearly,, as the benchmark scales. up, while ECperf's remains roughly constant. This difference can lead to opposite conclusions on the design of multiprocessor memory, systems, such as the utility of moderate sized (i.e., 1 MB) shared caches in a chip multiprocessor.
暂无评论