Write-buffers have a significant impact on performance, especially in wide-issue superscalar systems with write-through caching. We develop fast efficient simulation methods for evaluating multiple write-buffer config...
详细信息
Write-buffers have a significant impact on performance, especially in wide-issue superscalar systems with write-through caching. We develop fast efficient simulation methods for evaluating multiple write-buffer configurations together in a single-pass. Our results are also applicable for the simulation of other buffer structures. We first consider simulating non-coalescing write-buffers. We show that a particular buffer stalls only when smaller buffers do, and develop an algorithm where only the smallest buffer is explicitly simulated, and the stales of others are updated only as smaller buffers stall. Empirical performance comparisons show a speedup of up to 7.4 over simpler methods. We then extend this algorithm to simulate multiple coalescing write buffers, where we demonstrate up to a factor of 3.5 speedup. Finally, we demonstrate the impact that write-buffers have on CPI by presenting write-buffer simulation results on four SPEC benchmarks.< >
A system is structurally fault-tolerant (SFT) if it preserves a fault-free subsystem of a pre-determined interconnection structure when faults appear. We present a systematic approach to designing SFT VLSI-based syste...
详细信息
ISBN:
(纸本)0818656107
A system is structurally fault-tolerant (SFT) if it preserves a fault-free subsystem of a pre-determined interconnection structure when faults appear. We present a systematic approach to designing SFT VLSI-based systems that use shared buses as the main communication mechanism. To represent the target systems, we introduce a processor-bus-link (PBL) graph in which processing elements (PEs) and buses are both modeled as nodes. PE and bus faults correspond to the removal of nodes from the PBL graph. The node covering concept and the minimum-weight spanning arborescence algorithm are then applied to the design of SFT systems that can tolerate both PE and bus faults. The designs obtained have fewer spare communication ports than prior designs, no critical single point of failure, and simple circuitry for reconfiguration.< >
In this paper we apply a recently formulated general timing model of synchronous operation to the special case of latch-controlled pipelined circuits. The model accounts for multiphase synchronous clocking, correctly ...
详细信息
In this paper we apply a recently formulated general timing model of synchronous operation to the special case of latch-controlled pipelined circuits. The model accounts for multiphase synchronous clocking, correctly captures the behavior of level-sensitive latches, handles both short- and long-path delays, accommodates wave pipelining, and leads to a comprehensive set of timing constraints. Pipeline circuits are important because of their frequent use in computer systems. We define their concurrency as a function of the clock schedule and degree of wave pipelining. We then identify a special class of clock schedules, coincident multiphase clocks, which provide a lower bound on the value of the optimum cycle time. We show that the region of feasible solutions for single-phase clocking can be nonconvex or even disjoint, and derive a closed-form expression for the minimum cycle time of a restricted but practical form of single-phase clocking. We compare these forms of clocking on three pipeline examples and highlight some of the issues in pipeline synchronization.
Communication has a dominant impact on the performance of massively parallel processors (MPPs). We propose a methodology to evaluate the internode communication performance of MPPs using a controlled set of synthetic ...
详细信息
Cache miss characterization models such as the three Cs model are useful in developing schemes to reduce cache misses and their penalty. In this paper we propose the OPT model that uses cache simulation under optimal ...
详细信息
We give tight bounds on the parallel complexity of some problems involving random graphs. Specifically, we show that a Hamiltonian cycle, a breadth first spanning tree, and a maximal matching can all be constructed in...
详细信息
The PDAS (Processor Design Automation System) is a new approach to design automation that uses formal methods to achieve a new level of design power and the ability to formally validate designs. The idea is to develop...
详细信息
The PDAS (Processor Design Automation System) is a new approach to design automation that uses formal methods to achieve a new level of design power and the ability to formally validate designs. The idea is to develop a design automation system which considers both microprocessor hardware design and design of the corresponding language compiler concurrently. Benchmark programs are used to motivate design decisions and optimize performance. Compiler optimizations are considered during the design of hardware. The system spans language design, compiler design, instruction set design, microarchitecture, and VLSI implementation.< >
One major problem in pipeline synthesis is the detection and resolution of pipeline hazards. We present a new solution to the problem in the domain of pipelined application-specific instruction set processors, based o...
详细信息
ISBN:
(纸本)9780818644900
One major problem in pipeline synthesis is the detection and resolution of pipeline hazards. We present a new solution to the problem in the domain of pipelined application-specific instruction set processors, based on hardware/software concurrent engineering approach. An extended taxonomy of inter-instruction dependencies is proposed for the analysis of pipeline hazards. Hardware/software resolution candidates are then associated with these dependencies. Algorithms using the taxonomy and the resolutions are developed to detect and resolve pipeline hazards, and to explore the hardware and software design space. Application benchmarks are used to evaluate the designs and guide the design decision. The power of these tools are demonstrated through the pipeline synthesis of two processors including industrial one. Compared with other approaches, our method achieves higher throughput, and provides a way to explore the hardware/software tradeoff. Our method can be combined with current approaches to achieve even higher performance since they are orthogonal.
Aliasing, which is the mapping of a faulty circuit's signature onto the fault-free signature, is a major problem in signature analysis. The authors present a new design technique (ALFRED) for zero aliasing based o...
详细信息
Aliasing, which is the mapping of a faulty circuit's signature onto the fault-free signature, is a major problem in signature analysis. The authors present a new design technique (ALFRED) for zero aliasing based on the concept of sequence detection. For a test sequence of length n, the length of the signature in ALFRED is Theta (log n). The authors reduce the circuit complexity by adopting a shift-register-like structure that minimizes the logical dependencies of all but one of the flip-flops. They relate the theory of balanced functions to ALFRED, and demonstrate the feasibility of the approach by using it to design a signature analyzer for a carry-lookahead adder.< >
暂无评论