检索结果-内蒙古大学图书馆

Annual Hawaii International Conference on System Sciences (HICSS)

作者： S.G. Abraham R.A. Sugumar Advanced Computer Architecture Laboratory EECS Department University of Michigan Ann Arbor MI USA Cray Research Inc. Chippewa Falls WI USA

Write-buffers have a significant impact on performance, especially in wide-issue superscalar systems with write-through caching. We develop fast efficient simulation methods for evaluating multiple write-buffer configurations together in a single-pass. Our results are also applicable for the simulation of other buffer structures. We first consider simulating non-coalescing write-buffers. We show that a particular buffer stalls only when smaller buffers do, and develop an algorithm where only the smallest buffer is explicitly simulated, and the stales of others are updated only as smaller buffers stall. Empirical performance comparisons show a speedup of up to 7.4 over simpler methods. We then extend this algorithm to simulate multiple coalescing write buffers, where we demonstrate up to a factor of 3.5 speedup. Finally, we demonstrate the impact that write-buffers have on CPI by presenting write-buffer simulation results on four SPEC benchmarks.< >

关键词： Computational modeling Delay Traffic control computer architecture Laboratories Algorithm design and analysis Contracts Buffer storage Conference management Prototypes

来源：评论

学校读者我要写书评

暂无评论

Structural fault tolerance in VLSI-based systems

Structural fault tolerance in VLSI-based systems

引用

Great Lakes Symposium on VLSI

作者： Hung-Kuei-Ku J.P. Hayes Dept. of Electr. Eng. & Comput. Sci. Michigan Univ. Ann Arbor MI USA Advanced Computer Architecture Laboratory Department of Electrical Engineering and Computer Science University of Michigan Ann Arbor MI USA

ISBN: (纸本)0818656107

A system is structurally fault-tolerant (SFT) if it preserves a fault-free subsystem of a pre-determined interconnection structure when faults appear. We present a systematic approach to designing SFT VLSI-based systems that use shared buses as the main communication mechanism. To represent the target systems, we introduce a processor-bus-link (PBL) graph in which processing elements (PEs) and buses are both modeled as nodes. PE and bus faults correspond to the removal of nodes from the PBL graph. The node covering concept and the minimum-weight spanning arborescence algorithm are then applied to the design of SFT systems that can tolerate both PE and bus faults. The designs obtained have fewer spare communication ports than prior designs, no critical single point of failure, and simple circuitry for reconfiguration.< >

关键词： Fault tolerant systems Circuit faults Integrated circuit interconnections Multiplexing Very large scale integration Reconfigurable logic computer architecture Laboratories Algorithm design and analysis Design methodology

来源：评论

学校读者我要写书评

暂无评论

SYNCHRONIZATION OF PIPELINES

引用

IEEE TRANSACTIONS ON computer-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 1993年第8期12卷 1132-1146页

作者： SAKALLAH, KA MUDGE, TN BURKS, TM DAVIDSON, ES Advanced Computer Architecture Laboratory Department of Electrical Engineering and Computer Science University of Michigan Ann Arbor MI USA

In this paper we apply a recently formulated general timing model of synchronous operation to the special case of latch-controlled pipelined circuits. The model accounts for multiphase synchronous clocking, correctly captures the behavior of level-sensitive latches, handles both short- and long-path delays, accommodates wave pipelining, and leads to a comprehensive set of timing constraints. Pipeline circuits are important because of their frequent use in computer systems. We define their concurrency as a function of the clock schedule and degree of wave pipelining. We then identify a special class of clock schedules, coincident multiphase clocks, which provide a lower bound on the value of the optimum cycle time. We show that the region of feasible solutions for single-phase clocking can be nonconvex or even disjoint, and derive a closed-form expression for the minimum cycle time of a restricted but practical form of single-phase clocking. We compare these forms of clocking on three pipeline examples and highlight some of the issues in pipeline synchronization.

关键词： Clocks Pipeline processing Synchronization Circuits Latches Timing Concurrent computing Propagation delay Processor scheduling Space vector pulse width modulation

来源：评论

学校读者我要写书评

暂无评论

Evaluating the communication performance of MPPs using synthetic sparse matrix multiplication workloads 93

Evaluating the communication performance of MPPs using synth...

引用

7th International Conference on Supercomputing, ICS 1993

作者： Boyd, Eric L. Wellman, John-David Abraham, Santosh G. Davidson, Edward S. Advanced Computer Architecture Laboratory Department of Electrical Engineering and Computer Science University of Michigan United States

ISBN: (纸本)089791600X

Communication has a dominant impact on the performance of massively parallel processors (MPPs). We propose a methodology to evaluate the internode communication performance of MPPs using a controlled set of synthetic workloads. By generating a range of sparse matrices and measuring the performance of a simple parallel algorithm that repeatedly multiplies a sparse matrix by a dense vector, we can determine the relative performance of different communication workloads. Specifiable communication parameters include the number of nodes, the average amount of communication per node, the degree of sharing among the nodes, and the computation-communication ratio. We describe a general procedure for constructing sparse matrices that have these desired communication and computation parameters, and apply a range of these synthetic workloads to evaluate the hierarchical ring interconnection and cache-only memory architecture (COMA) of the Kendall Square Research KSR1 MPP. This analysis discusses the impact of the KSR1 architecture on communication performance, highlighting the utility and impact of the automatic update feature. It also investigates the impact of system contention on the performance, particularly how it causes potential updates to be ignored. © 1993 ACM.

关键词： Memory architecture

来源：评论

学校读者我要写书评

暂无评论

Efficient simulation of caches under optimal replacement with applications to miss characterization 93

Efficient simulation of caches under optimal replacement wit...

引用

1993 ACM SIGMETRICS Conference on Measurement and Modeling of computer Systems, SIGMETRICS 1993

作者： Sugumar, Rabin A. Abraham, Santosh G. Advanced Computer Architecture Laboratory Department of Electrical Engineering and Computer Science University of Michigan Ann ArborMI48109-2122 United States

ISBN: (纸本)0897915801

Cache miss characterization models such as the three Cs model are useful in developing schemes to reduce cache misses and their penalty. In this paper we propose the OPT model that uses cache simulation under optimal (OPT) replacement to obtain a finer and more accurate characterization of misses than the three Cs model. However, current methods for optimal cache simulation are slow and difficult to use. We present three new techniques for optimal cache simulation. First, we propose a limited lookahead strategy with error fixing, which allows one pass simulation of multiple optimal caches. Second, we propose a scheme to group entries in the OPT stack, which allows efficient tree-based fully-associative cache simulation under OPT. Third, we propose a scheme for exploiting partial inclusion in set-associative cache simulation under OPT. Simulators based on these algorithms were used to obtain cache miss characterizations using the OPT model for nine SPEC benchmarks. The results indicate that miss ratios under OPT are substantially lower than those under LRU replacement, by up to 70% in fully-associative caches, and up to 32% in two-way set-associative caches. © 1993 ACM.

关键词： Cache memory

来源：评论

学校读者我要写书评

暂无评论

Optimal parallel construction of Hamiltonian cycles and spanning trees in random graphs 93

Optimal parallel construction of Hamiltonian cycles and span...

引用

5th Annual ACM Symposium on Parallel Algorithms and architectures, SPAA 1993

作者： MacKenzie, Philip D. Stout, Quentin F. Advanced Computer Architecture Laboratory Department of Electrical Engineering and Computer Science University of Michigan Ann ArborMI48109-2122 United States

ISBN: (纸本)0897915992

We give tight bounds on the parallel complexity of some problems involving random graphs. Specifically, we show that a Hamiltonian cycle, a breadth first spanning tree, and a maximal matching can all be constructed in (logn) expected time using n/lognprocessors on the CRCW PRAM. This is a substantial improvement over the best previous algorithms, which required ((log log n)2) time and nlog2n processors. We then introduce a technique which allows us to prove that constructing an edge cover of a random graph from its adjacency matrix requires (logn) expected time on a CRCW PRAM with O(n) processors. Constructing an edge cover is implicit in constructing a spanning tree, a Hamiltonian cycle, and a maximal matching, so this lower bound holds for all these problems, showing that our algorithms are optimal. This new lower bound technique is one of the very few lower bound techniques known which apply to randomized CRCW PRAM algorithms, and it provides the first nontrivial parallel lower bounds for these problems. © 1993 ACM.

关键词： Hamiltonians

来源：评论

学校读者我要写书评

暂无评论

Hierarchical Performance Modeling With MACS: A Case Study Of The Convex C-240

Hierarchical Performance Modeling With MACS: A Case Study Of...

引用

Annual International Symposium on computer architecture, ISCA

作者： E.L. Boyd E.S. Davidson Advanced Computer Architecture Laboratory Department of Electrical Engineering and Computer Science University of Michigan USA

来源：评论

学校读者我要写书评

暂无评论

PDAS: Processor design automation system

PDAS: Processor design automation system

引用

European Design Automation Conference

作者： I. Pyo A.M. Despain Advanced Computer Architecture Laboratory Department of Electrical Engineering-Systems University of Southern California USA

The PDAS (Processor Design Automation System) is a new approach to design automation that uses formal methods to achieve a new level of design power and the ability to formally validate designs. The idea is to develop a design automation system which considers both microprocessor hardware design and design of the corresponding language compiler concurrently. Benchmark programs are used to motivate design decisions and optimize performance. Compiler optimizations are considered during the design of hardware. The system spans language design, compiler design, instruction set design, microarchitecture, and VLSI implementation.< >

关键词： Personal digital assistants Design automation Natural languages Specification languages Hardware Design optimization Process design Formal languages Concrete computer architecture

来源：评论

学校读者我要写书评

暂无评论

Hardware/software resolution of pipeline hazards in pipeline synthesis of instruction set processors 93

Hardware/software resolution of pipeline hazards in pipeline...

引用

IEEE International Conference on computer-Aided Design

作者： I.-J. Huang A.M. Despain Advanced Computer Architecture Laboratory Department of Electrical Engineering-Systems University of Southern California USA

ISBN: (纸本)9780818644900

One major problem in pipeline synthesis is the detection and resolution of pipeline hazards. We present a new solution to the problem in the domain of pipelined application-specific instruction set processors, based on hardware/software concurrent engineering approach. An extended taxonomy of inter-instruction dependencies is proposed for the analysis of pipeline hazards. Hardware/software resolution candidates are then associated with these dependencies. Algorithms using the taxonomy and the resolutions are developed to detect and resolve pipeline hazards, and to explore the hardware and software design space. Application benchmarks are used to evaluate the designs and guide the design decision. The power of these tools are demonstrated through the pipeline synthesis of two processors including industrial one. Compared with other approaches, our method achieves higher throughput, and provides a way to explore the hardware/software tradeoff. Our method can be combined with current approaches to achieve even higher performance since they are orthogonal.

关键词： Hardware Hazards Pipeline processing Taxonomy Delay Application software Throughput Space exploration computer architecture Laboratories

来源：评论

学校读者我要写书评

暂无评论

Aliasing-free error detection (ALFRED)

Aliasing-free error detection (ALFRED)

引用

VLSI Test Symposium

作者： K. Chakrabarty J.P. Hayes Advanced Computer Architecture Laboratory Department of Electrical Engineering and Computer Science University of Michigan Ann Arbor MI USA

Aliasing, which is the mapping of a faulty circuit's signature onto the fault-free signature, is a major problem in signature analysis. The authors present a new design technique (ALFRED) for zero aliasing based on the concept of sequence detection. For a test sequence of length n, the length of the signature in ALFRED is Theta (log n). The authors reduce the circuit complexity by adopting a shift-register-like structure that minimizes the logical dependencies of all but one of the flip-flops. They relate the theory of balanced functions to ALFRED, and demonstrate the feasibility of the approach by using it to design a signature analyzer for a carry-lookahead adder.< >

关键词： Circuit testing Circuit faults Polynomials Flip-flops Automatic testing Hardware computer errors computer architecture Laboratories Complexity theory

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：