检索结果-内蒙古大学图书馆

Optimization of VLIW compatibility systems employing dynamic rescheduling

INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING 1997年第2期25卷 83-112页

作者： Conte, TM Sathaye, SW Department of Electrical and Computer Engineering North Carolina State University Raleigh North

Lack of object code compatibility in VLIW architectures is a severe limit to their adoption as a general-purpose computing paradigm. Previous approaches include hardware and software techniques, both of which have drawbacks. Hardware techniques add to the complexity of the architecture, whereas software techniques require multiple executables. this paper presents a technique called Dynamic Rescheduling that applies software techniques dynamically, using intervention by the OS: at each first-time page fault, the page of code is rescheduled for the new generation, if required. Results are presented to demonstrate the viability of the technique using the Illinois IMPACT-compiler and the TINKER architectural framework. For the machine models and the workloads used in this study, performance of the rescheduled code compares well with the native scheduled code for a machine. the behavior of a subset of programs in the workload is such that they face a large number of first-time page faults. Due to this, their rescheduling overhead is higher relative to their total execution time. Such programs are called high-overhead programs. Caching of translated pages across multiple invocations of the program to reduce the rescheduling overhead, using a persistent rescheduled-page cache (PRC)((1)) is dis cussed. It was found that for the workload used in this evaluation, a PRC of size between 512 to 1024 pages, and which uses an overhead-based page replacement policy would be effective in reducing the overhead.

关键词： object-code compatibility dynamic rescheduling instruction level parallelism

来源：评论

学校读者我要写书评

暂无评论

Efficient Synchronization: Let them Eat QOLB /sup1/

Efficient Synchronization: Let Them Eat QOLB /sup1/

引用

Annual International symposium on computer architecture, ISCA

作者： A. Kagi D. Burger J.R. Goodman Computer Sciences Department University of Wisconsin Madison Madison WI USA

来源：评论

学校读者我要写书评

暂无评论

Run-time Adaptive Cache Hierarchy Via Reference Analysis

Run-time Adaptive Cache Hierarchy Via Reference Analysis

引用

Annual International symposium on computer architecture, ISCA

作者： T.L. Johnson Wen-mei W. Hwu Center for Reliable and High-Performance Computing University of Illinois Urbana-Champaign IL USA

来源：评论

学校读者我要写书评

暂无评论

Designing high Bandwidth On-chip Caches

Designing High Bandwidth On-chip Caches

引用

Annual International symposium on computer architecture, ISCA

作者： K.M. Wilson K. Olukotun Computer Systems Laboratory University of Stanford Stanford CA USA

来源：评论

学校读者我要写书评

暂无评论

Algorithmic analysis of multithreaded algorithms 8th

引用

8th Annual International symposium on Algorithms and Computation, ISAAC 1997

作者： Leiserson, Charles E. MIT Laboratory for Computer Science CambridgeMA United States

ISBN: (纸本)3540638903

Cilk is a parallel programming language that allows programmers to write multithreaded parallel programs that use computational resources predictably and efficiently. the Cilk language allows programmers to specify the interactions among computational threads in a high-level fashion, and then Cilk’s runtime system maps the computation onto available physical resources dynamically in a provably efficient fashion. the performance of a Cilk program is mathematically guaranteed to scale up linearly with the number of processors, as long as the application has sufficient parallelism and the architecture sufficient communication bandwidth. Moreover, Cilk is efficient: a parallel Cilk program "scales down" to run on a single processor with nearly the same efficiency as comparable C code, thereby removing a major barrier to parallel programming. Cilk provides a theoretical performance model based on "Brent’s theorem". Using the measures of work, critical-path length, and serial space, a programmer can extrapolate the performance of his program to any number of processors. Moreover, divide-and-conquer parallel algorithms can be analyzed in much the same way one analyzes divide-and-conquer serial algorithms in a college algorithms class. Multithreaded algorithms to solve such problems as matrix multiplication, Cholesky factorization, and sorting can all be analyzed and competing algorithms compared within Cilk’s analytical framework. © Springer-Verlag Berlin Heidelberg 1997.

关键词： Parallel programming

来源：评论

学校读者我要写书评

暂无评论

An architecture workbench for multicomputers

An architecture workbench for multicomputers

引用

International symposium on Parallel Processing

作者： A.D. Pimentel L.O. Hertzberger Department of Computer Science University of Amsterdam Amsterdam Netherlands

the large design space of modern computer architectures calls for performance modelling tools to facilitate the evaluation of different alternatives. In this paper we give an overview of the Mermaid multicomputer simulation environment. this environment allows the evaluation of a wide range of architectural design tradeoffs while delivering reasonable simulation performance. To achieve this, simulation takes place at a level of abstract machine instructions rather than at the level of real instructions. Moreover, a less detailed mode of simulation is also provided. So when accuracy is not the primary objective, this simulation mode can yield high simulation efficiency. As a consequence, Mermaid makes both fast prototyping and accurate evaluation of multicomputer architectures feasible.

关键词： Computational modeling computer architecture computer simulation Concurrent computing Costs computer science Virtual prototyping Memory architecture Prototypes performance loss

来源：评论

学校读者我要写书评

暂无评论

Mapping applications to the RaPiD configurable architecture

Mapping applications to the RaPiD configurable architecture

引用

Annual IEEE symposium on Field-Programmable Custom computing Machines (FCCM)

作者： C. Ebeling D.C. Cronquist P. Franklin J. Secosky S.G. Berg Department of Computer Science and Engineering University of Washington Seattle WA USA

the goal of the RaPiD (Reconfigurable Pipelined Datapath) architecture is to provide high performance configurable computing for a range of computationally-intensive applications that demand special-purpose hardware. this is accomplished by mapping the computation into a deep pipeline using a configurable array of coarse-grained computational units. A key feature of RaPiD is the combination of static and dynamic control. While the underlying computational pipelines are configured statically, a limited amount of dynamic control is provided which greatly increases the range and capability of applications that can be mapped to RaPiD. this paper illustrates this mapping and configuration for several important applications including a FIR filter, 2-D DCT, motion estimation, and parametric curve generation; it also shows how static and dynamic control are used to perform complex computations.

关键词： computer architecture Pipelines Field programmable gate arrays Finite impulse response filter Application software Hardware Discrete cosine transforms Motion estimation high performance computing Circuits

来源：评论

学校读者我要写书评

暂无评论

Garp: a MIPS processor with a reconfigurable coprocessor

Garp: a MIPS processor with a reconfigurable coprocessor

引用

Annual IEEE symposium on Field-Programmable Custom computing Machines (FCCM)

作者： J.R. Hauser J. Wawrzynek University of California Berkeley USA

Typical reconfigurable machines exhibit shortcomings that make them less than ideal for general-purpose computing. the Garp architecture combines reconfigurable hardware with a standard MIPS processor on the same die to retain the better features of both. Novel aspects of the architecture are presented, as well as a prototype software environment and preliminary performance results. Compared to an UltraSPARC, a Garp of similar technology could achieve speedups ranging from a factor of 2 to as high as a factor of 24 for some useful applications.

关键词： Coprocessors Field programmable gate arrays Hardware computer architecture Switches Reconfigurable logic Circuits Software prototyping Software performance Application software

来源：评论

学校读者我要写书评

暂无评论

Fast algorithms for static compaction of sequential circuit test vectors

Fast algorithms for static compaction of sequential circuit ...

引用

VLSI Test symposium

作者： M.S. Nsiao E.M. Rudnick J.H. Patel Center for Reliable & High Performance Comput. Illinois Univ. Urbana IL USA Center for Reliable and High Performance Computing University of Illinois Urbana-Champaign IL USA

Two fast algorithms for static test sequence compaction are proposed for sequential circuits. the algorithms are based on the observation that test sequences traverse through a small set of states, and some states are frequently re-visited throughout the application of a test set. Subsequences that start and end on the same states may be removed if necessary and sufficient conditions are met for them. the techniques require only two fault simulation passes and are applied to test sequences generated by various test generators, resulting in significant compactions very quickly for circuits that have many revisited states.

关键词： Circuit testing Compaction Sequential circuits Sequential analysis Circuit faults Fault detection Contracts Electrical fault detection Sufficient conditions Circuit simulation

来源：评论

学校读者我要写书评

暂无评论

architecture and performance of the Hitachi SR2201 massively parallel processor system

Architecture and performance of the Hitachi SR2201 massively...

引用

International symposium on Parallel Processing

作者： H. Fujii Y. Yasuda H. Akashi Y. Inagami M. Koga O. Ishihara M. Kashiyama H. Wada T. Sumimoto Central Research Laboratory Hitachi and Limited Tokyo Japan General Purpose Computer Division Hitachi and Limited Hadano Kanagawa Japan

RISC-based Massively Parallel Processors (MPPs) often show low efficiency in real-world applications because of cache miss penalty, insufficient throughput of the memory system, and poor inter-processor communication performance. Hitachi's SR2201, an MPP scalable up to 2048 processors and 600 GFLOPS peak performance, overcomes these problems by introducing three novel features. First, its processor the 150 MHz HARP-IE, solves the cache miss penalty by "pseudo vector processing" (PVP). In PVP, data is loaded by prefetching to a special register bank, bypassing the cache. Second, a multi-bank memory architecture that operates like a pipeline eliminates the memory system bottleneck. third, the inter-processor communication achieves high performance on the three-dimensional crossbar network, using a "remote DMA transfer" protocol and a hardware-based cache coherency. As the result of these improvements, the SR2201 achieved 220.4 GFLOPS with 1024 processors in the LINPACK benchmark, which is almost 72% of the peak performance.

关键词： Application software throughput Reduced instruction set computing Degradation Cache memory computer architecture Software performance Laboratories Concurrent computing Prefetching

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：