检索结果-内蒙古大学图书馆

IEEE International Symposium on Circuits and systems (ISCAS)

作者： L. Cai D. Gajski M. Olivarez Department of Information and Computer Science University of California Irvine Irvine CA USA Motorola Architecture and Systems Laboratory Austin TX USA

ISBN: (纸本)0780366859

To implement chip design on satisfactory target architectures, architecture exploration should be done at higher levels of abstraction, in the earliest design stages. Using the SpecC language, an executable system level design language, system level architecture exploration can proceed easily and smoothly as the system specification is being created. A SpecC methodology of system level architecture exploration is introduced within this paper to illustrate this process. The design of a JPEG encoder is used as an example to illustrate the system level architecture exploration methodology.

关键词： computer architecture System-level design Process design Specification languages Design methodology Transform coding computer science Chip scale packaging Time to market System testing

来源：评论

学校读者我要写书评

暂无评论

Exploiting Locality in Single Assignment Data Structures Updated Through Split-Phase Transactions

引用

Cluster Computing 2001年第4期4卷 281-293页

作者： Amaral, José Nelson Lin, Wen-Yen Gaudiot, Jean-Luc Gao, Guang R. Department of Computing Science University of Alberta Edmonton Canada Tia Mobile Inc. Pasadena USA Department of Electrical Engineering University of Southern California Los Angeles USA Computer Architecture and Parallel Systems Laboratory Department of Electrical and Computer Engineering University of Delaware Newark USA

We present the design, implementation, and evaluation of single assignment data structures and of a software controlled cache in an existing multi-threaded architecture platform – the Efficient architecture for Running Threads (EARTH). The I-Structure Software-Controlled Cache (ISSC) exploits temporal and spatial locality of EARTH split-phased memory transactions for single-assignment memory references. Our experimental evaluation indicates that the caching mechanism for single-assignment storage makes the EARTH memory system more robust to variations in the latency of memory operations. As a consequence the system can be ported to a wider range of machine platforms and deliver speedup for both regular and irregular application.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Design and implementation of an efficient thread partitioning algorithm 3rd

Design and implementation of an efficient thread partitionin...

引用

3rd International Symposium on High Performance Computing, ISHPC 2000

作者： Amaral, José Nelson Gao, Guang Kocalar, Erturk Dogan O'Neill, Patrick Tang, Xinan Computer Architecture and Parallel Systems Laboratory University of Delaware NewarkDE United States Dep. of Comp. Science Univ. of Alberta Canada

ISBN: (纸本)9783540411284

The development of fine-grain multi-threaded program ex-ecution models has created an interesting challenge: how to partition a program into threads that can exploit machine parallelism, achieve latency tolerance, and maintain reasonable locality of reference? A suc-cessful algorithm must produce a thread partition that best utilizes mul-tiple execution units on a single processing node and handles long and unpredictable latencies. In this paper, we introduce a new thread partitioning algorithm that can meet the above challenge for a range of machine architecture models. A quantitative aFFInity heuristic is introduced to guide the placement of operations into threads. This heuristic addresses the trade-off between exploiting parallelism and preserving locality. The algorithm is surpris-ingly simple due to the use of a time-ordered event list to account for the multiple execution unit activities. We have implemented the proposed al-gorithm and our experiments, performed on a wide range of examples, have demonstrated its eFFIciency and effectiveness. © Springer-Verlag Berlin Heidelberg 2000.

关键词： Economic and social effects

来源：评论

学校读者我要写书评

暂无评论

A transparent runtime data distribution engine for OpenMP

引用

Scientific Programming 2000年第3期8卷 143-162页

作者： Nikolopoulos, D.S. Papatheodorou, T.S. Polychronopoulos, C.D. Labarta, J. Ayguade, E. Computer and Systems Research Laboratory University of Illinois at Urbana-Champaign 1308 West Main Street Urbana IL 61801 United States Department of Computer Engineering and Informatics University of Patras GR26500 Patras Greece Department of Computer Architecture Technical University of Catalonia c/Jordi Girona 1-3 08034 Barcelona Spain

This paper makes two important contributions. First, the paper investigates the performance implications of data placement in OpenMP programs running on modern NUMA multiprocessors. Data locality and minimization of the rate of remote memory accesses are critical for sustaining high performance on these systems. We show that due to the low remote-to-local memory access latency ratio of contemporary NUMA architectures, reasonably balanced page placement schemes, such as round-robin or random distribution, incur modest performance losses. Second, the paper presents a transparent, user-level page migration engine with an ability to gain back any performance loss that stems from suboptimal placement of pages in iterative OpenMP programs. The main body of the paper describes how our OpenMP runtime environment uses page migration for implementing implicit data distribution and redistribution schemes without programmer intervention. Our experimental results verify the effectiveness of the proposed framework and provide a proof of concept that it is not necessary to introduce data distribution directives in OpenMP and warrant the simplicity or the portability of the programming model.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Coping with very high latencies in petaflop computer systems 2nd

引用

2nd International Symposium on High Performance Computing, ISHPC 1999

作者： Ryan, Sean Amaral, José N. Gao, Guang Ruiz, Zachary Marquez, Andres Theobald, Kevin Computer Architecture and Parallel Systems Laboratory University of Delaware NewarkDE United States

ISBN: (纸本)3540659692

The very long and highly variable latencies in the deep memory hierarchy of a petaflop-scale architecture design, such as the Hybrid Technology Multi-Threaded architecture (HTMT) [13], present a new challenge to its programming and execution model. A solution to coping with such high and variable latencies is to directly and explicitly expose the different memory regions of the machine to the program execution model, allowing better management of communication. In this paper we describe the novel percolation model that lies at the heart of the HTMT program execution model [13]. The Percolation Model combines multithreading with dynamic prefetching of coarse-grain contexts. In the past, prefetching techniques have concentrated on moving blocks of data within the memory hierarchy. Instead of only moving contiguous blocks of data, the thread percolation approach manages contexts that include data, program instructions, and control states. The main contributions of this paper include the specification of the HTMT runtime execution model based on the concept of percolation, and a discussion of the role of the compiler in a machine that exposes the memory hierarchy to the programming model. © 1999, Springer-Verlag. All rights reserved.

关键词： Solvents

来源：评论

学校读者我要写书评

暂无评论

Superconducting processors for HTMT: issues and challenges

Superconducting processors for HTMT: issues and challenges

引用

Frontiers of Massively Parallel Computation

作者： K.B. Theobald G.R. Gao T.L. Sterling Computer Architecture and Parallel Systems Laboratory Department of Electrical and Computer Engineering University of Delaware Newark DE USA NASA Jet Propulsion Laboratory /Center for Advanced Computing Research California Institute of Technology Pasadena CA USA

The Hybrid Technology Multi-Threading project is a long-term study of the feasibility of combining several emerging technologies to reach 1 petaFLOPS within ten years. HTMT will combine high-speed superconductor processors, semiconductor memories with built-in processors, high-speed optical interconnects, and high-density holographic storage. While there are major challenges in all aspects of this project, those in processor architecture are the focus of this paper. Fundamental differences between RSFQ circuits and conventional semiconductor circuits, including a radical jump in clock speed, make today's processor design approaches inappropriate for HTMT. Sequential instruction dispatching, even within the lowest programming unit (a strand), will lead to unacceptably high latencies, hence poor performance. We propose alternative processor designs which use fine-grain synchronizations between individual instructions in order to avoid these bottlenecks.

关键词： Random access memory Optical buffering Holography Holographic optical components Delay computer architecture Optical interconnections Electrical capacitance tomography Quantum computing Clocks

来源：评论

学校读者我要写书评

暂无评论

VirtualQueue: A technique for packet voice stream reconstruction

VirtualQueue: A technique for packet voice stream reconstruc...

引用

IEEE International Conference on Multimedia Computing and systems (ICMCS)

作者： N. Figueira J. Pasquale Bay Architecture Lab Nortel Networks Limited Santa Clara CA USA Computer Systems Laboratory Department of Computer Science and Engineering University of California San Diego La Jolla CA USA

Statistical multiplexing in packet-switched networks creates problems for packetized voice streams by introducing variable delays on delivered packets. The resulting jitter needs to be filtered so that received voice packets can be reconstructed as a continuous stream at the receiver. One common approach to reconstruction is to play back the receiver voice data after a delay offset from the departure time at the source of the packet stream. While the added delay helps filter jitter, one cannot introduce too much delay, otherwise, interactiveness suffers. This paper presents a new technique to find the necessary delay offset (or play-back delay) to recreate the original voice data stream. This technique gives the user control over the fraction of packets that should arrive in time to be played back so that the added play-back delay can be effectively minimized.

关键词： Delay effects Delay estimation Added delay Jitter Upper bound Streaming media Humans Degradation NASA Laboratories

来源：评论

学校读者我要写书评

暂无评论

Energy and performance improvements in microprocessor design using a loop cache

Energy and performance improvements in microprocessor design...

引用

IEEE International Conference on computer Design: VLSI in computers and Processors, (ICCD)

作者： N. Bellas I. Hajj C. Polychronopoulos G. Stamoulis Digital DNA Systems Architecture Laboratories Motorola Corporation Schaumburg IL USA Department of Electrical & Computer Engineering and the Coordinated Science Laboratory University of Illinois Urbana IL USA Intel Corporation Santa Clara CA USA

Energy dissipated in on-chip caches represents a substantial portion in the energy budget of today's processors. Extrapolating current trends, this portion is likely to increase in the near future, since the devices devoted to the caches occupy an increasingly larger percentage of the total area of the chip. We extend the work proposed by J. Kin et al. (1997), in which an extra, small cache (called filter cache) is inserted between the CPU data path and the L1 cache and serves to filter most of the references initiated from the CPU. In our scheme, the compiler is used to generate code that exploits the new memory hierarchy and reduces the possibility of a miss in the extra cache. Experimental results across a wide range of SPEC95 benchmarks show that this cache, which we call L-Cache, has a small performance overhead with respect to the scheme without any extra caches, and provides substantial energy savings. The L-Cache is placed between the CPU and the I-Cache. The D-Cache subsystem is not modified. Since the L-Cache is much smaller, and thus, has a smaller access time than the I-Cache, this scheme can also be used for performance improvements provided that the hit rate in the L-Cache is very high. In our experimental results, we show that the L-Cache does indeed improve performance in some cases.

关键词： Microprocessors computer architecture Power engineering and energy Energy consumption Portable computers Circuits Energy dissipation computer aided instruction Hardware

来源：评论

学校读者我要写书评

暂无评论

Mixed abstraction level hardware synthesis from SDL for rapid prototyping

Mixed abstraction level hardware synthesis from SDL for rapi...

引用

International Workshop on Rapid System Prototyping (RSP)

作者： O. Bringmann A. Muth F. Slomka W. Rosenstiel G. Farber R. Hofmann Wilhelm-Schickard-Institut für Informatik Department of Computer Engineering Universität Tubingen Germany Laboratory for Process Control and Real-Time Systems Technische Universitä MTünchen Germany Department of Computer Architecture and Performance Evaluation Universität Erlangen Nürnberg Germany

SDL is currently gaining interest as a system level specification language for HW/SW codesign. Automated synthesis of SDL in hardware so far had problems with its efficiency. The investigations on the resource usage of SDL-to-VHDL designs presented in this paper identify two key challenges: minimizing the overhead introduced by SDL process infrastructure, and choosing the appropriate synthesis method. This paper presents a framework for SDL hardware synthesis where VHDL code generation, high-level synthesis and RT-level synthesis are combined. A configurable run-time environment implements services like data handling and message passing in efficient, hand-coded library components, which take into account properties of the target architecture. For these components RT-level synthesis was found to be suitable. The behavior of each SDL process on the other hand is freely specified by the system designer. Depending on the type of application, i.e. complex data-oriented or control-oriented either high-level synthesis, RT-level synthesis, or a combination of both can prove to be optimal.

关键词： Hardware Prototypes Control system synthesis Timing High level synthesis computer architecture Resource management Arithmetic High performance computing Design engineering

来源：评论

学校读者我要写书评

暂无评论

Preface

Lecture Notes in Computer Science (including subseries Lectu...

引用

Lecture Notes in computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 1999年 1697卷 V页

作者： Dongarra, Jack Luque, Emilio Margalef, Toms University of Tennessee OakRidge National Laboratory 107 Ayres Hall KnoxvilleTN United States Universitat Autònoma of Barcelona Computer Science Department Computer Architecture and Operating Systems Group Bellaterra BarcelonaTN Spain

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：