检索结果-内蒙古大学图书馆

Euromicro Symposium on Digital System Design

作者： Sonia Lopez Oscar Garnica David H. Albonesi Steven Dropsho Juan Lanchares Jose I. Hidalgo Department of Computer Engineering Rochester Institute of Technology Rochester NY USA Department of Computer Architecture Universidad Complutense de Madrid Madrid Spain Computer Systems Laboratory Cornell University Ithaca NY USA Google Inc. Zurich Switzerland

Resizable caches can trade-off capacity for access speed to dynamically match the needs of the workload. In Simultaneous Multi-Threaded (SMT) cores, the caching needs can vary greatly across the number of threads and their characteristics, offering opportunities to dynamically adjust cache resources to the workload. In this paper we propose the use of resizable caches in order to improve the performance of SMT cores, and introduce a new control algorithm that provides good results independent of the number of running threads. In workloads with a single thread, the resizable cache control algorithm should optimize for cache miss behavior because misses typically form the critical path. In contrast, with several independent threads running, we show that optimizing for cache hit behavior has more impact, since large SMT workloads have other threads to run during a cache miss. Moreover, we demonstrate that these seemingly diametrically opposed policies can be simultaneously satisfied by using the harmonic mean of the per-thread speedups as the metric to evaluate the system performance, and to smoothly and naturally adjust to the degree of multithreading.

关键词： Instruction sets Frequency domain analysis Delay Synchronization Algorithm design and analysis Adaptation model

来源：评论

学校读者我要写书评

暂无评论

A parallel architecture for Ray-Tracing

A parallel architecture for Ray-Tracing

引用

IEEE Latin American Symposium on Circuits and systems (LASCAS)

作者： Alexandre S. Nery Nadia Nedjah Felipe M.G. França LAM - Computer Architecture and Microeletronics Laboratory Systems Engineering and Computer Science Program COPPE Universidade Federal do Rio de Janeiro Department of Electronics Engineering and Telecommunications Universidade do Estado do Rio de Janeiro

Real time rendering of three-dimensional scenes in high photorealistic detail is a hard task, such as in the Ray Tracing rendering algorithm. However, parallel implementations of Ray Tracing have been enabling real time performance, as the algorithm is embarrassingly parallel. Thus, a custom parallel design in hardware is likely to achieve an acceptable performance. In this paper, we propose a hardware parallel architecture capable of dealing with the main desirable features of Ray Tracing, such as shadows and reflection effects, imposing low area cost and acceptable rendering performance.

关键词： computer architecture Interrupters computers Light sources Process control Real-time systems Telecommunications

来源：评论

学校读者我要写书评

暂无评论

Godson-T:An Efficient Many-Core architecture for Parallel Program Executions

引用

Journal of computer Science & Technology 2009年第6期24卷 1061-1073页

作者：范东睿袁楠张军超周永彬林伟宋风龙叶笑春黄河余磊龙国平张浩刘磊 Key Laboratory of Computer Systems and Architecture Institute of Computing Technology Chinese Academy of Sciences

Moore＇s law will grant computer architects ever more transistors for the foreseeable future, and the challenge is how to use them to deliver efficient performance and flexible programmability. We propose a many-core architecture, Godson- T, to attack this challenge. On the one hand, Godson-T features a region-based cache coherence protocol, asynchronous data transfer agents and hardware-supported synchronization mechanisms, to provide full potential for the high efficiency of the on-chip resource utilization. On the other hand, Godson-T features a highly efficient runtime system, a Pthreadslike programming model, and versatile parallel libraries, which make this many-core design flexibly programmable. This hardware/software cooperating design methodology bridges the high-end computing with mass programmers. Experimental evaluations are conducted on a cycle-accurate simulator of Godson-T. The results show that the proposed architecture has good scalability, fast synchronization, high computational efficiency, and flexible programmability.

关键词： many-core parallel computing multithread data communication thread synchronization runtime system

来源：评论

学校读者我要写书评

暂无评论

Simulation of High-Performance Memory Allocators

Simulation of High-Performance Memory Allocators

引用

Euromicro Symposium on Digital System Design

作者： Jose L. Risco-Martín J. Manuel Colmenar David Atienza J. Ignacio Hidalgo Dept. of Computer Architecture and Automation Complutense University of Madrid Madrid Spain C.E.S. Felipe II Complutense University of Madrid Aranjuez Spain Embedded Systems Laboratory (ESL) EPFL Lausanne Switzerland

Current general-purpose memory allocators do not provide sufficient speed or flexibility for modern high-performance applications. To optimize metrics like performance, memory usage and energy consumption, software engineers often write custom allocators from scratch, which is a difficult and error-prone process. In this paper, we present a flexible and efficient simulator to study Dynamic Memory Managers (DMMs), a composition of one or more memory allocators. This novel approach allows programmers to simulate custom and general DMMs, which can be composed without incurring any additional runtime overhead or additional programming cost. We show that this infrastructure simplifies DMM construction, mainly because the target application does not need to be compiled every time a new DMM must be evaluated. Within a search procedure, the system designer can choose the "best" allocator by simulation for a particular target application. In our evaluation, we show that our scheme will deliver better performance, less memory usage and less energy consumption than single memory allocators.

关键词： Memory management Resource management Energy consumption Dynamic scheduling Software Computational modeling Algorithm design and analysis

来源：评论

学校读者我要写书评

暂无评论

Message from the organizers

ACM International Conference Proceeding Series

引用

ACM International Conference Proceeding Series 2010年 vii-ix页

作者： Bode, Arndt Bouissou, Marc Distefano, Salvatore Puliafito, Antonio Trivedi, Kishor Walter, Max Technische Universität München Department of Computer Architecture Germany Electricité de France R and D/MRI Department France Università di Messina Multimedia and Distributed Systems Laboratory Italy Duke University Durham Dept. of Electrical and Computer Engineering United States

来源：评论

学校读者我要写书评

暂无评论

Lecture Notes in computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface

Lecture Notes in Computer Science (including subseries Lectu...

引用

Lecture Notes in computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2010年第PART 2期6395 LNCS卷

作者： Petriu, Dorina C. Rouquette, Nicolas Haugen, Øystein Carleton University Department of Systems and Computer Engineering 1125 Colonel By Drive Ottawa ON K1S 5BG Canada Jet Propulsion Laboratory Flight Software Systems Engineering and Architecture Group 4800 Oak Grove Drive Pasadena CA 91109 United States SINTEF IKT Forskningsveien 1 0373 Oslo Norway

来源：评论

学校读者我要写书评

暂无评论

System-level runtime mapping exploration of reconfigurable architectures

System-level runtime mapping exploration of reconfigurable a...

引用

International Symposium on Parallel and Distributed Processing (IPDPS)

作者： Kamana Sigdel Mark Thompson Andy D. Pimentel Carlo Galuzzi Koen Bertels Computer Engineering Laboratory Delft University of Technology The Netherlands Computer Systems Architecture Group University of Amsterdam The Netherlands

Dynamic reconfigurable systems can evolve under various conditions due to changes imposed either by the architecture, or by the applications, or by the environment. In such systems, the design process becomes more sophisticated as all the design decisions have to be optimized in terms of runtime behaviors and values. Runtime mapping exploration allows to explore reconfigurable systems at runtime to optimize task mappings in order to adapt to the changing behavior of the application(s), the architecture, or the environment. Performing such explorations at runtime enables a system to be more efficient in terms of various design constraints such as performance, chip area, power consumption, etc. Towards this goal, in this paper, we present a model that facilitates runtime mapping exploration of reconfigurable architectures. A case study of an MJPEG application shows that the presented model can be used to perform runtime exploration of various functional and non-functional design parameters.

关键词： Reconfigurable architectures Runtime environment computer architecture Process design Design optimization Energy consumption Power system modeling Embedded system Hardware Embedded computing

来源：评论

学校读者我要写书评

暂无评论

A Synchronization-Based Alternative to Directory Protocol

A Synchronization-Based Alternative to Directory Protocol

引用

International Symposium on Parallel and Distributed Processing with Applications, ISPA

作者： He Huang Lei Liu Nan Yuan Wei Lin Fenglong Song Junchao Zhang Dongrui Fan Institute of Computing Technology Key Laboratory of Computer Systems and Architecture Chinese Academy and Sciences Beijing China

The efficient support of cache coherence is extremely important to design and implement many-core processors. In this paper, we propose a synchronization-based coherence (SBC) protocol to efficiently support cache coherence for shared memory many-core architectures. The unique feature of our scheme is that it doesnpsilat use directory at all. Inspired by scope consistency memory model, our protocol maintains coherence at synchronization point. Within critical section, processor cores record write-sets (which lines have been written in critical section) with bloom-filter function. When the core releases the lock, the write-set is transferred to a synchronization manager. When another core acquires the same lock, it gets the write-set from the synchronization manager and invalidates stale data in its local cache. Experimental results show that the SBC outperforms by averages of 5% in execution time across a suite of scientific applications. At the mean time, the SBC is more cost-effective comparing to directory-based protocol that requires large amount of hardware resource and huge design verification effort.

关键词： Protocols Coherence Distributed processing Application software Helium Laboratories Concurrent computing Distributed computing computer architecture Memory architecture

来源：评论

学校读者我要写书评

暂无评论

Early Experiences with Write-Write Design of NFS over RDMA

Early Experiences with Write-Write Design of NFS over RDMA

引用

International Conference on Networking, architecture, and Storage (NAS)

作者： Bo Li Panyong Zhang Zhigang Huo Dan Meng National Research Center for Intelligent Computing Systems Institute of Computing Technology Key Laboratory of Computer System and Architecture Chinese Academy of Sciences China National Research Center for Intelligent Computing Systems Institute of Computing Technology Key Laboratory of Computer System and Architecture Chinese Academy and Sciences China

The network file system (NFS) protocol, as the de facto standard for sharing files in a distributed environment, has deployed Infiniband as the underlying transport of sunRPC, namely NFS over RDMA. In the current Read-Write design of NFS over RDMA, NFS write performance is limited for not fully utilizing the features of Infiniband. In this paper, we take on the challenge of enhancing the write performance of NFS. We propose and evaluate a new design of sunRPC over RDMA, namely Write-Write design. To guarantee the security of our design, we propose an HCA-based memory protection extension of Infiniband. Evaluations show that our Write-Write design increases the kernel-to-kernel RPC bandwidth by 15~27%. In real disk test, our Write-Write design gains 15%~22% in multi-client benchmarks compared with the Read-Write design.

关键词： Security Protection Bandwidth Scalability computer architecture computer networks Distributed computing File systems Benchmark testing Protocols

来源：评论

学校读者我要写书评

暂无评论

Flexible hardware acceleration for instruction-grain lifeguards

Flexible hardware acceleration for instruction-grain lifegua...

引用

作者： Chen, Shimin Kozuch, Michael Gibbons, Phillip B. Ryan, Michael Strigkos, Theodoros Mowry, Todd C. Ruwase, Olatunji Vlachos, Evangelos Falsafi, Babak Ramachandran, Vijaya Intel Research Pittsburgh 4720 Forbes Ave. Pittsburgh PA 15213 United States Computer Science Department Carnegie Mellon University Pittsburgh PA United States Parallel Systems Architecture Laboratory École Polytechnique Fédérale de Lausanne Lausanne Switzerland Deartment of Computer Science University of Texas at Austin Austin TX United States

Instruction-grain lifeguards monitor executing programs at the granularity of individual instructions to quickly detect bugs and security attacks, but their fine-grain nature incurs high monitoring overheads. This article identifies three common sources of these overheads and proposes three techniques that together constitute a general-purpose hardware acceleration framework for lifeguards. © 2009 IEEE.

关键词： Data mining

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：