检索结果-内蒙古大学图书馆

Design, Automation & Test in Europe Conference & Exhibition

作者： Arun Subramaniyan Semeen Rehman Muhammad Shafique Akash Kumar J?rg Henkel University of Michigan-Ann Arbor USA Chair for Processor Design TU Dresden Germany Institute of Computer Engineering Vienna University of Technology (TU Wien) Austria Chair for Embedded Systems Karlsruhe Institute of Technology Germany

ISBN: (纸本)9781509058266

Mainstream multi-core processors employ large multilevel on-chip caches making them highly susceptible to soft errors. We demonstrate that designing a reliable cache hierarchy requires understanding the vulnerability interdependencies across different cache levels. This involves vulnerability analyses depending upon the parameters of different cache levels (partition size, line size, etc.) and the corresponding cache access patterns for different applications. This paper presents a novel soft error-aware cache architectural space exploration methodology and vulnerability analysis of multi-level caches considering their vulnerability interdependencies. Our technique significantly reduces exploration time while providing reliability-efficient cache configurations. We also show applicability/benefits for ECC-protected caches under multi-bit fault scenarios.

关键词： considering vulnerability providing

来源：评论

学校读者我要写书评

暂无评论

Enhancing nanosatellite dependability through autonomous chip-level debug capabilities 29th

Enhancing nanosatellite dependability through autonomous chi...

引用

29th International Conference on architecture of Computing systems, ARCS 2016

作者： Fuchs, Christian M. Dafinger, Nikolaus Langer, Martin Trinitis, Carsten Chair of Space Systems Engineering Faculty of Aerospace Engineering Computer Engineering Laboratory Delft University of Technology Delft Netherlands Institute for Astronautics Chair for Computer Architecture and Organization Technical University Munich Garching Germany

来源：评论

学校读者我要写书评

暂无评论

eQASM: An executable quantum instruction set architecture

arXiv

引用

arXiv 2018年

作者： Fu, X. Riesebos, L. Rol, M.A. van Straten, J. van Someren, J. Khammassi, N. Ashraf, I. Vermeulen, R.F.L. Newsum, V. Loh, K.K.L. de Sterke, J.C. Vlothuizen, W.J. Schouten, R.N. Almudever, C.G. DiCarlo, L. Bertels, K. QuTech Delft University of Technology P.O. Box 5046 Delft2600 GA Netherlands Quantum Computer Architecture Lab Delft University of Technology Mekelweg 4 Delft2628 CD Netherlands Kavli Institute of Nanoscience Delft University of Technology P.O. Box 5046 Delft2600 GA Netherlands Computer Engineering Lab Delft University of Technology Mekelweg 4 Delft2628 CD Netherlands P.O. Box 155 Delft2600 AD Netherlands Topic Embedded Systems B.V. P.O. Box 440 AK Best5680 Netherlands

A widely-used quantum programming paradigm comprises of both the data flow and control flow. Existing quantum hardware cannot well support the control flow, significantly limiting the range of quantum software executable on the hardware. By analyzing the constraints in the control microarchitecture, we found that existing quantum assembly languages are either too high-level or too restricted to support comprehensive flow control on the hardware. Also, as observed with the quantum microinstruction set QuMIS [1], the quantum instruction set architecture (QISA) design may suffer from limited scalability and flexibility because of microarchitectural constraints. It is an open challenge to design a scalable and flexible QISA which provides a comprehensive abstraction of the quantum hardware. In this paper, we propose an executable QISA, called eQASM, that can be translated from quantum assembly language (QASM), supports comprehensive quantum program flow control, and is executed on a quantum control microarchitecture. With efficient timing specification, single-operation-multiple-qubit execution, and a very-long-instruction-word architecture, eQASM presents better scalability than QuMIS. The definition of eQASM focuses on the assembly level to be expressive. Quantum operations are configured at compile time instead of being defined at QISA design time. We instantiate eQASM into a 32-bit instruction set targeting a seven-qubit superconducting quantum processor. We validate our design by performing several experiments on a two-qubit quantum processor. Copyright © 2018, The Authors. All rights reserved.

关键词： computer architecture

来源：评论

学校读者我要写书评

暂无评论

Stress-aware routing to mitigate aging effects in SRAM-based FPGAs

Stress-aware routing to mitigate aging effects in SRAM-based...

引用

International Conference on Field Programmable Logic and Applications

作者： Behnam Khaleghi Behzad Omidi Hussam Amrouch Jörg Henkel Hossein Asadi Department of Computer Engineering Sharif University of Technology Tehran Chair for Embedded Systems Karlsruhe Institute of Technology Germany

Continuous shrinking of transistor size to provide high computation capability along with low power consumption has been accompanied by reliability degradations due to e.g., aging phenomenon. In this regard, with huge number of configuration bits, Field-Programmable Gate Arrays (FPGAs) are more susceptible to aging since aging not only degrades the performance, it may additionally result in corrupting the configuration cells and thus causing permanent circuit malfunctioning. While several works have investigated the aging effects in Look-Up Tables (LUTs), the routing fabric of these devices is seldom studied - even though it contributes to the majority of FPGAs' resources and configuration bits. Furthermore, there is a high prospect that errors in its state to propagate to the device outputs. In this paper, we first investigate aging effects in the routing fabric of FPGAs with respect to performance and reliability degradations. Based on this investigation, we enhance the conventional routing algorithm to mitigate the impact of aging by increasing the recovery time (i.e., the mechanism used to heal aging-induced defects) of transistors used in the routing resources. We examine our proposed method as reduction in stress time and required guardband to protect against aging in the routing fabric, as well as in improving the FPGA's lifetime. Our experiments show that the proposed method reduces the average stress time and aging-induced delay of routing resources by 41% and 18.3%, respectively. This, in turn, leads to improving the device lifetime by 130% compared to baseline routing. The proposed method can be applied by simple amending of conventional routing algorithms. Thus, it incurs negligible delay overhead.

关键词： Aging Routing Field programmable gate arrays Table lookup Stress Delays Transistors

来源：评论

学校读者我要写书评

暂无评论

Enhancing Nanosatellite Dependability Through Autonomous Chip-Level Debug Capabilities

Enhancing Nanosatellite Dependability Through Autonomous Chi...

引用

ARCS 2016;29th International Conference on architecture of Computing systems

作者： Christian M. Fuchs Nikolaus Dafinger Martin Langer Carsten Trinitis Chair of Space Systems Engineering Faculty of Aerospace Engineering Computer Engineering Laboratory Delft University of Technology Institute for Astronautics Technical University Munich Chair for Computer Architecture and Organization Technical University Munich

Modern embedded technology enables a high level of compute performance at the cost of little energy. Hence, miniaturized satellite development has begun to rely upon conventional application processor architectures an... 详细信息

关键词： satellite development nano satellites chip level Storage capacity debugger Job Performance Field programmable gate arrays embedded technology Processor architectures

来源：评论

学校读者我要写书评

暂无评论

STRAP: Stress-aware placement for aging mitigation in runtime reconfigurable architectures 15

STRAP: Stress-aware placement for aging mitigation in runtim...

引用

IEEE International Conference on computer-Aided Design

作者： Hongyan Zhang Michael A. Kochte Eric Schneider Lars Bauer Hans-Joachim Wunderlich Jörg Henkel Chair for Embedded Systems Karlsruhe Institute of Technology Karlsruhe Germany Institute of Computer Architecture and Computer Engineering University of Stuttgart Germany

ISBN: (纸本)9781467383899

Aging effects in nano-scale CMOS circuits impair the reliability and Mean Time to Failure (MTTF) of embedded systems. Especially for FPGAs that are manufactured in the latest technology node, aging is amajor concern. We introduce the first cross-layer aging-aware placement method for accelerators in FPGA-based runtime reconfigurable architectures. It optimizes stress distribution by accelerator placement at runtime, i.e. to which reconfigurable region an accelerator shall be reconfigured. Additionally, it optimizes logic placement at synthesis time to diversify the resource usage of individual accelerators, i.e. which CLBs of a reconfigurable region shall be used by an accelerator. Both layers together balance the intra- and inter-region stress induced by the application workload at negligible performance cost. Experimental results show significant reduction of maximum stress of up to 64% and 35%, which leads to up to 177% and 14% MTTF improvement relative to state-of-the-art methods w.r.t. HCI and BTI aging, respectively.

关键词： Stress Runtime Aging Transistors Table lookup Fabrics Field programmable gate arrays

来源：评论

学校读者我要写书评

暂无评论

Variability-aware dark silicon management in on-chip many-core systems 15

Variability-aware dark silicon management in on-chip many-co...

引用

Design, Automation and Test in Europe Conference and Exhibition

作者： Muhammad Shafique Dennis Gnad Siddharth Garg Jörg Henkel Chair for Embedded Systems Karlsruhe Institute of Technology Germany Electrical and Computer Engineering New York University NY USA

ISBN: (纸本)9783981537048

Dark Silicon refers to the constraint that only a fraction of on-chip resources (cores) can be simultaneously powered-on (running at full performance) in order to stay within the allowable power budget and safe temperature limits, while others remain `dark'. In this paper, we demonstrate how these `dark cores' can be leveraged to improve the temperature profile at run-time, thus providing opportunities to power-on more cores at the nominal voltage than the number allowed when strictly obeying the conventional Thermal Design Power (TDP) constraint. In this paper, we propose a computationally efficient dark silicon management technique that determines the best set of cores to keep dark and the mapping of threads to cores at run-time, while also accounting for the impact of process variations. We have developed a lightweight temperature prediction mechanism that determines the impact of different candidate solutions on the chip thermal profile. Experimental evaluation of the proposed techniques on a simulated 8×8 many-core processor, and across a range of chips to account for process variations, show that the total instruction throughput is increased by 1.8× on average while keeping the temperature within the safe limits, when compared with state-of-the-art approaches.

关键词： Silicon Instruction sets Temperature Temperature dependence Power demand Thermal management Heating

来源：评论

学校读者我要写书评

暂无评论

Design and synthesis of reconfigurable control-flow structures for CGRA

Design and synthesis of reconfigurable control-flow structur...

引用

International Conference on Reconfigurable Computing and FPGAs (ReConFig)

作者： Zoltan Endre Rakossy Axel Acosta-Aponte Tobias G. Noll Gerd Ascheid Rainer Leupers Anupam Chattopadhyay Institute for Communication Technologies and Embedded Systems (ICE) RWTH University Aachen Germany Chair of Electrical Engineering and Computer Systems (EECS) RWTH University Aachen Germany School of Computer Engineering Nanyang Technological University Singapore

Coarse-Grained Reconfigurable architectures (CGRA) promise both low power and high performance coupled with flexibility, however automatic mapping of applications to such platforms remains a great research challenge. Efficient manual mapping of the data-centric kernels of applications yields great results, however these contain internally control-flow specific tasks, which introduce mapping irregularities and execution inefficiencies on CGRAs. In this paper, we explore analysis, design and synthesis of reconfigurable structures for efficient application-specific control-flow processing, aiming to develop a methodology to design reconfigurable control-flow acceleration modules. Such modules can be coupled with generic CGRAs, off-loading execution of irregular and ill-suited sequential control-flow subroutines, enabling the CGRA to exploit a clean, regular data-flow centric mapping. Considering different architectural paradigms, we design and compare a functional array-based design, a VLIW-style design and an automatically generated design based on graph theoretic concepts against the ASIC implementation of the control flow operations for several kernels of the linear algebra domain. Such reconfigurable control-flow specific accelerators are a first step towards automating CGRA-based accelerator design and application mapping from high-level descriptions.

关键词： Process control Hardware VLIW Registers Arrays Kernel Couplings

来源：评论

学校读者我要写书评

暂无评论

ADAPT: An adaptive manycore methodology for software pipelined applications

ADAPT: An adaptive manycore methodology for software pipelin...

引用

Asia and South Pacific Design Automation Conference

作者： Xi Zhang Haris Javaid Muhammad Shafique Jude Angelo Ambrose Jörg Henkel Sri Parameswaran School of Computer Science and Engineering University of New South Wales Sydney Australia Google Inc. Chair for Embedded Systems Karlsruhe Institute of Technology Karlsruhe Germany

ISBN: (纸本)9781479977932

Future on-chip manycore systems are expected to have hundreds of cores, and to be used for a number of applications to amortize their fabrication costs. In this paper, we examine how software pipelines, which are useful for streaming/multimedia applications, can be efficiently executed on a manycore system with shared memory. The goal is to balance the stages of the pipeline under workload and resource variations. This paper presents ADAPT, a method to quickly detect bottleneck stages and add cores (workers) to those bottleneck stages at run-time. Further, if there are no idle workers, then a shuffling of workers across stages is performed to improve/maintain throughput. ADAPT is implemented in a 48-core system which is built using a commercial core and tool suite. For a variety of applications, ADAPT takes less than 2 μs for one run-time adaptation, and achieves up to 2.1× the throughput of a state-of-the-art method (which is modified and implemented in the same system for a fair comparison). These results illustrate the applicability of ADAPT for fine-grained run-time management of manycore systems to achieve high throughput for software pipelines.

关键词： Pipelines Delays Software Throughput Pipeline processing Steady-state Dynamic scheduling

来源：评论

学校读者我要写书评

暂无评论

E-pipeline: Elastic hardware/software pipelines on a many-core fabric 15

E-pipeline: Elastic hardware/software pipelines on a many-co...

引用

Design, Automation and Test in Europe Conference and Exhibition

作者： Xi Zhang Haris Javaid Muhammad Shafique Jorgen Peddersen Jörg Henkel Sri Parameswaran School of Computer Science and Engineering University of New South Wales Sydney Australia Google Inc. Chair for Embedded Systems Karlsruhe Institute of Technology Karlsruhe Germany

ISBN: (纸本)9783981537048

On-chip many-core systems are expected to be in common use in the future. A set of homogeneous processors in a many-core system can be used to implement multiple pipelines which execute simultaneously. Pipelines of processors use varying numbers of cores when their workloads vary at run time. In this paper, we show how such a system executing multiple pipelines with varying workloads can be implemented. We further show how the system can switch cores within a pipeline (intra-elasticity) and between pipelines (inter-elasticity). The method is named E-pipeline, and is implemented and evaluated in a commercial tool suite. Compared to reference design methods with clock gating, E-pipeline achieves the same power savings, maintains the throughput to meet throughput constraints and reduces core usage by an average of 37.7%. The adaptation overhead for switching cores is approximately 2μs.

关键词： Pipelines Throughput Software Cloning Benchmark testing Clocks Hardware

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：