A multi-threaded microprocessor with a customisable instruction set, CUStomisable Threaded ARchitecture (CUSTARD), is proposed. CUSTARD features include design space exploration and a compiler for automatic selection ...
详细信息
A multi-threaded microprocessor with a customisable instruction set, CUStomisable Threaded ARchitecture (CUSTARD), is proposed. CUSTARD features include design space exploration and a compiler for automatic selection of custom instructions. Custom instructions, optimised for a specificapplication, accelerate frequently performed computations by implementing them as dedicated hardware. Field programmable gate array implementations of CUSTARD are evaluated using media and cryptography benchmarks, and commercial MicroBlaze processor is compared. As low as 28% area overhead for four interleaved threads and up to 355% speedup over a processor without custom instructions are demonstrated.
Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume that carr...
Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume that carry a code at the bottom of the first page, provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center. The papers in this book comprise the proceedings of the meeting mentioned on the cover and title page. They reflect the authors' opinions and, in the interests of timely dissemination, are published as presented and without change. Their inclusion in this publication does not necessarily constitute endorsement by the editors or the Institute of Electrical and Electronics Engineers, Inc.
In the past decade, tensor computation is widely used in different areas. Various software toolbox have been released to assist tensor computation. However, there is still no hardware architecture to accelerate the te...
详细信息
ISBN:
(纸本)9781728116013
In the past decade, tensor computation is widely used in different areas. Various software toolbox have been released to assist tensor computation. However, there is still no hardware architecture to accelerate the tensor computation. This paper presents an efficient applicationspecific instruction set processor (ASIP) for tensor computation. Different tensor computations are fully optimized in terms of resource usage and performance. We implement the ASIP on FPGA platform. We test our design by implementing the CANDECOMP/PARAFAC(CP) decomposition. Our design can achieve a low resource usage and run at 141 Mhz.
Currently available very large-scale integrations (VLSIs) are vulnerable to radiation, as measured in terms of soft error and total-ionizing-dose. Therefore, by following a repairable VLSI concept, we have been develo...
详细信息
ISBN:
(纸本)9798350349641;9798350349634
Currently available very large-scale integrations (VLSIs) are vulnerable to radiation, as measured in terms of soft error and total-ionizing-dose. Therefore, by following a repairable VLSI concept, we have been developing a radiation-hardened optical reconfigurable gate array VLSI that can support a use of a partially damaged VLSI. Earlier development efforts have fabricated a 1 Grad total-ionizing-dose tolerant radiation-hardened optical reconfigurable gate array VLSI using the repairable VLSI concept. However, since stabilized power supply units are also vulnerable to radiation, a radiation-hardened optical reconfigurable gate array VLSI with no stabilized function must be used with a battery in intense radiation environments such as the Fukushima Daiichi Nuclear Power Plant. This paper presents the operating voltage range of a radiation-hardened optically reconfigurable gate array VLSI. These findings confirm that a battery direct drive is possible for a radiation-hardened optically reconfigurable gate array VLSI.
In this paper, we present an ultra low power design for a class of massively parallel architectures, called tightly-coupled processor arrays. Here, the key idea is to exploit the benefits of a decentralized resource m...
详细信息
ISBN:
(纸本)9780769547688
In this paper, we present an ultra low power design for a class of massively parallel architectures, called tightly-coupled processor arrays. Here, the key idea is to exploit the benefits of a decentralized resource management as inherent to invasive computing for power saving. We propose concepts and studying different architecture trade-offs for hierarchical power management by temporarily shutting down regions of processors through power gating. Moreover, a) overall system chip energy consumption, b) hardware cost, and c) timing overheads are compared for different sizes of power domains. Experimental results show that up to 70% of system energy consumption may be saved for selected characteristical algorithms and different resource utilizations.
Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume that carr...
Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume that carry a code at the bottom of the first page, provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center. The papers in this book comprise the proceedings of the meeting mentioned on the cover and title page. They reflect the authors' opinions and, in the interests of timely dissemination, are published as presented and without change. Their inclusion in this publication does not necessarily constitute endorsement by the editors or the Institute of Electrical and Electronics Engineers, Inc.
The combination of Winograd's algorithm and systolic array architecture has demonstrated the capability of improving DSP efficiency in accelerating convolutional neural networks (CNNs) on FPGA platforms. However, ...
详细信息
ISBN:
(纸本)9781665427012
The combination of Winograd's algorithm and systolic array architecture has demonstrated the capability of improving DSP efficiency in accelerating convolutional neural networks (CNNs) on FPGA platforms. However, handling arbitrary convolution kernel sizes in FPGA-based Winograd processing elements and supporting efficient data access remain underexplored. In this work, we are the first to propose an optimized Winograd processing element (WinoPE), which can naturally support multiple convolution kernel sizes with the same amount of computing resources and maintains high runtime DSP efficiency. Using the proposed WinoPE, we construct a highly efficient systolic array accelerator, termed WinoCNN. We also propose a dedicated memory subsystem to optimize the data access. Based on the accelerator architecture, we build accurate resource and performance modeling to explore optimal accelerator configurations under different resource constraints. We implement our proposed accelerator on multiple FPGAs, which outperforms the state-of-the-art designs in terms of both throughput and DSP efficiency. Our implementation achieves DSP efficiency up to 133 GODS/DSP and throughput up to 3.1 TOPS with the Xilinx ZCU102 FPGA. These are 29.1% and 20.0% better than the best solutions reported previously, respectively.
We first relate the architecture of systolic arrays to the technological and economic design forces acting on architects of special-purpose systems some 20 years ago. We then observe that those same design forces now ...
详细信息
We first relate the architecture of systolic arrays to the technological and economic design forces acting on architects of special-purpose systems some 20 years ago. We then observe that those same design forces now are bearing down on the architects of contemporary general-purpose processors, who consequently are producing general-purpose processors whose architectural features are increasingly similar to those of systolic arrays. We then describe some economic and technological forces that are changing the landscape of architectural research. At base, they are the increasing complexity of technology and applications, the fragmenting of the general-purpose processor market, and the judicious use hardware configurability. We describe a 2D architectural taxonomy, identifying what, we believe, to be a "sweet spot" for architectural research.
This paper presents a register transfer modeling scheme for array processor simulation. Its main goals are to verify the applicationspecific design by real data computation, and to help fine tune the array architectu...
详细信息
This paper presents a register transfer modeling scheme for array processor simulation. Its main goals are to verify the applicationspecific design by real data computation, and to help fine tune the array architecture by precise timing analysis. The data flow graph of the design is translated into a register transfer language which is further combined with a hardware description module. An interactive simulator SISim v2.0 has been implemented to simulate the behavior of such a system. The results are compared with the expected values to verify the array processor design. The recorded timing information can help the designer to analyze the system and improve the performance and resource utilization.
This paper describes the VLSI design and simulation of the lower layer processors of the KYDON vision system. KYDON is a completely autonomous, hierarchical, multilayered image understanding system. The VLSI design of...
详细信息
This paper describes the VLSI design and simulation of the lower layer processors of the KYDON vision system. KYDON is a completely autonomous, hierarchical, multilayered image understanding system. The VLSI design of the individual components as well as the timing simulation results of the processor of every have been presented. The system runs at 50 Mhz and promises a high processing rate of 300 image frames/sec.
暂无评论