ASP (associative string processor) modules comprise highly-versatile parallel processing building-blocks for the simple construction of application-specific second-generation massively parallel processors (MPPs). The ...
详细信息
ISBN:
(纸本)0818690895
ASP (associative string processor) modules comprise highly-versatile parallel processing building-blocks for the simple construction of application-specific second-generation massively parallel processors (MPPs). The author discusses ASP module philosophy, demonstrates how ASP modules can satisfy the market, algorithmic, architectural, and engineering requirements of application-specific MPPs, and reports on current progress in the development of ASP technology. A case example indicates that 1 TOPS/ft3, 1 GOPS/W, and 1 MOPS/$ can be reasonably forecast as figures-of-merit for the cost effectiveness of second-generation MPPs built with WSI ASP modules. Comparison with first-generation MPP implementations reveals a 2-3 orders-of-magnitude advantage in favor of the ASP modules.
A new approach for computing the 2-D DFT and 2-D DCT is presented. A new design of a systolic array for transposed matrix multiplication is also shown in this paper. The new 2-D DFT/DCT avoids the need for the array t...
详细信息
A new approach for computing the 2-D DFT and 2-D DCT is presented. A new design of a systolic array for transposed matrix multiplication is also shown in this paper. The new 2-D DFT/DCT avoids the need for the array transposer that was required by earlier implementations, and all processing can be pipelined easily. This approach employs a simple and regular structure that is well suited for VLSI implementation. This array can be easily scaled without modifying the basic control scheme and PE structure.
This paper presents a novel ferroelectric field-effect transistor (FeFET) in-memory computing architecture dedicated to accelerate Binary Neural Networks (BNNs). We present in-memory convolution, batch normalization a...
详细信息
ISBN:
(纸本)9781728171470
This paper presents a novel ferroelectric field-effect transistor (FeFET) in-memory computing architecture dedicated to accelerate Binary Neural Networks (BNNs). We present in-memory convolution, batch normalization and dense layer processing through a grid of small crossbars with reduced unit size, which enables multiple bit operation and value accumulation. Additionally, we explore the possible operations parallelization for maximized computational performance. Simulation results show that our new architecture achieves a computing performance up to 2.46 TOPS while achieving a high power efficiency reaching 111.8 TOPS/Watt and an area of 0.026 mm(2) in 22nm FDSOI technology.
Details are presented of the DAC (DSP ASIC Compiler) silicon compiler framework. DAC allows a non-specialist to automatically design DSP ASICs and DSP ASIC cores directly form a high level specification. Typical desig...
详细信息
Details are presented of the DAC (DSP ASIC Compiler) silicon compiler framework. DAC allows a non-specialist to automatically design DSP ASICs and DSP ASIC cores directly form a high level specification. Typical designs take only several minutes and the resulting layouts are comparable in area and performance to handcrafted designs.
We first relate the architecture of systolic arrays to the technological and economic design forces acting on architects of special-purpose systems some 20 years ago. We then observe that those same design forces now ...
详细信息
We first relate the architecture of systolic arrays to the technological and economic design forces acting on architects of special-purpose systems some 20 years ago. We then observe that those same design forces now are bearing down on the architects of contemporary general-purpose processors, who consequently are producing general-purpose processors whose architectural features are increasingly similar to those of systolic arrays. We then describe some economic and technological forces that are changing the landscape of architectural research. At base, they are the increasing complexity of technology and applications, the fragmenting of the general-purpose processor market, and the judicious use hardware configurability. We describe a 2D architectural taxonomy, identifying what, we believe, to be a "sweet spot" for architectural research.
arrayprocessors tailored to mesh-based iterative algorithms benefit from shifting to an asynchronous mode. An architecture implementing this functionally asynchronous state-space update with self-timed elementary pro...
详细信息
FPGA-based soft processors customized for operations on sparse graphs can deliver significant performance improvements over conventional organizations (ARMv7 CPUs) for bulk synchronous sparse graph algorithms. We deve...
详细信息
ISBN:
(纸本)9781479919253
FPGA-based soft processors customized for operations on sparse graphs can deliver significant performance improvements over conventional organizations (ARMv7 CPUs) for bulk synchronous sparse graph algorithms. We develop a stripped-down soft processor ISA to implement specific repetitive operations on graph nodes and edges that are commonly observed in sparse graph computations. In the processing core, we provide hardware support for rapidly fetching and processing state of local graph nodes and edges through spatial address generators and zero-overhead loop iterators. We interconnect a 2D array of these lightweight processors with a packet-switched network-on-chip to enable fine-grained operand routing along the graph edges and provide custom send/receive instructions in the soft processor. We develop the processor RTL using Vivado High-Level Synthesis and also provide an assembler and compilation flow to configure the processor instruction and data memories. We outperform a Microblaze (100MHz on Zedboard) and an NIOS-II/f (100MHz on DE2-115) by 6x (single processor design) as well as the ARMv7 dual-core CPU on the Zynq SoCs by as much as 10x on the Xilinx ZC706 board (100 processor design) across a range of matrix datasets.
Recently, a number of researchers have started to investigate new video-on-demand (VoD) architectures using batching, patching and periodic broadcasting. These architectures, compared to traditional unicast VoD system...
详细信息
ISBN:
(纸本)076951992X
Recently, a number of researchers have started to investigate new video-on-demand (VoD) architectures using batching, patching and periodic broadcasting. These architectures, compared to traditional unicast VoD systems, are much more scalable and can serve thousands or even millions of clients concurrently. Nevertheless, existing studies are usually focused on architectural issues. The problem of designing an efficient server to implement these new multicast VoD architectures has received little attention. While existing server designs using round-based schedulers can still be used, results show that such designs are sub-optimal as they do not exploit the characteristics of fixed-schedule periodic broadcasting channels. This study addresses this challenge by presenting an efficient server design for a recent multicast VoD architecture called Super-Scalar Video-on-Demand (SS-VoD). Results show that the efficient server design can increase the system capacity by 60% compared to traditional video server designs. This paper presents details of this new server design, derives a performance model, and analyzes it using numerical results.
This paper describes a scheme for representing heterogeneous array circuits, in particular those which have been optimised by pipelining or by transposition. Equations for correctness-preserving transformations of the...
详细信息
An application-specificarray architecture for Artificial Neural Networks (ANNs) computation is proposed. This array is configured as a mesh-of-appendixed-trees (MAT). Algorithms to implement both the recall and the t...
详细信息
暂无评论