Autoregressive (AR) spectral estimation is widely used in various fields. However, a trade-off between performance and computational complexity is sometimes faced. Two recursive computing algorithms individually appli...
详细信息
ISBN:
(纸本)0818690895
Autoregressive (AR) spectral estimation is widely used in various fields. However, a trade-off between performance and computational complexity is sometimes faced. Two recursive computing algorithms individually applied to the wide-sense stationary and highly nonstationary environments are presented. These algorithms have good numerical properties, high computing parallelism, and data locality. VLSI arrays are designed to realize the algorithms. The algorithms and arrays provide the potential of attaining high processing speed and good performance in real-time applications.
The authors describe a design framework, Architect, being developed for synthesizing application-specificarray architectures from behavioral specifications to Register-Transfer (RT) descriptions , which can be identi...
详细信息
Recently, a number of researchers have started to investigate new video-on-demand (VoD) architectures using batching, patching and periodic broadcasting. These architectures, compared to traditional unicast VoD system...
详细信息
ISBN:
(纸本)076951992X
Recently, a number of researchers have started to investigate new video-on-demand (VoD) architectures using batching, patching and periodic broadcasting. These architectures, compared to traditional unicast VoD systems, are much more scalable and can serve thousands or even millions of clients concurrently. Nevertheless, existing studies are usually focused on architectural issues. The problem of designing an efficient server to implement these new multicast VoD architectures has received little attention. While existing server designs using round-based schedulers can still be used, results show that such designs are sub-optimal as they do not exploit the characteristics of fixed-schedule periodic broadcasting channels. This study addresses this challenge by presenting an efficient server design for a recent multicast VoD architecture called Super-Scalar Video-on-Demand (SS-VoD). Results show that the efficient server design can increase the system capacity by 60% compared to traditional video server designs. This paper presents details of this new server design, derives a performance model, and analyzes it using numerical results.
A single integer linear programming model for optimally scheduling partitioned regular algorithms is presented. The herein presented methodology differs from existing methods in the following capabilities: (1) Not onl...
详细信息
ISBN:
(纸本)081867542X
A single integer linear programming model for optimally scheduling partitioned regular algorithms is presented. The herein presented methodology differs from existing methods in the following capabilities: (1) Not only constraints on the number of available processors and communication capabilities are taken into account, but also processor caches and constraints on the size of available memories are modeled and taken into account in the optimization model. (2) Different types of processors can be handled. (3) The size of the optimization model (number of integer variables) is independent of the size of the tiles to be executed. Hence, (4) the number of integer variables in the optimization model is greatly reduced such that problems of relevant size can be solved in practical execution time.
An application-specificarray architecture for Artificial Neural Networks (ANNs) computation is proposed. This array is configured as a mesh-of-appendixed-trees (MAT). Algorithms to implement both the recall and the t...
详细信息
This paper describes a design framework for developing application-specific serial array circuits. Starting from a description of the state-transition logic or a fully-parallel architecture, correctness-preserving tra...
详细信息
ISBN:
(纸本)0818629673
This paper describes a design framework for developing application-specific serial array circuits. Starting from a description of the state-transition logic or a fully-parallel architecture, correctness-preserving transformations are employed to derive a wide range of implementations with different space-time trade-offs. The approach has been used in synthesizing designs based on Field-Programmable Gate arrays, and will be illustrated by the development of a number of circuits including sorters and convolvers.
FPGA-based soft processors customized for operations on sparse graphs can deliver significant performance improvements over conventional organizations (ARMv7 CPUs) for bulk synchronous sparse graph algorithms. We deve...
详细信息
ISBN:
(纸本)9781479919253
FPGA-based soft processors customized for operations on sparse graphs can deliver significant performance improvements over conventional organizations (ARMv7 CPUs) for bulk synchronous sparse graph algorithms. We develop a stripped-down soft processor ISA to implement specific repetitive operations on graph nodes and edges that are commonly observed in sparse graph computations. In the processing core, we provide hardware support for rapidly fetching and processing state of local graph nodes and edges through spatial address generators and zero-overhead loop iterators. We interconnect a 2D array of these lightweight processors with a packet-switched network-on-chip to enable fine-grained operand routing along the graph edges and provide custom send/receive instructions in the soft processor. We develop the processor RTL using Vivado High-Level Synthesis and also provide an assembler and compilation flow to configure the processor instruction and data memories. We outperform a Microblaze (100MHz on Zedboard) and an NIOS-II/f (100MHz on DE2-115) by 6x (single processor design) as well as the ARMv7 dual-core CPU on the Zynq SoCs by as much as 10x on the Xilinx ZC706 board (100 processor design) across a range of matrix datasets.
This paper presents special-purpose linear array processor architecture for determining longest common subsequences (LCS) of two sequences. The algorithm uses systolic and pipelined architecture suitable for VLSI impl...
详细信息
ISBN:
(纸本)0818629673
This paper presents special-purpose linear array processor architecture for determining longest common subsequences (LCS) of two sequences. The algorithm uses systolic and pipelined architecture suitable for VLSI implementation. The algorithms are also suitable for implementation on parallel machines. We first develop a `greedy' algorithm to determine some of the LCS and then propose a generalization to determine all LCS of the given pair of sequences. Earlier hardware algorithms [Lipton and Lopresti, 85;Mukherjee, 89] were concerned with determining only the length of LCS or the edit distance of two sequences.
This paper presents the performance evaluation of a fast third-order Volterra digital filtering algorithm mapped onto an AT&T DSP-3 parallel processor. Five different implementations are considered. Speed-up resul...
详细信息
This paper presents the performance evaluation of a fast third-order Volterra digital filtering algorithm mapped onto an AT&T DSP-3 parallel processor. Five different implementations are considered. Speed-up results indicate that the `time-skewing' method is currently the fastest. An application to nonlinear communication channel equalization using a 64-QAM signal constellation is presented.
The factorization of sparse matrices is used in the inner loop of many engineering algorithms, including circuit simulation. This time consuming operation can be sped up by utilising muniprocessor architectures. Distr...
详细信息
暂无评论