This paper describes two new matrix transform algorithms for the max-log-MAP decoding of turbo codes. In the proposed algorithms, the successive decoding procedures carried out in the conventional max-log-MAP algorith...
详细信息
ISBN:
(纸本)0780370805
This paper describes two new matrix transform algorithms for the max-log-MAP decoding of turbo codes. In the proposed algorithms, the successive decoding procedures carried out in the conventional max-log-MAP algorithm are performed in parallel, and well formulated into a set of simple and regular matrix operations, which can therefore considerably speed up the decoding operations and reduce the computational complexity. The matrix max-log-MAP algorithms also maintain the advantage of the general logarithmic MAP like algorithms in avoiding complex numerical representation problems. They particularly facilitate the implementations of the logarithmic MAP like algorithms in special-purpose parallel processing VLSI hardware architectures. The matrix algorithms also allow simple implementations by using shift registers. The proposed implementation architectures for the matrix max-log-MAP decoding can effectively reduce the memory capacity and simplify the data accesses and transfers required by the conventional max-log-MAP as well as MAP algorithms.
Digital signalprocessing operations, e.g., digital filters, are one important class of application in communication devices. A digital filtering algorithm can be implemented in various ways by selecting one architect...
Digital signalprocessing operations, e.g., digital filters, are one important class of application in communication devices. A digital filtering algorithm can be implemented in various ways by selecting one architecture from the set of possible realizations. By choosing an advanced architecture notable advantages in both the silicon area and power dissipation can be achieved compared to the conventional direct-form realization. This paper focuses on interpolated finite impulse response (interpolated FIR) filter and recursive running-sum (RRS) filter based architectures. The VHDL-based implementations prove that these advanced architectures are efficient when low-power or low-area characteristics are desired. Over 55 percent savings in the area and in the power dissipation were achieved when an FIR filter with a narrow transition band was implemented using these architectures.
MPEG-4 is a new multimedia standard combining interactivity, object-based natural and synthetic digital video, audio and computer-graphics. For the implementation of the video part of the MPEG-4 standard a high degree...
MPEG-4 is a new multimedia standard combining interactivity, object-based natural and synthetic digital video, audio and computer-graphics. For the implementation of the video part of the MPEG-4 standard a high degree of flexibility is required, where the motion estimation requires the highest part of the computational power. Therefore, in this paper fast algorithms for MPEG-4 motion estimation are evaluated in terms of visual quality and computational power requirements for processor based implementations. Due to the object-based nature of MPEG-4 also new VLSI architectures for MPEG-4 motion estimation are required. Therefore known motion estimation architectures are evaluated on their capability of being modified for MPEG-4 support. Based on this evaluation a new dedicated, but flexible MPEG-4 motion estimation architecture targeted for low-power handheld applications is presented, which resulted to be advantageous to processor based implementations by magnitudes of order.
Three pipelined multiprocessor implementations of adaptive lattice filters are examined. The three multiprocessor architectures, which can be characterized as a serial pipeline, a ladder-connected dual pipeline, and a...
详细信息
Three pipelined multiprocessor implementations of adaptive lattice filters are examined. The three multiprocessor architectures, which can be characterized as a serial pipeline, a ladder-connected dual pipeline, and a ring pipeline, are derived directly from the computational and data-transfer requirements of adaptive lattice algorithms. The order-recursive nature of the adaptive lattice structure results in architectures which use pipelining extensively. A performance analysis is done for each multiprocessor system, with respect to two different adaptive lattice algorithms. Expressions for approximate computation time and speedup are derived for each combination of architecture and algorithm.< >
Parallel matrix multiplication algorithms (based on the common data distribution formats) used in pattern recognition, image processing, and signalprocessing applications are discussed. A novel algorithm is introduce...
详细信息
Parallel matrix multiplication algorithms (based on the common data distribution formats) used in pattern recognition, image processing, and signalprocessing applications are discussed. A novel algorithm is introduced and is shown to be the fastest one for a determined class of applications. The algorithms are analyzed for performance as a function of array dimension, data distribution formats, and the architecture of the computer upon which the algorithms are executed. Performance bounds and speedups (linear in the number of processors) are established. The results of the analysis are given both as characterizations of executions on selected classes of architectures and also in the form of theorems which establish the relative performance of the algorithms across classes of data distributions and architectures.< >
The authors evaluate several techniques for solving the symmetric tridiagonal problem based on the method of isospectral flow. architectures which result from these considerations are discussed. Their advantages and d...
详细信息
ISBN:
(纸本)0892524669
The authors evaluate several techniques for solving the symmetric tridiagonal problem based on the method of isospectral flow. architectures which result from these considerations are discussed. Their advantages and disadvantages, from the viewpoints of numerical accuracy and ease of implementation in VLSI, are investigated.
For many years, following the ever-increasing number of transistors per chip, advances in computer architecture mostly consisted of adding complex mechanisms to mono-core processors to improve their computing performa...
详细信息
For many years, following the ever-increasing number of transistors per chip, advances in computer architecture mostly consisted of adding complex mechanisms to mono-core processors to improve their computing performance. In the last decade, the continuous growth of computing performance was supported by the introduction of multi-core architectures, first for high-performance computing, then in mainstream desktop CPUs, and now in smartphones and embedded systems. Today, one of the main challenges researchers must overcome is finding how to implement applications that fully exploit the computing performance offered by these multicore architectures with tens, hundreds, and soon thousands of cores. In this session, parallel implementations of State-of-the-Art signal and video processing applications on multi and manycore architectures are presented. The first two talks of this session focus on implementation of HEVC video encoder on modern architecture. The implementation of intra encoding algorithms of HEVC on heterogeneous multicore architectures will be presented by the Fraunhofer HHI, and the optimization of the complexity-quality tradeoff of hardware-accelerated HEVC coding will be presented by the Politecnico di Torino. Finally, an implementation of the Fast Fourier Transform on a manycore embedded system will be presented as a result of collaboration between Kalray, INSA Rennes, and the Auckland University of Technology.
High Efficiency Video Coding (HEVC), the recently developed international video compression standard, has 50% better video compression efficiency than H.264 video compression standard at the expense of significantly i...
详细信息
High Efficiency Video Coding (HEVC), the recently developed international video compression standard, has 50% better video compression efficiency than H.264 video compression standard at the expense of significantly increased computational complexity. HEVC Inverse Discrete Cosine Transform (IDCT) algorithm accounts for 11% of the computational complexity of an HEVC video encoder. Recently, commercial and academic high-level synthesis (HLS) tools are started to be successfully used for FPGA implementations of digital signalprocessingalgorithms. Therefore, in this paper, the first FPGA implementations of HEVC 2D IDCT algorithm using HLS tools in the literature are proposed. The proposed HEVC IDCT hardware are implemented on Xilinx FPGAs using three HLS tools; Xilinx Vivado HLS, LegUp, MATLAB Simulink HDL Coder. Using HLS tools significantly reduced the FPGA development time, and the resulting FPGA implementations achieved real-time performance. Therefore, HLS tools can be used for FPGA implementation of HEVC video encoder.
This paper discusses some of the fundamental issues in the design of highly parallel, dense, low-power motion sensors in analog VLSI. Since photoreceptor circuits are an integral part of all visual motion sensors, we ...
详细信息
This paper discusses some of the fundamental issues in the design of highly parallel, dense, low-power motion sensors in analog VLSI. Since photoreceptor circuits are an integral part of all visual motion sensors, we discuss how the sizing of photosensitive areas cart affect the performance of such systems. We review the classic gradient and correlation algorithms and give a survey of analog motion-sensing architectures inspired by them. We calculate how the measurable speed range scales with signal-to-noise ratio (SNR) for a classic Reichardt sensor with a fixed time constant. We show how this speed range may be improved using a nonlinear filter with an adaptive time constant, constructed out of a diode and a capacitor, and present data from a velocity sensor based on such a filter. Finally, we describe how arrays of such velocity sensors can be employed To compute the heading direction of a moving subject and to estimate the time-to-contact between the sensor and a moving object.
Energy efficiency and security is a critical requirement for computing at edge nodes. Unrolled architectures for lightweight cryptographic algorithms have been shown to be energy-efficient, providing higher performanc...
详细信息
暂无评论