Two real-valued signal models based on selective spanning with fast enumeration (SSFE) and layered orthogonal lattice detector (LORD) algorithms are implemented on a Nvidia graphics processing unit (GPU). A 2 x 2 mult...
详细信息
Future multi-core processors will necessitate exploitation of fine-grain, architecture-independent parallelism from applications to utilize many cores with relatively small local memories. We use c264, an end-to-end H...
详细信息
While cycle-accurate simulation tools have been widely used in modeling high-performance processors, such an approach can be hindered by the increasing complexity of the simulation, especially in modeling chip multi-p...
详细信息
ISBN:
(纸本)3540303170
While cycle-accurate simulation tools have been widely used in modeling high-performance processors, such an approach can be hindered by the increasing complexity of the simulation, especially in modeling chip multi-processors with multi-threading such as the network processors (NP). We have observed that for NP cycle level simulation, several days of simulation time covers only about one second of the real-world network traffic. Existing approaches to accelerating simulation are through either code analysis or execution sampling. Unfortunately, they are not applicable in speeding up NP simulations due to the small code size and the iterative nature of NP applications. We propose to sample the traffic input to the NP so that a long packet trace is represented by a much shorter one with simulation error bounded within +/- 3% and 95% confidence. Our method resulted one order of magnitude improvement in the NP simulation speed.
This paper presents an adaptive superscalar architecture for embeddedsystems based on the characteristics of embedded applications. The architecture uses a hardware-software cooperation solution to achieve dynamic pr...
详细信息
ISBN:
(纸本)1932415416
This paper presents an adaptive superscalar architecture for embeddedsystems based on the characteristics of embedded applications. The architecture uses a hardware-software cooperation solution to achieve dynamic processor resource allocation with minimal hardware overhead while providing some flexibility to the applications and the operating systems. The programmers and compilers can optimize applications by properly reconfiguring the architecture to lower the energy usage while maintaining the required performance level. simulation results show that simple optimizations can result in a 9% improvement in energy consumption and a 7% power reduction at the cost of about 1% performance degradation.
The main goal of an overtaking monitor system is the segmentation and tracking of the overtaking vehicle. This application can be addressed through an optic flow driven scheme. We can focus on the rear. mirror visual ...
详细信息
ISBN:
(纸本)3540364102
The main goal of an overtaking monitor system is the segmentation and tracking of the overtaking vehicle. This application can be addressed through an optic flow driven scheme. We can focus on the rear. mirror visual field by placing a camera on the top of it. If we drive a car, the ego-motion optic flow pattern is more or less unidirectional, i.e. all the static objects and landmarks move backwards while the overtaking cars move forward towards our vehicle. This well structured motion scenario facilitates the segmentation of regular motion patterns that correspond to the overtaking vehicle. Our approach is based on two main processing stages: first, the computation of optical flow using a novel superpipelined and fully parallelized architecture capable to extract the motion information with a frame-rate up to 148 frames per second at VGA resolution (640x480 pixels). Second, a tracking stage based on motion pattern analysis provides an estimated position of the overtaking car. We analyze the system performance, resources and show some promising results using a bank of overtaking car sequences.
An innovative high throughput and scalable multi-transform architecture for H.264/AVC is presented in this paper. This structure can be used as a hardware accelerator in modern embeddedsystems to efficiently compute ...
详细信息
Research on the prevention of epileptic seizures has led to approaches for future treatment techniques, which rely on the demanding computation of generalized partial directed coherence (GPDC) on electroencephalogram ...
详细信息
We present a highly efficient automated clock gating platform for rapidly developing power efficient hardware architectures. Our language, called CoDeL, allows hardware description at the algorithm level, and thus dra...
详细信息
ISBN:
(纸本)3540364102
We present a highly efficient automated clock gating platform for rapidly developing power efficient hardware architectures. Our language, called CoDeL, allows hardware description at the algorithm level, and thus dramatically reduces design time. We have extended CoDeL to automatically insert clock gating at the behavioral level to reduce dynamic power dissipation in the resulting architecture. This is, to our knowledge, the first hardware design environment that allows an algorithmic description of a component and yet produces a power aware design. To estimate the power savings, we have developed an estimation framework, which is shown to be consistent with the power savings obtained using statistical power analysis using Synopsys tools. To evaluate our platform we use the CoDeL implementation of a counter and various integer transforms used in the realm of DSP (Digital Signal Processing): discrete wavelet transform, discrete cosine transform and an integer transform used in the H.264 (MPEG4 Part 10) video compression standard. These designs are then clock gated using CoDeL and Synopsys. A simulation based power analysis on the designed circuits shows that CoDeL's clock gating performs better than Synopsys' automated clock gating. CoDeL reduces the power dissipation by 83% on average, while Synopsys gives 81% savings.
Many telecommunication applications, especially baseband processing, and digital signal processing (DSP) applications call for high-performance implementations due to the complexity of algorithms and high throughput r...
详细信息
ISBN:
(纸本)3540364102
Many telecommunication applications, especially baseband processing, and digital signal processing (DSP) applications call for high-performance implementations due to the complexity of algorithms and high throughput requirements. In general, the required performance is obtained with the aid of parallel computational resources. In these application domains, software implementations are often preferred over fixed-function ASICs due to the flexibility and ease of development. Application-specific instruction-set processor (ASIP) architectures can be used to exploit efficiently the inherent parallelism of the algorithms but still maintaining the flexibility. Use of high-level languages to program processor architectures with parallel resources can lead to inefficient resource utilization and, on the other hand, parallel assembly programming is error prone and tedious. In this paper, the inherent problems of parallel programming and software pipelining are mitigated with parallel language syntax and automatic generation of software pipelined code for the iteration kernels. With the aid of the developed tool support, the underlying performance of a processor architecture with parallel resources can be exploited and full utilization of the main processing resources is obtained for pipelined loop kernels. The given examples show that efficiency can be obtained without reducing the performance.
System-level synthesis is the task of automatically implementing application models as hardware/software systems. It encompasses four basic sub tasks, namely decision making and refinement for both computation and com...
详细信息
暂无评论