A method for mapping an algorithm, which is represented by the loop nest into the application specific structure is proposed. The method consists in translating the loop nest into the tensor equation. The tensor equat...
详细信息
ISBN:
(纸本)9789531841306
A method for mapping an algorithm, which is represented by the loop nest into the application specific structure is proposed. The method consists in translating the loop nest into the tensor equation. The tensor equation a set of structural solutions. The optimized solution finding consists in solving this equation in integers. The proposed limitations to the parts of the tensors help to derive. the pipelined structure and simplify the. mapping process. The method is illustrated by the example of the IIR-filter structure synthesis. It is intended for mapping DSP algorithms into FPGA.
In order to improve the performance of block cipher, clustered processor structure is put forward. How to schedule data in multiple clusters will influence the processor performance directly. Based on the analyzing ch...
详细信息
In order to improve the performance of block cipher, clustered processor structure is put forward. How to schedule data in multiple clusters will influence the processor performance directly. Based on the analyzing characteristics of block cipher data flow, we propose a data scheduling scheme according to block width and operation mode. The final algorithm mapping and experiment results show that the data scheduling scheme not only meets the data distribution demand of different algorithms, but also reduces the number of instructions that the algorithms need, thus it can enhance the throughput of most algorithms.
We present an algebraic theory based on tensor products for modeling direct interconnection networks. This algebraic theory has been used for designing and implementing block recursive numerical algorithms on shared-m...
详细信息
When implementing today's video compression standards on programmable processors, it is essential to optimize the algorithms with respect to the underlying hardware. As an example, the core decoder functions of th...
详细信息
ISBN:
(纸本)0819427519
When implementing today's video compression standards on programmable processors, it is essential to optimize the algorithms with respect to the underlying hardware. As an example, the core decoder functions of the H.263 hybrid coding scheme were implemented on a SIMD controlled processor with four parallel VLIW data paths, the HiPAR-DSP. The decoder tasks were implemented employing local memory, parallelization on several levels, and data statistics. Special effort was paid on the computation intensive tasks IDCT, and motion compensated frame reconstruction. To speed up the IDCT computation, a data dependent approach was chosen, which distinguishes different block types. The determination of IDCT block type could be parallelized together with other tasks, thus no additional overhead is required. Frame reconstruction mainly benefits from data parallel operations and transparent DMA transfers to and from external memory.
In surveillance and scene awareness applications using power-constrained or battery-powered equipment, performance characteristics of processing hardware must be considered. We describe a novel framework for moving pr...
详细信息
ISBN:
(纸本)9789897581335
In surveillance and scene awareness applications using power-constrained or battery-powered equipment, performance characteristics of processing hardware must be considered. We describe a novel framework for moving processing platform selection from a single design-time choice to a continuous run-time one, greatly increasing flexibility and responsiveness. Using Histogram of Oriented Gradients (HOG) object detectors and Mixture of Gaussians (MoG) motion detectors running on 3 platforms (FPGA, GPU, CPU), we characterise processing time, power consumption and accuracy of each task. Using a dynamic anomaly measure based on contextual object behaviour, we reallocate these tasks between processors to provide faster, more accurate detections when an increased anomaly level is seen, and reduced power consumption in routine or static scenes. We compare power-and speed-optimised processing arrangements with automatic event-driven platform selection, showing the power and accuracy tradeoffs between each. Real-time performance is evaluated on a parked vehicle detection scenario using the i-LIDS dataset. Automatic selection is 10% more accurate than power-optimised selection, at the cost of 12W higher average power consumption in a desktop system.
A novel PN code parallel acquisition algorithm for CDMA communication is presented in the paper. Using the mapping methodology from algorithm to architecture, our algorithm is successfully implemented in FPGA chips to...
详细信息
ISBN:
(纸本)0780391284
A novel PN code parallel acquisition algorithm for CDMA communication is presented in the paper. Using the mapping methodology from algorithm to architecture, our algorithm is successfully implemented in FPGA chips to complement quick acquisition for PN code. Finally, the simulations results are presented for this algorithm.
In order to improve the performance of block cipher, clustered processor structure is put forward. How to schedule data in multiple clusters will influence the processor performance directly. Based on the analyzing ch...
详细信息
In order to improve the performance of block cipher, clustered processor structure is put forward. How to schedule data in multiple clusters will influence the processor performance directly. Based on the analyzing characteristics of block cipher data flow, we propose a data scheduling scheme according to block width and operation mode. The final algorithm mapping and experiment results show that the data scheduling scheme not only meets the data distribution demand of different algorithms, but also reduces the number of instructions that the algorithms need, thus it can enhance the throughput of most algorithms.
This paper introduces an effective parallel processing method to design the on-board SAR (Synthetic Aperture Radar) real time imaging processor using FPGA+DSP based on the high-resolution imaging algorithm. The archit...
详细信息
ISBN:
(数字)9781728123455
ISBN:
(纸本)9781728123462
This paper introduces an effective parallel processing method to design the on-board SAR (Synthetic Aperture Radar) real time imaging processor using FPGA+DSP based on the high-resolution imaging algorithm. The architecture of this processor is designed based on the analysis of the algorithm operation characteristics and the inherent time relationship. In order to reduce the time consumption, pipeline and parallel joint processing method is applied. In addition, the system uses a combination of floating-point operations and fixed-point operations, which not only meets the imaging accuracy requirements but also saves the hardware scale of the system. The system requires 24s to focus the GF-3 stripmap SAR raw data with a granularity of 16384*16384 when works in 100MHz. The results demonstrate that our method was effective and the imaging quality can meet the requirements.
In surveillance and scene awareness applications using power-constrained or battery-powered equipment, performance characteristics of processing hardware must be considered. We describe a novel framework for moving pr...
详细信息
ISBN:
(纸本)9781479976867
In surveillance and scene awareness applications using power-constrained or battery-powered equipment, performance characteristics of processing hardware must be considered. We describe a novel framework for moving processing platform selection from a single design-time choice to a continuous run-time one, greatly increasing flexibility and responsiveness. Using Histogram of Oriented Gradients (HOG) object detectors and Mixture of Gaussians (MoG) motion detectors running on 3 platforms (FPGA, GPU, CPU), we characterise processing time, power consumption and accuracy of each task. Using a dynamic anomaly measure based on contextual object behaviour, we reallocate these tasks between processors to provide faster, more accurate detections when an increased anomaly level is seen, and reduced power consumption in routine or static scenes. We compare power- and speed- optimised processing arrangements with automatic event-driven platform selection, showing the power and accuracy tradeoffs between each. Real-time performance is evaluated on a parked vehicle detection scenario using the i-LIDS dataset. Automatic selection is 10% more accurate than power-optimised selection, at the cost of 12W higher average power consumption in a desktop system.
暂无评论