While research on the design of heterogeneous concurrent systems has a long and rich history, a unified design methodology and tool support has not emerged so far, and thus the creation of such systems remains a diffi...
详细信息
ISBN:
(纸本)9791092279016
While research on the design of heterogeneous concurrent systems has a long and rich history, a unified design methodology and tool support has not emerged so far, and thus the creation of such systems remains a difficult, time-consuming and error-prone process. The absence of principled support for system evaluation and optimization at high abstraction levels makes the quality of the resulting implementation highly dependent on the experience or prejudices of the designer. In this work we present TURNUS, a unified dataflow design space exploration framework for heterogeneous parallel systems. It provides high-level modelling and simulation methods and tools for system level performances estimation and optimization. TURNUS represents the outcome of several years of research in the area of co-design exploration for multimedia stream applications. During the presentation, it will be demonstrated how the initial high-level abstraction of the design facilitates the use of different analysis and optimization heuristics. These guide the designer during validation and optimization stages without requiring low-level implementations of parts of the application. Our framework currently yields exploration and optimization results in terms of algorithmic optimization, rapid performance estimation, application throughput, buffer size dimensioning, and power optimization.
One of the main concerns of evolvable and adaptive systems is the need of a training mechanism, which is normally done by using a training reference and a test input. The fitness function to be optimized during the ev...
详细信息
ISBN:
(纸本)9791092279016
One of the main concerns of evolvable and adaptive systems is the need of a training mechanism, which is normally done by using a training reference and a test input. The fitness function to be optimized during the evolution (training) phase is obtained by comparing the output of the candidate systems against the reference. The adaptivity that this type of systems may provide by re-evolving during operation is especially important for applications with runtime variable conditions. However, fully automated self-adaptivity poses additional problems. For instance, in some cases, it is not possible to have such reference, because the changes in the environment conditions are unknown, so it becomes difficult to autonomously identify which problem requires to be solved, and hence, what conditions should be representative for an adequate re-evolution. In this paper, a solution to solve this dependency is presented and analyzed. The system consists of an image filter application mapped on an evolvable hardware platform, able to evolve using two consecutive frames from a camera as both test and reference images. The system is entirely mapped in an FPGA, and native dynamic and partial reconfiguration is used for evolution. It is also shown that using such images, both of them being noisy, as input and reference images in the evolution phase of the system is equivalent or even better than evolving the filter with offline images. The combination of both techniques results in the completely autonomous, noise type/level agnostic filtering system without reference image requirement described along the paper.
Graphical processing Units (GPU) architectures are massively used for resource-intensive computation. Initially dedicated to imaging, vision and graphics, these architectures serve nowadays a wide range of multi-purpo...
详细信息
Nowadays, computer vision algorithms have countless application domains. On the one hand, these algorithms are typically computationally demanding, on the other hand, they are often used in embedded systems, which hav...
详细信息
ISBN:
(纸本)9791092279016
Nowadays, computer vision algorithms have countless application domains. On the one hand, these algorithms are typically computationally demanding, on the other hand, they are often used in embedded systems, which have stringent constraints on, e. g., size or power. In this work, we present the benefits of mapping compute-intensive imaging algorithms on programmable massively parallel processor arrays. More specific, we propose different variants of a combined corner and edge detection algorithm, the Harris Corner Detector (HCD), map these variants onto tightly-coupled processor arrays (TCPAs), and prototype the TCPA architecture, executing the different HCD implementations, in FPGA technology. Because floating-point operations are very costly in FPGAs, we use fixed-point arithmetic in our design, and evaluate our implementation by means of accuracy and performance against two state-of-the-art implementations: (a) the OpenCV library of programming functions for real-time computer vision, using 64-bit floating-point precision, and (b) a 32-bit fixed-point DSP-based embedded system. The accuracy of our work is evaluated by considering the number of corners detected. Here, our approach achieves an average error of less than 1.5% when compared with a reference implementation. Our different variants, trading accuracy for performance, are mapped to the programmable processor elements of a TCPA. Here, the fastest TCPA implementation achieves a 55 times higher frame rate than a state-of-the-art implementation of the HCD on a digital signal processor. Finally, we show how our implementation can be used in the context of a new resource-aware parallel computing paradigm, called invasive computing. Here, an application can adapt itself at run-time in order to satisfy different quality and throughput requirements.
This paper analyzes the application of different machine learning techniques for objective image Quality Assessment (IQA), and proposes an implementation on Field Programmable Gate Array (FPGA) system of final model g...
详细信息
This paper analyzes the application of different machine learning techniques for objective image Quality Assessment (IQA), and proposes an implementation on Field Programmable Gate Array (FPGA) system of final model generated by one of these techniques. The quality database TID2013 used for the construction of models contains a set of independent variables (quality metrics) and human rating Mean Opinion Score (MOS) extract from image. The first step in the modeling process deals with the selection of an accurate set of image metrics that are used as the input data of the model. The selected input metric data are used with the MOS as entries of machine learning methods to produce the final models. Different machine learning methods are evaluated and their performances in terms of image quality prediction are compared. The proposed methods consist of two classification techniques (Linear Discriminant Analysis and k-Nearest Neighbors) and four nonlinear regressions approaches (Artificial Neural Network, Non-Linear Polynomial Regression, decision tree and fuzzy logic). Both the stability and the robustness of designed models are evaluated by using a variant of Monte-Carlo cross validation (MCCV) with 1000 randomly chosen validation sets. The simulation results demonstrate that fuzzy logic model has the highest stable behavior and the best agreement with human visual perception. Thus implemented models consist of the final models produced by fuzzy logic modeling using Gaussian and Generalize Bell membership functions. The proposal implementation is done on Kintex 7 FPGA by using Xilinx Vivado and Vivado HLS tool.
The paper presents a new method of vehicle speed estimation using image data processing. The presented method employs conversion of greyscale input images into binary form. image conversion into binary form is based o...
详细信息
ISBN:
(纸本)9788362065301
The paper presents a new method of vehicle speed estimation using image data processing. The presented method employs conversion of greyscale input images into binary form. image conversion into binary form is based on small gradients in the input images. Contents of the obtained binary images correspond with traffic scenes presented in the input images. Vehicle speed is estimated on the basis of differences between appropriate ordinal image numbers in the sequence of input images. These ordinal image numbers are determined according to the changes of the state of the initial detection field and final detection field. The state of the detection fields is determined by analysis of their features. Changes of the features of the detection fields are caused by passing vehicles. Experimental results of vehicle speed estimation are provided.
Multipliers are basic building blocks for many arithmetic logic units, digital signal processors, coding theory units, communication systems, imageprocessing systems etc. So multipliers designed with high speed and p...
详细信息
ISBN:
(纸本)9781509038008
Multipliers are basic building blocks for many arithmetic logic units, digital signal processors, coding theory units, communication systems, imageprocessing systems etc. So multipliers designed with high speed and power efficient are essential for high performance processing units. The speed of multiplier is limited by propagation delay of adder. Therefore, design of efficient adders is critical in high performance multipliers. Vedic multipliers is one of the fastest multipliers which are focused recently. In this paper, we propose an power-delay efficient design of Vedic multiplier using adaptable Manchester Carry Chain adders (MCC) in a hierarchal approach. The proposed Vedic multiplier design using MCC is evaluated and analyzed in terms of power, delay and area in a standard 45nm CMOS technology in CADENCE. The proposed Vedic multiplier design using MCC has lower power-delay product requirement than existing Vedic multiplier architectures.
暂无评论