This paper presents a study of the design space of a Support Vector Machine (SVM) classifier with a linear kernel running on a manycore MPPA (Massively Parallel Processor Array) platform. This architecture gathers 256...
详细信息
This paper presents a study of the design space of a Support Vector Machine (SVM) classifier with a linear kernel running on a manycore MPPA (Massively Parallel Processor Array) platform. This architecture gathers 256 cores distributed in 16 clusters working in parallel. This study aims at implementing a real-time hyperspectral SVM classifier, where real-time is defined as the time required to capture a hyperspectral image. To do so, two aspects of the SVM classifier have been analyzed: the classification algorithm and the system parallelization. On the one hand, concerning the classification algorithm, first, the classification model has been optimized to fit into the MPPA structure and, secondly, a probability estimation stage has been included to refine the classification results. On the other hand, the system parallelization has been divided into two levels: first, the parallelism of the classification has been exploited taking advantage of the pixel-wise classification methodology supported by the SVM algorithm and, secondly, a double-buffer communication procedure has been implemented to parallelize the image transmission and the cluster classification stages. Experimenting with medical images, an average speedup of 9 has been obtained using a single-cluster and double-buffer implementation with 16 cores working in parallel. As a result, a system whose processing time linearly grows with the number of pixels composing the scene has been implemented. Specifically, only 3 mu s are required to process each pixel within the captured scene independently from the spatial resolution of the image. (C) 2017 Elsevier B.V. All rights reserved.
One of the main concerns of evolvable and adaptive systems is the need of a training mechanism, which is normally done by using a training reference and a test input. The fitness function to be optimized during the ev...
详细信息
ISBN:
(纸本)9791092279016
One of the main concerns of evolvable and adaptive systems is the need of a training mechanism, which is normally done by using a training reference and a test input. The fitness function to be optimized during the evolution (training) phase is obtained by comparing the output of the candidate systems against the reference. The adaptivity that this type of systems may provide by re-evolving during operation is especially important for applications with runtime variable conditions. However, fully automated self-adaptivity poses additional problems. For instance, in some cases, it is not possible to have such reference, because the changes in the environment conditions are unknown, so it becomes difficult to autonomously identify which problem requires to be solved, and hence, what conditions should be representative for an adequate re-evolution. In this paper, a solution to solve this dependency is presented and analyzed. The system consists of an image filter application mapped on an evolvable hardware platform, able to evolve using two consecutive frames from a camera as both test and reference images. The system is entirely mapped in an FPGA, and native dynamic and partial reconfiguration is used for evolution. It is also shown that using such images, both of them being noisy, as input and reference images in the evolution phase of the system is equivalent or even better than evolving the filter with offline images. The combination of both techniques results in the completely autonomous, noise type/level agnostic filtering system without reference image requirement described along the paper.
Graphical processing Units (GPU) architectures are massively used for resource-intensive computation. Initially dedicated to imaging, vision and graphics, these architectures serve nowadays a wide range of multi-purpo...
详细信息
Nowadays, computer vision algorithms have countless application domains. On the one hand, these algorithms are typically computationally demanding, on the other hand, they are often used in embedded systems, which hav...
详细信息
ISBN:
(纸本)9791092279016
Nowadays, computer vision algorithms have countless application domains. On the one hand, these algorithms are typically computationally demanding, on the other hand, they are often used in embedded systems, which have stringent constraints on, e. g., size or power. In this work, we present the benefits of mapping compute-intensive imaging algorithms on programmable massively parallel processor arrays. More specific, we propose different variants of a combined corner and edge detection algorithm, the Harris Corner Detector (HCD), map these variants onto tightly-coupled processor arrays (TCPAs), and prototype the TCPA architecture, executing the different HCD implementations, in FPGA technology. Because floating-point operations are very costly in FPGAs, we use fixed-point arithmetic in our design, and evaluate our implementation by means of accuracy and performance against two state-of-the-art implementations: (a) the OpenCV library of programming functions for real-time computer vision, using 64-bit floating-point precision, and (b) a 32-bit fixed-point DSP-based embedded system. The accuracy of our work is evaluated by considering the number of corners detected. Here, our approach achieves an average error of less than 1.5% when compared with a reference implementation. Our different variants, trading accuracy for performance, are mapped to the programmable processor elements of a TCPA. Here, the fastest TCPA implementation achieves a 55 times higher frame rate than a state-of-the-art implementation of the HCD on a digital signal processor. Finally, we show how our implementation can be used in the context of a new resource-aware parallel computing paradigm, called invasive computing. Here, an application can adapt itself at run-time in order to satisfy different quality and throughput requirements.
This paper analyzes the application of different machine learning techniques for objective image Quality Assessment (IQA), and proposes an implementation on Field Programmable Gate Array (FPGA) system of final model g...
详细信息
This paper analyzes the application of different machine learning techniques for objective image Quality Assessment (IQA), and proposes an implementation on Field Programmable Gate Array (FPGA) system of final model generated by one of these techniques. The quality database TID2013 used for the construction of models contains a set of independent variables (quality metrics) and human rating Mean Opinion Score (MOS) extract from image. The first step in the modeling process deals with the selection of an accurate set of image metrics that are used as the input data of the model. The selected input metric data are used with the MOS as entries of machine learning methods to produce the final models. Different machine learning methods are evaluated and their performances in terms of image quality prediction are compared. The proposed methods consist of two classification techniques (Linear Discriminant Analysis and k-Nearest Neighbors) and four nonlinear regressions approaches (Artificial Neural Network, Non-Linear Polynomial Regression, decision tree and fuzzy logic). Both the stability and the robustness of designed models are evaluated by using a variant of Monte-Carlo cross validation (MCCV) with 1000 randomly chosen validation sets. The simulation results demonstrate that fuzzy logic model has the highest stable behavior and the best agreement with human visual perception. Thus implemented models consist of the final models produced by fuzzy logic modeling using Gaussian and Generalize Bell membership functions. The proposal implementation is done on Kintex 7 FPGA by using Xilinx Vivado and Vivado HLS tool.
Compute imageprocessing algorithm on block or tile is a solution to realize an efficient parallelization on multi-processor system. In this session a GPU implementation of Block-Matching and 3-Dimensional filter (BM3...
详细信息
Compute imageprocessing algorithm on block or tile is a solution to realize an efficient parallelization on multi-processor system. In this session a GPU implementation of Block-Matching and 3-Dimensional filter (BM3D) denoising algorithmm is compared to a CPU implementation. Also a method to determine optimal 2D image tile sizing using constraint programming is implemented on a multi-core processor to optimize the system performance.
imageprocessing algorithms in today electronics market pose the need for increasing computation capabilities at a limited power budget. Modern applications in the robotics and cyber-physical systems domains require i...
详细信息
imageprocessing algorithms in today electronics market pose the need for increasing computation capabilities at a limited power budget. Modern applications in the robotics and cyber-physical systems domains require image acquisition, analysis and information extraction to be executed on embedded low-power, portable and autonomous devices, requiring novel architecture solutions to be studied and implemented. The papers in this session present three design cases targeting edge-cutting applications, exploiting heterogeneous embedded computing platforms.
暂无评论