Development of FPGA-based, network-enabled embedded systems in Register Transfer Level hardware description languages is tedious. Despite the automation of this process with numerous EDA tools available, no well-estab...
详细信息
Development of FPGA-based, network-enabled embedded systems in Register Transfer Level hardware description languages is tedious. Despite the automation of this process with numerous EDA tools available, no well-established design patterns exist. Moreover, the entire production cycle requires appropriate theoretical background and hardware design intuition from the developer which discourages the software community. To improve productivity and minimize time to market when assembling such systems, we propose a new hardware/software co-design approach to building reconfigurable hardware web services. The proposed integrated development platform features a programmable FPGA board where computations of different nature and purpose are logically distributed among a sequential soft-core processor program, a massively parallel accelerator and an independent communication module that handles remote clients' requests. Our second contribution is a set of tools that make the development of the aforementioned services essentially a software design undertaking with the extensive use of high-level programming languages. The platform has been tuned to act as a flexible runtime environment for imageprocessing services, thus providing functionality of an intelligent camera. Two example services built from scratch according to the new methodology are discussed. Reduced development time and significant performance gain observed prove validity of the proposed approach and unveil a large potential of the assembled prototype.
In this paper, an autonomous, application-specific system architecture, called KYDON, is presented. The KYDON architecture consists of k layers of array processors. The lowest four layers compose the KYDON's low-l...
详细信息
In this paper, an autonomous, application-specific system architecture, called KYDON, is presented. The KYDON architecture consists of k layers of array processors. The lowest four layers compose the KYDON's low-level image-processing group, and the rest of the layers compose the higher-level processing groups. The interconnectivity at each array processor is based on a full hexagonal mesh structure. Each processing element (PE) of an array processor is a simple autonomous unit, including its own control unit (CU). This paper deals with the internal structural design of the PEs at the lower layers. It also provides a description of the low-level image-processing tasks performed by KYDON's lower-array processors. More specifically, the lowest layer is a 2-D photoarray, which captures images from the environment. The next three layers perform imageprocessing and generate the graphic forms of the objects extracted from the input image. The next layers process the graphs provided by the lower layers in order to achieve an image understanding of the input image. An important feature of KYDON is that KYDON does not have any host computer or control processor to handle I/O and other miscellaneous tasks.
In the last two decades, we have seen an amazing development of imageprocessing techniques targeted for medical applications. We propose multi-GPU-based parallel real-time algorithms for segmentation and shape-based ...
详细信息
In the last two decades, we have seen an amazing development of imageprocessing techniques targeted for medical applications. We propose multi-GPU-based parallel real-time algorithms for segmentation and shape-based object detection, aiming at accelerating two medical imageprocessing methods: automated blood detection in wireless capsule endoscopy (WCE) images and automated bright lesion detection in retinal fundus images. In the former method we identified segmentation and object detection as being responsible for consuming most of the global processing time. While in the latter, as segmentation was not used, shape-based object detection was the compute-intensive task identified. Experimental results show that the accelerated method running on multi-GPU systems for blood detection in WCE images is on average 265 times faster than the original CPU version and is able to process 344 frames per second. By applying the multi-GPU framework for bright lesion detection in fundus images we are able to process 62 frames per second with a speedup average 667 times faster than the equivalent CPU version.
Active contour models (snakes) are commonly used for locating the boundary of an object in computer vision applications. The minimisation procedure is the key problem to solve in the technique of active contour models...
详细信息
Active contour models (snakes) are commonly used for locating the boundary of an object in computer vision applications. The minimisation procedure is the key problem to solve in the technique of active contour models. In this paper, a minimisation method for an active contour model using Hopfield networks is proposed. Due to its network structure, it lends itself admirably to parallel implementation and is potentially faster than conventional methods. In addition, it retains the stability of the snake model and the possibility for inclusion of hard constraints. Experimental results are given to demonstrate the feasibility of the proposed method in applications of industrial pattern recognition and medical imageprocessing.
Feature extraction is an important vision task in many applications like simultaneous localization and mapping (SLAM). In the recent computing systems, FPGA-based acceleration have presented a strong competition to GP...
详细信息
Feature extraction is an important vision task in many applications like simultaneous localization and mapping (SLAM). In the recent computing systems, FPGA-based acceleration have presented a strong competition to GPU-based acceleration due to its high computation capabilities and lower energy consumption. In this paper, we present a high-level synthesis implementation on a SoC-FPGA of a feature extraction algorithm dedicated for SLAM applications. We choose HOOFR extraction algorithm which provides a robust performance but requires a significant computation on embedded CPU. Our system is dedicated for SLAM applications so that we also integrated bucketing detection method in order to have a homogeneous distribution of keypoints in the image. Moreover, instead of optimizing performance by simplifying the original algorithm as in many other researches, we respected the complexity of HOOFR extractor and have parallelized the processing operations. The design has been validated on an Intel Arria 10 SoC-FPGA with a throughput of 54 fps at 1226 x 370 pixels (handling 1750 features) or 14 fps at 1920 x 1080 pixels (handling 6929 features).
The main result of this paper shows that the block-based digital medial axis transform can be computed in parallel by a constant number of calls to scan (parallel prefix) operations. This gives time- and/or work-optim...
详细信息
The main result of this paper shows that the block-based digital medial axis transform can be computed in parallel by a constant number of calls to scan (parallel prefix) operations. This gives time- and/or work-optimal parallel implementations for the distance-based and the block-based medial axis transform in a wide variety of parallel architectures. Since only eight scan operations plus a dozen local operations are performed, the algorithm is very easy to program and use. The originality of our approach is the use of the notion of a derived grid and the oversampling of the image in order to reduce the computation of the block-based medial axis transform in the original grid to the much easier task of computing the distance based medial axis transform of the oversampling of the image on the derived grid.
Detection of spatial symmetry is useful in several computer vision applications. Due to the real-time nature of the applications, it is important that symmetry detection algorithms be computationally efficient. Sequen...
详细信息
Detection of spatial symmetry is useful in several computer vision applications. Due to the real-time nature of the applications, it is important that symmetry detection algorithms be computationally efficient. Sequential algorithms for finding various types of planar symmetries in images are CPU-intensive, prompting us to look for fast parallel implementations. In this paper, we propose parallel algorithms for symmetry detection, and report an implementation on a distributed computing environment consisting of a network of Sun workstations. Experiments revealed close-to-linear speedup on a number of test images which we considered.
A new approach for using the Hough transform to detect line segments is presented. This approach is efficient in both space and time. Strategies combining the features of the intersection point [Ben-Tzvi, Leavers and ...
详细信息
A new approach for using the Hough transform to detect line segments is presented. This approach is efficient in both space and time. Strategies combining the features of the intersection point [Ben-Tzvi, Leavers and Sandler, Proc. 5th Intl. Conf. image Anal. 152-159 (1990);Xu, Oja and Kultannen, Pattern Recognition Lett. 11, 331-338 (1990)] and dual plane [Conker, Comput. Vis. Graphics image Process. 43, 115-132 (1988)] methods are used to calculate the Hough transform. A dense set of small overlapping windows are used to restrict the pairs of image pixels that are evaluated. Experimental results indicate that this method reduces the time and space requirements significantly.
A new, to our knowledge, subtracted joint transform correlator (SJTC) is proposed that has no digital processing in a computer. All processing for obtaining correlation signals between an object and multiple reference...
详细信息
A new, to our knowledge, subtracted joint transform correlator (SJTC) is proposed that has no digital processing in a computer. All processing for obtaining correlation signals between an object and multiple reference patterns were treated optically by use of a joint transform correlator with a holographic interferometer similar to the Mach-Zehnder one. The joint power spectrum of the reference patterns was subtracted from that of the input image (the object pattern plus the reference patterns), and the spurious correlation signals between the different reference patterns were removed. Because of the optical parallel computations of the Fourier spectra and the subtraction, the real-time SJTC is possible to achieve by use of only an optical system. An experimental arrangement of the system and system performances of the shift-invariant characteristics and discriminability are described. The results show the good performance of this system. (C) 1997 Optical Society of America.
In this paper we use and extend a parallel optoelectronic processor for image preprocessing and implement software tools for testing and evaluating the presented algorithms. After briefly introducing the processor and...
详细信息
In this paper we use and extend a parallel optoelectronic processor for image preprocessing and implement software tools for testing and evaluating the presented algorithms. After briefly introducing the processor and showing how images can be stored in it, we adapt a number of local image preprocessing algorithms for smoothing, edge detection, and corner detection, such that they can be executed on the processor in parallel. These algorithms are performed on all pixels of the input image in parallel and, as a result, in steps independent of its dimensions. We also develop a compiler and a simulator for evaluating and verifying the correctness of our implementations. (C) 2015 Elsevier Ltd. All rights reserved.
暂无评论