The correlation between two signals (cross correlation) is a standard approach to feature detection. The normalized form of cross correlation (normalized correlation coefficient) is particularly used for template matc...
详细信息
ISBN:
(纸本)9780769530895
The correlation between two signals (cross correlation) is a standard approach to feature detection. The normalized form of cross correlation (normalized correlation coefficient) is particularly used for template matching. In this case, the two-dimensional correlation of images is considered. One of its biggest drawbacks is the need for a lot of computational power, especially when many correlation coefficients are computed. This paper presents a new method for a high performance thread- and data-parallel computation of normalized cross correlation in the spatial domain. It will be shown that a speedup of up to 5 can be achieved solely by a sophisticated programming of the SIMD unit of a standard microprocessor Furthermore, the new data-parallel implementation in the spatial domain can even outperform an (also data-parallel) frequency domain implementation.
Manual methods of measuring defects in roads show poor repeatability and reproducibility. Cracking is a principle indicator of defect progression in a road pavements, and the authors' overall objective is to devel...
详细信息
Manual methods of measuring defects in roads show poor repeatability and reproducibility. Cracking is a principle indicator of defect progression in a road pavements, and the authors' overall objective is to develop a practical automatic, repeatable, and reproducible method of determining the extent of cracking. Their research aims at using a distributed array of processors to achieve practical speeds for processing digitized images of road surfaces to detect cracks. The algorithms described here provide for two processes. The first converts a gray-scale image into a binary image that represents most of the cracks and eliminates most of the noise from the surface texture. This initial screening process might suffice for the bulk of a road having few cracks. The second process combines the crack fragments in the binary image into continous cracks and gives the highway engineer an appropriate output. The article includes results in which individual images were judged to contain cracks or not contain cracks by eight independent observers and by processing on the DAP to the end of the initial screening process. The authors have found that single images can be processed to the initial screening stage in the 40-millisecond limit for real-time processing provided by the British TV standard.
The fast compression of images is a requisite in many applications like TV production, teleconferencing, or digital cinema. Many of the algorithms employed in current image compression standards are inherently sequent...
详细信息
The fast compression of images is a requisite in many applications like TV production, teleconferencing, or digital cinema. Many of the algorithms employed in current image compression standards are inherently sequential. High performance implementations of such algorithms often require specialized hardware like field integrated gate arrays. Graphics processing Units (GPUs) do not commonly achieve high performance on these algorithms because they do not exhibit fine-grain parallelism. Our previous work introduced a new core algorithm for wavelet-based image coding systems. It is tailored for massive parallel architectures. It is called bitplane coding with parallel coefficient processing (BPC-PaCo). This paper introduces the first high performance, GPU-based implementation of BPC-PaCo. A detailed analysis of the algorithm aids its implementation in the GPU. The main insights behind the proposed codec are an efficient thread-to-data mapping, a smart memory management, and the use of efficient cooperation mechanisms to enable inter-thread communication. Experimental results indicate that the proposed implementation matches the requirements for high resolution (4 K) digital cinema in real time, yielding speedups of 30x with respect to the fastest implementations of current compression standards. Also, a power consumption evaluation shows that our implementation consumes 40x less energy for equivalent performance than state-of-the-art methods.
image matching based on image feature pixels involves heavily iterated computation and repeated memory access. In our previous work the detection of interesting points has been reported as an efficient pre-processing ...
详细信息
image matching based on image feature pixels involves heavily iterated computation and repeated memory access. In our previous work the detection of interesting points has been reported as an efficient pre-processing step to extract binary images for further matching in terms of certain distance measurement. This paper presents our extension to a parallel implementation of the matching scheme for object recognition on a low cost heterogeneous PVM (parallel virtual Machine) network. While most of the sequential execution time is spent on image feature extraction, distance transform and matching measurement, our investigation shows that a distributed memory multicomputer can best meet the high computational and memory access demands in imageprocessing. The performance is evaluated in terms of execution time. We conclude that parallelimageprocessing can be implemented on a general distributed system to achieve the speedup without specific hardware requirement.
imageprocessing applications are computing demanding and since a long time much attention has been paid to the use of parallelprocessing. Emerging distributed and Grid based architectures represent new and well suit...
详细信息
ISBN:
(纸本)3540288694
imageprocessing applications are computing demanding and since a long time much attention has been paid to the use of parallelprocessing. Emerging distributed and Grid based architectures represent new and well suited platforms that promise the availability of the required computational power. In this direction imageprocessing has to evolve to heterogeneous environments, and a crucial aspect is represented by the interoperability and reuse of available and high performance code. This paper describes our experience in the development of PIMA(GE)(2), parallelimageprocessing GEnoa server, obtained wrapping a library using the CORBA framework. Our aim is to obtain a high level of flexibility and dynamicity in the server architecture with a possible limited overhead. The design of a hierarchy of imageprocessing operation objects and the development of the server interface are discussed.
This paper describes the first use of a Network processing Unit (NPU) to perform hardware-based image composition in a distributed rendering system. The image composition step is a notorious bottleneck in a clustered ...
详细信息
This paper describes the first use of a Network processing Unit (NPU) to perform hardware-based image composition in a distributed rendering system. The image composition step is a notorious bottleneck in a clustered rendering system. Furthermore, image compositing algorithms do not necessarily scale as data size and number of nodes increase. Previous researchers have addressed the composition problem via software and/or custom-built hardware. We used the heterogeneous multicore computation architecture of the Intel IXP28XX NPU, a fully programmable commercial off-the-shelf (COTS) technology, to perform the image composition step. With this design, we have attained a nearly four-times performance increase over traditional software-based compositing methods, achieving sustained compositing rates of 22-28 fps on a 1, 024 x 1, 024 image. This system is fully scalable with a negligible penalty in frame rate, is entirely COTS, and is flexible with regard to operating system, rendering software, graphics cards, and node architecture. The NPU-based compositor has the additional advantage of being a modular compositing component that is eminently suitable for integration into existing distributed software visualization packages.
image compression continues to be an important field of research, the ability to quickly and accurately compress images is beneficial to many areas including high-speed imaging, space exploration, defense applications...
详细信息
ISBN:
(纸本)9781932415582
image compression continues to be an important field of research, the ability to quickly and accurately compress images is beneficial to many areas including high-speed imaging, space exploration, defense applications, and multimedia applications. This paper presents a proposed parallel implementation of Fractal image Compression on a grid of parallel computing elements. These processing nodes each perform the compression of a region of an image, and are arranged in a way to provide a completely parallel searching of the entire domain.
Clustering is a basic operation in imageprocessing and computer vision, and it plays an important role in unsupervised pattern recognition and image segmentation. While there are many methods for clustering, the sing...
详细信息
Clustering is a basic operation in imageprocessing and computer vision, and it plays an important role in unsupervised pattern recognition and image segmentation. While there are many methods for clustering, the single-link hierarchical clustering is one of the most popular techniques. In this paper, with the advantages of both optical transmission and electronic computation, we design efficient parallel hierarchical clustering algorithms on the arrays with reconfigurable optical buses (AROB). We first design three efficient basic operations which include the matrix multiplication of two N x N matrices, finding the minimum spanning tree of a graph with N vertices, and identifying the connected component containing a specified vertex. Based on these three data operations, an O(log N) time parallel hierarchical clustering algorithm is proposed using N-3 processors. Furthermore, if the connectivity of the AROB with four-port connection is allowed, two constant time clustering algorithms can be also derived using N-4 and N-3 processors, respectively. These results improve on previously known algorithms developed on various parallel computational models. (C) 2000 Academic Press.
In this paper, we introduce a new hierarchical interconnection network for massively parallel systems, named Fully Connected Cubic Network (FCCN). FCCN is able to emulate the popular Hypercube. FCCN has a constant nod...
详细信息
ISBN:
(纸本)0819442836
In this paper, we introduce a new hierarchical interconnection network for massively parallel systems, named Fully Connected Cubic Network (FCCN). FCCN is able to emulate the popular Hypercube. FCCN has a constant nodal degree of 4 and it therefore eliminates the problem of large fanout in Hypercube. Moreover, the constant degree is an important requirement for efficiently fabricating an architecture in parallelimageprocessing. FCCN is also a highly scalable architecture in that the existing links remain intact when new nodes are introduced. FCCN is maximally fault tolerant and it enjoys reasonably low diameter, growth of the number of links and average internodal distance. At last, FCCN is used for parallelimageprocessing system for interconnection. The computation results show that FCCN is a high efficient interconnection network for parallelimageprocessing.
Fourier inversion is an efficient method for image reconstruction in a variety of applications, for example, in computed tomography and magnetic resonance imaging. Fourier inversion normally consists of two steps, int...
详细信息
Fourier inversion is an efficient method for image reconstruction in a variety of applications, for example, in computed tomography and magnetic resonance imaging. Fourier inversion normally consists of two steps, interpolation of data onto a rectilinear grid, if necessary, and inverse Fourier transformation, This paper presents interpolation by the scan-line method, in which the interpolation algorithm is implemented in a form consisting only of row operations and data transposes, The two-dimensional inverse Fourier transformation can also be implemented with only row operations and data transposes, Accordingly, Fourier inversion can easily be implemented on a parallel computer that supports row operations and data transposes on row distributed data The conditions under which the scan-line implementations are algorithmically equivalent to the original serial computer implementation are described and methods for improving accuracy outside of those conditions are presented, The scan-line algorithm is implemented on the iWarp parallel computer using the Adapt language for parallelimageprocessing. This implementation is applied to magnetic resonance data acquired along radial-lines and spiral trajectories through Fourier transform space.
暂无评论