AIAC algorithms (Asynchronous Iterations Asynchronous Communications) are a particular class of parallel iterative algorithms. their asynchronous nature makes them more efficient than their synchronous counterparts in...
详细信息
AIAC algorithms (Asynchronous Iterations Asynchronous Communications) are a particular class of parallel iterative algorithms. their asynchronous nature makes them more efficient than their synchronous counterparts in numerous cases as has already been shown in previous works. the first goal of this article is to compare several parallel programming environments in order to see if there is one of them which is best suited to efficiently implement AIAC algorithms. the main criterion for this comparison consists in the performances achieved in a global context of grid computing for two classical scientific problems. Nevertheless, we also take into account two secondary criteria which are the ease of programming and the ease of deployment. the second goal of this study is to extract from this comparison the important features that a parallel programming environment must have in order to be suited for the implementation of AIAC algorithms.
Massively parallel processor array architectures can be used as hardware accelerators for a plenty of dataflow dominant applications. Bilateral filtering is an example of a state-of-the-art algorithm in medical imagin...
详细信息
ISBN:
(纸本)0769526829
Massively parallel processor array architectures can be used as hardware accelerators for a plenty of dataflow dominant applications. Bilateral filtering is an example of a state-of-the-art algorithm in medical imaging, which falls in the class of 2D adaptive filter algorithms. In this paper we propose a semi-automatic mapping methodology for the generation of hardware accelerators for such a generic class of adaptive filtering applications in image processing. the final architecture deliver similar synthesis results as a hand-tuned design.
Embedded computing architectures can be designed to meet a variety of application specific requirements. However, optimized hardware can require compiler support to realize the potential of the hardware. this is espec...
详细信息
ISBN:
(纸本)0769526373
Embedded computing architectures can be designed to meet a variety of application specific requirements. However, optimized hardware can require compiler support to realize the potential of the hardware. this is especially true for embedded image processing systems where significant architectural variation is possible, and targeted software can change drastically based on architectural variation. this paper presents methods to compile a single high-level source given a fundamental variation in data-parallel target architectures processor granularity ranging from a single processor to a massively parallel processor array. the approach uses single PPE virtualization, which supports pixel-level data-parallel expressions that operate on a virtual one pixel per processing element (PPE) network and applies pixel-locating transformations to retarget the code into a given target PPE. Unlike mainstream parallel computing techniques, this technique can be applied to lightweight SIMD targets that do not provide global communication hardware or shared memory.
this paper describes an architecture dedicated to the real-time processing of census correlation in the context of the realization of passive stereovision sensors. Although DSP circuits have dramatically increased the...
详细信息
ISBN:
(纸本)9781424403127
this paper describes an architecture dedicated to the real-time processing of census correlation in the context of the realization of passive stereovision sensors. Although DSP circuits have dramatically increased their performances in terms of frequency (about 600 MHz today), DSP cores (several Multipliers Accumulators) and pipelines (Super Harvard architectures for example), FPGA circuits remain the best way to design massive parallelarchitectures when ultra fast algorithms computation are needed like it is the case in real time vision systems for collision avoidance.
We present the first parallel algorithm for building a Hausdorff Voronoi diagram (HVD). Our algorithm is targeted towards cluster computing architectures and computes the Hausdorff Voronoi diagram for non-crossing obj...
详细信息
ISBN:
(纸本)0769526365
We present the first parallel algorithm for building a Hausdorff Voronoi diagram (HVD). Our algorithm is targeted towards cluster computing architectures and computes the Hausdorff Voronoi diagram for non-crossing objects in time O(nlog(4)n/p)for input size n and p processors. In addition, our parallel algorithm also implies a new sequential HVD algorithm that constructs HVDs for noncrossing objects in time O(n log(4) n). this improves on previous sequential results and solves an open problem posed by Papadopoulou and Lee [18].
A wavelet-based parallel implementation is presented for image encoding on a multi-DSP system. the implementation is utilizing the discrete wavelet transform (DWT) and is realized in parallel processor architecture. T...
详细信息
ISBN:
(纸本)9780780397361
A wavelet-based parallel implementation is presented for image encoding on a multi-DSP system. the implementation is utilizing the discrete wavelet transform (DWT) and is realized in parallel processor architecture. the implementation has a very flexible architecture, which allows addition of extra slave processors (SPs) to the system whenever more computational power is needed. Performance of the implementation is measured and compared to a sequential reference implementation. Experimental results show that the parallel implementation is very efficient and overpowers the sequential counterpart considerably.
Methods to accurately measure Phase-locked loop lock time in multisite production environment has been presented and explained. the methods are applicable for testing transceiver frequency settling times, and frequenc...
详细信息
ISBN:
(纸本)9780780397361
Methods to accurately measure Phase-locked loop lock time in multisite production environment has been presented and explained. the methods are applicable for testing transceiver frequency settling times, and frequency and phase errors after settling for multiple devices under test in parallel using on board frequency mixers and RF signal generators or using RF receivers of automated testers. Inverse FFT was used to measure the PLL lock time in a case when PLL frequency error exists.
A motion panorama is an efficient and compact representation of the underlying video. However, the motion panorama construction process is computationally intensive and hence extremely time consuming. Addressing this ...
详细信息
ISBN:
(纸本)0769526373
A motion panorama is an efficient and compact representation of the underlying video. However, the motion panorama construction process is computationally intensive and hence extremely time consuming. Addressing this issue is crucial when one considers using motion panoramas in a real-time environment such as live video transmission. We present two parallelalgorithms for motion panorama construction, namely, the shared memory parallel algorithm (SMPA) that uses POSIX threads and the distributed memory parallel algorithm (DMPA) that uses MPI. the parallelalgorithms are tested on real videos. Experimental results show that the SMPA achieves linear speedup in most cases whereas the DMPA suffers from reduced efficiency when the number of processors exceeds 8.
this paper discusses fast parallelalgorithms for evaluating several centrality indices frequently used in complex network analysis. these algorithms have been optimized to exploit properties typically observed in rea...
详细信息
ISBN:
(纸本)0769526365
this paper discusses fast parallelalgorithms for evaluating several centrality indices frequently used in complex network analysis. these algorithms have been optimized to exploit properties typically observed in real-world large scale networks, such as the low average distance, high local density, and heavy-tailed power law degree distributions. We test our implementations on real datasets such as the web graph, protein-interaction networks, movie-actor and citation networks, and report impressive parallel performance for evaluation of the computationally intensive centrality metrics (betweenness and closeness centrality) on high-end shared memory symmetric multiprocessor and multithreaded architectures. To our knowledge, these are the first parallel implementations of these widely-used social network analysis metrics. We demonstrate that it is possible to rigorously analyze networks three orders of magnitude larger than instances that can be handled by existing network analysis (SNA) software packages. For instance, we compute the exact betweenness centrality value for each vertex in a large US patent citation network (3 million patents, 16 million citations) in 42 minutes on 16 processors, utilizing 20GB RAM of the IBM p5 570. Current SNA packages on the other hand cannot handle graphs with more than hundred thousand edges.
Pointing to dimension limit of serial feature fusion method,and quantity limit of parallel complex vector feature fusion method,an evolution of parallel vector feature fusion method based on quater
ISBN:
(纸本)0780397371
Pointing to dimension limit of serial feature fusion method,and quantity limit of parallel complex vector feature fusion method,an evolution of parallel vector feature fusion method based on quater
暂无评论