Dynamic networks are characterized by frequent topology changes due to the unpredictable appearance and disappearance of mobile devices and/or communication links. In this paper, we propose a correct-by-construction a...
详细信息
Thinning is an important task in many image processing applications, including remote sensing, photogrammetry, optical character recognition, and medical imaging. In this study, we compare the performance of thinning ...
详细信息
Thinning is an important task in many image processing applications, including remote sensing, photogrammetry, optical character recognition, and medical imaging. In this study, we compare the performance of thinning algorithms on parallel hardware. Grayscale thinning involves a substantial amount of computation per pixel, and may be accelerated in several ways: algorithmic improvements, code optimization, and parallelization. We describe an algorithmic improvement that speeds up grayscale thinning several-fold, and demonstrate scalable acceleration from multi-core CPU concurrency libraries (such as OpenMP), coprocessor hardware (such as the Xeon Phi), and GPUs (such as CUDA-enabled NVIDIA graphics cards). GPU processing appears to offer the most cost-effective approach for high performance grayscale thinning applications.
Tag anticollision is critical to the performance of many radio frequency identification systems in industrial applications. Tree-based schemes are popular arbitration algorithms for tag collision, as they are scalable...
详细信息
Tag anticollision is critical to the performance of many radio frequency identification systems in industrial applications. Tree-based schemes are popular arbitration algorithms for tag collision, as they are scalable and easy to implement. In these schemes, collided tags are recursively divided into two groups until each group contains only one tag. However, we have determined that previous pure tree-based schemes do not fully utilize the information that can be collected;they allow only the responding tags to be involved in the splitting process. In this paper, we propose a novel pure tree-based method that we named parallel splitting with retrieve (PSR), which allows both the nonresponding and unidentified tags to also be involved in the splitting process. PSR consists of two mechanisms: parallel splitting (PS) and retrieve. The PS mechanism can reduce the total number of tag collisions and the retrieve mechanism can reduce the extra idle slots caused by PS. In comparison with conventional pure tree-based schemes, the proposed PSR algorithm achieves higher efficiency without the need to estimate the number of tags;simulation results indicate an efficiency level of 0.41. The PSR algorithm also has similar performance to state-of-the-art algorithms that do not need to estimate the number of tags, but it is simpler to implement and is thus preferred for resource constrained systems.
Fluid animation often appears in applications such as games, films and cartoons. How to animate photo-realistic fluid motion efficiently is an important issue. We present an efficient parallel method for photo-realist...
详细信息
The Hamming code is a well-known error correction code and can correct a single error in an input vector of size n bits by adding log n parity checks. A new parallel implementation of the code is presented, using a hi...
详细信息
The Hamming code is a well-known error correction code and can correct a single error in an input vector of size n bits by adding log n parity checks. A new parallel implementation of the code is presented, using a hierarchical structure of n processors in log n layers. All the processors perform similar simple tasks, and need only a few bytes of internal memory.
A parallel algorithm for prefix computation on N data elements mapped on a Multi Mesh (MM) network of N = n(4) processing elements is presented here. The time required by the proposed algorithm is significantly less t...
详细信息
A parallel algorithm for prefix computation on N data elements mapped on a Multi Mesh (MM) network of N = n(4) processing elements is presented here. The time required by the proposed algorithm is significantly less than that by any of the existing algorithms for prefix computation on mesh-like architectures due to the specific interconnection pattern used in the MM network. The proposed technique requires O(N-1/4) time for data communication and O(logN(1/4)) time for computation, when mapped on a MM network constituted by N-1/2 meshes, each of size N-1/4 x N-1/4. The data communication time in the proposed algorithm is less than the prefix sum algorithm proposed in extended Multi Mesh. To be precise, instead of (13N(1/4) - 5) tau communication time the proposed algorithm requires a data communication time of 7.5N(1/4) t only. Moreover, the proposed parallel algorithm does not need any extra inter block links as used in the extended Multi Mesh.
In this paper, we focus on applying parallel processing techniques to HEVC encoder in order to significantly reduce the computational power requirements without disturbing its coding efficiency. So, we propose several...
详细信息
In this paper, we focus on applying parallel processing techniques to HEVC encoder in order to significantly reduce the computational power requirements without disturbing its coding efficiency. So, we propose several, synchronous and asynchronous, parallelization approaches working at a coarse grain parallelization level, based on the Group Of Pictures (GOP), which we call GOP-based level. GOP -based approaches encode simultaneously several groups of consecutive frames. Depending on how these GOPs are conformed and distributed it is critical to obtain good parallel performance. The results show that near ideal efficiencies are obtained using up to 10 cores. Furthermore, when the computational load is unbalanced, the asynchronous versions outperform the synchronous ones. The parallel algorithms developed in this work support all standard coding modes proposed by the reference software. (C) 2016 Civil-Comp Ltd. and Elseider Ltd. All rights reserved.
Although the resultant elimination method can get all the possible solutions for the selective harmonic elimination (SHE) problem without the selection of initial values, it still has some fatal shortcomings, such as ...
详细信息
Although the resultant elimination method can get all the possible solutions for the selective harmonic elimination (SHE) problem without the selection of initial values, it still has some fatal shortcomings, such as the high computation burden and the huge memory consumption caused by the intermediate expression swell in the procedure of computing the symbolic determinant of the Sylvester matrix. On the basis of the principle of polynomial interpolation, an algorithm framework is proposed to compute the resultant polynomials, which contains the following two major steps: the evaluation of numerical interpolation points and the solution of linear equations. This approach avoids symbolic computing whose computation complexity is usually very high, furthermore, both of these two steps are suitable for parallel implementing which can speed up the computing tremendously. By using the extended n-dimensional Bjorck-Pereyra's algorithm, this algorithm framework is implemented on a parallel computing system, and it has been used to solve the SHE equations for two-level, three-level, and multilevel inverters. As all the possible solutions can be found by this algorithm, the optimal solutions which have the lowest total harmonic distortion can be identified. Experiment results verify the correctness and effectiveness of the proposed method.
In [18], a membrane parallel theoretical framework for computing (co) homology information of foreground or background of binary digital images is developed. Starting from this work, we progress here in two senses: (a...
详细信息
In [18], a membrane parallel theoretical framework for computing (co) homology information of foreground or background of binary digital images is developed. Starting from this work, we progress here in two senses: (a) providing advanced topological information, such as (co) homology torsion and efficiently answering to any decision or classification problem for sum of k-xels related to be a (co) cycle or a (co) boundary;(b) optimizing the previous framework to be implemented in using GPGPU computing. Discrete Morse theory, Effective Homology Theory and parallel computing techniques are suitably combined for obtaining a homological encoding, called algebraic minimal model, of a Region-Of-Interest (seen as cubical complex) of a presegmented k-D digital image. (C) 2016 Elsevier B.V. All rights reserved.
Frequent sequence mining is well known and well studied problem in datamining. The output of the algorithm is used in many other areas like bioinformatics, chemistry, and market basket analysis. Unfortunately, the fre...
详细信息
Frequent sequence mining is well known and well studied problem in datamining. The output of the algorithm is used in many other areas like bioinformatics, chemistry, and market basket analysis. Unfortunately, the frequent sequence mining is computationally quite expensive. In this paper, we present a novel parallel algorithm for mining of frequent sequences based on a static load-balancing. The static load-balancing is done by measuring the computational time using a probabilistic algorithm. For reasonable size of instance, the algorithms achieve speedups up to approximate to 3/4 . P where P is the number of processors. In the experimental evaluation, we show that our method performs significantly better then the current state-of-the-art methods. The presented approach is very universal: it can be used for static load-balancing of other pattern mining algorithms such as itemset/tree/graph mining algorithms.
暂无评论