The deblocking filter (DF) reduces blocking artifacts in encoded video sequences, and thereby significantly improves the subjective and objective quality of videos. Statistics show that the DF accounts for 5-18% of th...
详细信息
The deblocking filter (DF) reduces blocking artifacts in encoded video sequences, and thereby significantly improves the subjective and objective quality of videos. Statistics show that the DF accounts for 5-18% of the total decoding time in high-efficiency video coding. Therefore, speeding up the DF will improve codec performance, especially for the decoder. In view of the rapid development of multicore technology, we propose a parallel DF scheme based on a modified order of accessing the coding tree units (CTUs) by analyzing the data dependencies between adjacent CTUs. This enables the DF to run in parallel, providing accelerated performance and more flexibility in the degree of parallelism, as well as finer parallel granularity. We additionally solve the problems of variable privatization and thread synchronization in the parallelization of the DF. Finally, the DF module is parallelized based on the HM16.1 reference software using OpenMP technology. The acceleration performance is experimentally tested under various numbers of cores, and the results show that the proposed scheme is very effective at speeding up the DF.
The BSP model (Bulk Synchronous parallel) simplifies the construction and evaluation of parallel algorithms, with its simplified synchronization structure and cost model. Nevertheless, imperative BSP programs can suff...
详细信息
The BSP model (Bulk Synchronous parallel) simplifies the construction and evaluation of parallel algorithms, with its simplified synchronization structure and cost model. Nevertheless, imperative BSP programs can suffer from synchronization errors. Programs with textually aligned barriers are free from such errors, and this structure eases program comprehension. We propose a simplified formalization of barrier inference as data flow analysis, which verifies statically whether an imperative BSP program has replicated synchronization , which is a sufficient condition for textual barrier alignment.
Most of industrial induction motors currently used employ simple winding patterns, which commonly are designed to fulfil the fundamental magnetizing flux and torque requirements, disregarding the spatial harmonic cont...
详细信息
Most of industrial induction motors currently used employ simple winding patterns, which commonly are designed to fulfil the fundamental magnetizing flux and torque requirements, disregarding the spatial harmonic content of the air-gap magnetomotive force (MMF). However, it is well known that the lower-order MMF spatial harmonics have a negative impact on the motor efficiency, vibration, noise, and torque production. The use of different turns per coil in the winding design is a possible solution to mitigate the problem. In this paper, a novel winding optimizing algorithm is fully described. The air-gap is modelled as a linear function of the current-sheet created by the conductors in the slots. Several winding patterns with different poles for stators with different slots are optimized, and the turns per coil pattern is presented in tables for single and double layer windings with optimal coil pitch shortening. These tables can be used, as reference, in winding design projects. An application example of winding optimization is also presented.
Near-duplicate document detection attracts much attention from researchers since the growth of documents production is very high. The main problem confronted while looking for duplicate or near-duplicate document dete...
详细信息
Near-duplicate document detection attracts much attention from researchers since the growth of documents production is very high. The main problem confronted while looking for duplicate or near-duplicate document detection is a very high dimensional data which increases the time and space requirements for processing the data. With the trend of production of new documents, the system to detect similarity among documents becomes almost impracticable. We are proposing a new approach for solving this problem which consists in reducing the dimensionality of data and also use efficiently parallel programming to fully maximize the available capacity of the hardware. The intuition we have by using parallel programming is that more processors/core will perform better than only one processor if their management is well done. We have implemented our method and tested it empirically and experimental results have demonstrated that our algorithm performs better than other methods used for All Pairs Similarity Search (APSS) which employ multi-core and multi-programming to deduct the similarity of the documents. The results show that our method can reduce up to 65% terms to be used in similarity computation and its execution time is better than Partition-based Similarity Search method which uses parallel processing for document similarity.
With the data growth, the need to parallelize treatments become crucial in numerous do-mains. But for non-specialists it is still difficult to tackle parallelism technicalities as data distribution, communications or ...
详细信息
With the data growth, the need to parallelize treatments become crucial in numerous do-mains. But for non-specialists it is still difficult to tackle parallelism technicalities as data distribution, communications or load balancing. For the geoscience domain we propose a solution based on implicit parallel patterns. These patterns are abstract models for a class of algorithms which can be customized and automatically transformed in a parallel execution. In this paper, we describe a pattern for stencil computation and a novel pattern dealing with computation following a pre-defined order. They are particularly used in geosciences and we illustrate them with the flow direction and the flow accumulation computations.
Divide-and-conquer is a common parallel programming skeleton supported by many cross-platform multithreaded libraries, and most commonly used by programmers for parallelization. The challenges of producing (manually o...
详细信息
GPU programming models enable and encourage massively parallel programming with over a million threads, requiring extreme parallelism to achieve good performance. Massive parallelism brings significant correctness cha...
详细信息
The long-term behavior of dynamical system is usually analyzed by means of basins of attraction (BOA) and most often, in particular, with cell mapping methods that ensure a straightforward technique of approximation. ...
详细信息
The long-term behavior of dynamical system is usually analyzed by means of basins of attraction (BOA) and most often, in particular, with cell mapping methods that ensure a straightforward technique of approximation. Unfortunately, the construction of BOA requires large resources, especially for higher-dimensional systems, both in terms of computational time and memory space. In this paper, the implementation of cell mapping methods toward a distributed computing is undertaken;a new efficient parallel algorithm for the computation of large-scale BOA is presented herein, also by addressing issues arising from the inner seriality related to the BOA construction. A cell mapping core is thus wrapped in a management shell, and in charge of the core administration, it permits to split over a multicore environment the computing domain, by carrying out an efficient use of the distributed memory. The proposed approach makes use of a double-step algorithm in order to generate, first, the multidimensional BOA of the system and then to evaluate arbitrary 2D Poincar, sections of the hypercube that stores the information. An analysis on a test system is performed by considering different dimensional grids;the effort of a parallel implementation toward medium and large clusters is balanced by a great results in terms of computational speed. The performances are strictly affected not only by the number of cores used to run the code, but in particular in the way they are instructed. To get the best from an implementation on a massive parallel architecture, the processes must be properly balanced between memory operations and numerical integrations. A significant improvement in the elaboration time for a large computing domain is shown, and a comparison with a serial code demonstrates the great potential of the application;the advantages given by the use of parallel reading/writing are also discussed with respect to the BOA grid dimension.
We design an invariance proof method for concurrent programs parameterised by a weak consistency model. The calculational design of the invariance proof method is by abstract interpretation of a truly parallel analyti...
详细信息
Bulk synchronous parallelism (BSP) offers an abstract and simple model of parallelism yet allows to take realistically into account the communication costs of parallel algorithms. BSP has been used in many application...
详细信息
Bulk synchronous parallelism (BSP) offers an abstract and simple model of parallelism yet allows to take realistically into account the communication costs of parallel algorithms. BSP has been used in many application domains. BSPlib and its variants are programming libraries for the C language that support the BSP style. Bulk Synchronous parallel ML (BSML) is a library for BSP programming with the functional language OCaml. It offers parallel operations on a data structure named parallel vector. BSML provides a global view of programs, i.e. BSML programs can be seen as sequential programs working on a parallel data structure (seq of par) while a BSPlib program is written in the SPMD style and understood as a parallel composition of communicating sequential programs (par of seq). The communication styles of BSML and BSPlib are also quite different. The contribution of this paper is a BSPlib-style communication API implemented on top of BSML. It has been designed without extending BSML, but only using the imperative features of the underlying functional language OCaml. Programs implemented using this API are syntactically very close to programs implemented using a BSPlib library for the C language. It therefore shows that BSML is universal for the BSP model.
暂无评论