检索结果-内蒙古大学图书馆

parallel Deblocking Filter Based on Modified Order of Accessing the Coding Tree Units for HEVC on Multicore Processor

引用

KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS 2017年第3期11卷 1684-1699页

作者： Lei, Haiwei Liu, Wenyi Wang, Anhong North Univ China Key Lab Instrumentat Sci & Dynam Measurement Minist Educ Taiyuan 030051 Peoples R China Taiyuan Univ Sci & Technol Sch Elect Informat Engn Taiyuan 030024 Peoples R China

The deblocking filter (DF) reduces blocking artifacts in encoded video sequences, and thereby significantly improves the subjective and objective quality of videos. Statistics show that the DF accounts for 5-18% of the total decoding time in high-efficiency video coding. Therefore, speeding up the DF will improve codec performance, especially for the decoder. In view of the rapid development of multicore technology, we propose a parallel DF scheme based on a modified order of accessing the coding tree units (CTUs) by analyzing the data dependencies between adjacent CTUs. This enables the DF to run in parallel, providing accelerated performance and more flexibility in the degree of parallelism, as well as finer parallel granularity. We additionally solve the problems of variable privatization and thread synchronization in the parallelization of the DF. Finally, the DF module is parallelized based on the HM16.1 reference software using OpenMP technology. The acceleration performance is experimentally tested under various numbers of cores, and the results show that the proposed scheme is very effective at speeding up the DF.

关键词： Deblocking filter parallel programming multicore processor high-efficiency video coding (HEVC)

来源：评论

学校读者我要写书评

暂无评论

Replicated Synchronization for Imperative BSP Programs

引用

Procedia Computer Science 2017年 108卷 535-544页

作者： Arvid Jakobsson Frédéric Dabrowski Wadoud Bousdira Frédéric Loulergue Gaetan Hains Huawei Technologies France Research Center Univ. Orléans INSA Centre Val de Loire LIFO EA 4022 Orléans France School of Informatics Computing and Cyber Systems Northern Arizona University USA

The BSP model (Bulk Synchronous parallel) simplifies the construction and evaluation of parallel algorithms, with its simplified synchronization structure and cost model. Nevertheless, imperative BSP programs can suffer from synchronization errors. Programs with textually aligned barriers are free from such errors, and this structure eases program comprehension. We propose a simplified formalization of barrier inference as data flow analysis, which verifies statically whether an imperative BSP program has replicated synchronization , which is a sufficient condition for textual barrier alignment.

关键词： parallel programming bulk synchronous parallelism static analysis barrier inference

来源：评论

学校读者我要写书评

暂无评论

Novel Method to Minimize the Air-Gap MMF Spatial Harmonic Content in Three-Phase Windings

Novel Method to Minimize the Air-Gap MMF Spatial Harmonic Co...

引用

International Conference on Electrical Machines

作者： Andre M. Silva Fernando J. T. E. Ferreira Gabriel Falcao Manuel Rodrigues Department of Electrical and Computer Engineering University of Coimbra Coimbra Portugal

Most of industrial induction motors currently used employ simple winding patterns, which commonly are designed to fulfil the fundamental magnetizing flux and torque requirements, disregarding the spatial harmonic content of the air-gap magnetomotive force (MMF). However, it is well known that the lower-order MMF spatial harmonics have a negative impact on the motor efficiency, vibration, noise, and torque production. The use of different turns per coil in the winding design is a possible solution to mitigate the problem. In this paper, a novel winding optimizing algorithm is fully described. The air-gap is modelled as a linear function of the current-sheet created by the conductors in the slots. Several winding patterns with different poles for stators with different slots are optimized, and the turns per coil pattern is presented in tables for single and double layer windings with optimal coil pitch shortening. These tables can be used, as reference, in winding design projects. An application example of winding optimization is also presented.

关键词： Stator Winding design Winding optimization Magnetomotive force Spatial harmonic content Parasitic torque parallel programming

来源：评论

学校读者我要写书评

暂无评论

Efficient parallel Method for Documents Similarity in a Large Dataset

引用

电脑学刊 2017年第3期28卷 251-264页

作者： Niyigena Papias Zuping Zhang

Near-duplicate document detection attracts much attention from researchers since the growth of documents production is very high. The main problem confronted while looking for duplicate or near-duplicate document detection is a very high dimensional data which increases the time and space requirements for processing the data. With the trend of production of new documents, the system to detect similarity among documents becomes almost impracticable. We are proposing a new approach for solving this problem which consists in reducing the dimensionality of data and also use efficiently parallel programming to fully maximize the available capacity of the hardware. The intuition we have by using parallel programming is that more processors/core will perform better than only one processor if their management is well done. We have implemented our method and tested it empirically and experimental results have demonstrated that our algorithm performs better than other methods used for All Pairs Similarity Search (APSS) which employ multi-core and multi-programming to deduct the similarity of the documents. The results show that our method can reduce up to 65% terms to be used in similarity computation and its execution time is better than Partition-based Similarity Search method which uses parallel processing for document similarity.

关键词： dimensionality reduction document similarity algorithm pairwise document similarity parallel programming query likelihood

来源：评论

学校读者我要写书评

暂无评论

Efficient Implicit parallel Patterns for Geographic Information System

引用

Procedia Computer Science 2017年 108卷 545-554页

作者： Kevin Bourgeois Sophie Robert Sébastien Limet Victor Essayan Univ.Orléans INSA Centre Val de Loire LIFO EA4022 Orléans France 2 Géo-Hyd (Antea Group) Olivet France

With the data growth, the need to parallelize treatments become crucial in numerous do-mains. But for non-specialists it is still difficult to tackle parallelism technicalities as data distribution, communications or load balancing. For the geoscience domain we propose a solution based on implicit parallel patterns. These patterns are abstract models for a class of algorithms which can be customized and automatically transformed in a parallel execution. In this paper, we describe a pattern for stencil computation and a novel pattern dealing with computation following a pre-defined order. They are particularly used in geosciences and we illustrate them with the flow direction and the flow accumulation computations.

关键词： parallel programming Implicit parallelism Performance GIS

来源：评论

学校读者我要写书评

暂无评论

Synthesis of divide and conquer parallelism for loops

引用

ACM SIGPLAN Notices 2017年第6期52卷 540-555页

作者： Farzan, Azadeh Nicolet, Victor University of Toronto Canada

Divide-and-conquer is a common parallel programming skeleton supported by many cross-platform multithreaded libraries, and most commonly used by programmers for parallelization. The challenges of producing (manually or automatically) a correct divide-and-conquer parallel program from a given sequential code are two-fold: (1) assuming that a good solution exists where individual worker threads execute a code identical to the sequential one, the programmer has to provide the extra code for dividing the tasks and combining the partial results (i.e. joins), and (2) the sequential code may not be suitable for divide-and-conquer parallelization as is, and may need to be modified to become a part of a good solution. We address both challenges in this paper. We present an automated synthesis technique to synthesize correct joins and an algorithm for modifying the sequential code to make it suitable for parallelization when necessary. This paper focuses on class of loops that traverse a read-only collection and compute a scalar function over that collection. We present theoretical results for when the necessary modifications to sequential code are possible, theoretical guarantees for the algorithmic solutions presented here, and experimental evaluation of the approach's success in practice and the quality of the produced parallel programs. © 2017 ACM.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

BARRACUDA: Binary-level analysis of runtime RAces in CUDA programs

引用

ACM SIGPLAN Notices 2017年第6期52卷 126-140页

作者： Eizenberg, Ariel Peng, Yuanfeng Pigli, Toma Mansky, William Devietti, Joseph University of Pennsylvania United States Princeton University United States

GPU programming models enable and encourage massively parallel programming with over a million threads, requiring extreme parallelism to achieve good performance. Massive parallelism brings significant correctness challenges by increasing the possibility for bugs as the number of thread interleavings balloons. Conventional dynamic safety analyses struggle to run at this scale. We present BARRACUDA, a concurrency bug detector for GPU programs written in Nvidia's CUDA language. BARRACUDA handles a wider range of parallelism constructs than previous work, including branch operations, low-level atomics and memory fences, which allows BARRACUDA to detect new classes of concurrency bugs. BARRACUDA operates at the binary level for increased compatibility with existing code, leveraging a new binary instrumentation framework that is extensible to other dynamic analyses. BARRACUDA incorporates a number of novel optimizations that are crucial for scaling concurrency bug detection to over a million threads. © 2017 Owner/Author.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

An efficient parallel implementation of cell mapping methods for MDOF systems

引用

NONLINEAR DYNAMICS 2016年第4期86卷 2279-2290页

作者： Belardinelli, Pierpaolo Lenci, Stefano Polytech Univ Marche DICEA Ancona Italy

The long-term behavior of dynamical system is usually analyzed by means of basins of attraction (BOA) and most often, in particular, with cell mapping methods that ensure a straightforward technique of approximation. Unfortunately, the construction of BOA requires large resources, especially for higher-dimensional systems, both in terms of computational time and memory space. In this paper, the implementation of cell mapping methods toward a distributed computing is undertaken;a new efficient parallel algorithm for the computation of large-scale BOA is presented herein, also by addressing issues arising from the inner seriality related to the BOA construction. A cell mapping core is thus wrapped in a management shell, and in charge of the core administration, it permits to split over a multicore environment the computing domain, by carrying out an efficient use of the distributed memory. The proposed approach makes use of a double-step algorithm in order to generate, first, the multidimensional BOA of the system and then to evaluate arbitrary 2D Poincar, sections of the hypercube that stores the information. An analysis on a test system is performed by considering different dimensional grids;the effort of a parallel implementation toward medium and large clusters is balanced by a great results in terms of computational speed. The performances are strictly affected not only by the number of cores used to run the code, but in particular in the way they are instructed. To get the best from an implementation on a massive parallel architecture, the processes must be properly balanced between memory operations and numerical integrations. A significant improvement in the elaboration time for a large computing domain is shown, and a comparison with a serial code demonstrates the great potential of the application;the advantages given by the use of parallel reading/writing are also discussed with respect to the BOA grid dimension.

关键词： parallel programming Cell mapping methods Basins of attraction

来源：评论

学校读者我要写书评

暂无评论

Ogre and Pythia: An invariance proof method for weak consistency models

引用

ACM SIGPLAN Notices 2017年第1期52卷 3-18页

作者： Alglave, Jade Cousot, Patrick University College London Microsoft Research Cambridge United Kingdom New York University United States École Normale Supérieure Psl France

We design an invariance proof method for concurrent programs parameterised by a weak consistency model. The calculational design of the invariance proof method is by abstract interpretation of a truly parallel analytic semantics. This generalises the methods by Lamport and Owicki-Gries for sequential consistency. We use cat as an example of language to write consistency specifications of both concurrent programs and machine architectures. © 2017 ACM.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

A BSPLIB-STYLE API FOR BULK SYNCHRONOUS parallel ML

引用

SCALABLE COMPUTING-PRACTICE AND EXPERIENCE 2017年第3期18卷 261-274页

作者： Loulergue, Frederic No Arizona Univ Sch Informat Comp & Cyber Syst Flagstaff AZ 86001 USA

Bulk synchronous parallelism (BSP) offers an abstract and simple model of parallelism yet allows to take realistically into account the communication costs of parallel algorithms. BSP has been used in many application domains. BSPlib and its variants are programming libraries for the C language that support the BSP style. Bulk Synchronous parallel ML (BSML) is a library for BSP programming with the functional language OCaml. It offers parallel operations on a data structure named parallel vector. BSML provides a global view of programs, i.e. BSML programs can be seen as sequential programs working on a parallel data structure (seq of par) while a BSPlib program is written in the SPMD style and understood as a parallel composition of communicating sequential programs (par of seq). The communication styles of BSML and BSPlib are also quite different. The contribution of this paper is a BSPlib-style communication API implemented on top of BSML. It has been designed without extending BSML, but only using the imperative features of the underlying functional language OCaml. Programs implemented using this API are syntactically very close to programs implemented using a BSPlib library for the C language. It therefore shows that BSML is universal for the BSP model.

关键词： bulk synchronous parallelism parallel programming functional programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：