We study the effect of adaptive mesh refinement on a parallel domain decomposition solver of a linear system of algebraic equations. These concepts need to be combined within a parallel adaptive finite element softwar...
详细信息
We study the effect of adaptive mesh refinement on a parallel domain decomposition solver of a linear system of algebraic equations. These concepts need to be combined within a parallel adaptive finite element software. A prototype implementation is presented for this purpose. It uses adaptive mesh refinement with one level of hanging nodes. Two and three-level versions of the Balancing Domain Decomposition based on Constraints (BDDC) method are used to solve the arising system of algebraic equations. The basic concepts are recalled and components necessary for the combination are studied in detail. Of particular interest is the effect of disconnected subdomains, a typical output of the employed mesh partitioning based on space-filling curves, on the convergence and solution time of the BDDC method. It is demonstrated using a large set of experiments that while both refined meshes and disconnected subdomains have a negative effect on the convergence of BDDC, the number of iterations remains acceptable. In addition, scalability of the three-level BDDC solver remains good on up to a few thousands of processor cores. The largest presented problem using adaptive mesh refinement has over 109 unknowns and is solved on 2048 cores. (C) 2017 Elsevier Ltd. All rights reserved.
We describe a parallel algorithmic framework for optimizing the shape of elements in a simplicial volume mesh. Using fine-grained parallelism and asymmetric multiprocessing on multi-core CPU and modern graphics proces...
详细信息
We describe a parallel algorithmic framework for optimizing the shape of elements in a simplicial volume mesh. Using fine-grained parallelism and asymmetric multiprocessing on multi-core CPU and modern graphics processing unit hardware simultaneously, we achieve speedups of more than tenfold over current state-of-the-art serial methods. In addition, improved mesh quality is obtained by optimizing both the surface and the interior vertex positions in a single pass, using feature preservation to maintain fidelity to the original mesh geometry. The framework is flexible in terms of the core numerical optimization method employed, and we provide performance results for both gradient-based and derivative-free optimization methods.
A parallel algorithm for the parametric synthesis of a family of controlled linearized combined dynamic systems in which the configuration of the stability regions in the feedback parameter space depends on some slowl...
详细信息
A parallel algorithm for the parametric synthesis of a family of controlled linearized combined dynamic systems in which the configuration of the stability regions in the feedback parameter space depends on some slowly varying design parameters is proposed. The parametric synthesis of a system for the angular stabilization of an elastic beam rotating about a longitudinal axis under the action of a longitudinal acceleration is implemented. This kind of synthesis can substantially reduce the errors of the stabilization system and the typical regulation time in the entire range of the slow increase in the rotation velocity of the beam.
The HEVC video coding standard launched on 2013, is able to reduce to the half, on average, the bit stream size produced by H.264/AVC encoder at the same video quality, but it requires nearly 70 % more time than H.264...
详细信息
The HEVC video coding standard launched on 2013, is able to reduce to the half, on average, the bit stream size produced by H.264/AVC encoder at the same video quality, but it requires nearly 70 % more time than H.264/AVC to encode a video sequence. In this paper we propose several parallelization approaches to the HEVC encoder. Our proposals, for distributed memory platforms, work at a coarse grain level parallelization, being one group of pictures (GOP) the basic structure. These approaches encode simultaneously several GOPs. To obtain good parallel performance, a right GOP conformation and distribution should be applied.
We present an efficient distributed memory parallel algorithm for computing connected components in undirected graphs based on Shiloach-Vishkin's PRAM approach. We discuss multiple optimization techniques that red...
详细信息
We present an efficient distributed memory parallel algorithm for computing connected components in undirected graphs based on Shiloach-Vishkin's PRAM approach. We discuss multiple optimization techniques that reduce communication volume as well as load-balance the algorithm. We also note that the efficiency of the parallel graph connectivity algorithm depends on the underlying graph topology. Particularly for short diameter graph components, we observe that parallel Breadth First Search (BFS) method offers better performance. However, running parallel BFS is not efficient for computing large diameter components or large number of small components. To address this challenge, we employ a heuristic that allows the algorithm to quickly predict the type of the network by computing the degree distribution and follow the optimal hybrid route. Using large graphs with diverse topologies from domains including metagenomics, web crawl, social graph and road networks, we show that our hybrid implementation is efficient and scalable for each of the graph types. Our approach achieves a runtime of 215 seconds using 32 K cores of Cray XC30 for a metagenomic graph with over 50 billion edges. When compared against the previous state-of-the-art method, we see performance improvements up to 24 x.
The new video coding standard HEVC includes two concepts that allow to partition a frame into regions that can be independently encoded and decoded. These two concepts are named "Tiles" and "Slices"...
详细信息
The new video coding standard HEVC includes two concepts that allow to partition a frame into regions that can be independently encoded and decoded. These two concepts are named "Tiles" and "Slices". In this paper, we present and analyze optimized parallel versions of the HEVC encoder based on tile and slice partitioning. We have evaluated the benefits and drawbacks of both approaches in terms of computational times and rate distortion performance. The results show that both approaches obtain good speed-ups, being the parallel version based on tiles the one that obtains the best trade-off between speed-up achieved (up to 9.3) and rate distortion performance loss (1.6% BD rate for AI mode and 2.2% for LB mode on average).
Presently, a tetrahedral mesher based on the Delaunay triangulation approach may outperform a tetrahedral improver based on local smoothing and flip operations by nearly one order in terms of computing time. paralleli...
详细信息
Presently, a tetrahedral mesher based on the Delaunay triangulation approach may outperform a tetrahedral improver based on local smoothing and flip operations by nearly one order in terms of computing time. parallelization is a feasible way to speed up the improver and enable it to handle large-scale meshes. In this study, a novel domain decomposition approach is proposed for parallel mesh improvement. It analyses the dual graph of the input mesh to build an inter-domain boundary that avoids small dihedral angles and poorly shaped faces. Consequently, the parallel improver can fit this boundary without compromising the mesh quality. Meanwhile, the new method does not involve any inter-processor communications and therefore runs very efficiently. A parallel pre-processing pipeline that combines the proposed improver and existing parallel surface and volume meshers can prepare a quality mesh containing hundreds of millions of elements in minutes. Experiments are presented to show that the developed system is robust and applicable to models of a complication level experienced in industry. (C) 2017 Elsevier Inc. All rights reserved.
It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is...
详细信息
It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its shared-memory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. Our algorithm searches for augmenting paths via specialized breadth-first searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. algorithms that employ multiple-source searches cannot discard a search tree once no augmenting path is discovered from the tree, unlike algorithms that rely on single-source searches. We describe a novel tree-grafting method that eliminates most of the redundant edge traversals resulting from this property of multiple-source searches. We also employ the recent direction-optimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. We provide a proof of correctness for our algorithm. Our NUMA-aware implementation is scalable to 80 threads of an Intel multiprocessor and to 240 threads on an Intel Knights Corner coprocessor. On average, our parallel algorithm runs an order of magnitude faster than the fastest algorithms available. The performance improvement is more significant on graphs with small matching number.
The watershed transform is considered as the most appropriate method for image segmentation in the field of mathematical morphology. In the following paper, we present an adapted topological watershed algorithm suited...
详细信息
The watershed transform is considered as the most appropriate method for image segmentation in the field of mathematical morphology. In the following paper, we present an adapted topological watershed algorithm suited for a rapid and effective implementation on Shared Memory parallel Machine (SMPM). The introduced algorithm allows a parallel watershed computing while preserving the given topology. No prior minima extraction is needed, nor the use of any sorting step or hierarchical queue. The strategy that guides the parallel watershed computing, labeled SDM-Strategy (equivalent to Split-Distributes and Merge), is also presented. Experimental analyses such as execution time, performance enhancement, cache consumption, efficiency and scalability are also presented and discussed. (C) 2017 Elsevier B.V. All rights reserved.
Coverage problem is essential to Wireless Sensor Networks on energy efficient deployment and monitoring. In this paper, we propose a distributed Čech complex algorithm for coverage hole detection in WSNs. Based on our...
详细信息
暂无评论