Computer-aided design (CAD) models of industrial design are often plagued with a series of defects, including minute features, gaps, self-intersections, and misalignments. The development of schemes in automatic defea...
详细信息
Computer-aided design (CAD) models of industrial design are often plagued with a series of defects, including minute features, gaps, self-intersections, and misalignments. The development of schemes in automatic defeaturing, gaps repairing, and intersection removal usually requires a discrete representation of the geometry. However, existing surface meshing methods could not effectively handle sliver surfaces and assemblies with complex contact conditions, including multiple misaligned curves/surfaces and degenerated or free-form-shaped interface of contact. A surface meshing method based on a meshing-and-synchronizing strategy is proposed, which tackles the mesh generation of sliver surfaces and the cleanup of misaligned assemblies by means of mesh alignment. The mesh generation is performed in a hierarchical manner with curves and surfaces being meshed in sequence. By incorporating the synchronization strategy into the curve/surface meshing process, curve/surface mesh alignment is achieved automatically without compromising the mesh quality. Thanks to the alignment, the curve mesh generated by aligned curve meshing (ACM) is free of intersections, which enhances the meshing of sliver surfaces. By enforcing mesh alignment, aligned surface meshing (ASM) can handle misaligned assemblies with complex contact conditions. ASM is parallelized by using OpenMP. Various assemblies characterized with difficult misaligned features and touches are well processed by parallel ASM.
An important combinatorial problem is subgraph isomorphism, which formalizes the task of searching for occurrences of a known substructure within a larger structure represented by a graph: applications are in the fiel...
详细信息
An important combinatorial problem is subgraph isomorphism, which formalizes the task of searching for occurrences of a known substructure within a larger structure represented by a graph: applications are in the fields of chemistry, biology, medicine, databases, social network analysis. Subgraph isomorphism has been proven to be NP-complete in the general case, but several algorithms exist that use heuristics to achieve an affordable run time for common classes of graphs. The need of working with larger and larger graphs makes attractive the idea of parallelizing this task;however, a consensus has not yet been reached on what is the best strategy for doing so. In this paper, we present two versions of a new, parallel algorithm that is based on a re-design of the well known VF3 algorithm. We discuss the changes that were made to efficiently distribute the work among multiple processors. The algorithms have been evaluated with a comprehensive experimentation, using several publicly available graph datasets, to demonstrate their effectiveness in exploiting the parallelism. (c) 2021 Elsevier B.V. All rights reserved.
Herein, a parallel implementation of Discrete Orthogonal moments on block represented images is investigated. Moments and moment functions have been used widely as features for image analysis and pattern recognition t...
详细信息
Herein, a parallel implementation of Discrete Orthogonal moments on block represented images is investigated. Moments and moment functions have been used widely as features for image analysis and pattern recognition tasks. The main disadvantage of all moment sets, is the high computational cost which is increased as higher-order moments are involved in the computations. In image block representation (IBR) the image is represented by homogeneous areas which are called blocks. The IBR allows moment computation with zero computational error for binary images, low computational error for gray images, low computational complexity, while can achieve high processing rates. The results from parallel implementation on a multicore computer using OpenMP, exhibit significant performance.
The minimum spanning tree is a critical problem for many applications in network analysis, communication network design, and computer science. The parallel implementation of minimum spanning tree algorithms increases ...
详细信息
ISBN:
(纸本)9783030975494;9783030975487
The minimum spanning tree is a critical problem for many applications in network analysis, communication network design, and computer science. The parallel implementation of minimum spanning tree algorithms increases the simulation performance of large graph problems using high-performance computational resources. The minimum spanning tree algorithms generally use traditional parallel programming models for distributed and shared memory systems, like Massage Passing Interface or OpenMP. Furthermore, the partitioned global address space model offers new capabilities in the form of asynchronous computations on distributed shared memory, positively affecting the performance and scalability of the algorithms. The paper aims to present a new minimum spanning tree algorithm implemented in a partitioned global address space model. The experiments with diverse parameters have been conducted to study the efficiency of the asynchronous implementation of the algorithm.
The electric and magnetic fields around power lines carry an immense amount of information about the power grid and can be used to improve stability, balance loads, conserve power, and reduce outages. To study this, a...
详细信息
The electric and magnetic fields around power lines carry an immense amount of information about the power grid and can be used to improve stability, balance loads, conserve power, and reduce outages. To study this, an extremely large model of transmission lines over a 70-km(2) tract of land near Washington, DC, has been built. The terrain was modeled accurately using 1-m-resolution LIDAR data. The 140-million-element power-line model was solved using the boundary element method, and the solvers were parallelized across DEVCOM Army Research Laboratory's Centennial supercomputer using a modified version of the domain decomposition method. The code on each node was accelerated using the fast multipole method and, when available, GPUs. Additionally, larger test models were used to characterize the scalability of the code. The largest test model had 10,010,944,000 elements, and was solved on 1,024 nodes in 4.3 hours.
The kd-tree is one of the most widely used data structures to manage multi-dimensional data. Due to the ever-growing data volume, it is imperative to consider parallelism in kd-trees. However, we observed challenges i...
详细信息
The kd-tree is one of the most widely used data structures to manage multi-dimensional data. Due to the ever-growing data volume, it is imperative to consider parallelism in kd-trees. However, we observed challenges in existing parallel kd-tree implementations, for both constructions and *** goal of this paper is to develop efficient in-memory kd-trees by supporting high parallelism and cache-efficiency. We propose the Pkd-tree (parallel kd-tree), a parallel kd-tree that is efficient both in theory and in practice. The Pkd-tree supports parallel tree construction, batch update (insertion and deletion), and various queries including k-nearest neighbor search, range query, and range count. We proved that our algorithms have strong theoretical bounds in work (sequential time complexity), span (parallelism), and cache complexity. Our key techniques include 1) an efficient construction algorithm that optimizes work, span, and cache complexity simultaneously, and 2) reconstruction-based update algorithms that guarantee the tree to be weight-balanced. With the new algorithmic insights and careful engineering effort, we achieved a highly optimized implementation of the *** tested Pkd-tree with various synthetic and real-world datasets, including both uniform and highly skewed data. We compare the Pkd-tree with state-of-the-art parallel kd-tree implementations. In all tests, with better or competitive query performance, Pkd-tree is much faster in construction and updates consistently than all baselines. We released our code.
Currently, the best known tradeoff between approximation ratio and complexity for the Sparsest Cut problem is achieved by the algorithm in [Sherman, FOCS 2009]: it computes O(root(log n)/epsilon)-approximation using O...
详细信息
ISBN:
(纸本)9798400704161
Currently, the best known tradeoff between approximation ratio and complexity for the Sparsest Cut problem is achieved by the algorithm in [Sherman, FOCS 2009]: it computes O(root(log n)/epsilon)-approximation using O(n(epsilon) log(O(1))n) maxflows for any epsilon is an element of[Theta(1/log n), Theta(1)]. It works by solving the SDP relaxation of [Arora-Rao-Vazirani, STOC 2004] using the Multiplicative Weights Update algorithm (MW) of [Arora-Kale, JACM 2016]. To implement one MW step, Sherman approximately solves a multicommodity flow problem using another application of MW. Nested MW steps are solved via a certain "chaining" algorithm that combines results of multiple calls to the maxflow algorithm. We present an alternative approach that avoids solving the multicommodity flow problem and instead computes "violating paths". This simplifies Sherman's algorithm by removing a need for a nested application of MW, and also allows parallelization: we show how to compute O(root(log n)/epsilon)-approximation via O(log(O(1)) n) maxflows using O(n(epsilon)) processors. We also revisit Sherman's chaining algorithm, and present a simpler version together with a new analysis.
We propose the analysis of a scalable parallel MCMC algorithm for graph coloring aimed at balancing the color class sizes, provided that a suitable number of colors is made available. Firstly, it is shown that the Mar...
详细信息
We propose the analysis of a scalable parallel MCMC algorithm for graph coloring aimed at balancing the color class sizes, provided that a suitable number of colors is made available. Firstly, it is shown that the Markov chain converges to the target distribution by repeatedly sampling from suitable proposed distributions over the neighboring colors of each node, independently and hence in parallel manner. We prove that the number of conflicts in the improper colorings genereted thoughout the iterations of the algorithm rapidly converges in probability to 0. As for the balancing, given to the complexity of the distributions involved, we propose a qualitative analysis about the balancing level achieved. Based on a collection of multinoulli distributions arising from the color occurrences within every node neighborhood, we provide some evidence about the character of the final color balancing, which results to be nearly uniform over the color classes. Some numerical simulations on big social graphs confirm the fast convergence and the balancing trend, which is validated through a statistical hypothesis test eventually. (c) 2021 Elsevier B.V. All rights reserved.
Fast Fourier Transform (FFT) is a fundamental operation for 2D data in various applications. To accelerate large-scale 2D-FFT computation, we propose a Heterogeneous parallel In-place 2D-FFT algorithm, HI-FFT. Our nov...
详细信息
Fast Fourier Transform (FFT) is a fundamental operation for 2D data in various applications. To accelerate large-scale 2D-FFT computation, we propose a Heterogeneous parallel In-place 2D-FFT algorithm, HI-FFT. Our novel work decomposition method makes it possible to run our parallel algorithm on the original data (i.e., in-place), unlike prior parallel algorithms that require additional memory space (i.e., out-of-place) to guarantee independence among sub-tasks. Our work decomposition method also removes the duplicated operations on the out-of-place approaches. Using our decomposition method, we introduced an in-place heterogeneous parallel algorithm that utilizes both multi-core CPU and GPU simultaneously. To maximize the utilization efficiency of the computing resources, we also propose a priority-based dynamic scheduling method. We compared the performance of seven different 2D-FFT algorithms, including ours, for large-scale 2D-FFT problems whose sizes varied from 20K(2) to 120K(2). As a result, we found that our method achieved up to 2.92 and 4.42 times higher performance than the conventional homogeneous parallel algorithms based on the state-of-the-art CPU and GPU libraries, respectively. Also, our method showed up to 2.27 times higher performance than the prior heterogeneous algorithms while requiring two times less memory space. To check the benefit of our HI-FFT on an actual application, we applied it to a CGH (Computer Generated Holography) process. We found that it successfully reduces the hologram generation time. These results demonstrate the advantage of our approach for large-scale 2D-FFT computation.
Edge detection is an important process in image segmentation, object recognition, template matching, etc. It computes gradients in both horizontal and vertical directions of the image at each pixel position to find th...
详细信息
Edge detection is an important process in image segmentation, object recognition, template matching, etc. It computes gradients in both horizontal and vertical directions of the image at each pixel position to find the image boundaries. The conventional edge detectors take significant time to detect the edges in the image. To reduce the computational time, this paper proposes parallel algorithms for edge detection with Sobel, Prewitt and Robert first order derivatives using a Shared Memory - Single Instruction Multiple Data (SM - SIMD) parallel architecture. From the experimental results, it is inferred that the proposed parallel algorithms for edge detection are faster than the conventional methods.
暂无评论