For an n-gon P, we say P is weakly visible from segment s if any point on P is visible from at least one point of the segment. In this paper, we present an optimal preprocessing algorithm which runs in O(log n) time u...
详细信息
In this paper, we briefly introduce the basic theory and method of tight-binding molecular dynamics(TBMD), and study the quantum oscillation of graphene at about the absolute zero Kelvin. By using the TBMD method and ...
详细信息
ISBN:
(纸本)9781612841021
In this paper, we briefly introduce the basic theory and method of tight-binding molecular dynamics(TBMD), and study the quantum oscillation of graphene at about the absolute zero Kelvin. By using the TBMD method and parallel program to simulate the graphene and analyzing the simulated results, we propose some improvements on computing the forces by perturbation and sparse matrix method.
Tato práce se zabývá technologií OpenCL a jejím využitím pro detekci objektů. První část je zaměřená na popis principů technologie OpenCL a základní teorii ...
详细信息
Tato práce se zabývá technologií OpenCL a jejím využitím pro detekci objektů. První část je zaměřená na popis principů technologie OpenCL a základní teorii o detekci objektů. Následuje kapitola analýzy, kde je navržená metoda zpracování s přihlédnutím na možnosti OpenCL. Další část popisuje samotnou implementaci detekční aplikace a experimentálně vyhodnocuje výkon detektoru. Poslední kapitola shrnuje dosažené výsledky.
In this work PFI (Perturbed Functional Iterations) has been extended to solve large-scale nonlinear models by applying parallel computations. PFI partially linearizes a given nonlinear system, and irrespective of the ...
详细信息
An effective strategy for accelerating the calculation of convex hulls is to filter the input points by discarding interior points. In this paper, we present such a straightforward preprocessing approach by discarding...
详细信息
An effective strategy for accelerating the calculation of convex hulls is to filter the input points by discarding interior points. In this paper, we present such a straightforward preprocessing approach by discarding the points locating in a convex polygon formed by 16 extreme points. Extreme points of a planar point set do not alter when all points are rotated with the same angle in the plane. Four groups of four extreme points with min or max x or y coordinates can be found for the original point set and three rotated point sets. These 16 extreme points are used to form a planar convex polygon. We discard those points locating in the convex polygon and calculate the desired convex hull of the remaining points. The proposed preprocessing algorithm is evaluated on two computational platforms. Experiments show that, when employing the proposed preprocessing algorithm on the computational platform 1, it achieves speedups of approximately 4 x similar to 5x on average and 5 x similar to 6x in the best cases over the cases where the proposed approach is not used, while on the computational platform 2, the speedups are approximately 6 x similar to 9x on average and 9 x similar to 14x in the best cases. Moreover, more than 99% input points can be discarded in most cases.
In this paper, parallelization techniques are proposed for the branch-and-bound algorithm OTClique for the maximum weight clique problem. OTClique consists of the precomputation phase and the branch-and-bound phase. T...
详细信息
In this paper, parallelization techniques are proposed for the branch-and-bound algorithm OTClique for the maximum weight clique problem. OTClique consists of the precomputation phase and the branch-and-bound phase. The proposed algorithmparallelizes both of them. In the precomputation phase, the construction of optimal tables is parallelized. In the branch-and-bound phase, the proposed algorithm generates small subproblems and assigns them to threads. A technique to share lower and upper bounds is also proposed. Experiments using some benchmarks show that the proposed parallelization techniques improve the performance of OTClique. With an 8-core CPU, the computation time of OTClique becomes 6.91 times shorter on random graphs and 5.38 times on DIMACS benchmarks on average. (C) 2021 Elsevier B.V. All rights reserved.
In recent years, high-utility pattern mining has been studied extensively. However, most of these studies have addressed mining high-utility patterns (HUPs) without consideration for their frequencies, leading to the ...
详细信息
In recent years, high-utility pattern mining has been studied extensively. However, most of these studies have addressed mining high-utility patterns (HUPs) without consideration for their frequencies, leading to the mining of meaningless HUPs. One of the approaches to solving this problem is to use HUP mining with strong affinity frequencies. In this paper, we propose two algorithms to discover HUPs with strong affinity frequencies: DHUPMiner (Discriminative High-Utility pattern Miner) and its parallel version, DHUP-Miner*. Several novel pruning strategies are applied to reduce the search space for potential DHUPs. Experimental results show that the proposed algorithms are faster than the state-of-the-art algorithm (FDHUP) for both sparse and dense benchmark datasets. Moreover, the parallel algorithm (DHUP-Miner*) was found to handle large datasets well.
The compressible, three-dimensional, time-dependent Navier-Stokes equations are solved on a 20 processor Flex/32 computer. The code is a parallel implementation of an existing code operational on the Cray-2 at NASA Am...
详细信息
GPUs have become important solutions for accelerating scientific applications. Most of the existing work on climate models now use code rewritten using CUDA to achieve a limited speedup. This restriction also greatly ...
详细信息
GPUs have become important solutions for accelerating scientific applications. Most of the existing work on climate models now use code rewritten using CUDA to achieve a limited speedup. This restriction also greatly limits followup development and applications. In this paper, we designed and implemented a GPU-based acceleration of the LASG/IAP climate system ocean model (LICOM) version 2, called LICOM2-GPU. Considering the extremely large codebase of the model and the occasional need to modify the code, we implemented the model completely in OpenACC. Several accelerated methods, including OpenACC data locality optimization, loop optimization, and interprocess communication optimization are presented. Developing for GPUs using OpenACC is substantially simpler than using the CUDA port. Thus, the OpenACC is a suitable GPU programming model for complex systems, such as the earth system model and its components. Our experimental results using 4 NVIDIA K80 cards achieved up to a 6.6x speedup compared with 4 Intel(R) Xeon(R) CPU E5-2690 v2 GPUs.
暂无评论