In this paper, the Laplace transform combined with the local discontinuous Galerkin method is used for distributed-order time-fractional diffusion-wave equation. In this method, at first, we convert the equation to so...
详细信息
In this paper, the Laplace transform combined with the local discontinuous Galerkin method is used for distributed-order time-fractional diffusion-wave equation. In this method, at first, we convert the equation to some time-independent problems by Laplace transform. Then, we solve these stationary equations by the local discontinuous Galerkin method to discretize diffusion operators at the same time. Next, by using a numerical inversion of the Laplace transform, we find the solution of the original equation. One of the advantages of this procedure is its capability to be implemented in a parallel environment. It's another advantage is that the number of stationary problems that should be solved is much less than that is needed in time-marching methods. Finally, some numerical experiments have been provided to show the accuracy and efficiency of the method.
The obnoxious p-median problem consists of selecting p locations, considered facilities, in a way that the sum of the distances from each nonfacility location, called customers, to its nearest facility is maximized. T...
详细信息
The obnoxious p-median problem consists of selecting p locations, considered facilities, in a way that the sum of the distances from each nonfacility location, called customers, to its nearest facility is maximized. This is an NP-hard problem that can be formulated as an integer linear program. In this paper, we propose the application of a variable neighborhood search (VNS) method to effectively tackle this problem. First, we develop new and fast local search procedures to be integrated into the basic VNS methodology. Then, some parameters of the algorithm are tuned in order to improve its performance. The best VNS variant is parallelized and compared with the best previous methods, namely branch and cut, tabu search, and GRASP over a wide set of instances. Experimental results show that the proposed VNS outperforms previous methods in the state of the art. This fact is finally confirmed by conducting nonparametric statistical tests.
Networks are very important in the world. In signal processing, the towers are modeled as nodes (vertices) and if two towers communicate, then they have an arc (edge) between them or precisely, they are adjacent. The ...
详细信息
Point density is an important property that dictates the usability of a point cloud data set. This paper introduces an efficient, scalable, parallel algorithm for computing the local point density index, a sophisticat...
详细信息
Point density is an important property that dictates the usability of a point cloud data set. This paper introduces an efficient, scalable, parallel algorithm for computing the local point density index, a sophisticated point cloud density metric. Computing the local point density index is non-trivial, because this computation involves a neighbour search that is required for each, individual point in the potentially large, input point cloud. Most existing algorithms and software are incapable of computing point density at scale. Therefore, the algorithm introduced in this paper aims to address both the needed computational efficiency and scalability for considering this factor in large, modern point clouds such as those collected in national or regional scans. The proposed algorithm is composed of two stages. In stage 1, a point-level, parallel processing step is performed to partition an unstructured input point cloud into partially overlapping, buffered tiles. A buffer is provided around each tile so that the data partitioning does not introduce spatial discontinuity into the final results. In stage 2, the buffered tiles are distributed to different processors for computing the local point density index in parallel. That tile-level parallel processing step is performed using a conventional algorithm with an R-tree data structure. While straight-forward, the proposed algorithm is efficient and particularly suitable for processing large point clouds. Experiments conducted using a 1.4 billion point data set acquired over part of Dublin, Ireland demonstrated an efficiency factor of up to 14.8/16. More specifically, the computational time was reduced by 14.8 times when the number of processes (i.e. executors) increased by 16 times. Computing the local point density index for the 1.4 billion point data set took just over 5 minutes with 16 executors and 8 cores per executor. The reduction in computational time was nearly 70 times compared to the 6 hours required without
High Performance Computing (HPC) systems are employed to solve hard problems and rely on parallel algorithms which present very long execution times-up to several days. These systems are expensive in terms of the comp...
详细信息
In order to improve the computational efficiency of DPM(Decomposition Projective Method) based on orthogonal complement space, this paper has reformed the computing process of the above mothed as a parallel algorithm....
详细信息
When analyzing power systems, it is often desirable to visualize the network of buses and branches. Here, a new algorithm for producing 2-D network layouts is proposed. The method consists of two steps: first, a matri...
详细信息
When analyzing power systems, it is often desirable to visualize the network of buses and branches. Here, a new algorithm for producing 2-D network layouts is proposed. The method consists of two steps: first, a matrix of desired distances between all bus-pairs is computed based on base voltages and branch reactances and, second, coordinates that minimize the errors between desired and actual distances are found. The parallelization used in the latter step is particularly beneficial for interpreted languages;it is shown that layouts for relatively large systems (a few thousand buses) can be produced within seconds on a standard laptop computer using Python or Matlab. Predefined coordinates for selected buses can optionally be given as input. This can be useful, e.g., when one wants to retain some geographical aspects of the system or wish to compare a full and reduced network model. Although the focus here is on power systems, the algorithm can also be used for other types of networks.
Speedup and efficiency of two parallel algorithms for calculating the dynamics of the current distribution when the surface of a tungsten sample is heated by an electron beam pulse are presented. The algorithms are im...
详细信息
We propose a decomposition framework for the parallel optimization of the sum of a differentiable (possibly nonconvex) function and a nonsmooth (separable), convex one. The latter term is usually employed to enforce s...
详细信息
We propose a decomposition framework for the parallel optimization of the sum of a differentiable (possibly nonconvex) function and a nonsmooth (separable), convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. The main contribution of this work is a novel parallel, hybrid random/deterministic decomposition scheme wherein, at each iteration, a subset of (block) variables is updated at the same time by minimizing local convex approximations of the original nonconvex function. To tackle with huge-scale problems, the (block) variables to be updated are chosen according to a mixed random and deterministic procedure, which captures the advantages of both pure deterministic and random update-based schemes. Almost sure convergence of the proposed scheme is established. Numerical results on huge-scale problems show that the proposed algorithm outperforms current schemes.
Back-Projection is the major algorithm in Computed Tomography to reconstruct images from a set of recorded projections. It is used for both fast analytical methods and high-quality iterative techniques. X-ray imaging ...
详细信息
Back-Projection is the major algorithm in Computed Tomography to reconstruct images from a set of recorded projections. It is used for both fast analytical methods and high-quality iterative techniques. X-ray imaging facilities rely on Back-Projection to reconstruct internal structures in material samples and living organisms with high spatial and temporal resolution. Fast image reconstruction is also essential to track and control processes under study in real-time. In this article, we present efficient implementations of the Back-Projection algorithm for parallel hardware. We survey a range of parallel architectures presented by the major hardware vendors during the last 10 years. Similarities and differences between these architectures are analyzed and we highlight how specific features can be used to enhance the reconstruction performance. In particular, we build a performance model to find hardware hotspots and propose several optimizations to balance the load between texture engine, computational and special function units, as well as different types of memory maximizing the utilization of all GPU subsystems in parallel. We further show that targeting architecture-specific features allows one to boost the performance 2-7 times compared to the current state-of-the-art algorithms used in standard reconstructions codes. The suggested load-balancing approach is not limited to the back-projection but can be used as a general optimization strategy for implementing parallel algorithms.
暂无评论