Aiming at the multi-objective vehicle path planning problem with time windows (VRPTW), a Spark-based parallel Adaptive Large Neighborhood Search algorithm (Spark-ALNS) is proposed to solve it. The main design of the 4...
详细信息
Aiming at the multi-objective vehicle path planning problem with time windows (VRPTW), a Spark-based parallel Adaptive Large Neighborhood Search algorithm (Spark-ALNS) is proposed to solve it. The main design of the 4-point strategy: (1) Design a new simulated annealing algorithm cooling strategy to achieve a better jump out of the local optimal solution. (2) Adopt CW initialization to accelerate the convergence speed. (3) Use three destruction operators and three repair operators to implement local path optimization. (4) A new parallel strategy is proposed to improve the algorithm's accuracy and reduce the running time. To illustrate the algorithm's effectiveness, the arithmetic example in Solomon is used as an example. The experimental results show that the proposed Spark-ALNS can find better solutions, get the known optimal solutions for 41 out of 56 instances, and find new optimal solutions for 31 algorithms, which outperforms other evolutionary algorithms. The runtime is 3-5 times better than other parallel algorithms and is able to solve VRPTW effectively.
This paper studies a parallel algorithm for real Toeplitz systems, which is proposed based on the block Jacobi iteration and GMRES method. The algorithm has the advantage of less float operations, fast convergence spe...
详细信息
This paper studies a parallel algorithm for real Toeplitz systems, which is proposed based on the block Jacobi iteration and GMRES method. The algorithm has the advantage of less float operations, fast convergence speed and especially suitable for parallel computating. In this paper, we first use the block Jacobi iterative method to obtain the iterative process, and then the GMRES method is nested to obtain the iterative sequences {x(k)}. Therefore, the parallel algorithm for solving symmetric positive definite Toeplitz systems is constructed. The convergence of the algorithm is also discussed simply in the paper. At the end, we give some numerical examples to illustrate the effectiveness of the parallel algorithm.
The solution of linear equation group can be applied to the oil exploration, the structure vibration analysis, the computational fluid dynamics, and other fields. When we make the in-depth analysis of some large or ve...
详细信息
The solution of linear equation group can be applied to the oil exploration, the structure vibration analysis, the computational fluid dynamics, and other fields. When we make the in-depth analysis of some large or very large complicated structures, we must use the parallel algorithm with the aid of high-performance computers to solve complex problems. This paper introduces the implementation process having the parallel with sparse linear equations from the perspective of sparse linear equation group.
Approximate string matching (ASM) has a number of applications in many disciplines, ranging from information retrieval to gene matching. Conventional solution to this problem is based on the dynamic programming-based ...
详细信息
Approximate string matching (ASM) has a number of applications in many disciplines, ranging from information retrieval to gene matching. Conventional solution to this problem is based on the dynamic programming-based strategy having quadratic space and time complexity. The complexity of the conventional solution makes it impractical to search queries from the huge sequences having billions of characters. Therefore, many studies have been proposed that improves on the space and time requirement of the basic solution which includes heuristic, filtration, and index-based solutions. These existing solutions obtain the better performance by compromising on the completeness of the search. In this paper, we proposed the linear space algorithm for the approximate string matching problem while retaining the time complexity of conventional solution. The proposed method works in linear space without omitting any regions in the given text;hence, it finds all the possible matches. Conventional dynamic programming solution is modified in such a way that storage of complete trace back table is avoided by keeping only running count of each edit operation in the memory. A variety of laws and facts are discovered in classical dynamic programming table in that regard. We also presented the parallel approach to the proffered algorithm to improve the running time of the algorithm. The algorithm is evaluated on the CUDA-enabled GPUs. DNA sequences of sizes between 250 and 970 MBP are used for evaluation. Moreover, experiments are also performed by using natural language text to highlight the broader applicability of the proposed algorithm. Results show the substantial superiority of the algorithm in terms of performance and scalability compared to the state-of-the-art algorithms.
With the explosive growth of various intelligent device and the rapid development of wireless network communication technology, most people prefer to use video applications on smart devices. However, the main challeng...
详细信息
With the explosive growth of various intelligent device and the rapid development of wireless network communication technology, most people prefer to use video applications on smart devices. However, the main challenges when using video codec technology on mobile devices are: 1) The explosive growth of multimedia applications has caused the allocation of computing resources to become an important issue;2) high power consumption and limited battery power;3) high cpu utilization causes the system to be unresponsive. In this paper, a GPU based High Definition parallel Video Codec (GHPVC) is proposed, which is a low energy consumption and high efficient video codec on mobile devices. First, Frame Data Management model and Prediction Model Selector model are proposed in order to get higher data transmission efficiency and parallel execution efficiency. Second, a GPU based parallel ME module is proposed because the ME module is the most power-consuming and computationally intensive module in video codec. The GHPVC is proposed on the basis of conforming to the H.264 standard. Moreover and experimentally evaluated for different GPU devices on different mobile devices. Experimental results show that compared with the existing H.264 scheme, the proposed GHPVC not only has significant improvement in codec performance, but also effectively reduces energy consumption and CPU utilization.
Intelligent optimization algorithms, such as the genetic algorithm (GA) and particle swarm optimization (PSO), have been widely used for harmonic minimization in power converters. However, these algorithms usually eva...
详细信息
ISBN:
(纸本)9781728151359
Intelligent optimization algorithms, such as the genetic algorithm (GA) and particle swarm optimization (PSO), have been widely used for harmonic minimization in power converters. However, these algorithms usually evaluate the fitness function for hundreds of or even more populations, which leads to a huge computing burden and memory consumption. Hence, in practical applications, it is difficult to real-time implement these algorithms on the traditional central processing units (CPUs). On the other hand, although the population size is huge, each of them performs the same calculations independently, which is very suitable for parallelization on the graphical processing units (GPUs). In this paper, a parallel version of differential evolution (DE) on GPUs is proposed. Compared to the traditional CPU-based DE algorithm, the GPU-based parallel DE algorithm executes hundreds of times faster in solving the harmonic minimization problem. Some computational results of 21 switching angles for three-level inverters are given. Also, some guidelines for the parameter selection, such as the population size, the grid allocation, etc., on algorithmparallelization are discussed and summarized.
Centrality measures on graphs have found applications in a large number of domains including modeling the spread of an infection/disease, social network analysis, and transportation networks. As a result, parallel alg...
详细信息
ISBN:
(纸本)9781665410168
Centrality measures on graphs have found applications in a large number of domains including modeling the spread of an infection/disease, social network analysis, and transportation networks. As a result, parallel algorithms for computing various centrality metrics on graphs are gaining significant research attention in recent years. In this paper, we study parallel algorithms for the percolation centrality measure which extends the betweenness-centrality measure by incorporating a time dependent state variable with every node. We present parallel algorithms that compute the source-based and source-destination variants of the percolation centrality values of nodes in a network. Our algorithms extend the algorithm of Brandes, introduce optimizations aimed at exploiting the structural properties of graphs, and extend the algorithmic techniques introduced by Sariyuce el al. [26] in the context of centrality computation. Experimental studies of our algorithms on an Intel Xeon(R) Silver 4116 CPU and an Nvidia Tesla V100 GPU on a collection of 12 real-world graphs indicate that our algorithmic techniques offer a significant speedup.
Complex networks are large and analysis of these networks require significantly different methods than small networks. parallel processing is needed to provide analysis of these networks in a timely manner. Graph cent...
详细信息
ISBN:
(纸本)9781665407595
Complex networks are large and analysis of these networks require significantly different methods than small networks. parallel processing is needed to provide analysis of these networks in a timely manner. Graph centrality measures provide convenient methods to assess the structure of these networks. We review main centrality algorithms, describe implementation of closed centrality in Python and propose a simple parallel algorithm of closed centrality and show its implementation in Python with obtained results.
The huge data volumes and the emergence of new parallel architectures, e.g. multicore CPUs lead to revisiting classic computer science topics such as in-place sequence rotation. In-place sequence rotation is a basic s...
详细信息
The huge data volumes and the emergence of new parallel architectures, e.g. multicore CPUs lead to revisiting classic computer science topics such as in-place sequence rotation. In-place sequence rotation is a basic step in several fundamental computing tasks. The sequential algorithms of the in-place sequence rotation effect are classic and well-studied, which are classified into three classes. Recently, Intel introduced the parallel standard template library (STL) implementation for multicore CPU systems;it has an in-place rotation function based on the rotation by copy, but its space complexity is O(n). In this work, we propose the blend rotation, which is a parallel-friendly and in-place algorithm that combines the merits of these three rotation algorithm classes. Besides, we propose a set of for parallel In-place SeQuence RoTation (PI-sqrt) implementations. The performance of PI-sqrt is examined through several experiments. To the best of our knowledge, the obtained running times show that the implementations of blend and reversal rotations are by far the fastest parallel implementations;they are faster on average, through different experiments, by 7.85 x and 5.52x, respectively, compared to the parallel rotation function of Intel parallel STL.
An efficient parallel finite element method is introduced for solving the steady-state Smagorinsky model in which a fully overlapping domain decomposition is considered for parallelization. The crucial idea of the met...
详细信息
An efficient parallel finite element method is introduced for solving the steady-state Smagorinsky model in which a fully overlapping domain decomposition is considered for parallelization. The crucial idea of the method is to utilize a locally refined multiscale mesh that is fine around its own subdomain and coarse elsewhere to calculate a local finite element solution. On the basis of an existing Smagorinsky solver, the introduced method is easily implemented and avoids massive recoding. Using the duality argument, errors of the standard finite element approximations for the velocity in ������2 norm and pressure in ������-1 norm are derived. Error bounds of the solutions from the introduced method are estimated. Moreover, four parallel iterative algorithms are presented, and some results of numerical tests are given to verify the theory predicted and demonstrate the effectiveness of the algorithms. It is numerically shown that the parallel algorithms decrease substantially the CPU time, keeping the accuracy of the solutions comparable to the serial algorithm.
暂无评论