Although several sequential heuristics have been proposed for dealing with the Unconstrained Binary Quadratic Programming (UBQP), very little effort has been made for designing parallel algorithms for the UBQP. This p...
详细信息
ISBN:
(纸本)9781509046010
Although several sequential heuristics have been proposed for dealing with the Unconstrained Binary Quadratic Programming (UBQP), very little effort has been made for designing parallel algorithms for the UBQP. This paper propose a novel decentralized parallel search algorithm, called parallel Elite Biased Tabu Search (PEBTS). It is based on (DTS)-T-2, a state-of-the-art sequential UBQP metaheuristic. The key strategies in the PEBTS algorithm include: (i) a lazy distributed cooperation procedure to maintain diversity among different search processes and (ii) finely tuned bit-flip operators which can help the search escape local optima efficiently. Our experiments on the Tianhe-2 supercomputer with up to 24 computing cores show the accuracy of the efficiency of PEBTS compared with a straightforward parallel algorithm running multiple independent and non-cooperating D 2 TS processes.
An acceleration method for interpolation-based super-resolution (SR) methods using convolutional neural networks (CNNs), represented by SRCNN and VDSR, is proposed. In this paper, estimated pixels are classified into ...
详细信息
ISBN:
(纸本)9781538615423
An acceleration method for interpolation-based super-resolution (SR) methods using convolutional neural networks (CNNs), represented by SRCNN and VDSR, is proposed. In this paper, estimated pixels are classified into a number of types according to upscaling factors, and then SR images are generated by using CNNs optimized for each type. It allows us to adapt smaller filter sizes to CNNs than conventional ones, so that the computational complexity can be reduced for both running phase and training one. In addition, it is shown that the optimized CNNs for some type are closely related to those of other types, and the relation provides a method to reduce the computational complexity for training phase. A number of experiments are carried out to demonstrate that the effectiveness of the proposed method. The proposed method outperforms conventional ones in terms of the processing speed, while keeping the quality of SR images.
Computed Tomograpl c (CT) image reconstruction is an important technique used in a wide range of applications. Among reconstruction methods, Model-Based Iterative Reconstruction (MBIR) is known to produce much higher ...
详细信息
ISBN:
(数字)9781450351140
ISBN:
(纸本)9781450351140
Computed Tomograpl c (CT) image reconstruction is an important technique used in a wide range of applications. Among reconstruction methods, Model-Based Iterative Reconstruction (MBIR) is known to produce much higher quality CT images;however, the high computational requirements of MBIR greatly restrict their application. Currently, MBIR speed is primarily limited by irregular data access patterns, the difficulty of effective parallelization, arid slow algorithmic convergence. This paper presents a new algorithm for MBIR, the Non-Uniform parallel Super-Vaxel (NU-PSV) algoritlun, that regularizes the data access pattern, enables massive parallelism, and ensures fast convergence. We compare the NU-PSV algoritlun with two state-of-the-art implementations on a 69632-core distributed system. Results indicate that the NU-PSV algorithm has an average speedup of 1665 compared to the fastest state-of-the-art implementations.
We present a parallel algorithm to compute promising candidate states for modifying the state space of a pseudo-random number generator in order to increase its cycle length. This is important for generators in low-po...
详细信息
ISBN:
(纸本)9781538619681
We present a parallel algorithm to compute promising candidate states for modifying the state space of a pseudo-random number generator in order to increase its cycle length. This is important for generators in low-power devices where increase of state space is not an alternative. The runtime of the parallel algorithm is improved by an analogy to ant colony behavior: if two paths meet, the resulting path is followed at accelerated speed just as ants tend to reinforce paths that have been used by other ants. We evaluate our algorithm with simulations and demonstrate high parallel efficiency that makes the algorithm well-suited even for massively parallel systems like GPUs. Furthermore, the accelerated path variant of the algorithm achieves a runtime improvement of up to 4% over the straight-forward implementation.
Gene regulatory network (GRN) is an important tool in post genomic era, and its construction algorithms are concerned by many researchers. However, Most of the algorithms have high computation complexity and cannot be...
详细信息
ISBN:
(纸本)9781538621653
Gene regulatory network (GRN) is an important tool in post genomic era, and its construction algorithms are concerned by many researchers. However, Most of the algorithms have high computation complexity and cannot be easily solved in a satisfied time. So how to design a structure to accelerate the algorithms is a problem to researchers. This paper is to develop a parallel algorithm that adopts Message Passing Interface (MPI) parallel technology for gene regulatory network inferring based on time-delayed mass action model to accelerate computation. Experiments on three well-known motifs and a real biological data set of GRN show that the proposal can make full use of the existing multi-core computers' computation resources and improved the computation efficiency of the network construction.
The simulation of EM (electromagnetic) wave propagation requires considerable computation time, as it analyzes a large number of propagation paths. To overcome this problem, we propose a GPU (graphics processing unit)...
详细信息
ISBN:
(纸本)9788890701870
The simulation of EM (electromagnetic) wave propagation requires considerable computation time, as it analyzes a large number of propagation paths. To overcome this problem, we propose a GPU (graphics processing unit)-based parallel algorithm for VPL (vertical plane launch)-approximated EM wave propagation. The conventional algorithm computes the gain along propagation paths with irregular memory access, which results in low GPU performance. In our proposed algorithm, a CPU reorders irregular propagation paths to a GPU-suitable linear order on the CPU memory at each receiving point. We hid the reordering time behind CPU-GPU communication and GPU-based computation of gain on the reordered memory. We found that our proposed algorithm with a quad GPU is up to 30 times faster than the conventional algorithm with a 16-threaded dual CPU.
The scale of data used in graph analytics grows at an unprecedented rate. More than ever, domain experts require efficient and parallel algorithms for tasks in graph analytics. One such task is the truss decomposition...
详细信息
ISBN:
(纸本)9781538634721
The scale of data used in graph analytics grows at an unprecedented rate. More than ever, domain experts require efficient and parallel algorithms for tasks in graph analytics. One such task is the truss decomposition, which is a hierarchical decomposition of the edges of a graph and is closely related to the task of triangle enumeration. As evidenced by the recent GraphChallenge, existing algorithms and implementations for truss decomposition are insufficient for the scale of modern datasets. In this work, we propose a parallel algorithm for computing the truss decomposition of massive graphs on a shared-memory system. Our algorithm breaks a computation-efficient serial algorithm into several bulk-synchronous parallel steps which do not rely on atomics or other fine-grained synchronization. We evaluate our algorithm across a variety of synthetic and real-world datasets on a 56-core Intel Xeon system. Our serial implementation achieves over 1400x speedup over the provided GraphChallenge serial benchmark implementation and is up to 28x faster than the state-of-the-art shared-memory parallel algorithm.
The report presents a developed approach to simulation of acoustic fields in enclosed media. This method is based on the use of Rayleigh's integral for calculation of secondary sources generated by a wave falling ...
详细信息
The report presents a developed approach to simulation of acoustic fields in enclosed media. This method is based on the use of Rayleigh's integral for calculation of secondary sources generated by a wave falling onto media boundaries. The implementing algorithm is highly parallelizable, implies loosely coupled parallel branches with only few points of inter-thread communication. On the other hand, the algorithm is exponential upon an average number of reflections which occur to a single wave element emitted by a primary source, although for practical applications this number can be reduced enough to provide accurate results with reasonable time and space consumptions. The proposed algorithm is based on the approximate superposition of acoustical fields and provides adequate results, as long as the used equations of acoustics are linear. To calculate scattering properties of reflecting boundaries, the algorithm represents a geometric model of sound media propagation as a set of small flat vibrating pistons. Each wave element falling onto such a piston makes one radiate reflected sound in all directions and makes it possible to construct an algorithm which accepts sets of sources and reflecting surfaces. It also yields a field distribution over specified points such that each source, primary or secondary, can be associated with an element of parallel execution and be managed via a list of polymorphic sources implementing a task list. The report covers a mathematical formulation of the problem, defines an object model used to implement the algorithm, and provides some analysis of the algorithm in sequential and parallel forms. (C) 2017 The Authors. Published by Elsevier B.V.
A dominating set of a small size is useful in several settings including wireless networks, document summarization, secure system design, and the like. In this paper, we start by studying three distributed algorithms ...
详细信息
ISBN:
(纸本)9781538630778
A dominating set of a small size is useful in several settings including wireless networks, document summarization, secure system design, and the like. In this paper, we start by studying three distributed algorithms that produce a small sized dominating sets in a few rounds. We interpret these algorithms in the natural shared memory setting and experiment with these algorithms on a multi-core CPU. Based on the observations from these experimental results, we propose variations to the three algorithms and also show how the proposed variations offer interesting trade-offs with respect to the size of the dominating set produced and the time taken.
Learning the structure of Bayesian networks, even in the static case, is NP-hard, compelling much of the research to focus on heuristic-based approaches. However, there are instances where exact solutions are desirabl...
详细信息
ISBN:
(纸本)9781538622933
Learning the structure of Bayesian networks, even in the static case, is NP-hard, compelling much of the research to focus on heuristic-based approaches. However, there are instances where exact solutions are desirable especially for small network sizes. In this work, we present a dynamic programming based exact solution to learn dynamic Bayesian network structure. Our method simultaneously learns intra- as well as higher order inter-time-slice interactions in the network. For n variables, our exact solution requires O(n(2).2(n(M+1))) computations to learn M-th order network. To handle such high computational requirements, we present a parallel exact solution to push the limit on the size of the networks that can be learned. Given p = 2(k) processors, the parallel algorithm runs in O(n(2).2(nM).(2(n-k) + k)) time and achieves optimal parallel efficiency when 2(n-k) > k. Using MPI+X parallel programming model, the parallel algorithm linearly scales to 1,024 cores of a 64-node Intel Xeon InfiniBand cluster, sustaining > 99% of parallel efficiency. We also show that the learned networks on gene network datasets are of high fidelity compared to heuristic-based techniques.
暂无评论