The proceedings contain 367 papers. The topics discussed include: inference of disease-specific gene interaction network using a Bayesian network learned by genetic algorithm;P-SaMI: a data-flow pattern to perform mas...
ISBN:
(纸本)9781450331968
The proceedings contain 367 papers. The topics discussed include: inference of disease-specific gene interaction network using a Bayesian network learned by genetic algorithm;P-SaMI: a data-flow pattern to perform massively-parallel molecular docking experiments using a fully-flexible receptor model;a new approach to biometric recognition based on hand geometry;shape description based on bag of salience points;OR-PCA with dynamic feature selection for robust background subtraction;compact and discriminative approach for encoding spatial-relationship of visual words;an architecture of recommender system for scientific paper;evolving decision-tree induction algorithms with a multi-objective hyper-heuristic;collective preferences in evolutionary multi-objective optimization: techniques and potential contributions of collective intelligence;color image quantization using interactive genetic algorithm;and benchmarking motion sensing devices for rehabilitative gaming.
A fundamental challenge in designing concurrent data structures is obtaining efficient wait-free implementations, in which each operation completes regardless of the behavior of other operations in the system. The mos...
详细信息
ISBN:
(纸本)9781450336178
A fundamental challenge in designing concurrent data structures is obtaining efficient wait-free implementations, in which each operation completes regardless of the behavior of other operations in the system. The most common paradigm for guaranteeing wait-freedom is to employ a helping mechanism, in which, intuitively, fast processes help slow processes complete their operations. Curiously, despite its abundant use, to date, helping has not been formally defined nor was its necessity rigorously studied. In this paper we initiate a rigorous study of the interaction between wait-freedom and helping. We start with presenting a formal definition of help, capturing the intuition of one thread helping another to make progress. Next, we present families of object types for which help is necessary in order to obtain wait-freedom. In other words, we prove that for some types there are no linearizable wait-free help-free implementations. In contrast, we show that other, simple types, can be implemented in a linearizable wait-free manner without employing help. Finally, we provide a universal strong primitive for implementing wait-free data structures without using help. Specifically, given a wait-free help-free fetch&cons object, one can implement any type in a wait-free help-free manner.
We show that simple sequential randomized iterative algorithms for random permutation, list contraction, and tree contraction are highly parallel. In particular, if iterations of the algorithms are run as soon as all ...
ISBN:
(纸本)9781510813311
We show that simple sequential randomized iterative algorithms for random permutation, list contraction, and tree contraction are highly parallel. In particular, if iterations of the algorithms are run as soon as all of their dependencies have been resolved, the resulting computations have logarithmic depth (parallel time) with high probability. Our proofs make an interesting connection between the dependence structure of two of the problems and random binary trees. Building upon this analysis, we describe linear-work, polylogarithmic-depth algorithms for the three problems. Although asymptotically no better than the many prior parallelalgorithms for the given problems, their advantages include very simple and fast implementations, and returning the same result as the sequential algorithm. Experiments on a 40-core machine show reasonably good performance relative to the sequential algorithms.
We study the design of nearly-linear-time algorithms for approximately solving positive linear programs. Both the parallel and the sequential deterministic versions of these algorithms require O(ε~(-4)) iterations, a...
详细信息
ISBN:
(纸本)9781510813311
We study the design of nearly-linear-time algorithms for approximately solving positive linear programs. Both the parallel and the sequential deterministic versions of these algorithms require O(ε~(-4)) iterations, a dependence that has not been improved since the introduction of these methods in 1993 by Luby and Nisan. Moreover, previous algorithms and their analyses rely on update steps and convergence arguments that are combinatorial in nature, and do not seem to arise naturally from an optimization viewpoint. In this paper, we leverage insights from optimization theory to construct a novel algorithm that breaks the longstanding O(ε~(-4)) barrier. Our algorithm has a simple analysis and a clear motivation. Our work introduces a number of novel techniques, such as the combined application of gradient descent and mirror descent, and a truncated, smoothed version of the standard multiplicative weight update, which may be of independent interest.
Clustering large numbers of data points is a very computationally demanding task that often needs to be accelerated in order to be useful in practical applications. This work focuses on the Density-Based Spatial Clust...
详细信息
Clustering large numbers of data points is a very computationally demanding task that often needs to be accelerated in order to be useful in practical applications. This work focuses on the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, which is one of the state-of-the-art clustering algorithms, and targets its acceleration using an FPGA device. The article presents an optimized, scalable, and parameterizable architecture that takes advantage of the internal memory structure of modern FPGAs in order to deliver a high-performance clustering system. Post-synthesis simulation results show that the developed system can obtain mean speedups of 31 x in real-world tests and 202 x in synthetic tests when compared to state-of-the-art software counterparts running on a quad-core 3.4GHz Intel i7-2600k. Additionally, this implementation is also capable of clustering data with any number of dimensions without impacting the performance.
Data-parallelarchitectures must provide efficient support for complex control-flow constructs to support sophisticated applications coded in modern single-program multiple-data languages. As these architectures have ...
详细信息
ISBN:
(纸本)9781479969982
Data-parallelarchitectures must provide efficient support for complex control-flow constructs to support sophisticated applications coded in modern single-program multiple-data languages. As these architectures have wide datapaths that process a single instruction across parallel threads, a mechanism is needed to track and sequence threads as they traverse potentially divergent control paths through the program. The design space for divergence management ranges from software-only approaches where divergence is explicitly managed by the compiler, to hardware solutions where divergence is managed implicitly by the microarchitecture. In this paper, we explore this space and propose a new predication-based approach for handling control-flow structures in data-parallelarchitectures. Unlike prior predication algorithms, our new compiler analyses and hardware instructions consider the commonality of predication conditions across threads to improve efficiency. We prototype our algorithms in a production compiler and evaluate the tradeoffs between software and hardware divergence management on current GPU silicon. We show that our compiler algorithms make a predication-only architecture competitive in performance to one with hardware support for tracking divergence.
The performance of parallelalgorithms for sparse matrix-matrix multiplication is typically determined by the amount of interprocessor communication performed, which in turn depends on the nonzero structure of the inp...
详细信息
ISBN:
(纸本)9781450335881
The performance of parallelalgorithms for sparse matrix-matrix multiplication is typically determined by the amount of interprocessor communication performed, which in turn depends on the nonzero structure of the input matrices. In this paper, we characterize the communication cost of a sparse matrix-matrix multiplication algorithm in terms of the size of a cut of an associated hypergraph that encodes the computation for a given input nonzero structure. Obtaining an optimal algorithm corresponds to solving a hypergraph partitioning problem. Our hypergraph model generalizes several existing models for sparse matrix-vector multiplication, and we can leverage hypergraph partitioners developed for that computation to improve application-specific algorithms for multiplying sparse matrices.
We describe a simple algorithm for spectral graph sparsification, based on iterative computations of weighted spanners and uniform sampling. Leveraging the algorithms of Baswana and Sen for computing spanners, we obta...
详细信息
ISBN:
(纸本)9781450328210
We describe a simple algorithm for spectral graph sparsification, based on iterative computations of weighted spanners and uniform sampling. Leveraging the algorithms of Baswana and Sen for computing spanners, we obtain the first distributed spectral sparsification algorithm. We also obtain a parallel algorithm with improved work and time guarantees. Combining this algorithm with the parallel framework of Peng and Spielman for solving symmetric diagonally dominant linear systems, we get a parallel solver which is much closer to being practical and significantly more efficient in terms of the total work.
The Lopsided Lovasz Local Lemma (LLLL) is a powerful probabilistic principle which has been used in a variety of combinatorial constructions. While this principle began as a general statement about probability spaces,...
详细信息
ISBN:
(纸本)9781510813311
The Lopsided Lovasz Local Lemma (LLLL) is a powerful probabilistic principle which has been used in a variety of combinatorial constructions. While this principle began as a general statement about probability spaces, it has recently been transformed into a variety of polynomial-time algorithms. The resampling algorithm of Moser & Tardos is the most well-known example of this. A variety of criteria have been shown for the LLLL; the strongest possible criterion was shown by Shearer, and other criteria which are easier to use computationally have been shown by Bissacot et al, Pegden, and Kolipaka & Szegedy. We show a new criterion for the Moser-Tardos algorithm to converge. This criterion is stronger than the LLLL criterion, and in fact can yield better results even than the full Shearer criterion. This is possible because it does not apply in the same generality as the original LLLL; yet, it is strong enough to cover many applications of the LLLL in combinatorics. We show a variety of new bounds and algorithms. A noteworthy application is for k-SAT, with bounded occurences of variables. As shown in Gebauer, Szabo, and Tardos, a k-SAT instance in which every variable appears L ≤ 2~(k+1)/(e(k+1)) times, is satisfiable. Although this bound is asymptotically tight (in k), we improve it to L ≤ (2~(k+1)(1-1/k)~k)/(k-1) - 2/k which can be significantly stronger when k is small. We introduce a new parallel algorithm for the LLLL. While Moser & Tardos described a simple parallel algorithm for the Lovasz Local Lemma, and described a simple sequential algorithm for a form of the Lopsided Lemma, they were not able to combine the two. Our new algorithm applies in nearly all settings in which the sequential algorithm works - this includes settings covered by our new stronger LLLL criterion.
暂无评论