Tree search algorithms play an important role in many applications in the field of artificial intelligence. When playing board games like chess etc., computers use game tree search algorithms to evaluate a position. I...
详细信息
ISBN:
(纸本)9781581134094
Tree search algorithms play an important role in many applications in the field of artificial intelligence. When playing board games like chess etc., computers use game tree search algorithms to evaluate a position. In this paper, we present a procedure that we call parallel Controlled Conspiracy Number Search (parallel CCNS). Shortly, we describe the principles of the sequential CCNS algorithm, which bases its approximation-results on irregular subtrees of the entire game tree. We have parallelized CCNS and implemented it in our chess program ***, which now is the first in the world, that could win a high-ranked Grandmaster chess-tournament. We add experiments that show a speedup of about 50 on 159 processors running on an SCI-workstation-cluster.
In a breakthrough result, Spielman and Teng (2004) developed a nearly-linear time solver for Laplacian linear equations, i.e. equations where the coefficient matrix is symmetric with non-negative diagonals and zero ro...
详细信息
ISBN:
(纸本)9798400704161
In a breakthrough result, Spielman and Teng (2004) developed a nearly-linear time solver for Laplacian linear equations, i.e. equations where the coefficient matrix is symmetric with non-negative diagonals and zero rowsums. Since the development of the Spielman-Teng solver, there has been substantial progress, simplifying and improving their result, but obtaining a fast practical, parallel Laplacian solver remains an open problem. We present a framework for obtaining extremely simple, parallel Laplacian linear equation solvers with nearly-linear work and sub-linear depth. Our framework allows us to parallelize any Laplacian solver based on repeated single-vertex approximate Gaussian elimination. We demonstrate this by parallelizing both the algorithm of Kyng and Sachdeva (2016) and the practical variant by Gao, Kyng, and Spielman (2023). Our framework is work-efficient in the sense of matching the sequential work of these algorithms. Our parallelization framework is very simple: We sample a subset of the current low-degree vertices (sparse columns), and in parallel we eliminate all vertices that are isolated in the resulting induced subgraph. This approach can be combined with any parallelizable approximate single-vertex elimination subroutine with sparse output. Given the simplicity of the approach, we believe that using it to parallelize the solver of Gao, Kyng, and Spielman (2023) is the most promising direction for obtaining practical parallel Laplacian solvers. If we additionally use a parallel spectral sparsification routine, our approach can be modified to work in polylogarithmic depth and nearly-linear work.
We consider the natural extension of the single disk caching problem to parallel disk I/O model. We close the existing gap between lower and upper bounds and achieve optimal competitive ratio of O(√D) when lookahead ...
详细信息
We consider the natural extension of the single disk caching problem to parallel disk I/O model. We close the existing gap between lower and upper bounds and achieve optimal competitive ratio of O(√D) when lookahead is more than the memory size M. When lookahead is smaller, we derive various upper bounds and lower bounds on the competitive ratio under various adversarial models.
Data-parallelarchitectures must provide efficient support for complex control-flow constructs to support sophisticated applications coded in modern single-program multiple-data languages. As these architectures have ...
详细信息
ISBN:
(纸本)9781479969982
Data-parallelarchitectures must provide efficient support for complex control-flow constructs to support sophisticated applications coded in modern single-program multiple-data languages. As these architectures have wide datapaths that process a single instruction across parallel threads, a mechanism is needed to track and sequence threads as they traverse potentially divergent control paths through the program. The design space for divergence management ranges from software-only approaches where divergence is explicitly managed by the compiler, to hardware solutions where divergence is managed implicitly by the microarchitecture. In this paper, we explore this space and propose a new predication-based approach for handling control-flow structures in data-parallelarchitectures. Unlike prior predication algorithms, our new compiler analyses and hardware instructions consider the commonality of predication conditions across threads to improve efficiency. We prototype our algorithms in a production compiler and evaluate the tradeoffs between software and hardware divergence management on current GPU silicon. We show that our compiler algorithms make a predication-only architecture competitive in performance to one with hardware support for tracking divergence.
This brief announcement presents parallelalgorithms with a tradeoff between work and span for single source reachability and approximate shortest paths on directed graphs. Both algorithms have ∼O(mρ2 + nρ4) work a...
详细信息
A hierarchical communication model was presented to study the scheduling tasks with small communication delays for cluster of processors. The model captured hierarchical nature of communication in parallel computers. ...
详细信息
ISBN:
(纸本)9781581134094
A hierarchical communication model was presented to study the scheduling tasks with small communication delays for cluster of processors. The model captured hierarchical nature of communication in parallel computers. An approximation algorithm was proposed for the small communication time (SCT) hierarchical model with an unbounded number of bi-processor machines. It was found that the communication overhead did not interfere with the availability of processors and allowed all processors to execute other tasks.
We present new BSP algorithms for deterministic sorting and randomized median finding. We sort n general keys by using a partitioning scheme that achieves the requirements of efficiency (one-optimality) and insensitiv...
详细信息
ISBN:
(纸本)9780897918091
We present new BSP algorithms for deterministic sorting and randomized median finding. We sort n general keys by using a partitioning scheme that achieves the requirements of efficiency (one-optimality) and insensitivity against data skew (the accuracy of the splitting keys depends solely on the step distance, which can be adapted to meet the worst-case requirements of our application). Although we employ sampling in order to realize efficiency, we can give a precise worst-case estimation of the maximum imbalance which might occur. We also investigate optimal randomized BSP algorithms for the problem of finding the median of n elements that require, with high-probability, 3n/(2p)+o(n/p) number of comparisons, for a wide range of values of n and p. Experimental results for the two algorithms are also presented.
This paper describes a new parallel algorithm for optimizing costly nonconvex functions with box constraints when derivatives are unavailable. MAPO (Multi-Algorithm parallel Optimization) iteratively uses a committee ...
详细信息
ISBN:
(纸本)9781581136616
This paper describes a new parallel algorithm for optimizing costly nonconvex functions with box constraints when derivatives are unavailable. MAPO (Multi-Algorithm parallel Optimization) iteratively uses a committee of optimization algorithms based on response surfaces to generate candidate points for function evaluation. In each iteration, the evaluation points are selected from multiple rankings of all candidate points. Good numerical results for MAPO with 4 radial basis function methods are reported for 8 processors.
The prefetch schedulng problem and output scheduling problem with respect to parallel disk input and output, respectively, are considered. A fixed size pool of empty memory buffers and the sequence of block write requ...
详细信息
ISBN:
(纸本)9781581134094
The prefetch schedulng problem and output scheduling problem with respect to parallel disk input and output, respectively, are considered. A fixed size pool of empty memory buffers and the sequence of block write requests are taken as the input by the output scheduling problem. The prefetch scheduling problem takes the pool of empty memory buffers and the sequence of block read requests as the input. The duality between the two problems and the analysis of applications in one domain with respect to the applications in an another domain is also illusttrated.
A quantitative comparison of the BSP and LogP models for parallel computation is developed. Very efficient cross simulations between the two models are derived, showing their substantial equivalence for algorithmic de...
详细信息
ISBN:
(纸本)9780897918091
A quantitative comparison of the BSP and LogP models for parallel computation is developed. Very efficient cross simulations between the two models are derived, showing their substantial equivalence for algorithmic design guided by asymptotic analysis. It is also shown that the two models can be implemented with similar performance on most point-to-point networks. In conclusion, within the limits of our analysis that is mainly of asymptotic nature, BSP and LogP can be viewed as closely related variants within the bandwidth-latency framework for modeling parallel computation. BSP seems somewhat preferable due to greater simplicity and portability, and slightly greater power.
暂无评论