In computing with explorable uncertainty, one considers problems where the values of some input elements are uncertain, typically represented as intervals, but can be obtained using queries. Previous work has consider...
详细信息
In computing with explorable uncertainty, one considers problems where the values of some input elements are uncertain, typically represented as intervals, but can be obtained using queries. Previous work has considered query minimization in the settings where queries are asked sequentially (adaptive model) or all at once (non-adaptive model). We introduce a new model where k queries can be made in parallel in each round, and the goal is to minimize the number of query rounds. Using competitive analysis, we present upper and lower bounds on the number of query rounds required by any algorithm in comparison with the optimal number of query rounds for the given instance. Given a set of uncertain elements and a family of m subsets of that set, we study the problems of sorting all m subsets and of determining the minimum value (or the minimum element(s)) of each subset. We also study the selection problem, i.e., the problem of determining the i-th smallest value and identifying all elements with that value in a given set of uncertain elements. Our results include 2-round-competitive algorithms for sorting and selection and an algorithm for the minimum value problem that uses at most (2 + epsilon) . opt(k)+O (1/epsilon .lg in) query rounds for every 0 < epsilon < 1, where opt(k) is the optimal number of query rounds.
In this article, some local and parallel finite element algorithms are proposed and investigated for the magnetohydrodynamic flows with low electromagnetic Reynolds number. For a solution to this problem, it comprises...
详细信息
In this article, some local and parallel finite element algorithms are proposed and investigated for the magnetohydrodynamic flows with low electromagnetic Reynolds number. For a solution to this problem, it comprises of two main components, the low-frequency components and the high-frequency components. Motivated by this, we obtain the low-frequency components globally via some relatively coarse grid and catch the high-frequency components locally using a fine grid by some local and parallel procedures. Some local a priori estimates that are crucial for our theoretical analysis are derived. The optimal error estimates are rigorously derived and some numerical tests are reported to support our theoretical findings.
A geometric approach and the concept of streamlines are used to reformulate the equations governing the multiphase multicomponent flow through porous media with allowance for phase transitions. With the use of the str...
详细信息
A geometric approach and the concept of streamlines are used to reformulate the equations governing the multiphase multicomponent flow through porous media with allowance for phase transitions. With the use of the streamline simulation technology, the three-dimensional problem is split into a set of onedimensional subproblems, for which efficient parallel algorithms can be constructed. A new geometric formulation of the streamline simulation technology is presented, which is in a sense rigorous. It is demonstrated how the accuracy of the method can be estimated with the help of this formulation.
We use exponential start time clustering to design faster parallel graph algorithms involving distances. Previous algorithms usually rely on graph decomposition routines with strict restrictions on the diameters of th...
详细信息
ISBN:
(纸本)9781450335881
We use exponential start time clustering to design faster parallel graph algorithms involving distances. Previous algorithms usually rely on graph decomposition routines with strict restrictions on the diameters of the decomposed pieces. We weaken these bounds in favor of stronger local probabilistic guarantees. This allows more direct analyses of the overall process giving: Linear work parallel algorithms that construct spanners with O(k) stretch and size O(n(1+1)=k) in unweighted graphs and size O(n(1+1/k) log k) in weighted graphs. Hopsets that lead to the first parallel algorithm for approximating shortest paths in undirected graphs with O(m poly log n) work.
This poster paper deals with algorithmic solutions for the alignment of a large number of data strings, which is a challenging and computationally demanding task, belonging to the class of NP-hard problems. In order t...
详细信息
ISBN:
(纸本)9781509020881
This poster paper deals with algorithmic solutions for the alignment of a large number of data strings, which is a challenging and computationally demanding task, belonging to the class of NP-hard problems. In order to get practical and time-acceptable solutions, the development of heuristic approaches and the employment of parallelism are inevitable. Two parallel algorithms are presented for solving the alignment problem in the context of extraction of data from web-pages. Both algorithms have their sequential counterpart, which represents an efficient heuristic approach, based on the estimation of local parameters for symbol distances. The algorithms are characterized by different assignments of data to processors of the parallel system: In the first one the processed data sets are distributed according to symbols, in the second one by columns. The algorithms are formulated in pseudocode and implemented with MPI. Their computational and space complexities are estimated and compared. Moreover, they are examined on a parallel cluster with multi-core nodes for a pool of test data coming from web-pages. The results of experiments show that the variant with distribution by columns is less efficient because of high communication costs. The algorithm with the data distribution according to symbols delivers acceptable timings, especially when the matrix of string data is long and thin.
The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of CUDA-enabled GPU architecture. It has multiple streaming multiprocessors with a shared memory, and the globa...
详细信息
ISBN:
(纸本)9781479984909
The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of CUDA-enabled GPU architecture. It has multiple streaming multiprocessors with a shared memory, and the global memory that can be accessed by all threads. The HMM has several parameters: the number d of streaming multiprocessors, the number p of threads per streaming multiprocessor, the number w of memory banks of each shared memory and the global memory, shared memory latency 1, and global memory latency L. The main purpose of this paper is to discuss optimality of fundamental parallel algorithms running on the HMM. We first show that image convolution for an image with n x n pixels using a filter of size (2v +1) x (2v +1) can be done in O(n(2)/w + n(2)L/dp + n(2)v(2/)dw + n(2)v(2)l/dp) time units on the HMM. Further, we show that this parallel implementation is time optimal by proving the lower bound of the running time. We then go on to show that the product of two n x n matrices can be computed in O (n(3)/mw + n(3)L/mdp + n(3/)dw - n(3)l/dp) time units on the HMM if the capacity of the shared memory in each streaming multiprocessor is O(m(2)). This implementation is also proved to be time optimal. We further clarify the conditions for image convolution and matrix multiplication to hide the memory access latency overhead and to maximize the global memory throughput and the parallelism. Finally, we provide experimental results on GeForce GTX Titan to support our theoretical analysis.
beta-skeletons, prominent members of the neighborhood graph family, have interesting geometric properties and various applications ranging from geographic networks to archeology. This paper focuses on computing the be...
详细信息
ISBN:
(纸本)9783642401633
beta-skeletons, prominent members of the neighborhood graph family, have interesting geometric properties and various applications ranging from geographic networks to archeology. This paper focuses on computing the beta-spectrum, a labeling of the edges of the Delaunay triangulation, DT(V), which makes it possible to quickly find the lune-ased beta-skeleton of V for any query value beta is an element of [1,2]. We consider planar n-point sets V with L-p metric, 1 < p < infinity. We present an O (n log(2) n) time sequential, and an O (log(4) n) time parallel, beta-spectrum labeling. We also show a parallel algorithm, which for a given beta is an element of [1,2] finds the lune-based beta-skeleton in O (log(2) n) time. The parallel algorithms use O(n) processors in the CREW-PRAM model. (C) 2015 Elsevier B.V. All rights reserved.
We present new shared-memory parallel algorithms for the bi-core decomposition problem, which discovers dense subgraphs in bipartite graphs and is the bipartite analogue of the classic k-core decomposition problem. We...
详细信息
ISBN:
(纸本)9781611977578
We present new shared-memory parallel algorithms for the bi-core decomposition problem, which discovers dense subgraphs in bipartite graphs and is the bipartite analogue of the classic k-core decomposition problem. We develop a theoretically-efficient parallel bi-core decomposition algorithm that discovers a hierarchy by peeling vertices from the graph in parallel. Our algorithm improves the span (parallel running time) over the state-of-the-art parallel bi-core decomposition algorithm, while matching the state-of-the-art sequential algorithm in work. We additionally prove the bi-core decomposition problem to be P-complete, meaning that a polylogarithmic span solution is unlikely under standard assumptions. We also devise a theoretically-efficient parallel bi-core index structure to allow for fast parallel queries of vertices in given cores. Finally, we propose a novel practical optimization that prunes unnecessary computations, and we provide optimized parallel implementations of our bi-core decomposition algorithms that are scalable and fast. Using 30 cores with two-way hyper-threading, our implementation achieves up to a 4.9x speedup over the state-of-the-art parallel algorithm. Our parallel index structure can be constructed up to 27.7x faster than the state-of-the-art sequential counterpart. Due to the improved storage format of our index structure, our parallel queries are up to 116.3x faster than the state-of-the-art sequential queries.
The computer technology is used for constructing the maps of the regional and local anomalies the magnetic fields for the Northern Eurasia sector of within an area confined between 48°-72° E and 60°-68&...
详细信息
ISBN:
(纸本)9783000503375
The computer technology is used for constructing the maps of the regional and local anomalies the magnetic fields for the Northern Eurasia sector of within an area confined between 48°-72° E and 60°-68° N. The algorithm for separating the anomalies in the different intervals of the wavelengths is based on subsequent upward and downward magnetic data continuation. The downward continuation procedure is an ill-posed problem, the regularization is applied. For selecting the regularization parameter, we used the results of the interpretation of the magnetic anomalies along DSS profiles. To recalculate magnetic field, parallel algorithms and software for multiprocessor computers were used. In this work, we describe the mathematical apparatus and algorithms of parallel computations that are used for designing the computer technology. We used the computed data on the magnetic field at heights 5 km and 20 km for constructing the maps of local anomalies in the upper lithosphere. The map of the regional components was obtained by the upward continuation of the field to a height of 40 km and its subsequent recalculation back to the zero level.
This paper describes usage of CUDA parallelization scheme for forward and inverse gravity problems for structural boundaries. For- ward problem is calculated using the finite elements approach. This means that the who...
详细信息
This paper describes usage of CUDA parallelization scheme for forward and inverse gravity problems for structural boundaries. For- ward problem is calculated using the finite elements approach. This means that the whole calculation volume is split into parallelepipeds and then the gravity effect of each is calculated using known formula. In- verse problem solution is found using iteration local corrections method. This method requires only forward problem calculation on each iteration and does not use the operator inversion. Obtained results show that even cheap consumer video cards are highly effective for algorithm parallelization.
暂无评论