The authors recently introduced the technique of sparse mesh refinement to produce the first near-optimal sequential time bounds of O(n lg L/s+m) for inputs in any fixed dimension with piecewise-linear constraining (P...
详细信息
ISBN:
(纸本)9781595936677
The authors recently introduced the technique of sparse mesh refinement to produce the first near-optimal sequential time bounds of O(n lg L/s+m) for inputs in any fixed dimension with piecewise-linear constraining (PLC) features. This paper extends that work to the parallel case, refining the same inputs in time O(lg(L/s)lg m) on an EREW PRAM while maintaining the work bound;in practice, this means we expect linear speedup for any practical number of processors. This is faster than the best previously known parallel Delaunay mesh refinement algorithms in two dimensions. It is the first technique with work bounds equal to the sequential case. In higher dimension, it is the first provably fast parallel technique for any kind of quality mesh refinement with PLC inputs. Furthermore, the algorithm's implementation is straightforward enough that it is likely to be extremely fast in practice.
Mining association rules from large databases is an important problem in data mining. There is a need to develop parallel algorithm for this problem because it is a very costly computation process. However, all propos...
详细信息
ISBN:
(纸本)9780897919890
Mining association rules from large databases is an important problem in data mining. There is a need to develop parallel algorithm for this problem because it is a very costly computation process. However, all proposed parallelalgorithms for mining association rules follow the conventional level-wise approach. On a shared-memory multi-processors, they will impose a synchronization in every iteration which degrades greatly their performance. The deficiency comes from the contention on the shared I/O channel when all processors are accessing their database partitions in the shared storage synchronously. An asynchronous algorithm APM has been proposed for mining association rules on shared-memory multiprocessors. All participating processors in APM generate candidates and count their supports independently without synchronization. Furthermore, it can finish the computation with less I/O than required in the level-wise approach. The algorithm has been implemented on a Sun Enterprise 4000 multi-processors with 12 nodes. The experiments show that APM has super performance than other proposed synchronous algorithms.
NEighborhood MOdeling (NEMO) provides transparent parallelism to develop parallel spatial data processing. It addresses the following parallel issues: architecture and machine independence;communication bottlenecks;da...
详细信息
NEighborhood MOdeling (NEMO) provides transparent parallelism to develop parallel spatial data processing. It addresses the following parallel issues: architecture and machine independence;communication bottlenecks;data visualization;casualty errors;load balancing;and data coherence. NEMO is capable of processing three types of time consuming raster neighborhood models: cellular automata;propagation;and neighborhood analysis. NEMO achieves this flexibility by including five components to its design: the three application drivers such as the cellular automata driver, propagation driver, and neighborhood analysis automata driver;and the display manager and raster database manager.
Cache-oblivious algorithms have the advantage of achieving good sequential cache complexity across all levels of a multi-level cache hierarchy, regardless of the specifies (cache size and cache line size) of each leve...
详细信息
ISBN:
(纸本)9781605586069
Cache-oblivious algorithms have the advantage of achieving good sequential cache complexity across all levels of a multi-level cache hierarchy, regardless of the specifies (cache size and cache line size) of each level. In this paper, we describe cache-oblivious sorting algorithms with optimal work, optimal cache complexity and polylogarithmic depth. Using known mappings, these lead to low cache complexities on shared-memory multiprocessors with a single level of private caches or a single shared cache. Moreover, the low cache complexities extend to shared-memory multiprocessors with common configurations of multi-level caches. The key factor in the low cache complexity on multiprocessors is the low depth of the algorithms we propose.
We present a randomized parallel algorithm for constructing the 3D convex hull on a generic p-processor coarse grained multicomputer with arbitrary interconnection network and n/p local memory per processor, where n/p...
详细信息
We present a randomized parallel algorithm for constructing the 3D convex hull on a generic p-processor coarse grained multicomputer with arbitrary interconnection network and n/p local memory per processor, where n/p ≥ p2+Ε (for some arbitrarily small Ε > 0). For any given set of n points in 3-space, the algorithm computes the 3D convex hull, with high probability, in O(n log n÷p) local computation time and O(1) communication phases with at most O(n÷p) data sent/received by each processor. That is, with high probability, the algorithm computes the 3D convex hull of an arbitrary point set in time O(n log n÷p + Γn,p), where Γn,p denotes the time complexity of one communication phase. In the terminology of the BSP model, our algorithm requires, with high probability, O(1) supersteps and a synchronization period Θ(n log n÷p). In the LogP model, the execution time of our algorithm is asymptotically optimal for several architectures.
We study the parallel query complexity of reconstructing binary trees from simple queries involving their nodes. We show that a querier can efficiently reconstruct a binary tree with a logarithmic number of rounds and...
详细信息
The original LogP model for parallel computation use four parameters;the communication latency (L), overhead (n), bandwidth (g), and the number of processors (P). This basic model is extended with a linear model for l...
详细信息
The original LogP model for parallel computation use four parameters;the communication latency (L), overhead (n), bandwidth (g), and the number of processors (P). This basic model is extended with a linear model for long messages. The resulting combination which is called the LogGP model for parallel computation has one additional parameter, G, which captures the bandwidth obtained for long messages. In this paper, the algorithm design and analysis under the new model are discussed, and the all-to-all remap, FFT, and radix sort are examined. In addition, the single node scatter problem is, examined. Finally, solutions are derived for the problem, and the their optimality under the LogGP model is proven.
Padded sorting requires n input keys to be output in sorted order in an array with slightly more than n locations, unused locations being filled with a special null value. We show that a deterministic CRCW PRAM with k...
详细信息
The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires signific...
详细信息
The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view each time the computational mesh is adapted. JOVE has been implemented on an SP2 in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. Furthermore, JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.
We present a parallel algorithm on a CRCW PRAM for three-dimensional pattern matching, i.e., finding all occurrences of a pattern of size m3 in a text of size n3. The text search of the algorithm is alphabet-independe...
详细信息
ISBN:
(纸本)9780897918909
We present a parallel algorithm on a CRCW PRAM for three-dimensional pattern matching, i.e., finding all occurrences of a pattern of size m3 in a text of size n3. The text search of the algorithm is alphabet-independent and runs in optimal constant time. The preprocessing computes a witness array for the pattern, and it runs in O(log m) time using m3 processors. The witness computation uses multilayer suffix trees and it works for any dimensions. Our result is based on a new characterization of three-dimensional periodicity. The parallel algorithm can be translated into a sequential algorithm for three-dimensional pattern matching whose text search is alphabet-independent and runs in O(n3) time and whose preprocessing runs in O(m3 log σ) time, where σ = min(|Σ|, m3) and Σ is the alphabet of symbols.
暂无评论