Two novel variations on sample sort, one using only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead and another using regular sam...
详细信息
Two novel variations on sample sort, one using only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead and another using regular sampling for choosing splitters, were studied. The two were coded in Split-C and were run on a variety of platforms. Results were consistent with theoretical analysis and illustrated the scalability and efficiency of the algorithms.
Enormous river basin information has been collected by for high resolution of the physically-based distributed hydrological model, while the scales of computational domain are often restricted by the intensive calcula...
详细信息
This paper is concerned with parallel algorithms for determining the Convex Hull of N points on a plane, for a Shared Memory SIMD Computer. First, simple algorithms with read conflicts are described. It is then shown ...
详细信息
Each vertex of an undirected graph possesses a piece of information which must be sent to every other vertex. The method of communication is to send bounded size packets of messages from one vertex to another. We desc...
详细信息
Identifying long pairwise maximal common substrings among a large set of sequences is a frequently used construct in computational biology, with applications in DNA sequence clustering and assembly. Due to errors made...
详细信息
Identifying long pairwise maximal common substrings among a large set of sequences is a frequently used construct in computational biology, with applications in DNA sequence clustering and assembly. Due to errors made by sequencers, algorithms that can accommodate a small number of differences are of particular interest. Formally, let D be a collection of n sequences of total length N, phi be a length threshold, and k be a mismatch threshold. The goal is to identify and report all k-mismatch maximal common substrings of length at least phi over all pairs of strings in D. Heuristics based on seed-and-extend style filtering techniques are often employed in such applications. However, such methods cannot provide any provably efficient run time guarantees. To this end, we present a sequential algorithm with an expected run time of O(N log(k) N+occ), where occ is the output size. We then present a distributed memory parallel algorithm with an expected run time of O ((N/P log N + occ) log(k) N) using O (log(k+1) N) expected rounds of global communications, under some realistic assumptions, where p is the number of processors. Finally, we demonstrate the performance and scalability of our algorithms using experiments on large high throughput sequencing data. (C) 2020 Elsevier Inc. All rights reserved.
One of the important problems in the use of remote sensing from satellites is three-dimensional modeling of surface—fragments both dynamic (e.g., ocean surface) and slowly varying ones. Some researchers propose the u...
详细信息
Expressed sequence tags, abbreviated as ESTs, are DNA molecules experimentally derived from expressed portions of genes. Clustering of ESTs is essential for gene recognition and for understanding important genetic var...
详细信息
Expressed sequence tags, abbreviated as ESTs, are DNA molecules experimentally derived from expressed portions of genes. Clustering of ESTs is essential for gene recognition and for understanding important genetic variations such as those resulting in diseases. In this paper, we present the algorithmic foundations and implementation of PaCE, a parallel software system we developed for large-scale EST clustering. The novel features of our approach include 1) design of space-efficient algorithms to limit the space required to linear in the size of the input data set, 2) a combination of algorithmic techniques to reduce the total work without sacrificing the quality of EST clustering, and 3) use of parallel processing to reduce runtime and facilitate clustering of large data sets. Using a combination of these techniques, we report the clustering of 327,632 rat ESTs in 47 minutes, and 420,694 Triticum aestivum ESTs in 3 hours and 15 minutes, using a 60-processor IBM xSeries cluster. These problems are well beyond the capabilities of state-of-the-art sequential software. We also present thorough experimental evaluation of our software including quality assessment using benchmark Arabidopsis EST data.
In this paper we describe a technique for finding efficient parallel algorithms for problems on directed graphs that involve checking the existence of certain kinds of paths in the graph. This technique provides effic...
详细信息
In this paper we describe a technique for finding efficient parallel algorithms for problems on directed graphs that involve checking the existence of certain kinds of paths in the graph. This technique provides efficient algorithms for finding dominators in flow graphs, performing interval and loop analysis on reducible flow graphs, and finding the feedback vertices of a digraph. Each of these algorithms takes O(log2 n) time using the same number of processors needed for fast matrix multiplication. All of these bounds are for an EREW PRAM.
We describe the first parallel algorithm with optimal speedup for constructing minimum-width tree decompositions of graphs of bounded treewidth. On n-vertex input graphs, the algorithm works in O((logn)2) time using O...
详细信息
Current visual text mining platforms are still focused on small or medium-scale datasets and sequential algorithms. However, as document collections increase in size and complexity, more computing resources are requir...
详细信息
ISBN:
(纸本)9780889867741
Current visual text mining platforms are still focused on small or medium-scale datasets and sequential algorithms. However, as document collections increase in size and complexity, more computing resources are required in order to achieve the expected interactive experience. In order to address the scalability problem, this paper proposes and evaluates parallel implementations for three critical visual text mining algorithms. Experiments with the parallel solutions were conducted for varying dataset sizes and different numbers of processors. The results show a good speedup for the proposed solutions and indicate the potential benefits of exploring task parallelism in critical algorithms to improve scalability of an interactive visual text mining platform.
暂无评论