This paper presents efficient and optimal parallel algorithms for multidimensional image template matching on CREW PRAM model. For an N d image and an M d window, we present an optimal (respectively efficient) algorit...
详细信息
This paper presents efficient and optimal parallel algorithms for multidimensional image template matching on CREW PRAM model. For an N d image and an M d window, we present an optimal (respectively efficient) algorithm which runs in O(log(M)) time with O(M d ×N d /log(M) processors (respectively O(M d ×N d )). We also present efficient and optimal algorithms for solving the multidimensional array and pattern matching.
As more research centers embark on sequencing new genomes, the problem of DNA fragment assembly for shotgun sequencing is growing in importance and complexity. Accurate and fast assembly is a crucial part of any seque...
详细信息
ISBN:
(纸本)0780393635
As more research centers embark on sequencing new genomes, the problem of DNA fragment assembly for shotgun sequencing is growing in importance and complexity. Accurate and fast assembly is a crucial part of any sequencing project and since the DNA fragment assembly problem is NP-hard, exact solutions are very difficult to obtain. Various heuristics, including genetic algorithms, were designed for solving the fragment assembly problem. While the sequential genetic algorithm has given good results, it is unable to sequence very large DNA molecules. In this work, we present two parallel methods, a distributed genetic algorithm and a parallel simulated annealing, to solve problem instances that are 77K base pairs long accurately.
Nucleus decompositions have been shown to be a useful tool for finding dense subgraphs. The coreness value of a clique represents its density based on the number of other cliques it is adjacent to. One useful output o...
详细信息
Nucleus decompositions have been shown to be a useful tool for finding dense subgraphs. The coreness value of a clique represents its density based on the number of other cliques it is adjacent to. One useful output of nucleus decomposition is to generate a hierarchy among dense subgraphs at different resolutions. However, existing parallel algorithms for nucleus decomposition do not generate this hierarchy, and only compute the coreness values. This paper presents a scalable parallel algorithm for hierarchy construction, with practical optimizations, such as interleaving the coreness computation with hierarchy construction and using a concurrent union-find data structure in an innovative way to generate the hierarchy. We also introduce a parallel approximation algorithm for nucleus decomposition, which achieves much lower span in theory and better performance in practice. We prove strong theoretical bounds on the work and span (parallel time) of our *** a 30-core machine with two-way hyper-threading, our parallel hierarchy construction algorithm achieves up to a 58.84x speedup over the state-of-the-art sequential hierarchy construction algorithm by Sariyuce et al. and up to a 30.96x self-relative parallel speedup. On the same machine, our approximation algorithm achieves a 3.3x speedup over our exact algorithm, while generating coreness estimates with a multiplicative error of 1.33x on average.
Consider a set P of points in the plane sorted by c-coordinate. A point p in P is said to be a proximate point if there exists a point q on the cc-axis such that p is the closest point to q over all points. in P. The ...
详细信息
ISBN:
(纸本)3540633073
Consider a set P of points in the plane sorted by c-coordinate. A point p in P is said to be a proximate point if there exists a point q on the cc-axis such that p is the closest point to q over all points. in P. The primate points problem is to determine all proximate points in P. We propose optimal sequential and parallel algorithms for the proximate points problem. Our sequential algorithm runs in O(n) time. Our parallel algorithms run in O(log log n) time using n/log log n Common-CRCW processors, and in O(log n) time using n/log n EREW processors. We show that both parallel algorithms are work-time optimal;the EREW algorithm is also time-optimal. As it turns out, the proximate points problem finds interesting and highly nontrivial applications to pattern analysis, digital geometry, and image processing.
As the size of today's high performance computers continue to grow, node failures in these computers are becoming frequent events. Although checkpoint is the typical technique to tolerate such failures, it often i...
详细信息
ISBN:
(纸本)9780769529837
As the size of today's high performance computers continue to grow, node failures in these computers are becoming frequent events. Although checkpoint is the typical technique to tolerate such failures, it often introduces a considerable overhead and has shown poor scalability on today's large scale systems. In this paper we defined a new term called fault tolerant parallel algorithm which means that the algorithm gets the correct answer despite the failure of nodes. The fault tolerance approach in which the data of failed processes is recovered by modifying applications to recompute on all surviving processes is checkpoint-free. In particular if no failure occurs, the fault tolerant parallel algorithms are the same as the original algorithms. We show the practicality of this technique by applying it to parallel dense matrix-matrix multiplication and Gaussian elimination to tolerate single process failure. Experimental results demonstrate that a process failure can be tolerated with a good scalability for the two fault tolerant parallel algorithms and the proposed fault tolerant parallel dense matrix-matrix multiplication is able to survive process failure with a very low performance overhead. The main drawback of this approach is non-transparent and algorithm-dependent.
In this paper we present two parallel versions of bisection method to compute the spectrum of symmetric Toeplitz matrices. Both parallel algorithms have been implemented and analysed on a virtual shared memory multipr...
详细信息
In this paper we present two parallel versions of bisection method to compute the spectrum of symmetric Toeplitz matrices. Both parallel algorithms have been implemented and analysed on a virtual shared memory multiprocessor using a portable message-passing environment. The algorithms very efficiently parallelize the sequential method, and the application of a dynamic strategy to distribute the computations produces better results than the use of a static method. We also improve the performance of the original sequential algorithm by applying Newton's method for the final approximation of the eigenvalues. However, the bad results of the sequential algorithm produce low speedups when we compare the parallel methods with the best available sequential algorithm.
Finding biconnected components (BCs) of graphs is one of the fundamental problems in graph theory which has many practical applications. If n and m are the number of the nodes and edges, respectively, in graph G, find...
详细信息
Efficiently storing and processing massive graph data sets is a challenging prob- lem as researchers seek to leverage "Big Data" to answer next-generation scientific questions. New techniques are required to...
详细信息
Efficiently storing and processing massive graph data sets is a challenging prob- lem as researchers seek to leverage "Big Data" to answer next-generation scientific questions. New techniques are required to process large scale-free graphs in shared, distributed, and external memory. This dissertation develops new techniques to parallelize the storage, computation, and communication for scale-free graphs with high-degree vertices. Our work facilitates the processing of large real-world graph datasets through the development of parallel algorithms and tools that scale to large computational and memory resources, overcoming challenges not addressed by exist- ing techniques. Our aim is to scale to trillions of edges, and our research is targeted at leadership class supercomputers, clusters with local non-volatile memory, and shared memory systems. We present three novel techniques to address scaling challenges in processing large scale-free graphs. We apply an asynchronous graph traversal technique using prioritized visitor queues that is capable of tolerating data latencies to the external graph storage media and message passing communication. To accommodate large high-degree vertices, we present an edge list partitioning technique that evenly parti- tions graphs containing high-degree vertices. Finally, we propose a technique we call distributed delegates that distributes and parallelizes the storage, computation, and communication when processing high-degree vertices. The edges of high-degree ver- tices are distributed, providing additional opportunities for parallelism not present in existing methods. We apply our techniques to multiple graph algorithms: Breadth-First Search, Single Source Shortest Path, Connected Components, K-Core decomposition, Trian- gle Counting, and Page Rank. Our experimental study of these algorithms demon- strates excellent scalability on supercomputers, clusters with non-volatile memory, and shared memory systems. Our study includes multi
Next-generation sequencing technologies have led to a big data age in biology. Since the sequencing of the human genome, the primary bottleneck has steadily moved from collection to storage and analysis of the data. T...
详细信息
Next-generation sequencing technologies have led to a big data age in biology. Since the sequencing of the human genome, the primary bottleneck has steadily moved from collection to storage and analysis of the data. The primary contributions of this dissertation are design and implementation of novel parallel algorithms for two important problems in bioinformatics – error-correction and transcriptome assembly. For error-correction, we focused on k-mer spectrum based error-correction application called Reptile. We designed a novel distributed memory algorithm that divided the k-mer and tiles amongst the processing ranks. This allows any hardware with any memory size per node to be employed for error-correction using Reptile's algorithm, irrespective of the size of the dataset. Our implementational achieved highly scalable results for E. Coli, Drosophila as well as the human datasets which consisted of 1.55 billion reads. Besides an algorithm that distributes k-mers and tiles between ranks, we have also implemented numerous heuristics that are useful to adjust the algorithm based on the hardware traits. We also implemented an extension of our parallel algorithm further by using pre-generating tiles and using collective messages to reduce the number of point to point messages for error-correction. Further extensions of this work have focused to create a library for distributed k-mer processing which has applications to problems in metagenomics. For transcriptome assembly, we have implemented a hybrid MPI-OpenMP approach for Chrysalis, which is part of the Trinity pipeline. Chrysalis clusters minimally overlapping contigs obtained from the prior module in Trinity called Inchworm. With this parallelization, we were able to reduce the runtime of the Chrysalis step of the Trinity workflow from over 50 hours to less than 5 hours for the sugarbeet dataset. We also employed this implementation to complete transcriptome of a 1.5 billion reads dataset pooled from different brea
暂无评论