Transmission, storage, and archival of high-throughput sequencing (HTS) short-read datasets pose significant challenges due to the large size of such datasets. Constant improvements to HTS technology, in the form of i...
详细信息
ISBN:
(纸本)9781450366663
Transmission, storage, and archival of high-throughput sequencing (HTS) short-read datasets pose significant challenges due to the large size of such datasets. Constant improvements to HTS technology, in the form of increasing throughput and decreasing cost, and its increasing adoption amplify the problem. General-purpose compression algorithms have been widely adopted for representing read datasets in a compact form. However, they are unable to fully leverage the domain-specific properties of read datasets. In response, researchers proposed special-purpose compression algorithms which improve upon the compression efficiency of generalpurpose compression algorithms. In this paper, we present Par-RefCom, a parallel reference-based algorithm for compressing HTS genomics short-read datasets. HTS instruments are typically used to generate paired-end reads as they hold significance for biological analysis. In contrast to existing special-purpose compression algorithms, ParRefCom treats paired-end reads as first-class citizens. Owing to this treatment of paired-end reads, our algorithm is able to significantly improve compression efficiency over the state-of-the-art. More specifically, for a benchmark human dataset, the size of the compressed output is 21% smaller than that produced by the current best algorithm. Further, ParRefCom is scalable and its compression and decompression speeds are better than those of reference-free methods.
Sketches are probabilistic data structures that can provide approximate results within mathematically proven error bounds while using orders of magnitude less memory than traditional approaches. They are tailored for ...
详细信息
ISBN:
(纸本)9783030294007;9783030293994
Sketches are probabilistic data structures that can provide approximate results within mathematically proven error bounds while using orders of magnitude less memory than traditional approaches. They are tailored for streaming data analysis on architectures even with limited memory such as single-board computers that are widely exploited for IoT and edge computing. Since these devices offer multiple cores, with efficient parallel sketching schemes, they are able to manage high volumes of data streams. However, since their caches are relatively small, a careful parallelization is required. In this work, we focus on the frequency estimation problem and evaluate the performance of a high-end server, a 4-core Raspberry Pi and an 8-core Odroid. As a sketch, we employed the widely used Count-Min Sketch. To hash the stream in parallel and in a cache-friendly way, we applied a novel tabulation approach and rearranged the auxiliary tables into a single one. To parallelize the process with performance, we modified the workflow and applied a form of buffering between hash computations and sketch updates. Today, many single-board computers have heterogeneous processors in which slow and fast cores are equipped together. To utilize all these cores to their full potential, we proposed a dynamic load-balancing mechanism which significantly increased the performance of frequency estimation.
Recently, there have been more chances to calculate matrix-vector multiplication due to the growing use of the neural network. We have proposed the method to automatically synthesize the optimum parallel algorithm for...
详细信息
The paper considers the mixed-integer global optimization problems. A novel parallel algorithm for solving the problems of this class based on the index algorithm for solving the continuous global optimization problem...
详细信息
ISBN:
(纸本)9783030365929;9783030365912
The paper considers the mixed-integer global optimization problems. A novel parallel algorithm for solving the problems of this class based on the index algorithm for solving the continuous global optimization problems has been proposed. The comparison of this algorithm with known analogs demonstrates the efficiency of the developed approach. The proposed algorithm allows an efficient parallelization including the employment of the graphics accelerators. The results of performed numerical experiments (solving a series of 100 multiextremal mixedinteger problems) confirm a good speedup of the algorithm with the use of GPU.
In this paper, a parallel subdomain-level discontinuous Galerkin time domain (DGTD) method based on the Message Passing Interface (MPI) library has been proposed for simulating complex structures or multiscale electro...
详细信息
ISBN:
(纸本)9781728153049
In this paper, a parallel subdomain-level discontinuous Galerkin time domain (DGTD) method based on the Message Passing Interface (MPI) library has been proposed for simulating complex structures or multiscale electromagnetic problems. The efficiency of parallel algorithm is greatly affected by load distribution, so an automatic load balancing strategy is proposed to reduce the load difference between processes in order to show the advantages of parallel algorithms. First, the relationship between the time required to solve system matrix and the degree of freedom (DoF) of the subdomains applying tetrahedral or hexahedral elements is obtained by some numerical experiments. Then the partition is adjusted to achieve load balancing and reduce the data exchange between processes by relationship above, so that the parallel algorithm approaches the linear speedup ratio. Finally, some numerical cases have been simulated to demonstrate the reliability and efficiency of the algorithm.
Deduplication is one of the most effective and efficient techniques to save memory space. It is widely used in data centers and cloud storage systems. Multi-stream concurrency is expected to increase the throughput of...
详细信息
ISBN:
(纸本)9781728143286
Deduplication is one of the most effective and efficient techniques to save memory space. It is widely used in data centers and cloud storage systems. Multi-stream concurrency is expected to increase the throughput of deduplication. However, multiple data streams hurt the locality of accessed data and weaken the benefit of data concurrency, which forms a challenge for data deduplication. Usually, the ordered index can reshape the locality of data streams, which can improve the cache hit rate during deduplication. In this paper, we first propose an efficient parallel deduplication algorithm by clustering scattered fingerprints, called CSF, to exploit the data locality as much as possible. It tries to improve the utilization rate of the fingerprint page by the clustered fingerprints. Moreover, it retains the scattered fingerprint to next round fingerprint comparison by re-using the fingerprints on the same page. Thus the number of the fingerprint pages to read is reduced. We further optimize the proposed algorithm by a scheduling strategy, which effectively schedules the task of part streams ahead while ensuring the overall performance. Finally, we evaluated the performance of our algorithm with various data sets in experiments. The experimental results show that our proposed algorithm achieves better performance than the state-of-the-art method.
In parallel computation domain, graph coloring is widely studied in its own and represents a reference problem for scheduling of parallel tasks. Unfortunately, common graph coloring strategies usually focus on minimiz...
详细信息
ISBN:
(数字)9783030200817
ISBN:
(纸本)9783030200817;9783030200800
In parallel computation domain, graph coloring is widely studied in its own and represents a reference problem for scheduling of parallel tasks. Unfortunately, common graph coloring strategies usually focus on minimizing the number of colors without any concern for the sizes of each color class, thus producing highly skewed color class distributions. However, to guarantee efficiency in parallel computations, but also in other application contexts, it is important to keep the color classes highly balanced in their sizes. In this paper we address this challenging issue for large scale graphs, proposing a fast parallel MCMC heuristic for sparse graphs that randomly generates good balanced colorings provided that a sufficient number of colors are made available. We show its effectiveness through some numerical simulations on random graphs.
With the machine learning applications processing larger and more complex data, people tend to use multiple computing nodes to execute the machine learning tasks in distributed way. However, in real world, people alwa...
详细信息
ISBN:
(纸本)9781450360913
With the machine learning applications processing larger and more complex data, people tend to use multiple computing nodes to execute the machine learning tasks in distributed way. However, in real world, people always encounter a problem that a few nodes in system exhibit poor performance and drag down the efficiency of the whole system. In existing parallel strategies such as bulk synchronous parallel and stale synchronous parallel, these nodes with poor performance may not be monitored and found out in time. To address this problem, we proposed a free stale synchronous parallel (FSSP) strategy to free the system from the negative impact of those nodes. Our experimental results on some classical machine leaning algorithms and datasets demonstrated that FSSP strategy outperformed other existing parallel computing strategy.
Decompressing a file made by the gzip program at an arbitrary location is in principle impossible, due to the nature of the DEFLATE compression algorithm. Consequently, no existing program can take advantage of parall...
详细信息
ISBN:
(纸本)9781538655559
Decompressing a file made by the gzip program at an arbitrary location is in principle impossible, due to the nature of the DEFLATE compression algorithm. Consequently, no existing program can take advantage of parallelism to rapidly decompress large gzip-compressed files. This is an unsatisfactory bottleneck, especially for the analysis of large sequencing data experiments. Here we propose a parallel algorithm and an implementation, pugz, that performs fast and exact decompression of any text file. We show that pugz is an order of magnitude faster than gunzip, and 5x faster than a highly-optimized sequential implementation (libdeflate). We also study the related problem of random access to compressed data. We give simple models and experimental results that shed light on the structure of gzip-compressed files containing DNA sequences. Preliminary results show that random access to sequences within a gzip-compressed FASTQ file is almost always feasible at low compression levels, yet is approximate at higher compression levels.
In this paper, a parallel subdomain level discontinuous Galerkin time domain (DGTD) method is achieved via Message Passing Interface (MPI) library based on upwind flux and Runge-Kutta explicit time integration. A load...
详细信息
ISBN:
(纸本)9781728106922
In this paper, a parallel subdomain level discontinuous Galerkin time domain (DGTD) method is achieved via Message Passing Interface (MPI) library based on upwind flux and Runge-Kutta explicit time integration. A load balancing scheme is proposed for subdomains with multiple kinds of elements in different orders. Numerical examples demonstrate the effectiveness of the proposed scheme and good speedup is achieved.
暂无评论