Recently, there have been more chances to calculate matrix-vector multiplication due to the growing use of the neural network. We have proposed the method to automatically synthesize the optimum parallel algorithm for...
详细信息
The paper considers the mixed-integer global optimization problems. A novel parallel algorithm for solving the problems of this class based on the index algorithm for solving the continuous global optimization problem...
详细信息
ISBN:
(纸本)9783030365929;9783030365912
The paper considers the mixed-integer global optimization problems. A novel parallel algorithm for solving the problems of this class based on the index algorithm for solving the continuous global optimization problems has been proposed. The comparison of this algorithm with known analogs demonstrates the efficiency of the developed approach. The proposed algorithm allows an efficient parallelization including the employment of the graphics accelerators. The results of performed numerical experiments (solving a series of 100 multiextremal mixedinteger problems) confirm a good speedup of the algorithm with the use of GPU.
In this paper, a parallel subdomain-level discontinuous Galerkin time domain (DGTD) method based on the Message Passing Interface (MPI) library has been proposed for simulating complex structures or multiscale electro...
详细信息
ISBN:
(纸本)9781728153049
In this paper, a parallel subdomain-level discontinuous Galerkin time domain (DGTD) method based on the Message Passing Interface (MPI) library has been proposed for simulating complex structures or multiscale electromagnetic problems. The efficiency of parallel algorithm is greatly affected by load distribution, so an automatic load balancing strategy is proposed to reduce the load difference between processes in order to show the advantages of parallel algorithms. First, the relationship between the time required to solve system matrix and the degree of freedom (DoF) of the subdomains applying tetrahedral or hexahedral elements is obtained by some numerical experiments. Then the partition is adjusted to achieve load balancing and reduce the data exchange between processes by relationship above, so that the parallel algorithm approaches the linear speedup ratio. Finally, some numerical cases have been simulated to demonstrate the reliability and efficiency of the algorithm.
Deduplication is one of the most effective and efficient techniques to save memory space. It is widely used in data centers and cloud storage systems. Multi-stream concurrency is expected to increase the throughput of...
详细信息
ISBN:
(纸本)9781728143286
Deduplication is one of the most effective and efficient techniques to save memory space. It is widely used in data centers and cloud storage systems. Multi-stream concurrency is expected to increase the throughput of deduplication. However, multiple data streams hurt the locality of accessed data and weaken the benefit of data concurrency, which forms a challenge for data deduplication. Usually, the ordered index can reshape the locality of data streams, which can improve the cache hit rate during deduplication. In this paper, we first propose an efficient parallel deduplication algorithm by clustering scattered fingerprints, called CSF, to exploit the data locality as much as possible. It tries to improve the utilization rate of the fingerprint page by the clustered fingerprints. Moreover, it retains the scattered fingerprint to next round fingerprint comparison by re-using the fingerprints on the same page. Thus the number of the fingerprint pages to read is reduced. We further optimize the proposed algorithm by a scheduling strategy, which effectively schedules the task of part streams ahead while ensuring the overall performance. Finally, we evaluated the performance of our algorithm with various data sets in experiments. The experimental results show that our proposed algorithm achieves better performance than the state-of-the-art method.
In parallel computation domain, graph coloring is widely studied in its own and represents a reference problem for scheduling of parallel tasks. Unfortunately, common graph coloring strategies usually focus on minimiz...
详细信息
ISBN:
(数字)9783030200817
ISBN:
(纸本)9783030200817;9783030200800
In parallel computation domain, graph coloring is widely studied in its own and represents a reference problem for scheduling of parallel tasks. Unfortunately, common graph coloring strategies usually focus on minimizing the number of colors without any concern for the sizes of each color class, thus producing highly skewed color class distributions. However, to guarantee efficiency in parallel computations, but also in other application contexts, it is important to keep the color classes highly balanced in their sizes. In this paper we address this challenging issue for large scale graphs, proposing a fast parallel MCMC heuristic for sparse graphs that randomly generates good balanced colorings provided that a sufficient number of colors are made available. We show its effectiveness through some numerical simulations on random graphs.
With the machine learning applications processing larger and more complex data, people tend to use multiple computing nodes to execute the machine learning tasks in distributed way. However, in real world, people alwa...
详细信息
ISBN:
(纸本)9781450360913
With the machine learning applications processing larger and more complex data, people tend to use multiple computing nodes to execute the machine learning tasks in distributed way. However, in real world, people always encounter a problem that a few nodes in system exhibit poor performance and drag down the efficiency of the whole system. In existing parallel strategies such as bulk synchronous parallel and stale synchronous parallel, these nodes with poor performance may not be monitored and found out in time. To address this problem, we proposed a free stale synchronous parallel (FSSP) strategy to free the system from the negative impact of those nodes. Our experimental results on some classical machine leaning algorithms and datasets demonstrated that FSSP strategy outperformed other existing parallel computing strategy.
In this paper, a parallel subdomain level discontinuous Galerkin time domain (DGTD) method is achieved via Message Passing Interface (MPI) library based on upwind flux and Runge-Kutta explicit time integration. A load...
详细信息
ISBN:
(纸本)9781728106922
In this paper, a parallel subdomain level discontinuous Galerkin time domain (DGTD) method is achieved via Message Passing Interface (MPI) library based on upwind flux and Runge-Kutta explicit time integration. A load balancing scheme is proposed for subdomains with multiple kinds of elements in different orders. Numerical examples demonstrate the effectiveness of the proposed scheme and good speedup is achieved.
Decompressing a file made by the gzip program at an arbitrary location is in principle impossible, due to the nature of the DEFLATE compression algorithm. Consequently, no existing program can take advantage of parall...
详细信息
ISBN:
(纸本)9781538655559
Decompressing a file made by the gzip program at an arbitrary location is in principle impossible, due to the nature of the DEFLATE compression algorithm. Consequently, no existing program can take advantage of parallelism to rapidly decompress large gzip-compressed files. This is an unsatisfactory bottleneck, especially for the analysis of large sequencing data experiments. Here we propose a parallel algorithm and an implementation, pugz, that performs fast and exact decompression of any text file. We show that pugz is an order of magnitude faster than gunzip, and 5x faster than a highly-optimized sequential implementation (libdeflate). We also study the related problem of random access to compressed data. We give simple models and experimental results that shed light on the structure of gzip-compressed files containing DNA sequences. Preliminary results show that random access to sequences within a gzip-compressed FASTQ file is almost always feasible at low compression levels, yet is approximate at higher compression levels.
We present the first conditional hardness results for massively parallel algorithms for some central graph problems including (approximating) maximum matching, vertex cover, maximal independent set, and coloring. In s...
详细信息
ISBN:
(纸本)9781728149523
We present the first conditional hardness results for massively parallel algorithms for some central graph problems including (approximating) maximum matching, vertex cover, maximal independent set, and coloring. In some cases, these hardness results match or get close to the state of the art algorithms. Our hardness results are conditioned on a widely believed conjecture in massively parallel computation about the complexity of the connectivity problem. We also note that it is known that an unconditional variant of such hardness results might be somewhat out of reach for now, as it would lead to considerably improved circuit complexity lower bounds and would concretely imply that NC1 is a proper subset of P. We obtain our conditional hardness result via a general method that lifts unconditional lower bounds from the well-studied LOCAL model of distributed computing to the massively parallel computation setting.
Past research has evidently proved that public key cryptosystems are usually slower than symmetric key cryptosystems due to the reason that they use one additional cryptographic key and different methods for encryptio...
详细信息
Past research has evidently proved that public key cryptosystems are usually slower than symmetric key cryptosystems due to the reason that they use one additional cryptographic key and different methods for encryption and decryption. RSA is one of the most common asymmetric key cryptography algorithms. Recent research has focused on speeding up RSA using various techniques. With the introduction of distributed computing, parallelization of algorithms enables them to run on multiple cores concurrently at a time. RSA consists of two resource intensive operations namely Modular Exponentiation of up to 1024-bit exponents and repeated calculation of Greatest common divisor. Thus, RSA lays the perfect base for application of Montgomery Reduction algorithm to optimize the Repeated Modular multiplication in exponentiation. In this paper we proposed a parallel scheme for RSA using a new parallel data structure known as Concurrent Indexed List of character blocks. The aim of our research was to improve the speed of RSA encryption and decryption using parallelism and also make it compatible with leading industry cryptography standards. We have simulated four different approaches namely both parallel and sequential with and without Montgomery. We have also integrated our parallel paradigm with renowned C++ Crypto library and achieved a speed-up of upto four times than sequential approach. Unlike any other previous approaches, our implementation got easily integrated with any external library and thus can be adopted by any other algorithmic scheme.
暂无评论