We provide time lowerbounds for sequential and parallel algorithms deciding bisimulation on labelled transition systems that use partition refinement. For sequential algorithms this is Ω((m+n)logn) and for parallel a...
详细信息
K-nearest neighbor classification algorithm can quickly deal with the classification problem in this paper, but when calculating the similarity, it will assign the same weight to all distances, and does not pay attent...
详细信息
ISBN:
(纸本)9781665490832
K-nearest neighbor classification algorithm can quickly deal with the classification problem in this paper, but when calculating the similarity, it will assign the same weight to all distances, and does not pay attention to the impact of small distance on classification accuracy. At the same time, the k-nearest neighbor classification algorithm will be affected by the number of samples and dimensions, which will affect the efficiency of the classification algorithm. Therefore, an improved weighted KNN classification algorithm based on spark framework is proposed, which can improve the operation efficiency of the algorithm by cutting and reducing the dimension of sample data. Experimental results show that the algorithm has better accuracy and speedup ratio than the parallel algorithm based on Hadoop platform, and can process large-scale text data quickly and accurately.
In 2016, 73% of total Internet traffic came from video transmission and this percentage is expected to reach 82% by 2021. These figures show the importance of using video compression standards that maximize video qual...
详细信息
In 2016, 73% of total Internet traffic came from video transmission and this percentage is expected to reach 82% by 2021. These figures show the importance of using video compression standards that maximize video quality while minimizing the necessary bandwidth. In 2013, the HEVC standard was released accounting for an approximate 50% bit rate saving compared to H.264/AVC while maintaining the same reconstruction quality. To address increases in video IP traffic, a new generation of video coding techniques is required that achieve higher compression rates. Compression improvements are being implemented in a software package known as the Joint Exploration Test Model. In this work, we present two parallel JEM model solutions specifically designed for distributed memory platforms for both All Intra and Random Access coding modes. The proposed parallel algorithms achieved high levels of efficiency, in particular for the All Intra mode. They also showed great scalability.
We present a randomized O(mlog2n) work, O(polylog n) depth parallel algorithm for minimum cut. This algorithm matches the work bounds of a recent sequential algorithm by Gawrychowski, Mozes, and Weimann [ICALP'20]...
详细信息
The aim of this article is to show that solvers for tridiagonal Toeplitz systems of linear equations can be efficiently implemented for a variety of modern GPU-accelerated and multicore architectures using OpenACC. We...
详细信息
The aim of this article is to show that solvers for tridiagonal Toeplitz systems of linear equations can be efficiently implemented for a variety of modern GPU-accelerated and multicore architectures using OpenACC. We consider two parallel algorithms for solving such systems with special assumptions about coefficient matrices. As the first algorithm, we propose a new, faster implementation of the divide and conquer method. The next algorithm is a new, vectorizable algorithm based on a recently introduced sequential method. We consider the use of both column-wise and row-wise storage formats for two-dimensional arrays and show how to efficiently convert between these two formats using cache memory and improve the overall performance of our implementations. We also show how to tune the performance by predicting the best values of the methods' parameters. Numerical experiments performed on Intel CPUs and Nvidia GPUs show that our new implementations achieve relatively good performance and accuracy.
In this paper we show a deterministic parallel all-pairs shortest paths algorithm for real-weighted directed graphs. The algorithm has Oe(nm + (n/d)3) work and Oe(d) depth for any depth parameter d ∈ [1, n]. To the b...
详细信息
In this paper we show a deterministic parallel all-pairs shortest paths algorithm for real-weighted directed graphs. The algorithm has Oe(nm + (n/d)3) work and Oe(d) depth for any depth parameter d ∈ [1, n]. To the best of our knowledge, such a trade-off has only been previously described for the real-weighted single-source shortest paths problem using randomization [Bringmann et al., ICALP’17]. Moreover, our result improves upon the parallelism of the state-of-the-art randomized parallel algorithm for computing transitive closure, which has Oe(nm + n3/d2) work and Oe(d) depth [Ullman and Yannakakis, SIAM J. Comput.’91]. Our APSP algorithm turns out to be a powerful tool for designing efficient planar graph algorithms in both parallel and sequential regimes. By suitably adjusting the depth parameter d and applying known techniques, we obtain: (1) nearly work-efficient Oe(n1/6)-depth parallel algorithms for the real-weighted single-source shortest paths problem and finding a bipartite perfect matching in a planar graph, (2) an Oe(n9/8)-time sequential strongly polynomial algorithm for computing a minimum mean cycle or a minimum cost-to-time-ratio cycle of a planar graph, (3) a slightly faster algorithm for computing so-called external dense distance graphs of all pieces of a recursive decomposition of a planar graph. One notable ingredient of our parallel APSP algorithm is a simple deterministic Oe(nm)-work Oe(d)-depth procedure for computing Oe(n/d)-size hitting sets of shortest d-hop paths between all pairs of vertices of a real-weighted digraph. Such hitting sets have also been called d-hub sets. Hub sets have previously proved especially useful in designing parallel or dynamic shortest paths algorithms and are typically obtained via random sampling. Our procedure implies, for example, an Oe(nm)-time deterministic algorithm for finding a shortest negative cycle of a real-weighted digraph. Such a near-optimal bound for this problem has been so far only achieved usi
An extensive study of population control techniques (PCTs) for time-dependent and eigenvalue Monte Carlo (MC) neutron transport calculations is presented. We define PCT as a technique that takes a censused population ...
详细信息
Novel parallel algorithm is introduced for electromagnetic transient analysis in order to design plasmonic devices. We have established time-division parallel computation for the finite-difference time-domain (FDTD) m...
详细信息
Novel parallel algorithm is introduced for electromagnetic transient analysis in order to design plasmonic devices. We have established time-division parallel computation for the finite-difference time-domain (FDTD) method. This completely parallel technique is extremely useful, since the computational task can be equally distributed to many processors and there is no data communication during a computation. The key idea of this parallel algorithm is as follows: (i) The coarse values at temporal sampling points can be independently obtained using a finite-difference complex-frequency-domain with a fast inverse Laplace transform. (ii) These values are transferred to the initial responses of the FDTD frames in many computational processors. (iii) Conventional FDTD computation is simply performed in completely parallel. The computational time for sequential part can be reduced to a fraction of the number of processors. We will apply the proposed technique to designing a plasmonic antenna for all optical magnetic recording purpose.
We present a parallel algorithm for permanent mod 2k of a matrix of univariate integer polynomials. It places the problem in ⊕L ⊆ NC2. This extends the techniques of Valiant [26], Braverman, Kulkarni and Roy [3] and ...
详细信息
Similarity search is one of the most fundamental computations that are regularly performed on ever-increasing protein datasets. Scalability is of paramount importance for uncovering novel phenomena that occur at very ...
详细信息
Similarity search is one of the most fundamental computations that are regularly performed on ever-increasing protein datasets. Scalability is of paramount importance for uncovering novel phenomena that occur at very large scales. We unleash the power of over 20,000 GPUs on the Summit system to perform all-vs-all protein similarity search on one of the largest publicly available datasets with 405 million proteins, in less than 3.5 hours, cutting the time-to-solution for many use cases from weeks. The variability of protein sequence lengths, as well as the sparsity of the space of pairwise comparisons, make this a challenging problem in distributed memory. Due to the need to construct and maintain a data structure holding indices to all other sequences, this application has a huge memory footprint that makes it hard to scale the problem sizes. We overcome this memory limitation by innovative matrix-based blocking techniques, without introducing additional load imbalance.
暂无评论