检索结果-内蒙古大学图书馆

An Improved Weighted KNN Algorithm About Text Classification Based on Spark Framework

学校读者我要写书评

暂无评论

An Improved Weighted KNN Algorithm About Text Classification...

International Conference on Information, Communication and Networks (ICICN)

作者： Tianming Yang Shaobo Du College of Computer & Information Engineering Guizhou University of Commerce Guiyang China

ISBN: (纸本)9781665490832

K-nearest neighbor classification algorithm can quickly deal with the classification problem in this paper, but when calculating the similarity, it will assign the same weight to all distances, and does not pay attention to the impact of small distance on classification accuracy. At the same time, the k-nearest neighbor classification algorithm will be affected by the number of samples and dimensions, which will affect the efficiency of the classification algorithm. Therefore, an improved weighted KNN classification algorithm based on spark framework is proposed, which can improve the operation efficiency of the algorithm by cutting and reducing the dimension of sample data. Experimental results show that the algorithm has better accuracy and speedup ratio than the parallel algorithm based on Hadoop platform, and can process large-scale text data quickly and accurately.

关键词： Dimensionality reduction Text categorization Classification algorithms Sparks parallel algorithms

A highly scalable parallel encoder version of the emergent JEM video encoder

学校读者我要写书评

暂无评论

JOURNAL OF SUPERCOMPUTING 2019年第3期75卷 1429-1442页

作者： Lopez-Granado, O. Migallon, H. Martinez-Rach, M. Galiano, V. Malumbres, M. P. Van Wallendael, Glenn Miguel Hernandez Univ Phys & Comp Architecture Dept Elche 03202 Spain Univ Ghent Multimedia Lab Ghent Belgium

In 2016, 73% of total Internet traffic came from video transmission and this percentage is expected to reach 82% by 2021. These figures show the importance of using video compression standards that maximize video quality while minimizing the necessary bandwidth. In 2013, the HEVC standard was released accounting for an approximate 50% bit rate saving compared to H.264/AVC while maintaining the same reconstruction quality. To address increases in video IP traffic, a new generation of video coding techniques is required that achieve higher compression rates. Compression improvements are being implemented in a software package known as the Joint Exploration Test Model. In this work, we present two parallel JEM model solutions specifically designed for distributed memory platforms for both All Intra and Random Access coding modes. The proposed parallel algorithms achieved high levels of efficiency, in particular for the All Intra mode. They also showed great scalability.

关键词： JEM Video coding parallel algorithms Multicore Performance

parallel minimum cuts in O(mlog2(n)) work and low depth

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Anderson, Daniel Blelloch, Guy E. Carnegie Mellon University United States

We present a randomized O(mlog2n) work, O(polylog n) depth parallel algorithm for minimum cut. This algorithm matches the work bounds of a recent sequential algorithm by Gawrychowski, Mozes, and Weimann [ICALP'20], and improves on the previously best parallel algorithm by Geissmann and Gianinazzi [SPAA'18], which performs O(mlog4n) work in O(polylog n) depth. Our algorithm makes use of three components that might be of independent interest. Firstly, we design a parallel data structure that efficiently supports batched mixed queries and updates on trees. It generalizes and improves the work bounds of a previous data structure of Geissmann and Gianinazzi and is work efficient with respect to the best sequential algorithm. Secondly, we design a parallel algorithm for approximate minimum cut that improves on previous results by Karger and Motwani. We use this algorithm to give a work-efficient procedure to produce a tree packing, as in Karger's sequential algorithm for minimum cuts. Lastly, we design an efficient parallel algorithm for solving the minimum 2-respecting cut problem. Copyright © 2021, The Authors. All rights reserved.

关键词： parallel algorithms

Solving tridiagonal Toeplitz systems of linear equations on GPU-accelerated computers

学校读者我要写书评

暂无评论

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2022年第14期34卷 e6449-e6449页

作者： Dmitruk, Beata Stpiczynski, Przemyslaw Marie Curie Sklodowska Univ Inst Comp Sci Akad 9 PL-20033 Lublin Poland

The aim of this article is to show that solvers for tridiagonal Toeplitz systems of linear equations can be efficiently implemented for a variety of modern GPU-accelerated and multicore architectures using OpenACC. We consider two parallel algorithms for solving such systems with special assumptions about coefficient matrices. As the first algorithm, we propose a new, faster implementation of the divide and conquer method. The next algorithm is a new, vectorizable algorithm based on a recently introduced sequential method. We consider the use of both column-wise and row-wise storage formats for two-dimensional arrays and show how to efficiently convert between these two formats using cache memory and improve the overall performance of our implementations. We also show how to tune the performance by predicting the best values of the methods' parameters. Numerical experiments performed on Intel CPUs and Nvidia GPUs show that our new implementations achieve relatively good performance and accuracy.

关键词： OpenACC parallel algorithms portability tridiagonal Toeplitz systems vectorization

A deterministic parallel APSP algorithm and its applications

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Karczmarz, Adam Sankowski, Piotr Institute of Informatics University of Warsaw Poland

In this paper we show a deterministic parallel all-pairs shortest paths algorithm for real-weighted directed graphs. The algorithm has Oe(nm + (n/d)3) work and Oe(d) depth for any depth parameter d ∈ [1, n]. To the best of our knowledge, such a trade-off has only been previously described for the real-weighted single-source shortest paths problem using randomization [Bringmann et al., ICALP’17]. Moreover, our result improves upon the parallelism of the state-of-the-art randomized parallel algorithm for computing transitive closure, which has Oe(nm + n3/d2) work and Oe(d) depth [Ullman and Yannakakis, SIAM J. Comput.’91]. Our APSP algorithm turns out to be a powerful tool for designing efficient planar graph algorithms in both parallel and sequential regimes. By suitably adjusting the depth parameter d and applying known techniques, we obtain: (1) nearly work-efficient Oe(n1/6)-depth parallel algorithms for the real-weighted single-source shortest paths problem and finding a bipartite perfect matching in a planar graph, (2) an Oe(n9/8)-time sequential strongly polynomial algorithm for computing a minimum mean cycle or a minimum cost-to-time-ratio cycle of a planar graph, (3) a slightly faster algorithm for computing so-called external dense distance graphs of all pieces of a recursive decomposition of a planar graph. One notable ingredient of our parallel APSP algorithm is a simple deterministic Oe(nm)-work Oe(d)-depth procedure for computing Oe(n/d)-size hitting sets of shortest d-hop paths between all pairs of vertices of a real-weighted digraph. Such hitting sets have also been called d-hub sets. Hub sets have previously proved especially useful in designing parallel or dynamic shortest paths algorithms and are typically obtained via random sampling. Our procedure implies, for example, an Oe(nm)-time deterministic algorithm for finding a shortest negative cycle of a real-weighted digraph. Such a near-optimal bound for this problem has been so far only achieved usi

关键词： parallel algorithms

Analysis of Population Control Techniques for Time-Dependent and Eigenvalue Monte Carlo Neutron Transport Calculations

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Variansyah, Ilham McClarren, Ryan G. University of Notre Dame Department of Aerospace and Mechanical Engineering Notre DameIN46556 United States

An extensive study of population control techniques (PCTs) for time-dependent and eigenvalue Monte Carlo (MC) neutron transport calculations is presented. We define PCT as a technique that takes a censused population and returns a controlled, unbiased population. A new perspective based on an abstraction of particle census and population control is explored, paving the way to improved understanding and application of the concepts. Five distinct PCTs identified from the literature are reviewed: Simple Sampling (SS), Splitting-Roulette (SR), Combing (CO), modified Combing (COX), and Duplicate-Discard (DD). A theoretical analysis of how much uncertainty is introduced to a population by each PCT is presented. parallel algorithms for the PCTs applicable for both time-dependent and eigenvalue MC simulations are proposed. The relative performances of the PCTs based on runtime and tally mean error or standard deviation are assessed by solving time-dependent and eigenvalue test problems. It is found that SR and CO are equally the most performant techniques, closely followed by DD. Copyright © 2022, The Authors. All rights reserved.

关键词： parallel algorithms

Design of Plasmonic Devices Using Time-Division parallel Algorithm

学校读者我要写书评

暂无评论

Design of Plasmonic Devices Using Time-Division Parallel Alg...

General Assembly and Scientific Symposium, URSI

作者： Seiya Kishimoto Di Wu Shinichiro Ohnuki College of Science and Technology Nihon University Tokyo Japan

Novel parallel algorithm is introduced for electromagnetic transient analysis in order to design plasmonic devices. We have established time-division parallel computation for the finite-difference time-domain (FDTD) method. This completely parallel technique is extremely useful, since the computational task can be equally distributed to many processors and there is no data communication during a computation. The key idea of this parallel algorithm is as follows: (i) The coarse values at temporal sampling points can be independently obtained using a finite-difference complex-frequency-domain with a fast inverse Laplace transform. (ii) These values are transferred to the initial responses of the FDTD frames in many computational processors. (iii) Conventional FDTD computation is simply performed in completely parallel. The computational time for sequential part can be reduced to a fraction of the number of processors. We will apply the proposed technique to designing a plasmonic antenna for all optical magnetic recording purpose.

关键词： Program processors Optical polarization Optical recording Plasmons Time-domain analysis parallel algorithms Task analysis

parallel polynomial permanent mod powers of 2 and shortest disjoint cycles

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Datta, Samir Jaiswal, Kishlaya Chennai Mathematical Institute Chennai India

We present a parallel algorithm for permanent mod 2k of a matrix of univariate integer polynomials. It places the problem in ⊕L ⊆ NC2. This extends the techniques of Valiant [26], Braverman, Kulkarni and Roy [3] and Björklund and Husfeldt [2] and yields a (randomized) parallel algorithm for shortest 2-disjoint paths improving upon the recent (randomized) polynomial time algorithm [2]. We also recognize the disjoint paths problem as a special case of finding disjoint cycles, and present (randomized) parallel algorithms for finding a shortest cycle and shortest 2-disjoint cycles passing through any given fixed number of vertices or edges. © 2021, CC BY.

关键词： parallel algorithms

Extreme-scale many-against-many protein similarity search 22

学校读者我要写书评

暂无评论

Extreme-scale many-against-many protein similarity search

Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

作者： Oguz Selvitopi Saliya Ekanayake Giulia Guidi Muaaz G. Awan Georgios A. Pavlopoulos Ariful Azad Nikos Kyrpides Leonid Oliker Katherine Yelick Aydın Buluç Lawrence Berkeley National Laboratory Microsoft Corporation University of California Institute for Fundamental Biomedical Research Vari Greece Indiana University

Similarity search is one of the most fundamental computations that are regularly performed on ever-increasing protein datasets. Scalability is of paramount importance for uncovering novel phenomena that occur at very large scales. We unleash the power of over 20,000 GPUs on the Summit system to perform all-vs-all protein similarity search on one of the largest publicly available datasets with 405 million proteins, in less than 3.5 hours, cutting the time-to-solution for many use cases from weeks. The variability of protein sequence lengths, as well as the sparsity of the space of pairwise comparisons, make this a challenging problem in distributed memory. Due to the need to construct and maintain a data structure holding indices to all other sequences, this application has a huge memory footprint that makes it hard to scale the problem sizes. We overcome this memory limitation by innovative matrix-based blocking techniques, without introducing additional load imbalance.

关键词： sparse matrices proteins high performance computing computational biology protein similarity search parallel algorithms