检索结果-内蒙古大学图书馆

46th International Conference on parallel Processing Workshops (ICPPW)

作者： Sundar, Hari Khurd, Parmeshwar Univ Utah Sch Comp Salt Lake City UT 84112 USA Netflix Inc Scotts Valley CA USA

ISBN: (纸本)9781538610428

We present parallel algorithms for computing cycle orders and cycle perimeters in relative neighborhood graphs. This parallel algorithm has wide-ranging applications from microscopic to macroscopic domains, e.g., in histopathological image analysis and wireless network routing. Our algorithm consists of the following steps (sub-algorithms): (1) Uniform partitioning of the graph vertices across processes, (2) parallel Delaunay triangulation and (3) parallel computation of the relative neighborhood graph and the cycle orders and perimeters. We evaluated our algorithm on a large dataset with 6.5 Million points and demonstrate excellent fixed-size scalability. We also demonstrate excellent isogranular scalability up to 131K processes. Our largest run was on a dataset with 13 billion points on 131K processes on ORNL's Cray XK7 "Titan" supercomputer.

关键词： Feature extraction Partitioning algorithms parallel algorithms Wireless networks Microscopy Scalability Three-dimensional displays

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms and concentration bounds for the Lovasz Local Lemma via witness-DAGs 28

Parallel algorithms and concentration bounds for the Lovasz ...

引用

28th Annual ACM-SIAM Symposium on Discrete algorithms (SODA)

作者： Haeupler, Bernhard Harris, David G. Carnegie Mellon Univ Sch Comp Sci Pittsburgh PA 15213 USA Univ Maryland Dept Comp Sci College Pk MD 20742 USA

ISBN: (纸本)9781611974782

The Lovasz Local Lemma (LLL) is a cornerstone principle in the probabilistic method of combinatorics, and a seminal algorithm of Moser & Tardos (2010) provides an efficient randomized algorithm to implement it. This algorithm can be parallelized to give an algorithm that uses polynomially many processors and runs in O(log(3) n) time, stemming from O(log n) adaptive computations of a maximal independent set (MIS). Chung et al. (2014) developed faster local and parallel algorithms, potentially running in time O(log(2) n), but these algorithms work under significantly more stringent conditions than the LLL. We give a new parallel algorithm that works under essentially the same conditions as the original algorithm of Moser & Tardos but uses only a single MIS computation, thus running in O(log(2) n) time. This conceptually new algorithm also gives a clean combinatorial description of a satisfying assignment which might be of independent interest. Our techniques extend to the deterministic LLL algorithm given by Chandrasekaran et al. (2013) leading to an NC-algorithm running in time O(log(2) n) as well. We also provide improved bounds on the runtimes of the sequential and parallel resampling-based algorithms originally developed by Moser & Tardos. Our bounds extend to any problem instance in which the tighter Shearer LLL criterion is satisfied. We also improve on the analysis of Kolipaka & Szegedy (2011) to give tighter concentration results.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for Switching Edges and Generating Random Graphs from Given Degree Sequences using HPC Platforms

Parallel Algorithms for Switching Edges and Generating Rando...

引用

作者： Md Hasanuzzaman Bhuiyan Virginia Polytechnic Institute and State University

学位级别：博士

Networks (or graphs) are an effective abstraction for representing many real-world complex systems. Analyzing various structural properties of and dynamics on such networks reveal valuable insights about the behavior of such systems. In today's data-rich world, we are deluged by the massive amount of heterogeneous data from various sources, such as the web, infrastructure, and online social media. Analyzing this huge amount of data may take a prohibitively long time and even may not fit into the main memory of a single processing unit, thus motivating the necessity of efficient parallel algorithms in various high-performance computing (HPC) platforms. In this dissertation, we present distributed and shared memory parallel algorithms for some important network analytic problems. First, we present distributed memory parallel algorithms for switching edges in a network. Edge switch is an operation on a network, where two edges are selected randomly, and one of their end vertices are swapped with each other. This operation is repeated either a given number of times or until a specified criterion is satisfied. It has diverse real-world applications such as in generating simple random networks with a given degree sequence and in modeling and studying various dynamic networks. One of the steps in our edge switch algorithm requires generating multinomial random variables in parallel. We also present the first non-trivial parallel algorithm for generating multinomial random variables. Next, we present efficient algorithms for assortative edge switch in a labeled network. Assuming each vertex has a label, an assortative edge switch operation imposes an extra constraint, i. e., two edges are randomly selected and one of their end vertices are swapped with each other if the labels of the end vertices of the edges remain the same as before. It can be used to study the effect of the network structural properties on dynamics over a network. Although the problem of assortative edge

关键词： Network Science parallel algorithms High Performance Computing Edge Switch Random Networks

来源：评论

学校读者我要写书评

暂无评论

Efficient parallel GCD algorithms for Multicore Shared Memory Architectures

Efficient Parallel GCD Algorithms for Multicore Shared Memor...

引用

International Conference on Advances in ICT for Emerging Regions

作者： Gihan Tharaka Pathirana Sittampalam Sotheeswaran Nagulan Ratnarajah Faculty of Science Eastern University Sri Lanka Faculty of Applied Science Vavuniya Campus of the University of Jaffna Sri Lanka

ISBN: (数字)9781728186559

ISBN: (纸本)9781728186566

The greatest common divisor (GCD) is used for numerous applications in number theory, modular arithmetic, encryption algorithms such as RSA, random number generation, and solving linear Diophantine equations. High-performance algorithms, which are efficiently and accurately find the GCD of two integers and n (>2) integers, are needed in the modern world for many applications in science and engineering. parallel hardware and parallel programming solve such emerging challenges that are not possible in a serial world. Modern desktop and laptop computers are equipped with multicore processors with shared memory architecture. In this paper, we develop novel efficient parallel n integers GCD algorithms for multicore shared memory architecture. The brute force, divide-and-conquer, linear recursive and finding minimum first techniques are adopted in our novel algorithms to reduce the size and the complexity of the n integers GCD problem. Various working models of OpenMP, such as the thread-centric, loop-centric and task-centric models are utilized, which promised a more natural way of exploiting and expressing regular and irregular algorithms. A comprehensive performance analysis applies to prove the efficiency of the proposed algorithms.

关键词： parallel algorithms Task analysis Multicore processing Time complexity Software algorithms Memory architecture Heuristic algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for Online Track Finding for the PaNDA Experiment at FAIR 22

Parallel Algorithms for Online Track Finding for the PaNDA E...

引用

22nd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2016

作者： Bianchi, L. Herten, A. Ritman, J. Stockmanns, T. IKP Germany JSC Forschungszentrum Jülich Germany

PANDA is a future hadron and nuclear physics experiment at the FAIR facility in construction in Darmstadt, Germany. Unlike the majority of current experiments, PANDA's strategy for data acquisition is based on online event reconstruction from free-streaming data, performed in real time entirely by software algorithms using global detector information. This paper reports on the status of the development of algorithms for the reconstruction of charged particle tracks, targeted towards online data processing applications, designed for execution on data-parallel processors such as GPUs (Graphic Processing Units). Two parallel algorithms for track finding, derived from the Circle Hough algorithm, are being developed to extend the parallelism to all stages of the algorithm. The concepts of the algorithms are described, along with preliminary results and considerations about their implementations and performance. © Published under licence by IOP Publishing Ltd.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Efficient Sequential and parallel algorithms for Estimating Higher Order Spectra

arXiv

引用

arXiv 2018年

作者： Mamun, Abdullah-Al Wang, Zigeng Cai, Xingyu Ravishanker, Nalini Rajasekaran, Sanguthevar Dept. of Computer Science University of Connecticut StorrsCT06269 United States Dept. of Statistics University of Connecticut StorrsCT06269 United States

Polyspectral estimation is a problem of great importance in the analysis of nonlinear time series that has applications in biomedical signal processing, communications, geophysics, image, radar, sonar and speech processing, etc. Higher order spectra (HOS) have been used in unsupervised and supervised clustering in big data scenarios, in testing for Gaussianity, to suppress Gaussian noise, to characterize nonlinearities in time series data, and so on [13]. Any algorithm for computing the kth order spectra of a time series of length n needs ω(nk-1) time since the output size will be ω(nk-1) as well. Given that we live in an era of big data, n could be very large. In this case, sequential algorithms might take unacceptable amounts of time. Thus it is essential to develop parallel algorithms. There is also room for improving existing sequential algorithms. In addition, parallel algorithms in the literature are nongeneric. In this paper we offer generic sequential algorithms for computing higher order spectra that are asymptotically faster than any published algorithm for HOS. Further, we offer memory efficient algorithms. We also present optimal parallel implementations of these algorithms on parallel computing models such as the PRAM and the mesh. We provide experimental results on our sequential and parallel algorithms. Our parallel implementation achieves very good speedups. Copyright © 2018, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Load-Balancing-Aware parallel algorithms of H-Matrices with Adaptive Cross Approximation for GPUs

Load-Balancing-Aware Parallel Algorithms of H-Matrices with ...

引用

IEEE International Conference on Cluster Computing

作者： Tetsuya Hoshino Akihiro Ida Toshihiro Hanawa Kengo Nakajima Information Technology Center The University of Tokyo Tokyo Japan

Hierarchical matrices (H-matrices) are an approximation technique for dense matrices, such as the coefficient matrix of the boundary element method (BEM). An H-matrix is expressed by a set of low-rank approximated and small dense sub-matrices, each of which has various ranks. The use of H-matrices reduces the required memory footprint of dense matrices from O(N^2) to O(NlogN) and is suitable for many-core processors that have relatively small memory capacities compared to traditional CPUs. However, existing parallel adaptive cross approximation (ACA) algorithms, which are low-rank approximation algorithms used to construct H-matrices, are not designed to exploit many-core processors in terms of load balancing. In existing parallel algorithms, the ACA process is independently applied to each sub-matrix. The computational load of the ACA process for each sub-matrix depends on the sub-matrix's rank; however, the rank is defined after the ACA process is applied. This makes it difficult to balance the load. We propose load-balancing-aware parallel ACA algorithms for H-matrices that focus on many-core processors. We implemented the proposed algorithms into HACApK, which is an open-source H-matrix library originally developed for CPU-based clusters. The proposed algorithms were evaluated using BEM problems on an NVIDIA Tesla P100 GPU (P100) and an Intel Xeon Broadwell processor. The evaluation results demonstrate the improved performance of the proposed algorithms in all GPU cases. For example, in a case where it is difficult for existing parallel algorithms to balance the load, the proposed algorithms achieved a 12.9 times performance improvement for the P100.

关键词： Approximation algorithms Clustering algorithms Libraries parallel algorithms Load management Instruction sets

来源：评论

学校读者我要写书评

暂无评论

Coarse-Grained Multicomputer Based-parallel algorithms for the Longest Common Subsequence Problem with a String-Exclusion Constraint

Coarse-Grained Multicomputer Based-Parallel Algorithms for t...

引用

International Conference on High Performance Computing and Simulation

作者： Jean Frederic Myoupo Armel Nkonjoh Ngomade Vianney Kengne Tchendji Computer Science Lab.- MIS University of Picardie Jules Verne Department of Mathematics and Computer Science University of Dschang

ISBN: (纸本)9781538678800

In this paper, new parallel algorithms based on the Coarse-Grained Multicomputer (CGM) model for solving the Longest Common Subsequence (LCS) problem with a string exclusion constraint (STR-EC-LCS) is presented. Based on a previous sequential algorithm, we propose two CGM parallel algorithms for STR-EC-LCS problem. We perform an experimental study of our two solutions to validate our theoretical predictions, and conclude that our first algorithm that minimizes idleness of the processors is better than the second. To the best of our knowledge, these algorithms are the first CGM-based parallel algorithms for the generalized-constrained-LCS problem.

关键词： parallel algorithms Constrained Longest Common Subsequence Dynamic Programming Coarse-Grained Multicomputer

来源：评论

学校读者我要写书评

暂无评论

Simple and Fast parallel algorithms for the Voronoi Map and the Euclidean Distance Map, with GPU implementations 46

Simple and Fast Parallel Algorithms for the Voronoi Map and ...

引用

46th International Conference on parallel Processing Workshops (ICPPW)

作者： Honda, Takumi Yamamoto, Shinnosuke Honda, Hiroaki Nakano, Koji Ito, Yasuaki Hiroshima Univ Dept Informat Engn Kagamiyama 1-4-1 Higashihiroshima 7398527 Japan

ISBN: (纸本)9781538610428

The complete Voronoi map of a binary image with black and white pixels is a matrix of the same size such that each element is the closest black pixel of the corresponding pixel. The complete Voronoi map visualizes the influence region of each black pixel. However, each region may not be connected due to exclave pixels. The connected Voronoi map is a modification of the complete Voronoi map so that all regions are connected. The Euclidean distance map of a binary image is a matrix, in which each element is the distance to the closest black pixel. It has many applications of image processing such as dilation, erosion, blurring effects, skeletonization and matching. The main contribution of this paper is to present simple and fast parallel algorithms for computing the complete/connected Voronoi maps and the Euclidean distance map and implement them in the GPU. Our parallel algorithm first computes the mixed Voronoi map, which is a mixture of the complete and connected Voronoi maps, and then converts it into the complete/connected Voronoi by exposing/hiding all exclave pixels. After that, the complete Voronoi map is converted into the Euclidean distance map by computing the distance to the closest black pixel for every pixel in an obvious way. The experimental results on GeForce GTX 1080 GPU show that the computing time for these conversions is relatively small. The throughput of our GPU implementation for computing the Euclidean distance maps of 2K x 2K binary images is up to 2.08 times larger than the previously published best GPU implementation, and up to 172 times larger than CPU implementation using Intel Core i7-4790.

关键词： Euclidean distance Graphics processing units parallel algorithms Instruction sets Phase change random access memory

来源：评论

学校读者我要写书评

暂无评论

Deterministic parallel algorithms for fooling polylogarithmic juntas and the Lovasz Local Lemma 28

Deterministic parallel algorithms for fooling polylogarithmi...

引用

28th Annual ACM-SIAM Symposium on Discrete algorithms (SODA)

作者： Harris, David G. Univ Maryland Dept Comp Sci College Pk MD 20742 USA

ISBN: (纸本)9781611974782

Many randomized algorithms can be derandomized efficiently using either the method of conditional expectations or probability spaces with low (almost-) independence. A series of papers, beginning with work by Luby (1988) and continuing with Berger & Rompel (1991) and Chari et al. (1994), showed that these techniques can be combined to give deterministic parallel algorithms for combinatorial optimization problems involving sums of w-juntas. We improve these algorithms through derandomized variable partitioning. This reduces the processor complexity to essentially independent of w while the running time is reduced from exponential in w to linear in w. For example, we improve the time complexity of an algorithm of Berger & Rompel (1991) for rainbow hypergraph coloring by a factor of approximately log(2) n and the processor complexity by a factor of approximately m(ln2). As a major application of this, we give an NC algorithm for the Lovasz Local Lemma Previous NC algorithms, including the seminal algorithm of Moser & Tardos (2010) and the work of Chandrasekaran et. al (2013), required that (essentially) the bad-events could span only O(log n) variables;we relax this to allowing polylog(n) variables. As two applications of our new algorithm, we give algorithms for defective vertex coloring and domatic graph partition. One main sub-problem encountered in these algorithms is to generate a probability space which can "fool" a given list of GF(2) Fourier characters. Schulman (1992) gave an NC algorithm for this;we dramatically improve its efficiency to near-optimal time and processor complexity and code dimension. This leads to a new algorithm to solve the heavy-codeword problem, introduced by Naor & Naor (1993), with a near-linear processor complexity (mn)(l+o(1)).

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：