检索结果-内蒙古大学图书馆

Scalable parallel Distance Field Construction for Large-Scale Applications

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2015年第10期21卷 1187-1200页

作者： Yu, Hongfeng Xie, Jinrong Ma, Kwan-Liu Kolla, Hemanth Chen, Jacqueline H. Univ Nebraska Lincoln NE 68588 USA Univ Calif Davis Davis CA 95616 USA Sandia Natl Labs Albuquerque NM 87123 USA

Computing distance fields is fundamental to many scientific and engineering applications. Distance fields can be used to direct analysis and reduce data. In this paper, we present a highly scalable method for computing 3D distance fields on massively parallel distributed-memory machines. A new distributed spatial data structure, named parallel distance tree, is introduced to manage the level sets of data and facilitate surface tracking over time, resulting in significantly reduced computation and communication costs for calculating the distance to the surface of interest from any spatial locations. Our method supports several data types and distance metrics from real-world applications. We demonstrate its efficiency and scalability on state-of-the-art supercomputers using both large-scale volume datasets and surface models. We also demonstrate in-situ distance field computation on dynamic turbulent flame surfaces for a petascale combustion simulation. Our work greatly extends the usability of distance fields for demanding applications.

关键词： Distance field in-situ processing parallel algorithms scalability spatial data structures scientific simulations geometric modeling large-scale scientific data analytics and visualization

来源：评论

学校读者我要写书评

暂无评论

On the Energy Complexity of parallel algorithms

On the Energy Complexity of Parallel Algorithms

引用

International Conference on parallel Processing (ICPP)

作者： Vijay Anand Korthikanti Gul Agha Mark Greenstreet Department of Computer Science University of Illinois Urbana-Champaign USA Department of Computer Science University of British Columbia Canada

For a given algorithm, the energy consumed in executing the algorithm has a nonlinear relationship with performance. In case of parallel algorithms, energy use and performance are functions of the structure of the algorithm. We define the asymptotic energy complexity of algorithms which models the minimum energy required to execute a parallel algorithm for a given execution time as a function of input size. Our methodology provides us with a way of comparing the orders of (minimal) energy required for different algorithms and can be used to define energy complexity classes of parallel algorithms.

关键词： parallel algorithms Complexity theory Time frequency analysis Energy consumption Arrays Computational modeling Multicore processing

来源：评论

学校读者我要写书评

暂无评论

On Speeding-up parallel Jacobi Iterations for SVDs

On Speeding-up Parallel Jacobi Iterations for SVDs

引用

IEEE International Conference on High Performance Computing and Communications

作者： Soumitra Pal Sudipta Pathak Sanguthevar Rajasekaran Computer Science and Engineering University of Connecticut 371 Fairfield Road Storrs CT 06269 USA

ISBN: (纸本)9781509042982

We live in an era of big data and the analysis of these data is becoming a bottleneck in many domains including biology and the internet. To make these analyses feasible in practice, we need efficient data reduction algorithms. The Singular Value Decomposition (SVD) is a data reduction technique that has been used in many different applications. For example, SVDs have been extensively used in text analysis. The best known sequential algorithms for the computation of SVDs take cubic time which may not be acceptable in practice. As a result, many parallel algorithms have been proposed in the literature. There are two kinds of algorithms for SVD, namely, QR decomposition and Jacobi iterations. Researchers have found out that even though QR is sequentially faster than Jacobi iterations, QR is difficult to parallelize. As a result, most of the parallel algorithms in the literature are based on Jacobi iterations. For example, the Jacobi Relaxation Scheme (JRS) of the classical Jacobi algorithm has been shown to be very effective in parallel. In this paper we propose a novel variant of the classical Jacobi algorithm that is more efficient than the JRS algorithm. Our experimental results confirm this assertion. The key idea behind our algorithm is to select the pivot elements for each sweep appropriately. We also show how to efficiently implement our algorithm on such parallel models as the PRAM and the mesh.

关键词： SVD Jacobi iterations JRS parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Distributed-Memory parallel algorithms for Matching and Coloring

Distributed-Memory Parallel Algorithms for Matching and Colo...

引用

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Ümit V. Çatalyürek Florin Dobrian Assefaw Gebremedhin Mahantesh Halappanavar Alex Pothen Depts. of Biomedial Informatics and Electrical & Computer Engineering The Ohio State University USA Conviva USA Department of Computer Science Purdue University USA Pacific Northwest National Laboratory USA

We discuss the design and implementation of new highly-scalable distributed-memory parallel algorithms for two prototypical graph problems, edge-weighted matching and distance-1 vertex coloring. Graph algorithms in general have low concurrency, poor data locality, and high ratio of data access to computation costs, making it challenging to achieve scalability on massively parallel machines. We overcome this challenge by employing a variety of techniques, including speculation and iteration, optimized communication, and randomization. We present preliminary results on weak and strong scalability studies conducted on an IBM Blue Gene/P machine employing up to tens of thousands of processors. The results show that the algorithms hold strong potential for computing at petascale.

关键词： Program processors Approximation algorithms parallel algorithms Algorithm design and analysis Sparse matrices Approximation methods Electronic mail

来源：评论

学校读者我要写书评

暂无评论

A parallel Algorithmic Approach to Simulate Acoustical Fields with Respect to Scattering of Sound due to Reflections

A Parallel Algorithmic Approach to Simulate Acoustical Field...

引用

2016 IEEE International Conference on Progress in Informatics and Computing

作者： Andrey Chusov Alexey Lysenko Lubov Statsenko Sergey Kuligin Petr Unru Alexandr Rodionov School of Engineering Far Eastern Federal University

The article presents an algorithmic model of sound propagation in rooms to run on parallel and distributed computer systems. This algorithm is used by the authors in an implementation of an adaptable high-performance computer system simulating various fields and providing scalability on an arbitrary number of parallel central and graphical processors as well as distributed computer clusters. Many general-purpose computer simulation systems have limited usability when it comes to highprecision simulation associated with large numbers of elementary computations due to their lack of scalability on various parallel and distributed platforms. The more the required adequacy of the model is, the higher the numbers of steps of the simulation algorithms are. Scalability permits a use hybrid parallel computer systems and improves efficiency o f t he s imulation w ith respect to adequacy, time consumptions, and total costs of simulation *** report covers such an algorithm which is based on an approximate superposition of acoustical fields and provides adequate results, as long as the used equations of acoustics are linear. The algorithm represents reflecting surfaces as sets of vibrating pistons and uses the Rayleigh integral to calculate their scattering properties. The article also provides a parallel form of the algorithm and analysis of its properties in parallel and sequential forms.

关键词： Computer simulation high-performance computing parallel algorithms problem-oriented programming

来源：评论

学校读者我要写书评

暂无评论

Louvain community detection with parallel heuristics on GPUs

Louvain community detection with parallel heuristics on GPUs

引用

IEEE International Conference on Intelligent Engineering Systems (INES)

作者： Richard Forster Faculty of Informatics Eotvos Lorand University Budapest Hungary

Community detection has become an important operation in numerous graph based applications. It is used to reveal groups that exist within real world networks without imposing prior size or cardinality constraints on the set of communities. Despite its potential, the support for parallel computers is rather limited. This is largely because the algorithm is irregular and the underlying heuristics imply a sequential nature. In this paper I present parallelization heuristics for fast community detection using the Louvain method as it is applied on GPUs. The Louvain method is a multi-phase, iterative heuristic for modularity optimization. It was originally developed by Blondel et al. (2008), the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. The parallel heuristics used, were first introduced by Hao Lu et al. (2015). As the Louvain method is inherently sequential, it limits the possibility of scalable usage. Thanks to the proposed parallel heuristics, I observe how this method can behave on GPUs. For evaluation I implemented the heuristics using CUDA on a GeForce GTX 980M GPU and for testing I used organization landscapes from the CERN developed Collaboration Spotting project that involves patents and publications to visualize the connections in technologies among its collaborators. Compared to the parallel Louvain implementation running on 8 threads on the same machine that has the used GPU, the CUDA implementation is able to produce community outputs comparable to the CPU generated results, while providing absolute speedups of up to 12 using the GeForce GTX 980M mobile GPU.

关键词： Graphics processing units Computer architecture Instruction sets parallel algorithms Heuristic algorithms Partitioning algorithms Image edge detection

来源：评论

学校读者我要写书评

暂无评论

parallel matrix multiplication on memristor-based computation-in-memory architecture

Parallel matrix multiplication on memristor-based computatio...

引用

International Conference on High Performance Computing & Simulation (HPCS)

作者： Adib Haron Jintao Yu Razvan Nane Mottaqiallah Taouil Said Hamdioui Koen Bertels Department of Quantum Engineering Delft University of Technology Delft The Netherlands

One of the most important constraints of today's architectures for data-intensive applications is the limited bandwidth due to the memory-processor communication bottleneck. This significantly impacts performance and energy. For instance, the energy consumption share of communication and memory access may exceed 80%. Recently, the concept of Computation-in-Memory (CIM) was proposed, which is based on the integration of storage and computation in the same physical location using a crossbar topology and non-volatile resistive-switching memristor technology. To illustrate the tremendous potential of CIM architecture in exploiting massively parallel computation while reducing the communication overhead, we present a communication-efficient mapping of a large-scale matrix multiplication algorithm on the CIM architecture. The experimental results show that, depending on the matrix size, CIM architecture exhibits several orders of magnitude higher performance in total execution time and two orders of magnitude better in total energy consumption than the multicore-based on the shared memory architecture.

关键词： Computer architecture Three-dimensional displays Computational modeling parallel algorithms Two dimensional displays

来源：评论

学校读者我要写书评

暂无评论

High-speed realization of parallel algorithm for hash computation on multicore cryptographic processor

High-speed realization of parallel algorithm for hash comput...

引用

International Conference on Integrated Circuits and Microsystems (ICICM)

作者： Qiang Dai Zibin Dai Zhouchuang Wang Wei Li Department of Microelectronics Institute of Information Science and Technology Zhengzhou China State Key Lab of ASIC and System Fudan University Shanghai China

ISBN: (纸本)9781509028153

Hashing algorithms are used widely in information security area. Having studied the characteristics of traditional cryptographic hashing function and considered the features of multi-core cryptographic processor, this paper proposes a parallel algorithm for hash computation well-suited to multicore cryptographic processor. The algorithm breaks the chain dependencies of the standard hash function by implementing recursive hash to get faster hash implementation. We discuss the theoretical foundation for our mapping framework including security measure and performance measure. The experiments are performed on a PC with a PCIE card including multi-core cryptographic processor as the cipher processing engine. The results show a performance gain by an approximate factor of 7.8 when running on the 8-core cryptographic processor.

关键词： Multicore processing parallel algorithms Standards Ciphers Micromechanical devices

来源：评论

学校读者我要写书评

暂无评论

A Universal parallel Two-Pass MDL Context Tree Compression Algorithm

引用

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 2015年第4期9卷 741-748页

作者： Krishnan, Nikhil Baron, Dror N Carolina State Univ Dept Elect & Comp Engn Raleigh NC 27695 USA

Computing problems that handle large amounts of data necessitate the use of lossless data compression for efficient storage and transmission. We present a novel lossless universal data compression algorithm that uses parallel computational units to increase the throughput. The length-N input sequence is partitioned into B blocks. Processing each block independently of the other blocks can accelerate the computation by a factor of B but degrades the compression quality. Instead, our approach is to first estimate the minimum description length (MDL) context tree source underlying the entire input, and then encode each of the B blocks in parallel based on the MDL source. With this two-pass approach, the compression loss incurred by using more parallel units is insignificant. Our algorithm is work-efficient, i. e., its computational complexity is O(N/B) Its redundancy is approximately B log (N/B) bits above Rissanen's lower bound on universal compression performance, with respect to any context tree source whose maximal depth is at most log (N/B). We improve the compression by using different quantizers for states of the context tree based on the number of symbols corresponding to those states. Numerical results from a prototype implementation suggest that our algorithm offers a better trade-off between compression and throughput than competing universal data compression algorithms.

关键词： Big data computational complexity data compression distributed computing minimum description length parallel algorithms redundancy two-pass code universal compression work-efficient algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel randomized KD-tree forest on GPU cluster for image descriptor matching

Parallel randomized KD-tree forest on GPU cluster for image ...

引用

International Symposium on Circuits and Systems

作者： Linjia Hu Saeid Nooshabadi Majid Ahmadi Department of Computer Science Michigan Tech Houghton MI USA Department of Electrical and Computer Engineering University of Windsor Windsor ON Canada

ISBN: (纸本)9781479953424

Many high dimensional data mining applications involve the nearest neighbor search (NNS) on a KD-tree. Randomized KD-tree forest enables fast medium and large scale NNS among high dimensional data points. In this paper, we present massively parallel algorithms for the construction of KD-tree forest, and NNS on a cluster equipped with massively parallel architecture (MPA) devices of graphical processing unit (GPU). This design can accelerate the KD-tree forest construction and NNS significantly for the signature of histograms of orientations (SHOT) 3D local descriptors by factors of up to 5.27 and 20.44, respectively. Our implementations will potentially benefit realtime high dimensional descriptors matching.

关键词： Graphics processing units Vegetation Clustering algorithms Instruction sets parallel algorithms Three-dimensional displays three-dimensional displays instruction sets parallel algorithms Graphics Processing Unit Clustering algorithms vegetation dimensional data GRAPPER PICK UP Descriptors

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：