检索结果-内蒙古大学图书馆

SP-ChainMail: a GPU-based sparse parallel ChainMail algorithm for deforming medical volumes

JOURNAL OF SUPERCOMPUTING 2015年第9期71卷 3482-3499页

作者： Rodriguez, Alejandro Leon, Alejandro Arroyo, German Miguel Mantas, Jose Univ Granada Granada Spain

ChainMail algorithm is a physically based deformation algorithm that has been successfully used in virtual surgery simulators, where time is a critical factor. In this paper, we present a parallel algorithm, based on ChainMail, and its efficient implementation that reduces the time required to compute deformations over large medical 3D datasets by means of modern GPU capabilities. We also present a 3D blocking scheme that reduces the amount of unnecessary processing threads. For this purpose, this paper describes a new parallel boolean reduction scheme, used to efficiently decide which blocks are computed. Finally, through an extensive analysis, we show the performance improvement achieved by our implementation of the proposed algorithm and the use of the proposed blocking scheme, due to the high spatial and temporal locality of our approach.

关键词： GPU programming Stencil computation Physically based deformation parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Workshop: Efficient sequential and parallel algorithms for sequence assembly

Workshop: Efficient sequential and parallel algorithms for s...

引用

IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)

作者： Sanguthevar Rajasekaran Hieu Dinh Vamsi Kundeti Computer Science and Engineering Department University of Connecticut Storrs CT USA

Next Generation Sequence (NGS) assemblers are challenged with the problem of handling massive number of reads. Bi-directed de Bruijn graph is the most fundamental data structure on which numerous NGS assemblers have been built (e.g. Velvet, ABySS). Most of these assemblers only differ in the heuristics which they employ to operate on this de Bruijn graph. These heuristics are composed of several fundamental operations such as construction, compaction and pruning of the underlying bi-directed de Bruijn graph. Unfortunately the current algorithms to accomplish these fundamental operations on the de Bruijn graph are computationally inefficient and have become a bottleneck to scale the NGS assemblers. In this talk, some of the recent results which provide computationally efficient algorithms to these fundamental bi-directed de Bruijn graph operations are discussed. The algorithms are based on sorting and efficient in sequential, out of-core, and parallel settings.

关键词： Bioinformatics Assembly Genomics Compaction parallel algorithms Communities Scalability

来源：评论

学校读者我要写书评

暂无评论

Distributed-Memory parallel algorithms for Matching and Coloring

Distributed-Memory Parallel Algorithms for Matching and Colo...

引用

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Ümit V. Çatalyürek Florin Dobrian Assefaw Gebremedhin Mahantesh Halappanavar Alex Pothen Depts. of Biomedial Informatics and Electrical & Computer Engineering The Ohio State University USA Conviva USA Department of Computer Science Purdue University USA Pacific Northwest National Laboratory USA

We discuss the design and implementation of new highly-scalable distributed-memory parallel algorithms for two prototypical graph problems, edge-weighted matching and distance-1 vertex coloring. Graph algorithms in general have low concurrency, poor data locality, and high ratio of data access to computation costs, making it challenging to achieve scalability on massively parallel machines. We overcome this challenge by employing a variety of techniques, including speculation and iteration, optimized communication, and randomization. We present preliminary results on weak and strong scalability studies conducted on an IBM Blue Gene/P machine employing up to tens of thousands of processors. The results show that the algorithms hold strong potential for computing at petascale.

关键词： Program processors Approximation algorithms parallel algorithms Algorithm design and analysis Sparse matrices Approximation methods Electronic mail

来源：评论

学校读者我要写书评

暂无评论

A parallel Algorithmic Approach to Simulate Acoustical Fields with Respect to Scattering of Sound due to Reflections

A Parallel Algorithmic Approach to Simulate Acoustical Field...

引用

2016 IEEE International Conference on Progress in Informatics and Computing

作者： Andrey Chusov Alexey Lysenko Lubov Statsenko Sergey Kuligin Petr Unru Alexandr Rodionov School of Engineering Far Eastern Federal University

The article presents an algorithmic model of sound propagation in rooms to run on parallel and distributed computer systems. This algorithm is used by the authors in an implementation of an adaptable high-performance computer system simulating various fields and providing scalability on an arbitrary number of parallel central and graphical processors as well as distributed computer clusters. Many general-purpose computer simulation systems have limited usability when it comes to highprecision simulation associated with large numbers of elementary computations due to their lack of scalability on various parallel and distributed platforms. The more the required adequacy of the model is, the higher the numbers of steps of the simulation algorithms are. Scalability permits a use hybrid parallel computer systems and improves efficiency o f t he s imulation w ith respect to adequacy, time consumptions, and total costs of simulation *** report covers such an algorithm which is based on an approximate superposition of acoustical fields and provides adequate results, as long as the used equations of acoustics are linear. The algorithm represents reflecting surfaces as sets of vibrating pistons and uses the Rayleigh integral to calculate their scattering properties. The article also provides a parallel form of the algorithm and analysis of its properties in parallel and sequential forms.

关键词： Computer simulation high-performance computing parallel algorithms problem-oriented programming

来源：评论

学校读者我要写书评

暂无评论

High-speed realization of parallel algorithm for hash computation on multicore cryptographic processor

High-speed realization of parallel algorithm for hash comput...

引用

International Conference on Integrated Circuits and Microsystems (ICICM)

作者： Qiang Dai Zibin Dai Zhouchuang Wang Wei Li Department of Microelectronics Institute of Information Science and Technology Zhengzhou China State Key Lab of ASIC and System Fudan University Shanghai China

ISBN: (纸本)9781509028153

Hashing algorithms are used widely in information security area. Having studied the characteristics of traditional cryptographic hashing function and considered the features of multi-core cryptographic processor, this paper proposes a parallel algorithm for hash computation well-suited to multicore cryptographic processor. The algorithm breaks the chain dependencies of the standard hash function by implementing recursive hash to get faster hash implementation. We discuss the theoretical foundation for our mapping framework including security measure and performance measure. The experiments are performed on a PC with a PCIE card including multi-core cryptographic processor as the cipher processing engine. The results show a performance gain by an approximate factor of 7.8 when running on the 8-core cryptographic processor.

关键词： Multicore processing parallel algorithms Standards Ciphers Micromechanical devices

来源：评论

学校读者我要写书评

暂无评论

A parallel time-domain wave simulator based on rectangular decomposition for distributed memory architectures

引用

APPLIED ACOUSTICS 2015年 97卷 104-114页

作者： Morales, Nicolas Mehra, Ravish Manocha, Dinesh Univ N Carolina Chapel Hill NC USA

We present a parallel time-domain simulator to solve the acoustic wave equation for large acoustic spaces on a distributed memory architecture. Our formulation is based on the adaptive rectangular decomposition (ARD) algorithm, which performs acoustic wave propagation in three dimensions for homogeneous media. We propose an efficient parallelization of the different stages of the ARD pipeline;using a novel load balancing scheme and overlapping communication with computation, we achieve scalable performance on distributed memory architectures. Our solver can handle the full frequency range of human hearing (20 Hz-20 kHz) and scenes with volumes of thousands of cubic meters. We highlight the performance of our parallel simulator on a CPU cluster with up to a thousand cores and terabytes of memory. To the best of our knowledge, this is the fastest time-domain simulator for acoustic wave propagation in large, complex 3D scenes such as outdoor or architectural environments. (C) 2015 Published by Elsevier Ltd.

关键词： Time-domain wave acoustics parallel algorithms Room acoustics

来源：评论

学校读者我要写书评

暂无评论

Asymptotic Optimality of parallel Short Division

Asymptotic Optimality of Parallel Short Division

引用

International Symposium on parallel and Distributed Processing (IPDPS)

作者： Niall Emmart Charles Weems School of Computer Science University of Massachusetts Amherst MA USA

ISBN: (纸本)9781509021413

In 2011 we published a practical algorithm for short division (division of a multiple precision dividend by a single precision divisor) on a parallel processor (HiPC 2011) with a run time of O(n/p+log p). Our algorithm, based on parallel computation of remainder sequences, is an improvement of Takahashi's earlier work (LSSC 2007) which has a run time of O((n/p) log p). Here we prove that Omega(n/p+log p) is a tight lower bound for short division (using a conventional fixed radix number system) on EREW and CREW PRAMs when the divisor d is not simply a power of two. The proof is based on an application of Cook, Dwork, and Reischuk's work on Boolean function complexity. The result itself is especially significant because it establishes a novel tight lower bound for two fundamental arithmetic operations, short division and division by a fixed constant, on an important class of parallel machines.

关键词： Phase change random access memory Algorithm design and analysis Complexity theory Boolean functions Clustering algorithms parallel machines parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A Universal parallel Two-Pass MDL Context Tree Compression Algorithm

引用

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 2015年第4期9卷 741-748页

作者： Krishnan, Nikhil Baron, Dror N Carolina State Univ Dept Elect & Comp Engn Raleigh NC 27695 USA

Computing problems that handle large amounts of data necessitate the use of lossless data compression for efficient storage and transmission. We present a novel lossless universal data compression algorithm that uses parallel computational units to increase the throughput. The length-N input sequence is partitioned into B blocks. Processing each block independently of the other blocks can accelerate the computation by a factor of B but degrades the compression quality. Instead, our approach is to first estimate the minimum description length (MDL) context tree source underlying the entire input, and then encode each of the B blocks in parallel based on the MDL source. With this two-pass approach, the compression loss incurred by using more parallel units is insignificant. Our algorithm is work-efficient, i. e., its computational complexity is O(N/B) Its redundancy is approximately B log (N/B) bits above Rissanen's lower bound on universal compression performance, with respect to any context tree source whose maximal depth is at most log (N/B). We improve the compression by using different quantizers for states of the context tree based on the number of symbols corresponding to those states. Numerical results from a prototype implementation suggest that our algorithm offers a better trade-off between compression and throughput than competing universal data compression algorithms.

关键词： Big data computational complexity data compression distributed computing minimum description length parallel algorithms redundancy two-pass code universal compression work-efficient algorithms

来源：评论

学校读者我要写书评

暂无评论

Multiscale electromagnetic modeling using double-higher-order quadrilateral meshes and parallel MoM-SIE direct solutions

Multiscale electromagnetic modeling using double-higher-orde...

引用

Antennas and Propagation Society International Symposium

作者： Branislav M. Notaros Ana B. Manić Aaron P. Smull Sanja B. Manić Xiaoye Sherry Li François-Henry Rouet Electrical & Computer Engineering Dept. Colorado State University Ft. Collins CO USA Computational Research Division Lawrence Berkeley National Laboratory Berkeley CA USA

We present the development of a scalable parallel algorithm and solver for computational electromagnetics based on a double higher order method of moments in the surface integral equation formulation in conjunction with a direct hierarchically semiseparable structures solver. Multiscale modeling using the new method, for electrically very large structures that also include electrically very small details, is discussed, with several advancement strategies.

关键词： Method of moments Computational modeling Matrix decomposition Mathematical model parallel algorithms Integral equations Surface waves

来源：评论

学校读者我要写书评

暂无评论

WORK-EFFICIENT parallel NON-MAXIMUM SUPPRESSION FOR EMBEDDED GPU ARCHITECTURES

WORK-EFFICIENT PARALLEL NON-MAXIMUM SUPPRESSION FOR EMBEDDED...

引用

IEEE International Conference on Acoustics, Speech and Signal Processing

作者： David Oro Carles Fernández Xavier Martorell Javier Hernando Herta Security Barcelona Spain Universitat Politécnica de Catalunya Barcelona Spain

ISBN: (纸本)9781479999897

With the emergence of GPU computing, deep neural networks have become a widely used technique for advancing research in the field of image and speech processing. In the context of object and event detection, sliding-window classifiers require to choose the best among all positively discriminated candidate windows. In this paper, we introduce the first GPU-based non-maximum suppression (NMS) algorithm for embedded GPU architectures. The obtained results show that the proposed parallel algorithm reduces the NMS latency by a wide margin when compared to CPUs, even clocking the GPU at 50% of its maximum frequency on an NVIDIA Tegra Kl. In this paper, we show results for object detection in images. The proposed technique is directly applicable to speech segmentation tasks such as speaker diarization.

关键词： Non-maximum suppression deep neural networks GPU computing CUDA object detection Object detection GRAPPER PICK UP Graphics Processing Unit ARCHITECTURE parallel algorithms Speech processing maximum frequency

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：