检索结果-内蒙古大学图书馆

GPU-based acceleration of an RNA tertiary structure prediction algorithm

COMPUTERS IN BIOLOGY AND MEDICINE 2013年第8期43卷 1011-1022页

作者： Jeon, Yongkweon Jung, Eesuk Min, Hyeyoung Chung, Eui-Young Yoon, Sungroh Seoul Natl Univ Dept Elect & Comp Engn Seoul 151744 South Korea Yonsei Univ Dept Elect & Elect Engn Seoul 120749 South Korea Chung Ang Univ Coll Pharm RNA Biopharm Lab Seoul 156756 South Korea Seoul Natl Univ Bioinformat Inst Seoul 151747 South Korea

Experimental techniques such as X-ray crystallography and nuclear magnetic resonance have been useful for the accurate determination of RNA tertiary structures. However, high-throughput structure determination using such methods often becomes difficult, due to the need for a large quantity of pure samples. Computational techniques for the prediction of RNA tertiary structures are thus becoming increasingly popular. Most of the existing prediction algorithms are computationally intensive, and there is a clear need for acceleration. In this paper, we propose a parallelization methodology for the fragment assembly of RNA (FARNA) algorithm, one of the most effective methods for computational prediction of RNA tertiary structure. The proposed parallelization scheme exploits multi-core CPUs and GPUs in harmony to maximize their utilization. We tested our approach with a number of RNA sequences and confirmed that it allows the time required for structure prediction to be significantly reduced. With respect to the baseline architecture equipped with a single CPU core, we achieved a speedup of up to approximately 24 x (roughly 4x by multi-core CPUs and 20x by GPUs). Compared with a quad-core CPU setup, the proposed approach delivers an additional 12x speedup by utilizing CPU devices. Given that most PCs these days have a multi-core CPU and a GPU card, our methodology will be very helpful for accelerating algorithms in a cost-effective manner. (C) 2013 Elsevier Ltd. All rights reserved.

关键词： RNA RNA structure prediction parallel algorithm Multi-core CPU GPGPU

来源：评论

学校读者我要写书评

暂无评论

Scalable matrix decompositions with multiple cores on FPGAs

引用

MICROPROCESSORS AND MICROSYSTEMS 2013年第8期37卷 887-898页

作者： Tai, Yi-Gang Lo, Chia-Tien Dan Psarris, Kleanthis Univ Texas San Antonio Dept Comp Sci San Antonio TX 78249 USA Southern Polytech State Univ Dept Comp Sci & Software Engn Marietta GA 30060 USA CUNY Brooklyn Coll Sch Nat & Behav Sci Brooklyn NY 11210 USA

Hardware accelerators are getting increasingly important in heterogeneous systems for many applications, including those that employ matrix decompositions. In recent years, a class of tiled matrix decomposition algorithms has been proposed for out-of-memory computations and multi-core architectures including GPU-based heterogeneous systems. However, on FPGAs these scalable solutions for large matrices are rarely found. In this paper we use the latest tiled decomposition algorithms from high performance linear algebra for off-chip memory access and loop mapping on multiple processing cores for on-chip computation to perform scalable and high performance QR and LU matrix decompositions on FPGAs. (C) 2012 Elsevier B.V. All rights reserved.

关键词： Matrix decomposition FPGA Hardware accelerator Linear algebra parallel algorithm Multi-core architecture

来源：评论

学校读者我要写书评

暂无评论

parallel High Throughput Soft-Output Sphere Decoding algorithm

引用

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY 2012年第2期68卷 217-231页

作者： Qi, Qi Chakrabarti, Chaitali Arizona State Univ Sch Elect Comp & Energy Engn Tempe AZ 85287 USA

Multiple-Input-Multiple-Output communication systems demand fast sphere decoding with high performance. To speed up the computation, we propose a scheme with multiple fixed complexity sphere decoders to construct a parallel soft-output fixed complexity sphere decoder (PFSD). The proposed decoder is highly parallel and has performance comparable to soft-output list fixed complexity sphere decoder (LFSD) and -best sphere decoder. In addition, we propose a parallel QR decomposition algorithm to lower the preprocessing overhead, and a low complexity LLR algorithm to allow parallel update of LLR values. We demonstrate that the PFSD algorithm can increase the throughput and reduce bit error rate of a soft-output solution in a 4 x 4 16-QAM system, and has superior performance compared to other soft decoders with comparable throughput and computation complexity. The PFSD algorithm has been mapped onto Xilinx XC4VLX160 FPGA. The resulting PFSD decoder can achieve up to 75 Mbps throughput for 4 x 4 64-QAM configuration at 100MHz with low control overhead.

关键词： Soft-output sphere decoding parallel algorithm Fixed complexity

来源：评论

学校读者我要写书评

暂无评论

AN INEXACT PERTURBED PATH-FOLLOWING METHOD FOR LAGRANGIAN DECOMPOSITION IN LARGE-SCALE SEPARABLE CONVEX OPTIMIZATION

引用

SIAM JOURNAL ON OPTIMIZATION 2013年第1期23卷 95-125页

作者： Quoc Tran Dinh Necoara, Ion Savorgnan, Carlo Diehl, Moritz Katholieke Univ Leuven Dept Elect Engn ESAT SCD B-3001 Louvain Belgium Katholieke Univ Leuven Optimizat Engn Ctr OPTEC B-3001 Louvain Belgium Univ Politehn Bucuresti Automat & Syst Engn Dept Bucharest 060042 Romania

This paper studies an inexact perturbed path-following algorithm in the framework of Lagrangian dual decomposition for solving large-scale separable convex programming problems. Unlike the exact versions considered in the literature, we propose solving the primal subproblems inexactly up to a given accuracy. This leads to an inexactness of the gradient vector and the Hessian matrix of the smoothed dual function. Then an inexact perturbed algorithm is applied to minimize the smoothed dual function. The algorithm consists of two phases, and both make use of the inexact derivative information of the smoothed dual problem. The convergence of the algorithm is analyzed, and the worst-case complexity is estimated. As a special case, an exact path-following decomposition algorithm is obtained and its worst-case complexity is given. Implementation details are discussed, and preliminary numerical results are reported.

关键词： smoothing technique self-concordant barrier Lagrangian decomposition inexact perturbed Newton-type method separable convex optimization parallel algorithm

来源：评论

学校读者我要写书评

暂无评论

A parallel Oseen-linearized algorithm for the stationary Navier-Stokes equations

引用

COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING 2012年 209卷 172-183页

作者： Shang, Yueqiang He, Yinnian Guizhou Normal Univ Sch Math & Comp Sci Guiyang 550001 Peoples R China Xi An Jiao Tong Univ Fac Sci State Key Lab Multiphase Flow Power Engn Xian 710049 Peoples R China

Based on two-grid discretizations and domain decomposition, a parallel Oseen-linearized finite element algorithm for the stationary Navier-Stokes equations with moderate or large viscosity parameter is proposed and analyzed. The key idea of the algorithm is to first solve a nonlinear problem by Picard iterative method on a coarse grid, and then to solve an Oseen problem in parallel on a fine grid to correct the coarse grid solution. By using local a priori error estimate for the finite element solution and under the uniqueness condition, error bounds of the corresponding finite element solution are analyzed. Numerical results are also given to demonstrate the high efficiency of the algorithm. (C) 2011 Elsevier B.V. All rights reserved.

关键词： Navier-Stokes equations Finite element Picard iteration Two-grid method parallel algorithm

来源：评论

学校读者我要写书评

暂无评论

parallelizing a state exchange strategy for noncooperative distributed NMPC

引用

SYSTEMS & CONTROL LETTERS 2013年第1期62卷 29-36页

作者： Pannek, Juergen Univ Fed Armed Forces Fac Aerosp Engn D-85577 Munich Germany

We consider a distributed noncooperative control setting in which systems are interconnected via state constraints. Each of these systems is governed by an agent which is responsible for exchanging information with its neighbours and computing a feedback law using a nonlinear model predictive controller to avoid violations of constraints. For this setting we present an algorithm which generates a parallelizable hierarchy among the systems. Moreover, we show both feasibility and stability of the closed loop using only abstract properties of this algorithm. To this end, we utilize a trajectory based stability result which we extend to the distributed setting. (C) 2012 Elsevier B.V. All rights reserved.

关键词： Nonlinear model predictive control Stability parallel algorithm

来源：评论

学校读者我要写书评

暂无评论

SYSTEMS OF STRUCTURED MONOTONE INCLUSIONS: DUALITY, algorithmS, AND APPLICATIONS

引用

SIAM JOURNAL ON OPTIMIZATION 2013年第4期23卷 2420-2447页

作者： Combettes, Patrick L. Univ Paris 06 Lab Jacques Louis Lions UMR 7598 F-75005 Paris France King Abdulaziz Univ KAU Dept Math Jeddah Saudi Arabia

A general primal-dual splitting algorithm for solving systems of structured coupled monotone inclusions in Hilbert spaces is introduced and its asymptotic behavior is analyzed. Each inclusion in the primal system features compositions with linear operators, parallel sums, and Lipschitzian operators. All the operators involved in this structured model are used separately in the proposed algorithm, most steps of which can be executed in parallel. This provides a flexible solution method applicable to a variety of problems beyond the reach of the state-of-the-art. Several applications are discussed to illustrate this point.

关键词： convex minimization coupled system infimal convolution monotone inclusion monotone operator operator splitting parallel algorithm structured minimization problem

来源：评论

学校读者我要写书评

暂无评论

Finding All Maximal Contiguous Subsequences of a Sequence of Numbers in O(1) Communication Rounds

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2013年第4期24卷 724-733页

作者： Rodrigues Alves, Carlos Eduardo Caceres, Edson Norberto Song, Siang Wun Univ Sao Judas Tadeu BR-05416010 Sao Paulo Brazil Univ Fed Mato Grosso do Sul Fac Comp BR-79070900 Campo Grande MS Brazil Univ Sao Paulo BR-05416010 Sao Paulo Brazil Univ Fed ABC Sao Paulo Brazil

Given a sequence A of real numbers, we wish to find a list of all nonoverlapping contiguous subsequences of A that are maximal. A maximal subsequence M of A has the property that no proper subsequence of M has a greater sum of values. Furthermore, M may not be contained properly within any subsequence of A with this property. This problem has several applications in Computational Biology and can be solved sequentially in linear time. We present a BSP/CGM algorithm that solves this problem using p processors in O(vertical bar A vertical bar=p) time and O(vertical bar A vertical bar/p) space per processor. The algorithm uses a constant number of communication rounds of size at most O(vertical bar A vertical bar/p). Thus, the algorithm achieves linear speedup and is highly scalable. To our knowledge, there are no previous known parallel BSP/CGM algorithms to solve this problem.

关键词： All maximal subsequences problem maximum subsequence problem parallel algorithm coarse-grained multicomputer communication rounds

来源：评论

学校读者我要写书评

暂无评论

An efficient algorithm for solving a multi-layer convection-diffusion problem applied to air pollution problems

引用

ADVANCES IN ENGINEERING SOFTWARE 2013年 65卷 191-199页

作者： Ferragut, L. Asensio, M. I. Cascon, J. M. Prieto, D. Ramirez, J. Inst Univ Fis Fundamental & Matemat Salamanca 37008 Spain Univ Salamanca Dept Matemat Aplicada E-37008 Salamanca Spain Univ Salamanca Dept Econ & Hist Econ Salamanca 37007 Spain Tecnosylva Sl Leon 24009 Spain

An urban scale Eulerian non-reactive multilayer air pollution model is proposed describing convection, turbulent diffusion and emission. A mass-consistent wind field model developed by authors is included in the air pollution model. An Adaptive Finite Element Method with characteristics in the horizontal directions and Finite Differences in the vertical direction using splitting techniques is proposed to numerically solve the corresponding PDE problem. A parallel version of the algorithm improves the precision of the solution keeping computation time below real time of simulation. A numerical example illustrates the whole problem. (C) 2013 Elsevier Ltd. All rights reserved.

关键词： Air pollution modeling Splitting methods Adaptive Finite Element Method parallel algorithm Parabolic convection-diffusion PDE PDE numerical methods

来源：评论

学校读者我要写书评

暂无评论

Benefits of using parallelized non-progressive network coding

引用

JOURNAL OF NETWORK AND COMPUTER APPLICATIONS 2013年第1期36卷 293-305页

作者： Kim, Minwoo Park, Karam Ro, Won W. Yonsei Univ Sch Elect & Elect Engn Seoul 120749 South Korea Samsung Elect Platform R&D Team Mobile Commun Suwon South Korea

Network coding helps improve communication rate and save bandwidth by performing a special coding at the sending or intermediate nodes. However, encoding/decoding at the nodes creates computation overhead on large input data that causes coding delays. Therefore the progressive method which can hide decoding delay in waiting time is proposed in the previous works. However, the network speed has been greatly accelerated and progressive schemes are no longer the most efficient decoding method. Thus, we present non-progressive decoding algorithm that can be more aggressively parallelized than the progressive network coding, which can diminish the advantages of hidden decoding time of progressive methods by utilizing the multi-core processors. Moreover, the block algorithm implemented by non-progressive decoding helps to reduce cache misses. Through experiments, our scheme which relies on matrix inversion and multiplication shows 46.0% improved execution time and 89.2% last level cache miss reduction compared to the progressive method on multi-core systems. (C) 2012 Elsevier Ltd. All rights reserved.

关键词： Network coding parallel algorithm Non-progressive decoder Tiling algorithm Matrix inversion Matrix multiplication

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：