检索结果-内蒙古大学图书馆

Computational experience with sequential and parallel, preconditioned Jacobi-Davidson for large, sparse symmetric matrices

引用

JOURNAL OF COMPUTATIONAL PHYSICS 2003年第1期188卷 318-331页

作者： Bergamaschi, L Pini, G Sartoretto, F Univ Venice Dipartimento Informat I-30171 Mestre VE Italy Univ Padua Dipartimento Metodi & Modelli Matemat Sci Applica I-35131 Padua Italy

The Jacobi-Davidson (JD) algorithm was recently proposed for evaluating a number of the eigenvalues of a matrix. JD goes beyond pure Krylov-space techniques;it cleverly expands its search space, by solving the so-called correction equation, thus in. principle providing a more powerful method. Preconditioning the Jacobi-Davidson correction equation is mandatory when large, sparse matrices are analyzed. We considered several preconditioners: Classical block-Jacobi, and IC(0), together with approximate inverse (AIN-V or FSAI) preconditioners. The rationale for using approximate inverse preconditioners is their high parallelization potential, combined with their efficiency in accelerating the iterative solution of the correction equation. Analysis was carried on the sequential performance of preconditioned JD for the spectral decomposition of large, sparse matrices, which originate in the numerical integration of partial differential equations arising in physical and engineering problems. It was found that JD is highly sensitive to preconditioning, and it can display an irregular convergence behavior. We parallelized JD by data-splitting techniques, combining them with techniques to reduce the amount of communication data. Our own parallel, preconditioned code was executed on a dedicated parallel machine, and we present the results of our experiments. Our JD code provides an appreciable parallel degree of computation. Its performance was also compared with those of PARPACK and parallel DACG. (C) 2003 Elsevier Science B.V. All rights reserved.

关键词： eigenvalues sparse approximate inverses parallel algorithms Jacobi-Davidson method

来源：评论

学校读者我要写书评

暂无评论

SPMD cluster-based parallel 3D OSEM

引用

IEEE TRANSACTIONS ON NUCLEAR SCIENCE 2003年第5期50卷 1498-1502页

作者： Jones, JP Jones, WF Kehren, F Newport, DF Reed, JH Lenox, MW Baker, K Byars, LG Michel, C Casey, ME CPS Innovat Knoxville TN 37932 USA Concorde Microsyst Knoxville TN 37932 USA Byars Consulting Knoxville TN 37932 USA

This study empirically compares two approaches to parallel 3-D OSEM that differ as to whether calculations are assigned to nodes by projection number or by transaxial plane number. For projection space decomposition (PSD), the forward projection is completely parallel, but backprojection requires a slow image synchronization. For image space decomposition (ISD), the communication associated with forward projection can be overlapped with calculation, and the communication associated with backprojection is more efficient. To compare these methods, an implementation of 3-D OSEM for three PET scanners is developed that runs on an experimental 9-node, 18-processor cluster computer. For selected benchmarks, both methods exhibit speedups in excess of eight or nine nodes, and comparable performance for the tested range of cluster sizes.

关键词： biomedical image processing image reconstruction parallel algorithms parallel processing positron emission tomography

来源：评论

学校读者我要写书评

暂无评论

Hybrid parallel evolutionary algorithms for constrained optimization utilizing PC clustering

Hybrid parallel evolutionary algorithms for constrained opti...

引用

Congress on Evolutionary Computation (CEC 2001)

作者： Lee, CH Park, KH Kim, JH Korea Adv Inst Sci & Technol Dept Elect Engn Taejon 305701 South Korea

ISBN: (纸本)0780366573

This paper proposes a hybrid parallelization of evolutionary algorithms (EAs) utilizing PC clustering environments to solve constrained numerical optimization problems. In proposed parallel structure, the coarse-grained parallel EAs (PEAs) were implicated in upper level and the fine-grained PEAs were used in lower level. The design of effective evolutionary algorithms (EAs) is to obtain a proper balance between exploration and exploitation. The balance can be controlled by the spread rate and the migration of the best individuals. In hybrid structure, the spread rate is high in lower level coarse-grained structure and low in upper level globally structure. The diversity is promoted by dividing individuals to several groups and migrating individual between them. By utilizing large number of processors, the optimization performance as well as the computation time were improved. The simulation results indicate that hybrid parallel EAs using the proposed structure has better performance in constrained numerical optimization problems than coarse-grained, or fine-grained parallel EAs, which are dedicated parallelization methods in previous work.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Dixon matrices by bracket

引用

ADVANCES IN COMPUTATIONAL MATHEMATICS 2003年第4期19卷 373-383页

作者： Chionh, EW Natl Univ Singapore Sch Comp Singapore 117543 Singapore

It is known that the Dixon matrix can be constructed in parallel either by entry or by diagonal. This paper presents another parallel matrix construction, this time by bracket. The parallel by bracket algorithm is the fastest among the three, but not surprisingly it requires the highest number of processors. The method also shows analytically that the Dixon matrix has a total of m(m + 1)(2)(m + 2)n(n + 1)(2)(n+ 2)/36 brackets but only mn(m + 1)(n+ 1)(mn + 2m + 2n + 1)/6 of them are distinct.

关键词： Dixon matrices brackets parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Mapping high level algorithms onto massively parallel reconfigurable hardware

Mapping high level algorithms onto massively parallel reconf...

引用

ACS/IEEE International Conference on Computer Systems and Applications

作者： I. Damaj J. Hawkins A. Abdallah Centre for Applied Formal Methods London South Bank University London UK

Summary form only given, as follows. We focus on implementing high level functional algorithms in reconfigurable hardware. The approach adopts the transformational programming paradigm for deriving massively parallel algorithms from functional specifications. It extends previous work by systematically generating efficient circuits and mapping them onto reconfigurable hardware. The massive parallelisation of the algorithm works by carefully composing "off the shelf" highly parallel implementations of each of the basic building blocks involved in the algorithm. These basic building blocks are a small collection of well-known higher order functions such as map, fold, and zipwith. By using function decomposition and data refinement techniques, these powerful functions are refined into highly parallel implementations described in Hoare's CSP. The CSP descriptions are very closely associated with Handle-C program fragments. Handle-C is a programming language based on C and extended by parallelism and communication primitives taken from CSP. In the final stage the circuit description is generated by compiling Handle-C programs and then mapped onto the targeted reconfigurable hardware such as the RC-1000 FPGA system from Celoxica. This approach is illustrated by a case study involving the generation of several versions of the matrix multiplication algorithm.

关键词： Hardware Circuits Field programmable gate arrays Functional programming parallel programming parallel algorithms Computer languages Reconfigurable architectures

来源：评论

学校读者我要写书评

暂无评论

An efficient parallel algorithm for scheduling interval ordered tasks

引用

JOURNAL OF COMPLEXITY 2003年第4期19卷 597-609页

作者： Chung, YJ Park, K Hankuk Univ Foreign Studies Dept Comp Engn Kyonggi Do 449791 South Korea Seoul Natl Univ Sch Engn & Comp Sci Seoul 151742 South Korea

We present an efficient parallel algorithm for scheduling n unit length tasks on m identical processors when the precedence graphs are interval orders. Our algorithm requires O(log(2) v + (n log n)/v) time and O(nv(2) + n(2)) operations on the CREW PRAM, where v can be any number between 1 and n. By choosing v = rootn, we obtain an O(rootn log n)-time algorithm with O(n(2)) operations. For v = n/log n, we have an O(log(2) n)-time algorithm with O(n(3)/log(2) n) operations. The previous solution takes O(log(2) n) time with O(n(3) log(2) n) operations on the CREW PRAM. Our improvement is mainly due to a simple dynamic programming recurrence for computing the lengths of optimal schedules and a reduction of the m-processor scheduling problem for interval orders to that of finding a maximum matching in a convex bipartite graph. (C) 2003 Elsevier Science (USA). All rights reserved.

关键词： PRODUCTION scheduling parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Efficient broadcasts and simple algorithms for parallel linear algebra computing in clusters

Efficient broadcasts and simple algorithms for parallel line...

引用

International Symposium on parallel and Distributed Processing (IPDPS)

作者： F.G. Tinetti E. Luque Escuela Ténica Superior de Ingeniería Universidad Autónoma de Barcelona Barcelona Spain Facultad de Informática Universidad Nacional de La Plata La Plata Argentina

This paper presents a natural and efficient implementation for the classical broadcast message passing routine which optimizes performance of Ethernet based clusters. A simple algorithm for parallel matrix multiplication is specifically designed to take advantage of both, parallel computing facilities (CPUs) provided by clusters, and optimized performance of broadcast messages on Ethernet based clusters. Also, this simple parallel algorithm proposed for matrix multiplication takes into account the possibly heterogenous computing hardware and maintains a balanced workload of computers according to their relative computing power. Performance tests are presented on a heterogenous cluster as well as on a homogeneous cluster, where it is compared with the parallel matrix multiplication provided by the ScaLAPACK library. Another simple parallel algorithm is proposed for LU matrix factorization (a general method to solve dense systems of equations) following the same guidelines used for the parallel matrix multiplication algorithm. Some performance tests are presented over a homogenous cluster.

关键词： Broadcasting Clustering algorithms Linear algebra Concurrent computing Ethernet networks parallel algorithms Testing Message passing Algorithm design and analysis parallel processing

来源：评论

学校读者我要写书评

暂无评论

parallel Genehunter: implementation of a linkage analysis package for distributed-memory architectures

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2003年第7-8期63卷 674-682页

作者： Conant, GC Plimpton, SJ Old, W Wagner, A Fain, PR Pacheco, TR Heffelfinger, G Univ New Mexico Dept Biol Albuquerque NM 87131 USA Sandia Natl Labs Computat Comp & Math Ctr Albuquerque NM 87185 USA Agilent Labs Ft Collins CO USA Univ Colorado Hlth Sci Ctr Ft Collins CO USA

We present a parallel algorithm for performing multipoint linkage analysis of genetic marker data on large family pedigrees. The algorithm effectively distributes both the computation and memory requirements of the analysis. We discuss an implementation of the algorithm in the Genehunter linkage analysis package (version 2.1), enabling Genehunter to run on distributed-memory platforms for the first time. Our preliminary benchmarks indicate reasonable scalability of the algorithm even for fixed-size problems, with parallel efficiencies of 75% or more on up to 128 processors. In addition, we have extended the hard-coded limit of 16 non-founding individuals in Genehunter 2.1 to a new limit of 32 non-founding individuals. (C) 2003 Elsevier Inc. All rights reserved.

关键词： parallel Lines ARCHITECTURE Memory architecture Genetic Linkage Analysis large family encapsulating memory requirements distributed memory parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel biological sequence comparison using prefix computations

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2003年第3期63卷 264-272页

作者： Aluru, S Futamura, N Mehrotra, K Iowa State Univ Dept Elect & Comp Engn Ames IA 50011 USA Wright State Univ Dept Comp Sci Dayton OH 45435 USA Syracuse Univ Sch EECS Syracuse NY 13244 USA

We present practical parallel algorithms using prefix computations for various problems that arise in pairwise comparison of biological sequences. We consider both constant and affine gap penalty functions, full-sequence and subsequence matching, and space-saving algorithms. Commonly used sequential algorithms solve the sequence comparison problems in O(mn) time and O(m + n) space, where m and n are the lengths of the sequences being compared. All the algorithms presented in this paper are time optimal with respect to the sequential algorithms and can use O((log n)-(n)) processors where n is the length of the larger sequence. While optimal parallel algorithms for many of these problems are known, we use a simple framework and demonstrate how these problems can be solved systematically using repeated parallel prefix operations. We also present a space-saving algorithm that uses O(m + (p)-(n)) space and runs in optimal time where p is the number of the processors used. We implemented the parallel space-saving algorithm and provide experimental results on an IBM SP-2 and a Pentium cluster. (C) 2003 Elsevier Science (USA). All rights reserved.

关键词： computational biology sequence alignments parallel prefix space-efficient parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A multi-threaded fast convolver for dynamically parallel image filtering

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2003年第3期63卷 360-372页

作者： Kepner, J MIT Lincoln Lab Lexington MA 02420 USA

2D convolution is a staple of digital image processing. The advent of large format imagers makes it possible to literally "pave" with silicon the focal plane of an optical sensor, which results in very large images that can require a significant amount computation to process. Filtering of large images via 2D convolutions is often complicated by a variety of effects (e.g., non-uniformities found in wide field of view instruments) which must be compensated for in the filtering process by changing the filter across the image. This paper describes a fast (FFT based) method for convolving images with slowly varying filters. A parallel version of the method is implemented using a multi-threaded approach, which allows more efficient load balancing and a simpler software architecture. The method has been implemented within a high level interpreted language (IDL), while also exploiting open standards vector libraries (VSIPL) and open standards parallel directives (OpenMP). The parallel approach and software architecture are generally applicable to a variety of algorithms and has the advantage of enabling users to obtain the convenience of an easy operating environment while also delivering high performance using a fully portable code. (C) 2003 Elsevier Science (USA). All rights reserved.

关键词： image processing parallel algorithms multi-threaded open standards high level languages

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：