检索结果-内蒙古大学图书馆

DISTRIBUTED SPARSE GAUSSIAN-ELIMINATION AND ORTHOGONAL FACTORIZATION

SIAM JOURNAL ON SCIENTIFIC COMPUTING 1995年第6期16卷 1462-1477页

作者： RAGHAVAN, P UNIV ILLINOIS NATL CTR SUPERCOMP APPLICATURBANAIL 61801

A unified framework is presented for a fully parallel solution of large, sparse nonsymmetric linear systems on distributed memory multiprocessors. Unlike earlier work, both symbolic and numeric steps are parallelized. parallel Cartesian nested dissection is used to compute a fill-reducing ordering of A using a compact representation of the column intersection graph, and the resulting separator tree is used to estimate the structure of the factor and to distribute data and perform, multifrontal numeric computations. When the matrix is nonsymmetric but square, the numeric computations involve Gaussian elimination with partial pivoting;when the matrix is overdetermined, row-oriented Householder transforms are applied to compute the triangular factor of an orthogonal factorization. Extensive empirical results are provided to demonstrate that the approach is effective both in preserving sparsity and achieving good parallel performance on an Intel iPSC/860.

关键词： parallel algorithms SPARSE LINEAR SYSTEMS SPARSE MATRIX FACTORIZATION GAUSSIAN ELIMINATION ORTHOGONAL FACTORIZATION NESTED DISSECTION

来源：评论

学校读者我要写书评

暂无评论

A distributed normalized explicit preconditioned conjugate gradient method

引用

parallel algorithms and Applications 2004年第2-3期19卷 163-174页

作者： Gravvanis, G.A. Giannoutakis, K.M. Missirlis, N.M. Department of Computer Science Hellenic Open University Patras Greece Department of Informatics University of Athens Panepistimioupolis GR 157 84 Athens Greece

A new parallel normalized explicit preconditioned conjugate gradient method in conjunction with normalized approximate inverse matrix techniques is presented for solving efficiently sparse linear systems on multi-computer systems. Application of the proposed method on a three dimensional boundary value problem is discussed and numerical results are given. The implementation and performance on a distributed, memory MIMD machine, using message passing interface (MPI) is also investigated.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

An Efficient Large-Scale Sensor Deployment Using a parallel Genetic Algorithm Based on CUDA

引用

INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS 2016年第3期12卷 8612128-8612128页

作者： Seo, Jae-Hyun Yoon, Yourim Kim, Yong-Hyuk Kwangwoon Univ Dept Comp Sci & Engn 20 Kwangwoon Ro Seoul 139701 South Korea Gachon Univ Dept Comp Engn 1342 Sengnamdaero Songnam 461701 Gyeonggi Do South Korea

We have employed evolutionary computation to solve the optimization problem of sensor deployment in battlefield environments. A genetic algorithm has the advantage of delivering results of a higher quality than simple computational algorithms, but it has the drawback of requiring too much computing time. This study aimed not only to shorten the computing time to as close to real-time as possible by using the Compute Unified Device Architecture (CUDA) but also to maintain a solution quality that is as good as or better than the case when the proposed algorithm is not used. In the proposed genetic algorithm, parallelization was applied to speed up the fitness evaluation requiring heavy computation time. The proposed CUDA-based design approach for complex and various sensor deployments is validated by means of simulation. We parallelized two parts in Monte Carlo simulation for the fitness evaluation: moving lots of tested vehicles and calculating the probability of detection (POD) for each vehicle. The experiment was divided into CPU and GPU experiments depending on arithmetic unit types. In the GPU experiment, the results showed similar levels for the detection probability by GPU and CPU, and the computing time decreased by approximately 55-56 times.

关键词： CUDA (Computer architecture) SENSOR placement parallel algorithms GENETIC algorithms LARGE scale systems EVOLUTIONARY computation

来源：评论

学校读者我要写书评

暂无评论

Two packet routing algorithms on a mesh-connected computer

引用

IEEE Transactions on parallel and Distributed Systems 1995年第4期6卷 436-440页

作者： Gu, Qian-Ping Gu, Jun Univ of Aizu Aizu-Wakamatsu Fukushima Japan

In this paper, we give two algorithms for the 1-1 routing problems on a mesh-connected computer. The first algorithm, with queue size 28, solves the 1-1 routing problem on an n × n mesh-connected computer in 2n + O(1) steps. This improves the previous queue size of 75. The second algorithm solves the 1-1 routing problem in 2n - 2 steps with queue size 12ts/s where ts is the time for sorting an s × s mesh into a row major order for all s ≥ 1. This result improves the previous queue size 18.67ts/s.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

NUMERICALLY STABLE SOLUTION OF DENSE SYSTEMS OF LINEAR-EQUATIONS USING MESH-CONNECTED PROCESSORS: 收藏
分享
引用; SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING 1984年第1期5卷 95-104页; 作者： BOJANCZYK, A BRENT, RP KUNG, HT AUSTRALIAN NATL UNIV DEPT COMP SCICANBERRAACT 2600AUSTRALIA CARNEGIE MELLON UNIV DEPT COMP SCIPITTSBURGHPA 15213; We propose a multiprocessor structure for solving a dense system of n linear equations. The solution is obtained in two stages. First, the matrix of coefficients is reduced to upper triangular form via Givens rotation... 详细信息; We propose a multiprocessor structure for solving a dense system of n linear equations. The solution is obtained in two stages. First, the matrix of coefficients is reduced to upper triangular form via Givens rotations. Second, a back substitution process is applied to the triangular system. A two-dimensional array of $θ (n^{2})$; 关键词： Givens method least squares linear systems numerical stability orthogonal factorization parallel algorithms QR method special-purpose hardware systolic arrays VLSI; 来源：评论; 学校读者我要写书评

暂无评论

Auto-tuning of level 1 and level 2 BLAS for GPUs

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2013年第8期25卷 1183-1198页

作者： Sorensen, Hans Henrik Brandenborg Tech Univ Denmark DK-2800 Lyngby Denmark

The use of high-performance libraries for dense linear algebra operations is of great importance in many numerical scientific applications. The most common operations form the backbone of the Basic Linear Algebra Subroutines (BLAS) library. In this paper, we consider the performance and auto-tuning of level 1 and level 2 BLAS routines on graphical processing units. As examples, we develop single-precision Compute Unified Device Architecture kernels for three of the most popular operations, the Euclidian norm (SNRM2), the matrixvector multiplication (SGEMV), and the triangular solution (STRSV). The target hardware is the most recent Nvidia (Santa Clara, CA, USA) Tesla 20-series (Fermi architecture), which is designed from the ground up for high-performance computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core graphical processing unit to achieve high performance for level 1 and level 2 BLAS operations. We show that auto-tuning can be successfully employed to kernels for these operations so that they perform well for all input sizes. Copyright (c) 2012 John Wiley & Sons, Ltd.

关键词： GPU BLAS dense linear algebra parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A JACOBI-LIKE ALGORITHM FOR COMPUTING THE SCHUR DECOMPOSITION OF A NON-HERMITIAN MATRIX

引用

SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING 1985年第4期6卷 853-864页

作者： STEWART, GW

This paper describes an iterative method for reducing a general matrix to upper triangular form by unitary similarity transformations. The method is similar to Jacobi’s method for the symmetric eigenvalue problem in that it uses plane rotations to annihilate off-diagonal elements, and when the matrix is Hermitian it reduces to a variant of Jacobi’s method. Although the method cannot compete with the QR algorithm in serial implementation, it admits of a parallel implementation in which a double sweep of the matrix can be done in time proportional to the order of the matrix.

关键词： eigenvalue Schur decomposition Jacobi algorithm parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A parallel SIMULATED ANNEALING-BASED CHANNEL ROUTER

引用

International Journal of High Speed Computing 1994年第1期6卷 101-114页

作者： R. MALL SRILATA RAMAN L.M. PATNAIK Motorola India Electronics Ltd The Presidency #1 St. Marks Road Bangalore 560 001 India Dept. of Electrical and Computer Engineering and Coordinated Science Laboratory University of Illinois at Urbana-Champaign IL 61801 USA Microprocessor Applications Laboratory Indian Institute of Science Bangalore 560 012 India

The routing problem of VLSI layout design is very compute intensive. Consequently, the routing task often turns out to be a bottleneck in the layout design of large circuits. parallel processing of the routing problem holds promise for mitigating this situation. In this context, we present a parallel channel routing algorithm that is targetted to run on loosely coupled computers like hypercubes. The proposed parallel algorithm employs the simulated annealing technique for obtaining near- optimum solutions. Initially, the number of tracks in the channel is made equal to the number of nets, and partitions of the channel are appropriately assigned to the nodes of the hypercube. Each node carries out concurrent perturbations to obtain new channel states that satisfy the constraints for a given net list. The algorithm minimizes the number of tracks iteratively by using the simulated annealing technique. For efficient execution, we attempt to reduce the communication overheads by restricting the broadcast updates to cases of interprocessor net transfers only. Performance evaluation studies of the algorithm show promising results.

关键词： parallel algorithms channel routers hypercube architecture simulated annealing algorithm

来源：评论

学校读者我要写书评

暂无评论

parallel RECOGNITION OF HIGH DIMENSIONAL IMAGES

引用

International Journal of Pattern Recognition and Artificial Intelligence 1992年第2N03期6卷 285-291页

作者： M. NIVAT A. SAOUDI L.I.T.P. Université Paris VII 2 Place Jussieu 75251 Paris Cedex 05 France L.I.T.P. Université Paris VII Centre Scientifique et Polytechnique Avenue J. B. Clément 93400 Villetaneuse France

We investigate the complexity of the recognition of images generated by a class of context-free image grammars. We show that the sequential time complexity of the recognition of an n × n image as generated by a context-free grammar is O(nM(n)), where M(n) is the time to multiply two boolean n × n matrices. The space complexity of this recognition is O(n 3 ). Using a parallel random access machine (i.e. PRAM), the recognition can be done in O( log 2 (n)) time with n 7 processors or in O(n log 2 (n)) time with n 6 processors. We also introduce high dimensional context-free grammars and prove that their recognition problem is polylogarithmic.

关键词： Context-free grammars Chomsky normal form images parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Heterogeneous Island Models and Their Application to Recommender Systems and Electric Vehicle Charging

引用

INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS 2020年第3-4期29卷

作者： Balcar, Stepan Pilat, Martin Charles Univ Prague Fac Math & Phys Malostranske Namesti 25 Prague 11800 Czech Republic

In this paper we describe a general framework for parallel optimization based on the island model of evolutionary algorithms. The framework runs a number of optimization methods in parallel with periodic communication. In this way, it essentially creates a parallel ensemble of optimization methods. At the same time, the system contains a planner that decides which of the available optimization methods should be used to solve the given optimization problem and changes the distribution of such methods during the run of the optimization. Thus, the system effectively solves the problem of online parallel portfolio selection. The proposed system is evaluated in a number of common benchmarks with various problem encodings as well as in two real-life problems - the optimization in recommender systems and the training of neural networks for the control of electric vehicle charging.

关键词： Evolutionary algorithms parallel algorithms recommender systems electric vehicle charging

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法