检索结果-内蒙古大学图书馆

Physically based simulation of cloth on distributed memory architectures

PARALLEL COMPUTING 2007年第6期33卷 377-390页

作者： Thomaszewski, Bernhard Blochinger, Wolfgang Univ Tubingen WSI SR D-72076 Tubingen Germany Univ Tubingen WSI GRIS D-72076 Tubingen Germany

Physically based simulation of cloth in virtual environments is a computationally demanding problem. It involves modeling the internal material properties of the textile (physical modeling) and also treating interactions with the surrounding scene (collision handling). In this paper, we present an approach to parallel cloth simulation designed for distributed memory parallel architectures, particularly clusters built of commodity components. We discuss parallel techniques for the physical modeling phase as well as for the collision handling phase which can significantly reduce the respective computation times. To deal with the very fine granularity of the physical modeling phase we apply a static data decomposition approach based on graph partitioning. In order to cope with the high irregularity of the collision handling phase we employ taskparallel techniques based on fully dynamic problem decomposition. We show how both techniques can be integrated into a robust parallel cloth simulation method which can deal with considerably complex scenes. (c) 2007 Elsevier B.V. All rights reserved.

关键词： parallel cloth simulation parallel collision handling irregular problems distributed memory architectures

来源：评论

学校读者我要写书评

暂无评论

A sparse nonsymmetric eigensolver for distributed memory architectures

引用

INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND distributed SYSTEMS 2008年第3期23卷 259-270页

作者： Guarracino, Mario R. Perla, Francesca Zanetti, Paolo Italian Natl Res Council Inst High Performance Comp & Networking Via P Castellino 111 I-80131 Naples Italy Univ Naples Parthenope I-80133 Naples Italy

In this work, we propose an efficient parallel implementation of the nonsymmetric block Lanczos algorithm for the computation of few extreme eigenvalues, and corresponding eigenvectors, of real nonhermitian matrices for distributed memory multicomputers. The reorganisation of the block Lanczos algorithm implemented allows to exploit a coarse-grained parallelism and to harness the computational power of the target architectures. The computational kernels of the algorithm are matrix-matrix multiplications, with dense and sparse factors, QR factorisation and singular value decomposition. To reduce the total amount of communication involved in the matrix-matrix multiplication with a sparse factor, we substitute each matrix appearing in the algorithm with its transpose. Then, we develop an efficient parallelisation of the matrix-matrix multiplication when the second factor is sparse. Some other linear algebra operations are performed using ScaLAPACK library. The parallel eigensolver has been tested on a cluster of PCs. All reported results show the proposed algorithm is efficient on the target architectures for problems of adequate dimension.

关键词： nonsymmetric eigensolver parallel block Lanczos algorithm distributed memory architectures matrix-matrix multiplication

来源：评论

学校读者我要写书评

暂无评论

Task clustering and scheduling for distributed memory parallel architectures

引用

IEEE TRANSACTIONS ON PARALLEL AND distributed SYSTEMS 1996年第1期7卷 46-55页

作者： Palis, MA Liou, JC Wei, DSL UNIV AIZU SCH ENGN & COMP SCIAIZU WAKAMATSUFUKUSHIMA 965JAPAN

This paper addresses the problem of scheduling parallel programs represented as directed acyclic task graphs for execution on distributed memory parallel architectures. Because of the high communication overhead in existing parallel machines, a crucial step in scheduling is task clustering, the process of coalescing fine grain tasks into single coarser ones so that the overall execution time is minimized. The task clustering problem is NP-hard, even when the number of processors is unbounded and task duplication is allowed. A simple greedy algorithm is presented for this problem which, for a task graph with arbitrary granularity, produces a schedule whose makespan is at most twice optimal. Indeed, the quality of the schedule improves as the granularity of the task graph becomes larger. For example, if the granularity is at least 1/2, the makespan of the schedule is at most 5/3 times optimal. For a task graph with n tasks and e inter-task communication constraints, the algorithm runs in O(n(n lg n + e)) time, which is n times faster than the currently best known algorithm for this problem. Similar algorithms are developed that produce: (1) optimal schedules for coarse grain graphs;(2) 2-optimal schedules for trees with no task duplication;and (3) optimal schedules for coarse grain trees with no task duplication.

关键词： program task graph task granularity task scheduling distributed memory architectures approximation algorithms

来源：评论

学校读者我要写书评

暂无评论

PARALLEL ALGORITHMS FOR FINDING A SUBOPTIMAL FUNDAMENTAL-CYCLE SET IN A GRAPH

引用

PARALLEL COMPUTING 1993年第9期19卷 961-971页

作者： CZECH, ZJ KONOPKA, M MAJEWSKI, BS POLISH ACAD SCI INST COMP SCIPL-44100 GLIWICEPOLAND UNIV QUEENSLAND KEY CTR SOFTWARE TECHNOLDEPT COMP SCIST LUCIAQLD 4072AUSTRALIA

An NP-complete problem of finding a fundamental-cycle set of a graph G with minimum total length is considered. Two parallel algorithms of O(n2/p + n log n log p) and O(m + n2/p + n log(n/p) + n log p) costs to find a suboptimal solution to this problem are presented (p is a number of processors, n is a number of vertices, and m is a number of edges of G). The algorithms partition an edge and vertex set of G among processors, respectively, and use a new heuristic method to solve the problem. A message-based tree-connected MIMD computer is assumed as a model of parallel computations. The algorithms were implemented for a binary tree of 15 transputers, and the experiments were conducted on a wide range of random graphs. The results show that the vertex set partition algorithm with inferior theoretical cost gives better speedups and finds the fundamental-cycle sets of shorter total lengths as compared to the edge set partition algorithm.

关键词： DESIGN AND ANALYSIS OF PARALLEL ALGORITHMS COMPLEXITY OF PARALLEL COMPUTATIONS distributed memory architectures TRANSPUTER-BASED SYSTEMS

来源：评论

学校读者我要写书评

暂无评论

An efficient parallel algorithm to solve block-Toeplitz systems

引用

JOURNAL OF SUPERCOMPUTING 2005年第3期32卷 251-278页

作者： Alonso, P Badía, JM Vidal, AM Univ Politecn Valencia Dept Sistemas Informat & Computac E-46071 Valencia Spain Univ Jaume 1 Dept Ingn & Ciencia Comp Castellon Spain

In this paper, we present an efficient parallel algorithm to solve Toeplitz-block and block-Toeplitz systems in distributed memory multicomputers. This algorithm parallelizes the Generalized Schur Algorithm to obtain the semi-normal equations. Our parallel implementation reduces the communication cost and optimizes the memory access. The experimental analysis on a cluster of personal computers shows the scalability of the implementation. The algorithm is portable because it is based on standard tools and libraries, such as ScaLAPACK and MPI.

关键词： block Toeplitz matrices Toeplitz block matrices generalized schur algorithm distributed memory architectures

来源：评论

学校读者我要写书评

暂无评论

PHR:: A parallel hierarchical radiosity system with dynamic load balancing

引用

JOURNAL OF SUPERCOMPUTING 2005年第3期31卷 249-263页

作者： Sinop, AK Abaci, T Akkus, Ü Gürsoy, A Güdükbay, U Bilkent Univ Dept Comp Engn TR-06800 Bilkent Ankara Turkey Koc Univ Dept Comp Engn TR-34450 Istanbul Turkey

In this paper, we present a parallel system called PHR for computing hierarchical radiosity solutions of complex scones. The system is targeted for multi-processor architectures with distributed memory. The system evaluates and subdivides the interactions level by level in a breadth first fashion, and the interactions are redistributed at the end of each level to keep load balanced. In order to allow interactions freely travel across processors, all the patch data is replicated on all the processors. Hence, the system favors load balancing at the expense of increased communication volume. However, the results show that the overhead of communication is negligible compared with total execution time. We obtained a speed-up of 25 for 32 processors in our test scenes.

关键词： hierarchical radiosity distributed memory architectures load balancing

来源：评论

学校读者我要写书评

暂无评论

Exploiting the symmetry in the parallelization of the Jacobi method

引用

PARALLEL COMPUTING 1997年第1-2期23卷 137-151页

作者： Daoudi, EM Lakhouaja, A Université Mohamed ler Faculté des Sciences Département de Mathématiques et d'Informatique Oujda Morocco

In this paper, we propose a new parallel algorithm which exploits the symmetry of the Jacobi method for computing the eigenvalues of a real and symmetric square matrix A on a distributed memory multiprocessor.

关键词： Jacobi method eigenvalues problem parallel algorithms distributed memory architectures

来源：评论

学校读者我要写书评

暂无评论

BROADCASTING IN WRAPAROUND MESHES WITH PARALLEL MONODIRECTIONAL LINKS

引用

PARALLEL COMPUTING 1992年第6期18卷 639-648页

作者： BERMOND, JC MICHALLON, P TRYSTRAM, D INST MATH APPL GRENOBLE LMC46 AVE FELIX VIALLETF-38031 GRENOBLEFRANCE CNRS 13S F-06560 VALBONNEFRANCE

In this paper we give an algorithm to broadcast a message in a wraparound mesh distributed-memory parallel architecture with parallel monodirectional links. This algorithm uses a general strategy based on the diffusion of the message in edge-disjoint spanning trees. We first present in this setting the results of Saad and Schultz and the improvements obtained by Simmen. We then give an asymptotically optimal broadcasting algorithm improving the preceding results. It uses in the wraparound mesh the constructions of two edge-disjoint spanning trees rooted at a given node and of minimum depth.

关键词： COMMUNICATION INTERCONNECTION NETWORKS distributed memory architectures WRAPAROUND MESH TOROIDAL GRID BROADCASTING SPANNING TREES

来源：评论

学校读者我要写书评

暂无评论

Reliable performance prediction for multigrid software on distributed memory systems

引用

ADVANCES IN ENGINEERING SOFTWARE 2011年第5期42卷 247-258页

作者： Romanazzi, Giuseppe Jimack, Peter K. Goodyer, Christopher E. Univ Coimbra Dept Matemat CMUC P-3001454 Coimbra Portugal Univ Leeds Sch Comp Leeds LS2 9JT W Yorkshire England

We propose a model for describing and predicting the parallel performance of a broad class of parallel numerical software on distributed memory architectures. The purpose of this model is to allow reliable predictions to be made for the performance of the software on large numbers of processors of a given parallel system, by only benchmarking the code on small numbers of processors. Having described the methods used, and emphasized the simplicity of their implementation, the approach is tested on a range of engineering software applications that are built upon the use of multigrid algorithms. Despite their simplicity, the models are demonstrated to provide both accurate and robust predictions across a range of different parallel architectures, partitioning strategies and multigrid codes. In particular, the effectiveness of the predictive methodology is shown for a practical engineering software implementation of an elastohydrodynamic lubrication solver. (C) 2010 Civil-Comp Ltd and Elsevier Ltd. All rights reserved.

关键词： Performance prediction Parallel engineering software Multigrid algorithms Partial differential equations distributed memory architectures Parallel distributed algorithms

来源：评论

学校读者我要写书评

暂无评论

Integrating load balancing and locality in the parallelization of irregular problems

引用

FUTURE GENERATION COMPUTER SYSTEMS 2001年第8期17卷 969-975页

作者： Baiardi, F Chiti, S Mori, P Ricci, L Univ Pisa Dipartimento Informat I-56125 Pisa Italy

An irregular problem models the evolution of a system where several elements are irregularly distributed in a domain. The evolution modifies this distribution in a way that cannot be foreseen and the behavior of each element depends upon the elements close to it according to a problem dependent relation. Starting from a hierarchical representation of the domain, we define a parallelization methodology that includes a load balancing strategy that preserves this locality property and a strategy to collect information distributed onto the processing nodes. (C) 2001 Elsevier Science B.V. All rights reserved.

关键词： irregular problems distributed memory architectures adaptive multigrid methods load balancing locality

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：