检索结果-内蒙古大学图书馆

A portable parallel implementation of a boundary element elastostatic code for shared and distributed memory systems

引用

ADVANCES IN ENGINEERING SOFTWARE 2004年第7期35卷 453-460页

作者： Cunha, MTF Telles, JCF Coutinho, ALGA Univ Fed Rio de Janeiro COPPE PEC BR-21945970 Rio De Janeiro RJ Brazil

This paper presents the parallel implementation of a boundary element code for the solution of 2D elastostatic problems using linear elements. The original code is described in detail in a reference text in the area [Boundary elements techniques: theory and applications in engineering, 1984]. The Fortran code is reviewed and rewritten to run on shared and distributed memory systems using standard and portable libraries: OpenMP, LAPACK and ScaLAPACK. The implementation process provides guidelines to develop parallel applications of the Boundary Element Method, applicable to many science and engineering problems. Numerical experiments on a SGI Origin 2000 shows the effectiveness of the proposed approach. (C) 2004 Elsevier Ltd. All rights reserved.

关键词： boundary elements parallel programming OpenMP LAPACK ScaLAPACK

来源：评论

学校读者我要写书评

暂无评论

MPI-implementation of PFI-code for numerical modeling of the anatomy of breast cancer

引用

INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS 2004年第8期81卷 991-999页

作者： Koontz, JK Dey, SC Chatterjee, S Dey, SK Eastern Illinois Univ Dept Math & Comp Sci Charleston IL 61920 USA Eastern Illinois Univ UNIX Syst Charleston IL 61920 USA Univ Illinois Urbana Champaign IL USA

Large-scale parallelized distributed computing has been implemented in the message passing interface (MPI) environment to solve numerically, eight reaction-diffusion equations representing the anatomy and treatment of breast cancer. The numerical algorithm is perturbed functional iterations (PFI) which is completely matrix-free. Fully distributed computations with multiple processors have been implemented on a large scale in the serial PFI-code in the MPI environment. The technique of implementation is general and can be applied to any serial code. This has been validated by comparing the computed results from the serial code and those from the MPI-version of the parallel code.

关键词： parallel algorithm parallel programming message passing interface

来源：评论

学校读者我要写书评

暂无评论

Extending a traditional debugger to debug massively parallel applications

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2004年第5期64卷 617-628页

作者： Balle, SM Brett, BR Chen, CP LaFrance-Linden, D Hewlett Packard Corp Nashua NH 03062 USA

Beowulf systems, and other proprietary approaches, are placing systems with four or more CPUs in the hands of many researchers and commercial users. In the near future, systems with hundreds of CPUs will become commonly available, with some programmers dealing with tens of thousands of CPUs. The debugging methods used on these systems are a combination of the traditional methods used for debugging single processes and ad-hoc methods to help the user cope with the multitudes of processes. Programmers are usually familiar with a single-process debugger and would like to use it (with minimal user-visible extensions) to debug their distributed program. We present a set of modifications to a traditional debugger that makes it capable of debugging applications running on thousands of processes. Our parallel debugger is composed of individual fully functional debuggers connected with an n-nary aggregating network. This permits us to present to users the results from each debugger at the same time in an aggregated fashion. Users get a global view of the application and can easily see if a given parameter has a different value from either what they expect it to be or from the other processes. Users can then focus on the process sets of interest and investigate the problem. One challenge when debugging thousands of processes is to deal with the amount of output coining from all the debuggers. We present methods to aggregate the overwhelming amount of output from the debuggers into a more manageable subset, which is presented to the user without losing information. Experiments show that the debugger is scalable to thousands of processors. The startup mechanism, as well as users' command response time scale well. The conclusions presented regarding the architecture and the new parallel debugger's scalability are not specific to the serial debugger we are using in our example implementation. (C) 2004 Elsevier Inc. All rights reserved.

关键词： parallel debugging parallel debugger distributed breakpoints parallel programming massively parallel debugging Ladebug (TM)

来源：评论

学校读者我要写书评

暂无评论

Load-balancing scatter operations for grid computing

引用

parallel COMPUTING 2004年第8期30卷 923-946页

作者： Genaud, S Giersch, A Vivien, F CNRS UMR ULP 7005 ICPSLSIIT F-67412 Illkirch Graffenstaden France Ecole Normale Super Lyon UCBL 5668 INRIA ENS LyonCNRSUMR LIP F-69364 Lyon 07 France

We present solutions to statically load-balance scatter operations in parallel codes run on grids. Our load-balancing strategy is based on the modification of the data distributions used in scatter operations. We study the replacement of scatter operations with parameterized scatters, allowing custom distributions of data. The paper presents: (1) a general algorithm which finds an optimal distribution of data across processors;(2) a quicker guaranteed heuristic relying on hypotheses on communications and computations;(3) a policy on the ordering of the processors. Experimental results with an MPI scientific code illustrate the benefits obtained from our load-balancing. (C) 2004 Elsevier B.V. All rights reserved.

关键词： parallel programming grid computing heterogeneous computing load-balancing scatter operation

来源：评论

学校读者我要写书评

暂无评论

Technology for testing nondeterministic client/server database applications (vol 30, pg 69, 2004)

引用

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2004年第4期30卷 278-278页

作者： Hwang, GH Chang, SJ Chu, HD Natl Taiwan Normal Univ Dept Informat & Comp Educ Taipei 106 Taiwan Chunghwa Telecom Co LTd Telecommun Labs Taoyuan 326 Taiwan Takming Coll Dept Management Informat Syst Taipei 114 Taiwan

The execution of a client/server application involving database access requires a sequence of database transaction events (or, T-events), called a transaction sequence (or, T-sequence). A client/server database application may have nondeterministic behavior in that multiple executions thereof with the same input may produce different T-sequences. We present a framework for testing all possible T-sequences of a client/server database application. We first show how to define a T-sequence in order to provide sufficient information to detect race conditions between T-events. Second, we design algorithms to change the outcomes of race conditions in order to derive race variants, which are prefixes of other T-sequences. Third, we develop a prefix-based replay technique for race variants derived from T-sequences. We prove that our framework can derive all the possible T-sequences in cases where every execution of the application terminates. A formal proof and an analysis of the proposed framework are given. We describe a prototype implementation of the framework and present experimental results obtained from it.

关键词： client-server systems distributed databases formal verification program testing reachability analysis parallel programming client-server database database transaction event T-events transaction sequence T-sequence reachability testing database management system concurrent programming Database applications Frameworks reachability analysis database management systems Legal Executions Formal verification data base access client-server systems hazards and race conditions multiple execution parallel programming program testing distributed databases

来源：评论

学校读者我要写书评

暂无评论

Cluster computing in the classroom and integration with computing curricula 2001

引用

IEEE TRANSACTIONS ON EDUCATION 2004年第2期47卷 188-195页

作者： Apon, A Mache, J Buyya, R Jin, H Univ Arkansas Fayetteville AR 72701 USA Lewis & Clark Coll Portland OR 97219 USA Univ Melbourne Melbourne Vic 3010 Australia Huazhong Univ Sci & Technol Wuhan 430074 Peoples R China

With the progress of research on cluster computing, many universities have begun to offer various courses covering cluster computing. A wide variety of content can be taught in these courses. Because of this variation, a difficulty that arises is the selection of appropriate course material. The selection is complicated because some content in cluster computing may also be covered by other courses in the undergraduate curriculum, and the background of students enrolled in cluster computing courses varies. These aspects of cluster computing make the development of good course material difficult. Combining experiences in teaching cluster computing at universities in the United States and Australia, this piper presents prospective topics in cluster computing and A wide variety of information sources from which instructors can choose. The course material is described in relation to the knowledge units of the Joint IEEE Computer Society and the Association for Computing Machinery (ACM) Computing Curricula 2001 and, includes system architecture, parallel programming, algorithms, and applications. Instructors can select units in each of the topical areas and develop their own syllabi to meet course objectives. The authors share their experiences in teaching cluster computing and the topics chosen, depending on course objectives.

关键词： cluster computing computer science education Computing Curricula 2001 parallel algorithms parallel programming system architecture

来源：评论

学校读者我要写书评

暂无评论

On the parallelization of boundary element codes using standard and portable libraries

引用

ENGINEERING ANALYSIS WITH BOUNDARY ELEMENTS 2004年第7期28卷 893-902页

作者： Cunha, MTF Telles, JCF Coutinho, ALGA Panetta, J Univ Fed Rio de Janeiro COPPE PEC BR-21945970 Rio De Janeiro Brazil INPE CPTEC BR-12630000 Cachoeira Paulista SP Brazil

The present paper introduces the main steps towards the parallelization of existing boundary element codes, using standard and portable libraries for writing shared memory parallel programs, OpenMP and LAPACK. parallel programming techniques can have a great impact on application performance and OpenMP facilitates these improvements. Since, such procedures are not widespread among BEM practitioners, the authors introduce these techniques into an well-known BEM program, described in detail by Brebbia and Dominguez [Boundary Elements: An Introductory Course. CMP, Southampton, 1992]. The code is herein reviewed and rewritten to achieve high performance on shared memory systems. The step-by-step implementation process provides guidelines to develop efficient parallel BEM codes, applicable to many science and engineering problems. Numerical experiments on a SGI Origin 2000 and a NEC SX-6 show the effectiveness of the proposed approach. (C) 2004 Elsevier Ltd. All rights reserved.

关键词： boundary elements parallel programming shared memory OpenMP LAPACK

来源：评论

学校读者我要写书评

暂无评论

Performance and modularity benefits of message-driven execution

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2004年第4期64卷 461-480页

作者： Gürsoy, A Kale, LV Koc Univ Dept Comp Engn TR-34450 Istanbul Turkey Univ Illinois Dept Comp Sci Urbana IL 61801 USA

Processor idling due to communication delays and load imbalances are among the major factors that affect the performance of parallel programs. Need to optimize performance often forces programmers to sacrifice modularity. This paper focuses on the performance benefits of message-driven execution, particularly for large parallel programs composed of multiple libraries and modules. We examine message-driven execution in the context of a parallel object-based language, but the analysis applies to other models such as multithreading as well. We argue that modularity and efficiency, in the form of overlapping communication latencies and processor idle times, can be achieved much more easily in message-driven execution than in message-passing SPMD style. Message-driven libraries are easier to compose into larger programs and they do not require one to sacrifice performance in order to break a program into multiple modules. One can overlap the idle times across multiple independent modules. We demonstrate performance and modularity benefits of message-driven execution with simulation studies. We show why it is not adequate to emulate message-driven execution with the message-passing SPMD style. During these studies, it became clear that the usual criteria of minimizing the completion time and reducing the critical path that are used in SPMD programs are not exactly suitable for message-driven programs. (C) 2004 Elsevier Inc. All rights reserved.

关键词： parallel programming message-driven message-passing modularity latency tolerance multithreading SPMD

来源：评论

学校读者我要写书评

暂无评论

Scalable FETI with optimal dual penalty for a variational inequality

引用

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS 2004年第5-6期11卷 455-472页

作者： Dostál, Z Horák, D Tech Univ Ostrava Dept Math Appl FEI VSB CZ-70833 Ostrava Czech Republic

The FETI method with the natural coarse grid is combined with the penalty method to develop an efficient solver for elliptic variational inequalities. A proof is given that a prescribed bound on the norm of feasibility of solution may be achieved with a value of the penalty parameter that does not depend on the discretization parameter and that an approximate solution with the prescribed bound on violation of the Karush-Kuhn-Tucker conditions may be found in a number of steps that does not depend on the discretization parameter. Results of numerical experiments with parallel solution of a model problem discretized by up to more than eight million of nodal variables are in agreement with the theory and demonstrate numerically both optimality of the penalty and scalability of the algorithm presented. Copyright (C) 2004 John Wiley Sons, Ltd.

关键词： penalty domain decomposition variational inequality scalable algorithms parallel programming

来源：评论

学校读者我要写书评

暂无评论

High performance RDMA-based MPI implementation over InfiniBand

引用

INTERNATIONAL JOURNAL OF parallel programming 2004年第3期32卷 167-198页

作者： Liu, JX Wu, JS Panda, DK Ohio State Univ Columbus OH 43210 USA

Although InfiniBand Architecture is relatively new in the high performance computing area, it offers many features which help us to improve the performance of communication subsystems. One of these features is Remote Direct Memory Access (RDMA) operations. In this paper, we propose a new design of MPI over InfiniBand which brings the benefit of RDMA to not only large messages, but also small and control messages. We also achieve better scalability by exploiting application communication pattern and combining send/receive operations with RDMA operations. Our RDMA-based MPI implementation achieves a latency of 6.8 musec for small messages and a peak bandwidth of 871 million bytes/sec. Performance evaluation shows that for small messages, our RDMA-based design can reduce the latency by 24%, increase the bandwidth by over 104%, and reduce the host overhead by up to 22% compared with the original design. For large data transfers, we improve performance by reducing the time for transferring control messages. We have also shown that our new design is beneficial to MPI collective communication and NAS parallel Benchmarks.

关键词： parallel programming MPI InfiniBand RDMA clusters parallel computing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：