检索结果-内蒙古大学图书馆

A multi-level frontal algorithm for finite element analysis and its implementation on parallel computation

ENGINEERING COMPUTATIONS 1999年第4期16卷 405-427页

作者： Wang, XC Baggio, P Schrefler, BA Dalian Univ Technol Res Inst Engn Mech Dalian Peoples R China Univ Trent Dipartimento Ingn Civile & Ambientale Trent Italy Univ Padua Dipartimento Costruz & Trasporti Padua Italy

This paper presents a multi-level frontal algorithm and its implementation and applications on parallel computation A multi-frontal program is given which may be used for unsymmetric finite element matrix equations. The parallel program is developed on a cluster of workstations. The PVM (parallel virtual machine) system is used to handle communications among networked workstations. The method has advantages such as numbering of the finite element mesh in an arbitrary manner, simple programming organisation, smaller core requirements and computation times. An implementation of this parallel method on workstations is discussed, the speedup and efficiency of this method being demonstrated and compared with general domain decomposition method based on band matrix methods by numerical examples.

关键词： algorithms finite element method heat transfer parallel computing parallel programming

来源：评论

学校读者我要写书评

暂无评论

CoCa: a parallelization model for high-energy physics

引用

IEEE CONCURRENCY 1999年第2期7卷 38-46页

作者： van der Stok, P Argante, E Willers, I Eindhoven Univ Technol Dept Comp Sci NL-5600 MB Eindhoven Netherlands CERN European Lab Particle Phys Div EP CMC CH-1211 Geneva 23 Switzerland

Software parallelization is required to contend with the increasing scale and complexity of High-Energy Physics experiments. The authors have developed a programming model, Communication Capability (CoCa), which allow... 详细信息

关键词： Delay Throughput parallel programming Hardware Transaction databases Detectors Collision mitigation Concurrent computing Physics computing Mesons

来源：评论

学校读者我要写书评

暂无评论

Hierarchical Bulk Synchronous parallel Model and Performance Optimization

引用

Journal of Computer Science & Technology 1999年第3期14卷 224-233页

作者：黄林鹏孙永强袁伟 DepartmentofComputerScienceandEngineering ShanghaiJiaoTongUniversityShanghai200030PR.China

Based on the framework of BSP, a Hierarchical Bulk Synchronous parallel (HBSP) performance model is introduced in this paper to capture the per formance optimization problem for various stages in parallel program development and to accurately predict the performance of a parallel program by considering fac tors causing variance at local computation and global communication. The related methodology has been applied to several real applications and the results show that HBSP is a suitable model for optimizing parallel programs.

关键词： parallel programming bulk synchronous parallel model,perfor mance optimization

来源：评论

学校读者我要写书评

暂无评论

Empirical performance modeling for parallel weather prediction codes

引用

parallel COMPUTING 1999年第13-14期25卷 2135-2148页

作者： Mierendorff, H Joppich, W German Natl Res Ctr Informat Techol GMD Inst Algorithms & Sci Comp SCAI D-53754 St Augustin Germany

Performance modeling for large industrial or scientific codes is of value for program tuning or for selection of new machines when benchmarking is not yet possible, We discuss an empirical method of estimating runtime for certain large parallel programs where computational work is estimated by regression functions based on measurements and time cost of communication is modeled by program analysis and benchmarks for communication primitives. The method is demonstrated with the local weather model (LM) of the German Weather Service (DWD) on SP-2, T3E, and SX-4. The method is an economic way of developing performance models because only a moderate number of measurements is required. The resulting model is sufficiently accurate even for very large test cases. (C) 1999 Elsevier Science B.V. All rights reserved.

关键词： parallel programming performance modeling weather prediction code

来源：评论

学校读者我要写书评

暂无评论

Communication performance optimisation requires minimising variance

引用

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 1999年第3期15卷 453-459页

作者： Donaldson, SR Hill, JMD Skillicorn, DB Univ Oxford Comp Lab Programming Res Grp Oxford OX1 3QD England Queens Univ Dept Comp & Informat Sci Kingston ON K7L 3N6 Canada

The cost of communication in message-passing systems can only be computed based on a large number of low-level details. Consequently, the only architectural measure they naturally suggest is a frrst-order one, latency. We show that a second-order property, the standard deviation of the delivery times is also of interest. Most importantly, the average performance of a large communication system depends not only on the average performance of its components, but also on the standard deviation of these performances. In other words, building a high-performance system requires components that are themselves performing high-performance, but their performance must also have small variance. We illustrate this effect using distributions of the BSP g parameter. Lower bounds in the time per unit transfer of communication in large systems can be derived from data measured over single links. (C) 1999 Elsevier Science B.V. All rights reserved.

关键词： parallel programming high-performance computing communication performance machine architecture BSP

来源：评论

学校读者我要写书评

暂无评论

Automatic model generation for performance estimation of parallel programs

引用

parallel COMPUTING 1999年第6期25卷 667-680页

作者： Mierendorff, H Schwamborn, H GMD German Natl Res Ctr Informat Technol SCAI D-53754 St Augustin Germany

A hybrid method for performance modeling of parallel programs is considered where the runtime of large sequential segments is estimated statically and the parallel program structure is evaluated by simulation. The present paper describes a way to generate a model of a given program automatically from the source code where the user has to provide only values for a small number of variables, This model contains the control structure of the original program and timing information for generalized basic blocks. We consider Fortran programs which are parallelized using the message passing paradigm. A prototype of a tool for automatic model generation has been developed which is able to treat examples of moderate size. (C) 1999 Elsevier Science B.V. All rights reserved.

关键词： automatic performance modeling parallel programming

来源：评论

学校读者我要写书评

暂无评论

Profiling techniques for communication in fine-grained parallel languages

引用

SOFTWARE-PRACTICE & EXPERIENCE 1999年第6期29卷 519-550页

作者： Scheiman, CJ Haake, B Ibel, M Schauser, KE Univ Calif Santa Barbara Dept Comp Sci Santa Barbara CA 93106 USA Calif Polytech State Univ San Luis Obispo Dept Comp Sci San Luis Obispo CA 93407 USA

Fine tuning the performance of large parallel programs is a very difficult task. A profiling tool can provide detailed insight into the utilization and communication of the different processors, which helps identify performance bottlenecks, In this paper we present two profiling techniques for the fine-grained parallel programming language Split-C, which provides a simple global address space memory model. One profiler provides a detailed analysis of a program's execution. The other profiler collects cumulative information. As our experience shows, it is quite challenging to profile programs that make use of efficient, low-overhead communication. We incorporated techniques which minimize profiling effects on the running program, and quantified the profiling overhead. We present several Split-C applications showing that the profiler is useful in determining performance bottlenecks. Copyright (C) 1999 John Whey & Sons, Ltd.

关键词： parallel programming performance analysis profiling fine-grained communication Split-C Active Messages

来源：评论

学校读者我要写书评

暂无评论

Class Act in parallel programming

引用

IEEE Software 1997年第6期14卷 107-107页

作者： Schaller, Nan C. Rochester Institute of Technology United States

来源：评论

学校读者我要写书评

暂无评论

Compiling High Performance Fortran for distributed-memory architectures

引用

parallel COMPUTING 1999年第13-14期25卷 1785-1825页

作者： Benkner, S Zima, H NEC Europe Ltd C&C Res Labs D-53757 St Augustin Germany Univ Vienna Inst Software Technol & Parallel Syst A-1090 Vienna Austria

High Performance Fortran (HPF) is a data-parallel language that provides a high-level interface for programming scientific applications, while delegating to the compiler the task of generating explicitly parallel message-passing programs. This paper provides an overview of HPF compilation and runtime technology for distributed-memory architectures, and deals with a number of topics in some detail. In particular, we discuss distribution and alignment processing, the basic compilation scheme and methods for the optimization of regular computations. A separate section is devoted to the transformation and optimization of independent loops with irregular data accesses. The paper concludes with a discussion of research issues and outlines potential future development paths of the language. (C) 1999 Elsevier Science B.V. All rights reserved.

关键词： High Performance Fortran (HPF) parallel programming parallelization code generation irregular problems distributed-memory architectures

来源：评论

学校读者我要写书评

暂无评论

parallel implementation of simulated annealing using transaction processing

引用

IEE PROCEEDINGS-COMPUTERS AND DIGITAL TECHNIQUES 1999年第2期146卷 107-113页

作者： Pao, DCW Lam, SP Fong, AS City Univ Hong Kong Dept Elect Engn Tat Chee Ave Kowloon Peoples R China

Simulated annealing is an effective method for solving large combinatorial optimisation problems. Because of its iterative nature the annealing process requires a substantial amount of computation time. A new parallel implementation based on the concurrency control theory of database systems is presented;the parallelised annealing process is serialisable. Concurrent updates to the base solution are allowed provided that they do not have data conflict. Using the travelling salesman problem as the example application, the parallel simulated annealing algorithm is implemented on a Motorola Delta 3000 shared-memory multiprocessor system with eight processors. With a moderate problem size of 400 cities, a speedup efficiency of over 90% is achieved at high annealing temperature and close to 100% at a low annealing temperature.

关键词： concurrency control simulated annealing parallel implementation parallel programming combinatorial optimisation parallel algorithms transaction processing parallelised Optimisation techniques shared-memory

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：