检索结果-内蒙古大学图书馆

5th International Workshop on Applied parallel Computing, PARA 2000

作者： Brunst, Holger Nagel, Wolfgang E. Seidl, Stephan Center for High Performance Computing Dresden University of Technology Dresden01062 Germany

ISBN: (纸本)354041729X

Performance tuning of parallel programs, considering the current status and future developments in parallel programming paradigms and parallel system architectures, remains an important topic even if the single CPU performance is doubling every 18 months. Based on a brief summary of state of the art parallel programming techniques, new performance tuning aspects will be identified. The main part of the paper concentrates on how to deal with these aspects by means of new performance analysis and tuning concepts. First tool developments are presented where part of these concepts are already implemented. Finally, an existing scientific parallel application will be presented with respect to its performance tuning stages which were carried out at our center. © Springer-Verlag Berlin Heidelberg 2001.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Nepal – Nested data parallelism in haskell 7

引用

7th European Conference on parallel Computing, Euro-Par 2001

作者： Chakravarty, Manuel M. T. Keller, Gabriele Lechtchinsky, Roman Pfannenstiel, Wolf University of New South Wales Australia University of Technology Sydney Australia Technische Universität Berlin Germany IT Service Omikron GmbH Berlin Germany

ISBN: (纸本)3540424954

This paper discusses an extension of Haskell by support for nested data-parallel programming in the style of the special-purpose language Nesl. The extension consists of a parallel array type, array comprehensions, and primitive parallel array operations. This extension brings a hitherto unsupported style of parallel programming to Haskell. Moreover, nested data parallelism should receive wider attention when available in a standardised language like Haskell. © Springer-Verlag Berlin Heidelberg 2001.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

A low-cost approach towards mixed task and data parallel scheduling

A low-cost approach towards mixed task and data parallel sch...

引用

International Conference on parallel Processing, ICPP 2001

作者： Rədulescu, A. Van Gemund, A.J.C. Faculty of Information Technology and Systems Delft University of Technology Netherlands

ISBN: (纸本)0769512577

A relatively new trend in parallel programming scheduling is the so-called mixed task and data scheduling. It has been shown that mixing task and data parallelism to solve large computational applications often yields better speedups compared to either applying more task parallelism or pure data parallelism. In this paper we present a new compile-time heuristic, named critical path and allocation (CPA), for scheduling data-parallel task graphs. Designed to have a very low cost, its complexity is much lower compared to existing approaches, such as TSAS, TwoL or CPR, by one order of magnitude or even more. Experimental results based on graphs derived from real problems as well as synthetic graphs, show that the performance loss of CPA relative to the above algorithms does not exceed 50%. These results are also confirmed by performance measurements of two real applications (i.e., complex matrix multiplication and Strassen matrix multiplication) running on a cluster of workstations. © 2001 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Mechanisms of parallel computing organization for neurocluster 6

引用

6th International Conference on parallel Computing Technologies, PaCT 2001

作者： Babenko, L.K. Chefranov, A.G. Fedorov, P.A. Korobko, A. Yu Makarevich, O.B. Taganrog State University of Radio Engineering Taganrog347928 Russia

ISBN: (纸本)9783540425229

Neurocluster based on NM6403 neuroprocessors architecture, system software and programming technology are discussed. Special attention was paid to operating system structure, data and control flow between subsystems, internal data structures, system topology, programming language and general parallel programming ideas. © Springer-Verlag Berlin Heidelberg 2001.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallel application experience with replicated method invocation

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2001年第8-9期13卷 681-712页

作者： Maassen, J Kielmann, T Bal, HE Vrije Univ Amsterdam Div Math & Comp Sci Amsterdam Netherlands

We describe and evaluate a new approach to object replication in Java, aimed at improving the performance of parallel programs. Our programming model allows the programmer to define groups of objects that can be replicated and updated as a whole, using reliable, totally-ordered broadcast to send update methods to all machines containing a copy, The model has been implemented in the Manta high-performance Java system. We evaluate system performance both with microbenchmarks and with a set of five parallel applications. For the applications, we also evaluate ease of programming, compared to RMI implementations. We present performance results for a Myrinet-based workstation cluster as well as for a wide-area distributed system consisting of four such clusters. The microbenchmarks show that updating a replicated object on 64 machines only takes about three times the RMI latency in Manta, Applications using Manta's object replication mechanism perform at least as fast as manually optimized versions based on RMI, while keeping the application code as simple as with naive versions that use shared objects without taking locality into account. Using a replication mechanism in Manta's runtime system enables several unmodified applications to run efficiently even on the wide-area system. Copyright (C) 2001 John Wiley & Sons, Ltd.

关键词： Java object replication RMI parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallel programming and performance evaluation with the URSA tool family

引用

INTERNATIONAL JOURNAL OF parallel programming 1998年第5期26卷 541-561页

作者： Park, I Voss, M Armstrong, B Eigenmann, R Purdue Univ Sch Elect & Comp Engn W Lafayette IN 47907 USA

This paper contributes to the solution of several open problems with parallel programming tools and their integration with performance evaluation environments. First, we propose interactive compilation scenarios instead of the usual black-box-oriented use of compiler tools. In such scenarios, information gathered by the compiler and the compiler's reasoning are presented to the user in meaningful ways and on-demand. Second, a tight integration of compilation and performance analysis tools is advocated. Many of the existing, advanced instruments for gathering performance results are being used in the presented environment and their results are combined in integrated views with compiler information and data from other tools. Initial instruments that assist users in "data mining" this information are presented and the need for much stronger facilities is explained. The URSA Family provides two tools addressing these issues. URSA MINOR supports a group of users at a specific site, such as a research or development project. URSA MAJOR complements this tool by making available the gathered results to the user community at large via the World-wide Web. This paper presents objectives, functionality, experience, and next development steps of the URSA tool family. Two case studies are presented that illustrate the use of the tools for developing and studying parallel applications and for evaluating parallelizing compilers.

关键词： parallel programming programming environment performance evaluation program optimization web-based tool

来源：评论

学校读者我要写书评

暂无评论

DISTRIBUTED SIMULATION OF HIGH-LEVEL ALGEBRAIC PETRI NETS WITH LIMITED CAPACITY PLACES

引用

parallel Algorithms and Applications 2001年第3期16卷 207-241页

作者： K. Djemame[a] [a] School of Computer Studies University of Leeds Leeds UK

The aim of this paper is to search for techniques to accelerate simulations exploiting the parallelism available in current multicomputers, and to use these techniques to study a class of Petri nets called high-level algebraic nets. These nets exploit the rich theory of algebraic specifications for high-level Petri nets. They also gain a great deal of modelling power by representing dynamically changing items asstructured tokenswhereas algebraic specifications turned out to be an adequate and flexible instrument for handlingstructured items. We focus on ECATNets (Extended Concurrent Algebraic Term Nets), a kind of high-level algebraic Petri nets with limited capacity placesThree distributed simulation techniques have been considered: asynchronous conservative, asynchronous optimistic and synchronous. These algorithms have been implemented in a network of workstations with MPI (Message Passing Interface). The influence that factors such as the characteristics of the simulated models, the organisation of the simulators and the characteristics of the target multicomputer have in the performance of the simulations have been measured and characterizedIt is concluded that distributed simulation of ECATNets on a multicomputer system can in fact gain speedup over the sequential simulation, and this can be achieved even for small scale simulation models.

关键词： Distributed simulation High-level Petri nets Rewriting logic parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallel computation of the time-frequency power spectrum: analysis and comparison to the bispectrum

Parallel computation of the time-frequency power spectrum: a...

引用

International Conferences on Info-tech and Info-net (ICII)

作者： K.N. Le G.K. Egan K.P. Dabke Department of Electrical and Computer Systems Engineering Monash University Melbourne Australia

Experiments of large data sets are computationally expensive. Signal processing analysis on a single CPU leads to unacceptably long execution times. The paper presents initial experiments on calculating the time-frequency power spectrum using the coarse-grained parallel programming technique. Experimental speedup factors are given and discussed. The measured speedup factor of the time-frequency power spectrum parallel calculation process is sublinear which indicates that the time-frequency power spectrum is a suitable application for parallel programming. The parallel efficiency is acceptable with the lowest value of 75.1% occurring at N = 10. The maximum speedup factor of 9.1 is obtained when N = 12 at 75.3% of efficiency.

关键词： Concurrent computing Time frequency analysis Signal processing Power engineering computing Signal analysis Fourier transforms Autocorrelation parallel programming Signal detection Kernel

来源：评论

学校读者我要写书评

暂无评论

A distributed implementation of Structured Gamma

A distributed implementation of Structured Gamma

引用

International Conference on parallel and Distributed Systems (ICPADS)

作者： G.A.L. Paillard F.M.G. Franca J.A.M. Filho Universidade Tiradentes Brazil Progranin de Engenharia de Sistemas e Coniputacão Universidade Federal do Rio de Janeiro Brazil Instituto Politécnico Universidade Estadual do Rio de Janeiro Brazil

ISBN: (纸本)0769511538

Presents a distributed implementation of the Structured Gamma programming language, a language based on the Gamma multi-set rewriting paradigm. Structured Gamma offers, in addition to the advantages introduced by Gamma, implicit concurrent behavior and a type system where not only types themselves are defined but also the automatic verification of user-defined types at compilation time. The problems and mechanisms involved in an MPI-based implementation of Structured Gamma using a type-checking engine based on the most general unifier (MGU) are investigated.

关键词： Computer languages parallel programming Engines Robustness Data models Functional programming Chemical elements Arithmetic

来源：评论

学校读者我要写书评

暂无评论

Limits on speculative module-level parallelism in imperative and object-oriented programs on CMP platforms

Limits on speculative module-level parallelism in imperative...

引用

International Conference on parallel Architecture and Compilation Techniques (PACT)

作者： F. Warg P. Stenstrom Department of Computer Engineering Chalmers University of Technology Goteborg Sweden

We consider program modules, e.g. procedures, functions, and methods as the basic method to exploit speculative parallelism in existing codes. We analyze how much inherent and exploitable parallelism exists in a set of C and Java programs on a set of chip-multiprocessor architecture models, and identify what inherent program features, as well as architectural deficiencies, that limit the speedup. Our data complement previous limit studies by indicating that the programming style-object-oriented versus imperative-does not seem to have any noticeable impact on the achievable speedup. Further, we show that as few as eight processors are enough to exploit all of the inherent parallelism. However, memory-level data dependence resolution and thread management mechanisms of recent CMP proposals may impose overheads that severely limit the speedup obtained.

关键词： Programmable logic arrays Yarn parallel processing Java Object oriented modeling Object oriented programming Concurrent computing Proposals parallel programming Predictive models

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：