检索结果-内蒙古大学图书馆

Proceedings of the 1999 8th IEEE international symposium on High Performance Distributed Computing - HPDC-8

作者： Cruz, John Park, Kihong Purdue Univ West Lafayette United States

With the proliferation of workstation clusters connected by high-speed networks, providing efficient system support for concurrent applications engaging in nontrivial interaction has become an important problem. Two principal barriers to harnessing parallelism are: one, efficient mechanisms that achieve transparent dependency maintenance while preserving semantic correctness, and two, scheduling algorithms that match coupled processes to distributed resources while explicitly incorporating their communication costs. This paper describes a set of performance features, their properties, and implementation in a system support environment called DUNES that achieves transparent dependency maintenance - IPC, file access, memory access, process creation/termination, process relationships - under dynamic load balancing. The two principal performance features are push/pull-based active and passive end-point caching and communication-sensitive load balancing. Collectively, they mitigate the overhead introduced by the transparent dependency maintenance mechanisms. Communication-sensitive load balancing, in addition, affects the scheduling of distributed resources to application processes where both communication and computation costs are explicitly taken into account. DUNES' architecture endows commodity operating systems with distributed operating system functionality while achieving transparency with respect to their existing application base. DUNES also preserves semantic correctness with respect to single processor semantics. We show performance measurements of a UNIX based implementation on Sparc and x86 architectures over high-speed LAN environments. We show that significant performance gains in terms of system throughput and parallel application speed-up are achievable.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Experimental evaluation of QSM, a simple shared-memory model

Experimental evaluation of QSM, a simple shared-memory model

引用

international symposium on parallel Processing

作者： B. Grayson M. Dahlin V. Ramachandran University of Texas Austin USA

parallel programming models should attempt to satisfy two conflicting goals. On one hand, they should hide architectural details so that algorithm designers can write simple, portable programs. On the other hand, models must expose architectural details so that designers can evaluate and optimize the performance of their algorithms. In this paper we experimentally examine the trade-offs made by a simple shared-memory model, QSM, to address this dilemma. The results indicate that analysis under the QSM model yields quite accurate results for reasonable input sizes and that algorithms developed under QSM achieve performance close to that obtainable through more complex models, such as BSP and LogP.

关键词： Algorithm design and analysis programming profession Program processors Design optimization Optimizing compilers Electronic switching systems parallel languages parallel programming Contracts Engineering profession

来源：评论

学校读者我要写书评

暂无评论

Exploiting global structure for performance on clusters

Exploiting global structure for performance on clusters

引用

international symposium on parallel Processing

作者： S.R. Donaldson J.M.D. Hill D.B. Skillicom Oxford University Computing Laboratory Oxford UK Sychron Limited Oxford UK

Most parallel programming models for distributed-memory architectures are based on individual threads interacting via send and receive operations. We show that a more structured model, BSP, gains substantial performance improvements by exploiting the extra information implicit in its structure. In particular each thread learns something about global state whenever it receives a message. This information can be used to modify its own behavior to improve collective use of the communication system. The programming model's semantics also provides implicit knowledge that can be exploited to increase performance. We show that these effects are useful at the application level by comparing the performance of BSP and MPI implementations of the NAS parallel benchmarks.

关键词： Yarn Libraries programming profession Laboratories Buildings Information science Electrical capacitance tomography Concurrent computing Skeleton parallel processing

来源：评论

学校读者我要写书评

暂无评论

Scalable hardware-algorithms for Binary prefix sums 13th

引用

13th international parallel Processing symposium, IPPS 1999 Held in Conjunction with the 10th symposium on parallel and Distributed Processing, SPDP 1999

作者： Lin, R. Nakano, K. Olariu, S. Pinotti, M.C. Schwing, J.L. Zomaya, A.Y. Department of Computer Science SUNY Geneseo GeneseoNY14454 United States Department of Electrical and Computer Engineering Nagoya Institute of Technology Showa-ku Nagoya466-8555 Japan Department of Computer Science Old Dominion University NorfolkVA23529 United States I.E.I C.N.R Pisa Italy Department of Computer Science Central Washington University EllensburgWA98926 United States Parallel Computing Research Lab Dept of Electrical and Electronic Eng University of Western Australia Perth Australia

ISBN: (纸本)3540658319

The main contribution of this work is to propose a number of broadcast efficient VLSI architectures for computing the sum and the prefix sums of a wk-bit, k ≥ 2, binary sequence using, as basic building blocks, linear arrays of at most w2 shift switches. An immediate consequence of this feature is that in our designs broadcasts are limited to buses of length at most w2 making them eminently practical. Using our design, the sum of a wk-bit binary sequence can be obtained in the time of2k-2 broadcasts, using 2wk−2 + O(wk−3) blocks, while the corresponding prefix sums can be computed in 3k-4 broadcasts using (k + 2)Wk−2 + O(kwk−3) blocks. © Springer-Verlag Berlin Heidelberg 1999.

关键词： Binary sequences

来源：评论

学校读者我要写书评

暂无评论

Load adaptive algorithms and implementations for the 2D discrete wavelet transform on fine-grain multithreaded architectures

Proceedings of the International Parallel Processing Symposi...

引用

Proceedings of the international parallel Processing symposium, IPPS 1999年 458-462页

作者： Khokhar, Ashfaq A. Heber, Gerd Thulasiraman, Parimala Gao, Guang R. Univ of Delaware Newark United States

In this paper we present a load adaptive parallel algorithm and implementation to compute 2D Discrete Wavelet Transform (DWT) on multithreading machines. In a 2D DWT computation, the problem sizes reduces at every decomposition level and the lengths of the emerging computation paths also vary. The parallel algorithm proposed in this paper, dynamically scales itself to the varying problem size. Experimental results are reported based on the implementations of the proposed algorithm on a 20 node multithreading emulation platform, EARTH-MANNA. We show that multithreading implementations of the proposed algorithm are at least 2 times faster than the MPI based message passing implementations reported in the literature. We further show that the proposed algorithm and implementations scale linearly with respect to problem and machine sizes.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

PM-PVM: A portable multithreaded PVM

Proceedings of the International Parallel Processing Symposi...

引用

Proceedings of the international parallel Processing symposium, IPPS 1999年 191-195页

作者： Santos, C.M.P. Aude, J.S. Federal Univ of Rio de Janeiro Brazil

PM-PVM is a portable implementation of PVM designed to work on SMP architectures supporting multithreading. PM-PVM portability is achieved through the implementation of the PVM functionality on top of a reduced set of parallel programming primitives. Within PM-PVM, PVM tasks are mapped onto threads and the message passing functions are implemented using shared memory. Three implementation approaches of the PVM message passing functions have been adopted. In the first one, a single message copy in memory is shared by all destination tasks. The second one replicates the message for every destination task but requires less synchronization. Finally, the third approach uses a combination of features from the two previous ones. Experimental results comparing the performance of PM-PVM and PVM applications running on a 4-processor Sparcstation 20 under Solaris 2.5 show that PM-PVM can produce execution times up to 54% smaller than PVM.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

parallel biological sequence comparison using prefix computations

Parallel biological sequence comparison using prefix computa...

引用

international symposium on parallel Processing

作者： S. Aluru N. Futamura K. Mehrotra Department of Computer Science New Mexico State University Las Cruces NM USA School of EECS Syracuse University Syracuse NY USA

We present practical parallel algorithms using prefix computations for various problems that arise in pairwise comparison of biological sequences. We consider both constant and affine gap penalty functions, full-sequence and subsequence matching and space-saving algorithms. The best known sequential algorithms solve these problems in O(mn) time and O(m+n) space, where m and n are the lengths of the two sequences. All the algorithms presented in this paper are time optimal with respect to the best known sequential algorithms and can use O(n/log n) processors where n is the length of the larger sequence. While optimal parallel algorithms for many of these problems are known, we use a simple framework and demonstrate how these problems can be solved systematically using repeated parallel prefix operations. We also present a space-saving algorithm that uses O(m+n/p) space and runs in optimal time where p is the number of the processors used.

关键词： Biology computing Concurrent computing Sequences Computer science Stacking Engineering profession parallel algorithms Dynamic programming Runtime Costs

来源：评论

学校读者我要写书评

暂无评论

The OptSolve++ software components for nonlinear optimization and root-finding 1

引用

3rd international symposium on Computing in Object-Oriented parallel Environments, ISCOPE 1999

作者： Bruhwiler, David L. Shasharina, Svetlana G. Cary, John R. Tech-X Corporation 1280 28th Street Suite 2 BoulderCO80303 United States

ISBN: (数字)9783540466970

ISBN: (纸本)3540668187

OptSolve++ is a set of C++ class libraries for nonlinear optimization and root-finding. The primary components include TxOptSlv (optimizer and solver classes), TxFunc (functor classes used to wrap user- defined functions), TxLin (linear algebra), and a library of test functions. These cross-platform components use exception handling and encapsu- late diagnostic output for the calling application. Use of the "template" design pattern in TxOptSlv provides a convenient interface that allows for interchange of existing algorithms by users and straightforward addition of new algorithms by developers. All classes are templated, so that optimization over the various floating-point types uses the same source code, and so that optimization over integers or more esoteric types can be readily accommodated with an identical interface. TxOptSlv and TxFunc use the template-based "traits" mechanism, allowing them to work with any user-specified container class, and future versions of OptSolve++ will allow users to interchange TxLin with other linear algebra libraries. © Springer-Verlag Berlin Heidelberg 1999.

关键词： C++ (programming language)

来源：评论

学校读者我要写书评

暂无评论

A new memory-saving technique to map system of affine recurrence equations (SARE) onto distributed memory systems

A new memory-saving technique to map system of affine recurr...

引用

international symposium on parallel Processing

作者： A. Marongiu P. Palazzari Electronic Engineering Department University of Roma La Sapienza Rome Italy ENEA HPCN project CR Casaccia Rome Italy

In this work we present a procedure for automatic parallel code generation in the case of algorithms described through Set of Affine Recurrence Equations (SARE); starting from the original SARE description in an N-dimensional iteration space, the algorithm is converted into a parallel code for an m-dimensional distributed memory parallel machine (m

关键词： Difference equations Read only memory Cost function Computational modeling Electronic mail Linear algebra Terminology parallel architectures Sufficient conditions

来源：评论

学校读者我要写书评

暂无评论

PM-PVM: A portable multithreaded PVM

PM-PVM: A portable multithreaded PVM

引用

international symposium on parallel Processing

作者： C.M.P. Santos J.S. Ande NCE and COPPE Federal University of Rio de Janeiro Brazil

关键词： Yarn Message passing parallel programming Application software parallel processing Computer networks Operating systems Data structures Signal generators User interfaces

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：