检索结果-内蒙古大学图书馆

Proceedings of the international parallel processing symposium, IPPS 1999年 597-602页

作者： Chakrabarti, Dhruva R. Banerjee, Prithviraj Northwestern Univ Evanston United States

this paper explains how efficient support for semi-regular distributions can be incorporated in a uniform compilation framework for hybrid applications. the key focus of this work is in showing how, unlike other existing schemes, our scheme is able to minimize preprocessing overheads and maintain sophisticated communication optimizations (such as reduction of inter-processor communication during schedule generation and sharing of communicated information between regular and irregular accesses) even in the presence of semi-regular distributions. It is only natural that preprocessing overheads associated with semi-regular distributions be intermediate between those involved for regular and irregular distributions. this paper shows how various properties can be inferred for semi-regular distributions. these allow the use of the interval representation which in turn reduces the preprocessing overhead and makes possible compatible code generation for hybrid references. Experimental results on a 16-processor IBM SP-2 for a number of sparse applications using semi-regular distributions show that our scheme is feasible.

关键词： Program compilers

来源：评论

学校读者我要写书评

暂无评论

Portable parallel programming for the dynamic load balancing of unstructured grid applications

Proceedings of the International Parallel Processing Symposi...

引用

Proceedings of the international parallel processing symposium, IPPS 1999年 338-342页

作者： Biswas, Rupak Das, Sajal K. Harvey, Daniel Oliker, Leonid NASA Ames Research Cent Moffett Field United States

the ability to dynamically adapt an unstructured grid (or mesh) is a powerful tool for solving computational problems with evolving physical features;however, an efficient parallel implementation is rather difficult, particularly from the viewpoint of portability on various multiprocessor platforms. We address this problem by developing PLUM, an automatic and architecture-independent framework for adaptive numerical computations in a message-passing environment. Portability is demonstrated by comparing performance on an SP2, an Origin2000, and a T3E, without any code modifications. We also present a general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication pattern, with a goal to providing a global view of system loads across processors. Experiments on an SP2 and an Origin2000 demonstrate the portability of our approach which achieves superb load balance at the cost of minimal extra overhead.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

distributed, scalable, dependable real-time systems: Middleware services and applications

Proceedings of the International Parallel Processing Symposi...

引用

Proceedings of the international parallel processing symposium, IPPS 1999年 297-301页

作者： Welch, Lonnie R. Ravindran, Binoy Werme, Paul V. Masters, Michael W. Shirazi, Behrooz A. Shirolkar, Prashant A. Harrison, Robert D. Mills, Wayne Do, Tuy Lafratta, Judy Anwar, Shafqat M. Sharp, Steve Sergeant, Terry Bilowus, George Swick, Mark Univ of Texas at Arlington Arlington United States

Some classes of real-time systems function in environments which cannot be modeled with static approaches. In such environments, the arrival rates of events which drive transient computations may be unknown. Also, the periodic computations may be required to process varying numbers of data elements per period, but the number of data elements to be processed in an arbitrary period cannot be known at the time of system engineering, nor can an upper bound be determined for the number of data items;thus, a worst case execution time cannot be obtained for such periodics. this paper presents middleware services that support such dynamic real-time systems through adaptive resource management. the middleware services have been implemented and employed for components of the experimental Navy system described in [10]. Experimental characterizations show that the services provide timely responses, that they have a low degree of intrusiveness on hardware resources, and that they are scalable.

关键词： distributed computer systems

来源：评论

学校读者我要写书评

暂无评论

PM-PVM: A portable multithreaded PVM

Proceedings of the International Parallel Processing Symposi...

引用

Proceedings of the international parallel processing symposium, IPPS 1999年 191-195页

作者： Santos, C.M.P. Aude, J.S. Federal Univ of Rio de Janeiro Brazil

PM-PVM is a portable implementation of PVM designed to work on SMP architectures supporting multithreading. PM-PVM portability is achieved through the implementation of the PVM functionality on top of a reduced set of parallel programming primitives. Within PM-PVM, PVM tasks are mapped onto threads and the message passing functions are implemented using shared memory. three implementation approaches of the PVM message passing functions have been adopted. In the first one, a single message copy in memory is shared by all destination tasks. the second one replicates the message for every destination task but requires less synchronization. Finally, the third approach uses a combination of features from the two previous ones. Experimental results comparing the performance of PM-PVM and PVM applications running on a 4-processor Sparcstation 20 under Solaris 2.5 show that PM-PVM can produce execution times up to 54% smaller than PVM.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Exploiting application tunability for efficient, predictable parallel resource management

Proceedings of the International Parallel Processing Symposi...

引用

Proceedings of the international parallel processing symposium, IPPS 1999年 749-758页

作者： Chang, Fangzhe Karamcheti, Vijay Kedem, Zvi New York Univ New York United States

parallel computing is becoming increasing central and mainstream, driven both by the widespread availability of commodity SMP and high-performance cluster platforms, as well as the growing use of parallelism in general-purpose applications such as image recognition, virtual reality, and media processing. In addition to performance requirements, the latter computations impose soft real-time constraints, necessitating efficient, predictable parallel resource management. In this paper, we propose a novel approach for increasing parallel system utilization while meeting application soft real-time deadlines. Our approach exploits the application tunability found in several general-purpose computations. Tunability refers to an application's ability to trade off resource requirements over time, while maintaining a desired level of output quality. We first describe language extensions to support tunability in the Calypso system, then characterize the performance benefits of tunability, using a synthetic task system to systematically identify its benefits. Our results show that application tunability is convenient to express and can significantly improve parallel system utilization for computations with predictability requirements.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Fast multithreaded out-of-core visualization technique

Proceedings of the International Parallel Processing Symposi...

引用

Proceedings of the international parallel processing symposium, IPPS 1999年 569-575页

作者： Sulatycke, Peter D. Ghose, Kanad State Univ of New York Binghamton United States

Out-of-core rendering techniques are necessary for viewing large volume disk-resident data sets produced by many scientific applications or high resolution imaging systems. Traditional visualizers can provide real-time performance but require all of the data to be viewed to be in the RAM. We describe a multithreaded implementation of an out-of-core isosurface renderer that does not impose such restrictions and yet provides performance that scales well with the size of the data. Our renderer uses an interval tree data structure on disk with a layout that reduces disk seeks to read out only the relevant data from the disk. the low resulting disk latencies are hidden by using prefetching and multithreading to overlap the activities of the rendering computations and disk accesses. Our renderer outperforms the out-of-core isosurface renderer of the well-known vtk toolkit by about one order of magnitude and several orders of magnitude when compared against vtk toolkit's optimized in-core algorithm on large representative CT scan data. the multithreaded version also scales well with the number of threads.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Lazy logging and prefetch-based crash recovery in software distributed shared memory systems

Proceedings of the International Parallel Processing Symposi...

引用

Proceedings of the international parallel processing symposium, IPPS 1999年 399-406页

作者： Kongmunvattana, Angkul Tzeng, Nian-Feng Univ of Southwestern Louisiana Lafayette United States

In this paper, we propose a new, efficient logging protocol, called lazy logging, and a fast crash recovery protocol, called the prefetch-based crash recovery (PCR), for software distributed shared memory (SDSM). Our lazy logging protocol minimizes failure-free overhead by logging only data indispensable for correct recovery, while our PCR protocol reduces the recovery time by prefetching data according to the future memory access patterns, thus eliminating memory miss penalty during the recovery process. We have performed experiments on workstation clusters, comparing our protocols against the earlier reduced-stable logging (RSL) protocol by actually implementing both protocols in TreadMarks, a state-of-the-art SDSM system. the experimental results show that our lazy logging protocol consistently outperforms the RSL protocol. Our protocol increases the execution time slightly by 1% to 4% during failure-free execution, while the RSL protocol results in the execution time overhead of 6% to 21% due to its larger log size and higher disk access frequency. Our PCR protocol also outperforms the widely used simple crash recovery protocol by 18% to 57% under all applications examined.

关键词： distributed computer systems

来源：评论

学校读者我要写书评

暂无评论

OpenMP for networks of SMPs

OpenMP for networks of SMPs

引用

international symposium on parallel processing

作者： Y.C. Hu Honghui Lu A.L. Cox W. Zwaenepoel Department of Computer Science Rice University Houston TX USA Department of Electrical and Computer Engineering Rice University Houston TX USA

In this paper we present the first system that implements OpenMP on a network of shared-memory multiprocessors. this system enables the programmer to rely on a single, standard, shared-memory API for parallelization within a multiprocessor and between multiprocessors. It is implemented via a translator that converts OpenMP directives to appropriate calls to a modified version of the TreadMarks software distributed memory system (SDSM). In contrast to previous SDSM systems for SMPs, the modified TreadMarks uses POSIX threads for parallelism within an SMP node. this approach greatly simplifies the changes required to the SDSM in order to exploit the intra-node hardware shared memory. We present performance results for six applications (SPLASH-2 Barnes-Hut and Water; NAS 3D-FFT, SOR, TSP and MGS) running on an SP2 with four four-processor SMP nodes. A comparison between the threaded implementation and the original implementation of TreadMarks shows that using the hardware shared memory within an SMP node significantly reduces the amount of data and the number of messages transmitted between nodes, and consequently achieves speedups up to 30% better than the original versions. We also compare SDSM against message passing. Overall, the speedups of multithreaded TreadMarks programs are within 7-30% of the MPI versions.

关键词： Switched-mode power supply Programming profession Message passing parallel programming Computer science Automatic logic units Runtime library Yarn Algorithms Open source software

来源：评论

学校读者我要写书评

暂无评论

OpenMP for networks of SMPs

Proceedings of the International Parallel Processing Symposi...

引用

Proceedings of the international parallel processing symposium, IPPS 1999年 302-310页

作者： Hu, Y.Charlie Lu, Honghui Cox, Alan L. Zwaenepoel, Willy Rice Univ Houston United States

In this paper, we present the first system that implements OpenMP on a network of shared-memory multiprocessors. this system enables the programmer to rely on a single, standard, shared-memory API for parallelization within a multiprocessor and between multiprocessors. It is implemented via a translator that converts OpenMP directives to appropriate calls to a modified version of the TreadMarks software distributed memory system (SDSM). In contrast to previous SDSM systems for SMPs, the modified TreadMarks uses POSIX threads for parallelism within an SMP node. this approach greatly simplifies the changes required to the SDSM in order to exploit the intra-node hardware shared memory. We present performance results for six applications (SPLASH-2 Barnes-Hut and Water, NAS 3D-FFT, SOR, TSP and MGS) running on an SP2 with four four-processor SMP nodes. A comparison between the threaded implementation and the original implementation of TreadMarks shows that using the hardware shared memory within an SMP node significantly reduces the amount of data and the number of messages transmitted between nodes, and consequently achieves speedups up to 30% better than the original versions. We also compare SDSM against message passing. Overall, the speedups of multithreaded TreadMarks programs are within 7-30% of the MPI versions.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Design and implementation of a scalable parallel system for multidimensional analysis and OLAP

Design and implementation of a scalable parallel system for ...

引用

international symposium on parallel processing

作者： S. Goil A. Choudhary Department of Electrical & Computer Engineering Technological Institute Northwestern University Evanston IL USA

Multidimensional Analysis and On-Line Analytical processing (OLAP) uses summary information that requires aggregate operations along one or more dimensions of numerical data values. Query processing for these applications require different views of data for decision support. the Data Cube operator provides multi-dimensional aggregates, used to calculate and store summary information on a number of dimensions. the multi-dimensionality of the underlying problem can be represented both in relational and multi-dimensional databases, the latter being a better fit when query performance is the criteria for judgment. Relational databases are scalable in size and efforts are on to make their performance acceptable. On the other hand multi-dimensional databases perform well for such queries, although they are nor very scalable. parallel computing is necessary to address the scalability and performance issues for these data sets. In this paper we present a parallel and scalable infrastructure for OLAP and multidimensional analysis. We use chunking to store data either as a dense block using multidimensional arrays (md-arrays) or a sparse set using a Bit encoded sparse structure (BESS). Chunks provide a multidimensional index structure for efficient dimension oriented data accesses much the same as md-arrays do. Operations within chunks and between chunks are a combination of relational and multi-dimensional operations depending on whether the chunk is sparse or dense. We present performance results for data sets with 3, 5 and 10 dimensions for our implementation on the IBM SP-2 which show good speedup and scalability.

关键词： Multidimensional systems Databases Data structures Aggregates Information analysis Decision support systems Data mining Performance analysis

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：