检索结果-内蒙古大学图书馆

A 2D algorithm with asymmetric workload for the upc conjugate gradient method

JOURNAL OF SUPERCOMPUTING 2014年第2期70卷 816-829页

作者： Gonzalez-Dominguez, Jorge Marques, Osni A. Martin, Maria J. Tourino, Juan Johannes Gutenberg Univ Mainz Parallel & Distributed Architectures Grp D-55122 Mainz Germany Univ Calif Berkeley Lawrence Berkeley Natl Lab Computat Res Div Berkeley CA 94720 USA Univ A Coruna Comp Architecture Grp La Coruna Spain

This paper examines four different strategies, each one with its own data distribution, for implementing the parallel conjugate gradient (CG) method and how they impact communication and overall performance. Firstly, typical 1D and 2D distributions of the matrix involved in CG computations are considered. Then, a new 2D version of the CG method with asymmetric workload, based on leaving some threads idle during part of the computation to reduce communication, is proposed. The four strategies are independent of sparse storage schemes and are implemented using Unified Parallel C (upc), a Partitioned Global Address Space (PGAS) language. The strategies are evaluated on two different platforms through a set of matrices that exhibit distinct sparse patterns, demonstrating that our asymmetric proposal outperforms the others except for one matrix on one platform.

关键词： Conjugate gradient PGAS upc Performance optimization Data distribution

来源：评论

学校读者我要写书评

暂无评论

Optimizing upc programs for multi-core systems

引用

SCIENTIFIC PROGRAMMING 2010年第3-4期18卷 183-191页

作者： Zheng, Yili Univ Calif Berkeley Lawrence Berkeley Lab Berkeley CA 94720 USA

The Partitioned Global Address Space (PGAS) model of Unified Parallel C (upc) can help users express and manage application data locality on non-uniform memory access (NUMA) multi-core shared-memory systems to get good performance. First, we describe several upc program optimization techniques that are important to achieving good performance on NUMA multi-core computers with examples and quantitative performance results. Second, we use two numerical computing kernels, parallel matrix-matrix multiplication and parallel 3-D FFT, to demonstrate the end-to-end development and optimization for upc applications. Our results show that the optimized upc programs achieve very good and scalable performance on current multi-core systems and can even outperform vendor-optimized libraries in some cases.

关键词： upc PGAS

来源：评论

学校读者我要写书评

暂无评论

Efficient bandwidth allocation and call admission control for VBR service using upc parameters

引用

INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS 2000年第1期13卷 29-50页

作者： Wu, DP Chao, HJ Polytech Inst New York Dept Elect Engn Metrotech Ctr 6 Brooklyn NY 11201 USA

Provision of Quality-of-Service (QoS) guarantees is an important and challenging issue in the design of Asynchronous Transfer Mode (ATM) networks. Call Admission Control (CAC) is an integral part of the challenge and is closely related to other aspects of network designs such as traffic characterization and QoS specification. Since the Usage Parameter Control (upc) parameters are the only standardized traffic characterizations, developing efficient CAC schemes based on upc parameters is significant for the implementation of CAC on ATM switches. In this paper, we develop a CAC algorithm called TAP (derived from TAgged Probability) as well as two other CAC algorithms using the upc parameters. These CAC algorithms are based on our observation that the loss-probability-to-overflow-probability ratio tends to decrease as the number of sources increases. By introducing the loss-probability-to-overflow-probability ratio K, we find that this ratio sheds light on increasing resource utilization while still guaranteeing QoS. Analysis, simulation, and numerical results have shown that the proposed TAP algorithm is simple and efficient. Copyright (C) 2000 John Wiley & Sons, Ltd.

关键词： CAC VER upc loss-probability-to-overflow-probability ratio effective bandwidth

来源：评论

学校读者我要写书评

暂无评论

The relationship between the unsaturated permeability coefficients (upc) determined and oxidation- reduction potential (ORP) in the subsurface wastewater infiltration system (SWIS)

引用

DESALINATION AND WATER TREATMENT 2019年第0期155卷 155-161页

作者： Li, H. B. Zhang, X. R. Li, Y. H. Bai, X. Y. Bai, J. N. Zhao, X. M. Zhao, S. Northeastern Univ Shenyang 110819 Liaoning Peoples R China

The estimation of unsaturated permeability coefficients (upc) is of great importance on the properties of the matrix. Aim to study the relationship between the upc and oxidation -reduction potential (ORP) in different matrix depths in the subsurface wastewater infiltration system (SWIS) and provide scientific basis for regulating SWIS and increasing pollutant removal, a test experiment of simulating the SWIS which included an inflow period (12 h) and a drying period (12 h) in one cycle was designed with a hydraulic load of 0.10 m3 center dot (m2 center dot d)-1. Results investigated that ORP could increase with upc increasing in 70 and 115 cm and decrease with the upc increasing in 100 and 130 cm matrix depths. Phenomena indicated that capillary action could affect upc and ORP obviously. Moreover, the existence of oxygen and low volumetric water contents could impose upc and ORP. upc in the 100 cm matrix depth below proved that anaerobic area could be found in aerobic environment under alternation conditions. upc in different matrix depths of a satisfactory SWIS could change from 2.49 x 10-7 to 1.16 x 10-3 cm center dot s-1. Treated water met reused requirements and no clogging was found.

关键词： Wet-dry alternation condition upc ORP SWIS

来源：评论

学校读者我要写书评

暂无评论

High Performance Computational Hydrodynamic Simulations: upc Parallel Architecture as a Future Alternative 18th

High Performance Computational Hydrodynamic Simulations: UPC...

引用

18th International Conference on Computational Science (ICCS)

作者： Chew, Alvin Wei Ze Tung Thanh Vu Law, Adrian Wing-Keung Nanyang Technol Univ Sch Civil & Environm Engn N1-01c-9850 Nanyang Ave Singapore 639798 Singapore Nanyang Environm & Water Res Inst NEWRI Environm Proc Modelling Ctr EPMC 1 Cleantech LoopCleanTech One 06-08 Singapore 637141 Singapore

ISBN: (纸本)9783319936987;9783319936970

Developments in high performance computing (HPC) has today transformed the manner of how computational hydrodynamic (CHD) simulations are performed. Till now, the message passing interface (MPI) remains the common parallelism architecture and has been adopted widely in CHD simulations. However, its bottleneck problem remains for some large-scale simulation cases due to delays during message passing whereby the total communication time may exceed the total simulation runtime with an increasing number of computer processers. In this study, we utilise an alternative parallelism architecture, known as PGAS-upc, to develop our own upc-CHD model with a 2-step explicit scheme from the Lax-Wendroff family of predictors-correctors. The model is evaluated on three incompressible, adiabatic viscous 2D flow cases having moderate flow velocities. Model validation is achieved by the reasonably good agreement between the predicted and respective analytical values. We then compare the computational performance between upc-CHD and that of MPI in its base design in a SGI UV-2000 server till 100 processers maximum in this study. The former achieves a near 1:1 speedup which demonstrates its efficiency potential for very large-scale CHD simulations, while the later experiences slowdown at some point. Extension of upc-CHD remains our main objective which can be achieved by the following additions: (a) inclusions of other numerical schemes to accommodate for other types of fluid simulations, and (b) coupling upc-CHD with Amazon Web Service (AWS) to further exploit its parallelism efficiency as a viable alternative.

关键词： Parallel computing Viscous incompressible laminar flow MPI upc Computational hydrodynamic (CHD) simulations

来源：评论

学校读者我要写书评

暂无评论

A Parallel Numerical Library for upc

引用

15th International Euro-Par Conference on Parallel Computing

作者： Gonzalez-Dominguez, Jorge Martin, Maria J. Taboada, Guillermo L. Tourino, Juan Doallo, Ramon Gomez, Andres Univ A Coruna Comp Architecture Grp La Coruna Spain Galicia Supercomp Ctr CESGA Santiago De Compostela Spain

ISBN: (纸本)9783642038686

Unified Parallel C (upc) is a Partitioned Global Address Space (PGAS) language that exhibits high performance and portability on a broad class of shared and distributed memory parallel architectures. This paper describes the design and implementation of a parallel numerical library for upc built on top of the sequential BLAS routines. The developed library exploits the particularities of the PEAS paradigm, taking into account data locality in order to guarantee a good performance. The library was experimentally validated;demonstrating scalability and efficiency.

关键词： Parallel computing PGAS upc Numerical libraries BLAS

来源：评论

学校读者我要写书评

暂无评论

Tool-assisted optimization of shared-memory accesses in upc applications

Tool-assisted optimization of shared-memory accesses in UPC ...

引用

14th IEEE International Conference on High Performance Computing and Communications (HPCC) / IEEE 9th International Conference on Embedded Software and Systems (ICESS)

作者： Cong, Guojing Wen, Huifang Murata, Hiroki Negishi, Yasushi IBM Corp Thomas J Watson Res Ctr 1101 Kitchawan RdRoute 134 Yorktown Hts NY 10598 USA IBM Tokyo Res Lab Yamato 162314 Japan

ISBN: (纸本)9780769547497

upc is designed to improve user productivity when programming distributed-memory machines. Yet the shared-memory abstraction also makes performance analysis hard as it introduces extra overhead with local accesses and implicit communication with remote ones. As far as we know, there are no mature software utilities for systematic analysis and tuning of shared-memory access performance in upc programs. We develop a mechanism to track shared memory accesses and correlate them to the upc source lines, functions, and data structures. We then apply tool-assisted analysis to a set of upc programs. For the NAS upc benchmark we achieve dramatic performance improvement over the unoptimized implementation as well as up to two times speedups over the fully hand-tuned implementation. We expect our approach effective in tuning a wide range of upc programs.

关键词： GAS upc performance toolsGAS upc performance toolsP

来源：评论

学校读者我要写书评

暂无评论

Optimizing Collective Communication in upc 28

Optimizing Collective Communication in UPC

引用

28th IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

作者： Jose, Jithin Hamidouche, Khaled Zhang, Jie Venkatesh, Akshay Panda, Dhabaleswar K. (DK) Ohio State Univ Dept Comp Sci & Engn Columbus OH 43210 USA

ISBN: (纸本)9781479941162

Message Passing Interface (MPI) has been the defacto programming model for scientific parallel applications. However, data driven applications with irregular communication patterns are harder to implement using MPI. The Partitioned Global Address Space (PGAS) programming models present an alternative approach to improve programmability. PGAS languages like upc are growing in popularity because of their ability to provide shared-memory programming model over distributed memory machines. However, since upc is an emerging standard, it is unlikely that entire applications will be re-written with it. Instead, unified communication runtimes have paved the way for a new class of hybrid applications that can leverage the benefits of both MPI and PGAS models. Such unified runtimes need to be designed in a high performance, scalable manner to improve the performance of emerging hybrid applications. Collective communication primitives offer a flexible, portable way to implement group communication operations and are supported in both MPI and PGAS programming models. Owing to their advantages, they are also widely used across various scientific parallel applications. Over the years, MPI libraries have relied upon aggressive software-/hardware-based and kernel-assisted optimizations to deliver low communication latency for various collective operations. However, there is much room for improvement for collective operations in state-of-the-art, open-source implementations of upc. In this paper, we address the challenges associated with improving the performance of collective primitives in upc. Further, we also explore design alternatives to enable collective primitives in upc to directly leverage the designs available in the MVAPICH2 MPI library. Our experimental evaluations show that our designs improve the performance of the upc broadcast and all-gather operations, by 25X and 18X respectively for 128KB message at 2,048 processes. Our designs improve the performance of the upc 2D

关键词： upc Collectives InfiniBand Programming Models PGAS

来源：评论

学校读者我要写书评

暂无评论

Performance of Parallel Bit-Reversal with Cilk and upc for Fast Fourier Transform

Performance of Parallel Bit-Reversal with Cilk and UPC for F...

引用

5th International Conference on Grid and Pervasive Computing

作者： Weng, Tien-Hsiung Huang, Sheng-Wei Liau, Wei-Duen Li, Kuan-Ching Providence Univ Dept Comp Sci & Informat Engn Taichung 43301 Taiwan

ISBN: (纸本)9783642130663

Bit-reversal is widely known being an important program, as essential part of Fast Fourier Transform. If not carefully and well designed, it may easily take large portion of FFT application's total execution time. In this paper, we present a parallel implementation of Bit-reversal for FFT using Cilk and upc. Based on our previous work of creating parallel Bit-reversal using OpenMP in SPMD style from an unparallelized and sequential algorithm, we could note that keeping the existing parallelism by reorganizing the same program using Cilk and upc libraries is possible yet achieving good performance. Experimental results were obtained by executing these parallel codes on two multi-core SMP platforms, and they show to be very promising.

关键词： Shared-memory parallel programming OpenMP Cilk upc Bit-reversal FFT

来源：评论

学校读者我要写书评

暂无评论

Performance evaluation of sparse matrix products in upc

引用

JOURNAL OF SUPERCOMPUTING 2013年第1期64卷 100-109页

作者： Gonzalez-Dominguez, Jorge Garcia-Lopez, Oscar Taboada, Guillermo L. Martin, Maria J. Tourino, Juan Univ A Coruna Comp Architecture Grp La Coruna Spain

Unified Parallel C (upc) is a Partitioned Global Address Space (PGAS) language whose popularity has increased during the last years owing to its high programmability and reasonable performance through an efficient exploitation of data locality, especially on hierarchical architectures like multicore clusters. However, the performance issues that arise in this language due to the irregular structure of sparse matrix operations have not yet been studied. Among them, the selection of an adequate storage format for the sparse matrices can significantly improve the efficiency of the parallel codes. This paper presents an evaluation, using upc, of the most common sparse storage formats with different implementations of the matrix-vector and matrix-matrix products, which are key kernels in many scientific applications.

关键词： PGAS upc Sparse products Performance evaluation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：