检索结果-内蒙古大学图书馆

Storage-Heterogeneity Aware task-based programming models to Optimize I/O Intensive Applications

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2022年第12期33卷 3589-3599页

作者： Elshazly, Hatem Ejarque, Jorge Badia, Rosa M. Barcelona Supercomp Ctr BSC Barcelona 08034 Spain

task-based programming models have enabled the optimized execution of the computation workloads of applications. These programming models can take advantage of large-scale distributed infrastructures by allowing the parallel and distributed execution of applications in high-level work components called tasks. Nevertheless, in the era of Big Data and Exascale, the amount of data produced by modern scientific applications has already surpassed terabytes and is rapidly increasing. Hence, I/O performance became the bottleneck to overcome in order to achieve more total performance improvement. New storage technologies offer higher bandwidth and faster solutions than traditional Parallel File Systems (PFS). Such storage devices are deployed in modern day infrastructures to boost I/O performance by offering a fast layer that absorbs the generated data. Therefore, it is necessary for any programming model targeting more performance to manage this heterogeneity and take advantage of it to improve the I/O performance of applications. Towards this goal, we propose in this article a set of programming model capabilities that we refer to as Storage-Heterogeneity Awareness. Such capabilities include: (i) abstracting the heterogeneity of storage systems, and (ii) optimizing I/O performance by supporting dedicated I/O schedulers and an automatic data flushing technique. The evaluation section of this article presents the performance results of different applications on the MareNostrum CTE-Power heterogeneous storage cluster. Our experiments demonstrate that a storage-heterogeneity aware programming model can achieve up to almost 5x I/O performance speedup and 48% total time improvement compared to the reference PFS-based usage of the execution infrastructure.

关键词： task analysis programming Performance evaluation Computational modeling Bandwidth Proposals Random access memory Heterogeneous storage systems task-based programming models I O intensive applications I O scheduling task scheduling automatic data movement heterogeneity abstraction resource pooling checkpointing

来源：评论

学校读者我要写书评

暂无评论

Towards enabling I/O awareness in task-based programming models

引用

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 2021年 121卷 74-89页

作者： Elshazly, Hatem Ejarque, Jorge Lordan, Francesc Badia, Rosa M. Barcelona Supercomp Ctr BSC Barcelona Spain

Storage systems have not kept the same technology improvement rate as computing systems. As applications produce more and more data, I/O becomes the limiting factor for increasing application performance. I/O congestion caused by concurrent access to storage devices is one of the main obstacles that cause I/O performance degradation and, consequently, total performance degradation. Although task-based programming models made it possible to achieve higher levels of parallelism by enabling the execution of tasks in large-scale distributed platforms, this parallelism only benefited the compute workload of the application. Previous efforts addressing I/O performance bottlenecks either focused on optimizing fine-grained I/O access patterns using I/O libraries or avoiding system-wide I/O congestion by minimizing interference between multiple applications. In this paper, we propose enabling I/O Awareness in task-based programming models for improving the total performance of applications. An I/O aware programming model is able to create more parallelism and mitigate the causes of I/O performance degradation. On the one hand, more parallelism can be created by supporting special tasks for executing I/O workloads, called I/O tasks, that can overlap with the execution of compute tasks. On the other hand, I/O congestion can be mitigated by constraining I/O tasks scheduling. We propose two approaches for specifying such constraints: explicitly set by the users or automatically inferred and tuned during application's execution to optimize the execution of variable I/O workloads on a certain storage infrastructure. We implement our proposal using PyCOMPSs: a task-based programming model for parallelizing Python applications. Our experiments on the MareNostrum 4 Supercomputer demonstrate that using I/O aware PyCOMPSs can achieve significant performance improvement in the total execution time of applications with different I/O workloads. This performance improvement can reach up to

关键词： I/O awareness task-based programming models I/O intensive applications I/O congestion I/O-compute overlap I/O scheduling Auto-tunable constraints

来源：评论

学校读者我要写书评

暂无评论

A Hardware Runtime for task-based programming models

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2019年第9期30卷 1932-1946页

作者： Tan, Xubin Bosch, Jaume Alvarez, Carlos Jimenez-Gonzalez, Daniel Ayguade, Eduard Valero, Mateo UPC BSC Barcelona 08034 Catalonia Spain

task-based programming models such as OpenMP 5.0 and OmpSs are simple to use and powerful enough to exploit task parallelism of applications over multicore, manycore and heterogeneous systems. However, their software-only runtimes introduce relevant overhead when targeting fine-grained tasks, resulting in performance losses. To overcome this drawback, we present a hardware runtime Picos++ that accelerates critical runtime functions such as task dependence analysis, nested task support, and heterogeneous task scheduling. As a proof-of-concept, the Picos++ hardware runtime has been integrated with a compiler infrastructure that supports parallel task-based programming models. A FPGA SoC running Linux OS has been used to implement the hardware accelerated part of Picos++, integrated with a heterogeneous system composed of 4 symmetric multiprocessor (SMP) cores and several hardware functional accelerators (HwAccs) for task execution. Results show significant improvements on energy and performance compared to state-of-the-art parallel software-only runtimes. With Picos++, applications can achieve up to 7.6x speedup and save up to 90 percent of energy, when using 4 threads and up to 4 HwAccs, and even reach a speedup of 16x over the software alternative when using 12 HwAccs and small tasks.

关键词： Fine-grained parallelism task-dependence analysis nested tasks heterogeneous task scheduling energy saving FPGA task-based programming models

来源：评论

学校读者我要写书评

暂无评论

Boosting Earth System Model Outputs And Saving PetaBytes in Their Storage Using Exascale Climate Emulators 24

Boosting Earth System Model Outputs And Saving PetaBytes in ...

引用

2024 International Conference for High Performance Computing, Networking, Storage and Analysis

作者： Abdulahl, Sameh Baker, Allison H. Bosilca, George Cao, Qinglei Castruccio, Stefano Genton, Marc G. Keyes, David E. Khalid, Zubair Ltaief, Hatem Song, Yan Stenchikov, Georgiy L. Sun, Ying King Abdullah Univ Sci & Technol Extreme Comp & Stat & Earth Sci Thuwal Saudi Arabia NSF Natl Ctr Atmospher Res Computat & Informat Sci Lab Boulder CO USA NVIDIA Santa Clara CA USA St Louis Univ Dept Comp Sci St Louis MO USA Univ Notre Dame Dept Appl & Computat Math & Stat Notre Dame IN USA Lahore Univ Management Sci Dept Elect Engn Lahore Pakistan

ISBN: (数字)9798350352917

ISBN: (纸本)9798350352924;9798350352917

We present the design and scalable implementation of an exascale climate emulator for addressing the escalating computational and storage requirements of high-resolution Earth System Model simulations. We utilize the spherical harmonic transform to stochastically model spatio-temporal variations in climate data. This provides tunable spatio-temporal resolution and significantly improves the fidelity and granularity of climate emulation, achieving an ultra-high spatial resolution of 0.034 degrees (similar to 3.5 km) in space. Our emulator, trained on 318 billion hourly temperature data points from a 35-year and 31 billion daily data points from an 83-year global simulation ensemble, generates statistically consistent climate emulations. We extend linear solver software to mixed-precision arithmetic GPUs, applying different precisions within a single solver to adapt to different correlation strengths. The PaRSEC runtime system supports efficient parallel matrix operations by optimizing the dynamic balance between computation, communication, and memory requirements. Our BLAS3-rich code is optimized for systems equipped with four different families and generations of GPUs, scaling well to achieve 0.976 EFlop/s on 9,025 nodes (36,100 AMD MI250X multi-chip module (MCM) GPUs) of Frontier (nearly full system), 0.739 EFlop/s on 1,936 nodes (7,744 Grace-Hopper Superchips (GH200)) of Alps, 0.243 EFlop/s on 1,024 nodes (4,096 A100 GPUs) of Leonardo, and 0.375 EFlop/s on 3,072 nodes (18,432 V100 GPUs) of Summit.

关键词： Dynamic runtime systems High-performance computing Mixed-precision computation Spatio-temporal climate emulation Spherical harmonic transform task-based programming models

来源：评论

学校读者我要写书评

暂无评论

ALPI: Enhancing Portability and Interoperability of task-Aware Libraries 1

引用

2nd International Workshop on Asynchronous Many-task Systems and Applications (WAMTA)

作者： Sala, Kevin Alvarez, David Penacoba, Raul Arias Mallo, Rodrigo Navarro, Antoni Roca, Aleix Beltran, Vicenc Barcelona Supercomp Ctr BSC Pl Eusebi Guell 1-3 Barcelona 08034 Spain

ISBN: (数字)9783031617638

ISBN: (纸本)9783031617621;9783031617638

task-based programming models are a promising approach to exploiting complex distributed and heterogeneous systems. However, integrating different communication, offloading, and storage APIs within tasks poses performance and deadlock risks. Several task-Aware libraries, such as TAMPI, TASIO, and TACUDA, have been developed to integrate blocking and non-blocking APIs within task-based programming models efficiently. In this paper, we introduce the Asynchronous Low-level programming Interface (ALPI) to enable the interoperability and portability of task-Aware libraries across various programming models and runtime systems. We have implemented ALPI in the Nanos6 and nOS-V runtimes, enhancing the integration of task-Aware libraries with the OmpSs-2 and OpenMP programming models. This work is a step towards improving the composability of parallel programming models by supporting task-Aware libraries across different runtime systems.

关键词： task-based programming models Runtime systems OpenMP OmpSs-2 Portability Interoperability

来源：评论

学校读者我要写书评

暂无评论

Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications

Reshaping Geostatistical Modeling and Prediction for Extreme...

引用

International Conference for High Performance Computing, Networking, Storage and Analysis (HPC)

作者： Cao, Qinglei Abdulah, Sameh Alomairy, Rabab Pei, Yu Nag, Pratik Bosilca, George Dongarra, Jack Genton, Marc G. Keyes, David E. Ltaief, Hatem Sun, Ying King Abdullah Univ Sci & Technol Div Comp Elect & Math Sci & Engn Extreme Comp Res Ctr Thuwal Saudi Arabia Univ Tennessee Innovat Comp Lab Knoxville TN 37996 USA Oak Ridge Natl Lab Oak Ridge TN USA Univ Manchester Manchester England

ISBN: (纸本)9781665454445

We extend the capability of space-time geostatistical modeling using algebraic approximations, illustrating application-expected accuracy worthy of double precision from majority low-precision computations and low-rank matrix approximations. We exploit the mathematical structure of the dense covariance matrix whose inverse action and determinant are repeatedly required in Gaussian log-likelihood optimization. Geostatistics augments first-principles modeling approaches for the prediction of environmental phenomena given the availability of measurements at a large number of locations;however, traditional Cholesky-based approaches grow cubically in complexity, gating practical extension to continental and global datasets now available. We combine the linear algebraic contributions of mixed-precision and low-rank computations within a tile-based Cholesky solver with on-demand casting of precisions and dynamic runtime support from PaRSEC to orchestrate tasks and data movement. Our adaptive approach scales on various systems and leverages the Fujitsu A64FX nodes of Fugaku to achieve up to 12X performance speedup against the highly optimized dense Cholesky implementation.

关键词： Space-Time Geospatial Statistics Climate/Weather Prediction task-based programming models Dynamic Runtime Systems Mixed-Precision Computations Low-Rank Matrix Approximations High Performance Computing

来源：评论

学校读者我要写书评

暂无评论

OmpSs@FPGA Framework for High Performance FPGA Computing

引用

IEEE TRANSACTIONS ON COMPUTERS 2021年第12期70卷 2029-2042页

作者： Miguel de Haro, Juan Bosch, Jaume Filgueras, Antonio Vidal, Miquel Jimenez-Gonzalez, Daniel Alvarez, Carlos Martorell, Xavier Ayguade, Eduard Labarta, Jesus Barcelona Supercomp Ctr BSC Barcelona 08034 Catalonia Spain Univ Politecn Catalunya UPC Barcelona 08034 Catalonia Spain

This article presents the new features of the OmpSs@FPGA framework. OmpSs is a data-flow programming model that supports task nesting and dependencies to target asynchronous parallelism and heterogeneity. OmpSs@FPGA is the extension of the programming model addressed specifically to FPGAs. OmpSs environment is built on top of Mercurium source to source compiler and Nanos++ runtime system. To address FPGA specifics Mercurium compiler implements several FPGA related features as local variable caching, wide memory accesses or accelerator replication. In addition, part of the Nanos++ runtime has been ported to hardware. Driven by the compiler this new hardware runtime adds new features to FPGA codes, such as task creation and dependence management, providing both performance increases and ease of programming. To demonstrate these new capabilities, different high performance benchmarks have been evaluated over different FPGA platforms using the OmpSs programming model. The results demonstrate that programs that use the OmpSs programming model achieve very competitive performance with low to moderate porting effort compared to other FPGA implementations.

关键词： Field programmable gate arrays task analysis Hardware Runtime programming Tools Random access memory FPGA reconfigurable hardware parallel architectures task-based programming models high-level synthesis

来源：评论

学校读者我要写书评

暂无评论

Reshaping geostatistical modeling and prediction for extreme-scale environmental applications 22

Reshaping geostatistical modeling and prediction for extreme...

引用

Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

作者： Qinglei Cao Sameh Abdulah Rabab Alomairy Yu Pei Pratik Nag George Bosilca Jack Dongarra Marc G. Genton David E. Keyes Hatem Ltaief Ying Sun University of Tennessee King Abdullah University of Science and Technology Thuwal KSA University of Tennessee and The Oak Ridge National Laboratory and University of Manchester

We extend the capability of space-time geostatistical modeling using algebraic approximations, illustrating application-expected accuracy worthy of double precision from majority low-precision computations and low-rank matrix approximations. We exploit the mathematical structure of the dense covariance matrix whose inverse action and determinant are repeatedly required in Gaussian log-likelihood optimization. Geostatistics augments first-principles modeling approaches for the prediction of environmental phenomena given the availability of measurements at a large number of locations; however, traditional Cholesky-based approaches grow cubically in complexity, gating practical extension to continental and global datasets now available. We combine the linear algebraic contributions of mixed-precision and low-rank computations within a tile-based Cholesky solver with on-demand casting of precisions and dynamic runtime support from PaRSEC to orchestrate tasks and data movement. Our adaptive approach scales on various systems and leverages the Fujitsu A64FX nodes of Fugaku to achieve up to 12X performance speedup against the highly optimized dense Cholesky implementation.

关键词： high performance computing climate/weather prediction dynamic runtime systems task-based programming models mixed-precision computations space-time geospatial statistics low-rank matrix approximations

来源：评论

学校读者我要写书评

暂无评论

A Current task-based programming Paradigms Analysis 20th

A Current Task-Based Programming Paradigms Analysis

引用

20th Annual International Conference on Computational Science (ICCS)

作者： Gurhem, Jerome Petiton, Serge G. Univ Lille CRIStAL Ctr Rech Informat Signal & Automat Lille CNRS UMR 9189 F-59000 Lille France Univ Paris Saclay UVSQ CNRS CEAMaison Simulat F-91191 Gif Sur Yvette France

ISBN: (纸本)9783030867973;9783030504250

task-based paradigm models can be an alternative to MPI. The user defines atomic tasks with a defined input and output with the dependencies between them. Then, the runtime can schedule the tasks and data migrations efficiently over all the available cores while reducing the waiting time between tasks. This paper focus on comparing several task-based programming models between themselves using the LU factorization as benchmark. HPX, PaRSEC, Legion and YML+XMP are task-based programming models which schedule data movement and computational tasks on distributed resources allocated to the application. YML+XMP supports parallel and distributed tasks with XscalableMP, a PGAS language. We compared their performances and scalability are compared to ScaLAPACK, an highly optimized library which uses MPI to perform communications between the processes on up to 64 nodes. We performed a block-based LU factorization with the task-based programming model on up to a matrix of size 49512 x 49512. HPX is performing better than PaRSEC, Legion and YML+XMP but not better than ScaLAPACK. YML+XMP has a better scalability than HPX, Legion and PaRSEC. Regent has trouble scaling from 32 nodes to 64 nodes with our algorithm.

关键词： Parallel and distributed programming paradigms task-based programming models Supercomputers

来源：评论

学校读者我要写书评

暂无评论

Runtime Approaches to Improve the Efficiency of Hybrid and Irregular Applications

Runtime Approaches to Improve the Efficiency of Hybrid and I...

引用

作者： Seonmyeong Bak Georgia Institute of Technology

学位级别：博士

On-node parallelism has increased significantly in high-performance computing systems. This huge amount of parallelism can be used to speed up regular paral- lel applications relatively easily because straightforward approaches usually suffice to map their computation patterns and data layouts on to available on-node parallelism. However, irregular parallel applications require considerable effort to run on the mod- ern processors with massive amounts of intra-node parallelism. Parallel programming models and runtime approaches have been proposed to help programmers to write those applications quickly, but it's still not easy to write efficient irregular paral- lel applications. Two key challenges in mapping irregular applications onto on-node parallelism are load balance and computation-communication overlap. In this thesis proposal, we address these challenges through new runtime approaches and new APIs that enable users to provide minimal information for application-aware scheduling. First, we introduce new algorithms to improve the scheduling of irregular task graphs containing a mix of communication and computation tasks with data-parallelism and blocking operations. We combine gang-scheduling with work-stealing for data- parallel tasks with frequent inter/intra-node communication in the task graphs so as to reduce interference and expensive context switching operations. We also propose improved victim selection policies for work-stealing to improve the load balance and overlap of ready tasks that have child tasks. Next, we propose an efficient integrated runtime system to handle load balancing of irregular applications written in hybrid parallel programming models. We introduce a unified runtime system that integrates distributed and shared-memory programming, as exemplified by the combination of Charm++ and OpenMP. In this approach, all processing resources (cores) can be used flexibly across both the distributed and shared-memory levels, thereby enabling mor

关键词： High Performance Computing Runtime Systems Load Balancing task-based programming models

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：