检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

29 篇 会议
7 篇 期刊文献
1 篇 学位论文

馆藏范围

37 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

34 篇 工学
- 33 篇 计算机科学与技术...
- 14 篇 软件工程
- 9 篇 电气工程
- 2 篇 机械工程
- 2 篇 信息与通信工程
- 1 篇 控制科学与工程
- 1 篇 网络空间安全
5 篇 理学
- 4 篇 数学
- 1 篇 物理学
3 篇 管理学
- 2 篇 管理科学与工程(可...
- 1 篇 图书情报与档案管...

主题

37 篇 task-based progr...
4 篇 hpx
4 篇 multicore
4 篇 hpc
4 篇 openmp
3 篇 plasma
3 篇 xeon phi
3 篇 eigensolver
3 篇 scheduling
2 篇 parallelization
2 篇 runtime system
2 篇 dataflow
2 篇 cuda
2 篇 coordination lan...
2 篇 performance port...
2 篇 tile algorithms
2 篇 mapping
2 篇 exascale computi...
2 篇 parsec
2 篇 fpga

机构

2 篇 univ politecn ca...
2 篇 univ bayreuth de...
2 篇 tech univ chemni...
2 篇 oak ridge natl l...
2 篇 univ tennessee d...
1 篇 erasmus mc dept ...
1 篇 univ leeds inst ...
1 篇 univ durham inst...
1 篇 louisiana state ...
1 篇 univ neuchatel i...
1 篇 univ durham larg...
1 篇 inria le chesnay
1 篇 university of te...
1 篇 barcelona superc...
1 篇 inpt toulouse
1 篇 louisiana state ...
1 篇 slac natl accele...
1 篇 sandia natl labs...
1 篇 technical univer...
1 篇 univ bordeaux bo...

作者

3 篇 kaiser hartmut
3 篇 kurzak jakub
3 篇 dongarra jack
3 篇 haidar azzam
3 篇 rauber thomas
3 篇 ruenger gudula
3 篇 thibault samuel
3 篇 bosilca george
2 篇 schuchart joseph
2 篇 kalkhof torben
2 篇 calandra henri
2 篇 koch andreas
2 篇 guermouche abdou
2 篇 yarkhan asim
2 篇 luszczek piotr
2 篇 agullo emmanuel
2 篇 faverge mathieu
2 篇 herault thomas
2 篇 diehl patrick
2 篇 weinzierl tobias

语言

36 篇 英文
1 篇 其他

检索条件"主题词=Task-based Programming"

共 37 条记录，以下是11-20 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

nOS-V: Co-Executing HPC Applications Using System-Wide task Scheduling 38

nOS-V: Co-Executing HPC Applications Using System-Wide Task ...

引用

International Parallel and Distributed Processing Symposium (IPDPS)

作者： Alvarez, David Sala, Kevin Beltran, Vicenc Barcelona Supercomp Ctr Barcelona Spain

ISBN: (纸本)9798350387117;9798350387124

Future Exascale systems will feature massive parallelism, many-core processors and heterogeneous architectures. In this scenario, it is increasingly difficult for HPC applications to fully and efficiently utilize the resources in system nodes. Moreover, the increased parallelism exacerbates the effects of existing inefficiencies in current applications. Research has shown that co-scheduling applications to share system nodes instead of executing each application exclusively can increase resource utilization and efficiency. Nevertheless, the current oversubscription and co-location techniques to share nodes have several drawbacks which limit their applicability and make them very application-dependent. This paper presents co-execution through system-wide scheduling. Co-execution is a novel fine-grained technique to execute multiple HPC applications simultaneously on the same node, outperforming current state-of-the-art approaches. We implement this technique in nOS-V, a lightweight tasking library that supports co-execution through system-wide task scheduling. Moreover, nOS-V can be easily integrated with existing programming models, requiring no changes to user applications. We showcase how co-execution with nOS-V significantly reduces schedule makespan for several applications on different scenarios, outperforming prior node-sharing techniques.

关键词： HPC parallel programming co-location coexecution task-based programming

来源：评论

学校读者我要写书评

暂无评论

Speeding-Up LULESH on HPX: Useful Tricks and Lessons Learned using a Many-task-based Approach

Speeding-Up LULESH on HPX: Useful Tricks and Lessons Learned...

引用

2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024

作者： Kalkhof, Torben Koch, Andreas Technical University of Darmstadt Embedded Systems and Applications Group Darmstadt Germany

ISBN: (纸本)9798350355543

Current programming models face challenges in dealing with modern supercomputers' growing parallelism and heterogeneity. Emerging programming models, like the task-based programming model found in the asynchronous many-task HPX programming framework, offer new ways to express parallelism, enhance scalability, and mask synchronization and communication latency on multi-core and distributed systemsRegular high-performance computing benchmarks are often unsuitable for comparing different programming models due to their limited code complexity. However, real-world scientific applications are usually too complex. As a middle ground, proxy applications model the behavior of actual scientific problems, while reducing code complexityIn our research on using HPX to program machines with heterogeneous compute units (e.g., GPU and FPGA/AI Engines), we have also substantially optimized a pure HPX-based software baseline of the LULESH proxy application. This paper discusses the techniques we applied yielding single-node speed-ups of 1.33x to 2.25x for different problem sizes relative to the LULESH OpenMP reference implementation. © 2024 IEEE.

关键词： HPC HPX LULESH task-based programming

来源：评论

学校读者我要写书评

暂无评论

On the Arithmetic Intensity of Distributed-Memory Dense Matrix Multiplication Involving a Symmetric Input Matrix (SYMM) 37

On the Arithmetic Intensity of Distributed-Memory Dense Matr...

引用

37th IEEE International Parallel and Distributed Processing Symposium (IPDPS)

作者： Agullo, Emmanuel Buttari, Alfredo Coulaud, Olivier Eyraud-Dubois, Lionel Faverge, Mathieu Franc, Alain Guermouche, Abdou Jego, Antoine Peressoni, Romain Pruvost, Florent INRIA Le Chesnay France Univ Bordeaux Bordeaux France Bordeaux INP Bordeaux France LaBRI Saanichton BC Canada CNRS Toulouse France INPT Toulouse France IRIT Sunnyvale CA USA

ISBN: (纸本)9798350337662

Dense matrix multiplication involving a symmetric input matrix (SYMM) is implemented in reference distributed-memory codes with the same data distribution as its general analogue (GEMM). We show that, when the symmetric matrix is dominant, such a 2D block-cyclic (2D BC) scheme leads to a lower arithmetic intensity (AI) of SYMM than that of GEMM by a factor of 2. We propose alternative data distributions preserving the memory benefit of SYMM of storing only half of the matrix while achieving up to the same AI as GEMM. We also show that, in the case we can afford the same memory footprint as GEMM, SYMM can achieve a higher AI. We propose a task-based design of SYMM independent of the data distribution. This design allows for scalable A-stationary SYMM with which all discussed data distributions, may they be very irregular, can be easily assessed. We have integrated the resulting code in a reduction dimension algorithm involving a randomized singular value decomposition dominated by SYMM. An experimental study shows a compelling impact on performance.

关键词： Symmetric matrix multiplication SYMM GEMM 2DBC task-based programming SBC TBC 3D 2.5D

来源：评论

学校读者我要写书评

暂无评论

From task-based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU tasks into Portable GPU Kernels 5

From Task-Based GPU Work Aggregation to Stellar Mergers: Tur...

引用

5th Annual IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)

作者： Daiss, Gregor Diehl, Patrick Marcello, Dominic Kheirkhahan, Alireza Kaiser, Hartmut Pflueger, Dirk Louisiana State Univ LSU Ctr Computat & Technol Baton Rouge LA 70803 USA Univ Stuttgart IPVS Stuttgart Germany Louisiana State Univ Dept Phys & Astron Baton Rouge LA USA

ISBN: (纸本)9781665460217

Meeting both scalability and performance portability requirements is a challenge for any HPC application, especially for adaptively refined ones. In Octo-Tiger, an astrophysics application for the simulation of stellar mergers, we approach this with existing solutions: We employ HPX to obtain fine-grained tasks to easily distribute work and finely overlap communication and computation. For the computations themselves, we use Kokkos to turn these tasks into compute kernels capable of running on hardware ranging from a few CPU cores to powerful accelerators. There is a missing link, however: while the fine-grained parallelism exposed by HPX is useful for scalability, it can hinder GPU performance when the tasks become too small to saturate the device, causing low resource utilization. To bridge this gap, we investigate multiple different GPU work aggregation strategies within Octo-Tiger, adding one new strategy, and evaluate the node-level performance impact on recent AMD and NVIDIA GPUs, achieving noticeable speedups.

关键词： HPX HIP CUDA Kokkos Work Aggregation Performance Portability task-based programming

来源：评论

学校读者我要写书评

暂无评论

Dynamic task Fusion for a Block-Structured Finite Volume Solver over a Dynamically Adaptive Mesh with Local Time Stepping 37th

Dynamic Task Fusion for a Block-Structured Finite Volume Sol...

引用

37th International Supercomputing Conference on High Performance Computing (ISC High Performance Computing)

作者： Li, Baojiu Schulz, Holger Weinzierl, Tobias Zhang, Han Univ Durham Inst Computat Cosmol Durham DH1 3FE England Univ Durham Dept Comp Sci Durham DH1 3FE England Univ Durham Large Scale Comp Inst Data Sci Durham DH1 3FE England

ISBN: (纸本)9783031073120;9783031073113

Load balancing of generic wave equation solvers over dynamically adaptive meshes with local time stepping is difficult, as the load changes with every time step. task-based programming promises to mitigate the load balancing problem. We study a Finite Volume code over dynamically adaptive block-structured meshes for two astrophysics simulations, where the patches (blocks) define tasks. They are classified into urgent and low priority tasks. Urgent tasks are algorithmically latency-sensitive. They are processed directly as part of our bulk-synchronous mesh traversals. Non-urgent tasks are held back in an additional task queue on top of the task runtime system. If they lack global side-effects, i.e. do not alter the global solver state, we can generate optimised compute kernels for these tasks. Furthermore, we propose to use the additional queue to merge tasks without side-effects into task assemblies, and to balance out imbalanced bulk synchronous processing phases.

关键词： task-based programming Block-structured dynamic adaptive mesh refinement Local time stepping Wave equation solvers

来源：评论

学校读者我要写书评

暂无评论

RosneT: A Block Tensor Algebra Library for Out-of-Core Quantum Computing Simulation 2

RosneT: A Block Tensor Algebra Library for Out-of-Core Quant...

引用

2nd International Workshop on Quantum Computing Software (QCS)

作者： Sanchez-Ramirez, Sergio Conejero, Javier Lordan, Francesc Queralt, Anna Cortes, Toni Badia, Rosa M. Garcia-Saez, Artur Barcelona Supercomp Ctr QUANTIC Barcelona Spain Barcelona Supercomp Ctr Workflows & Distributed Comp Barcelona Spain Univ Politecn Cataluna Barcelona Spain

ISBN: (纸本)9781728186740

With the advent of more powerful Quantum Computers, the need for larger Quantum Simulations has boosted. As the amount of resources grows exponentially with size of the target system Tensor Networks emerge as an optimal framework with which we represent Quantum States in tensor factorizations. As the extent of a tensor network increases, so does the size of intermediate tensors requiring HPC tools for their manipulation. Simulations of medium-sized circuits cannot fit on local memory, and solutions for distributed contraction of tensors are scarce. In this work we present RosneT, a library for distributed, out-of-core block tensor algebra. We use the PyCOMPSs programming model to transform tensor operations into a collection of tasks handled by the COMPSs runtime, targeting executions in existing and upcoming Exascale supercomputers. We report results validating our approach showing good scalability in simulations of Quantum circuits of up to 53 qubits.

关键词： tensor network quantum computing simulation out-of-core task-based programming COMPSs distributed computing HPC

来源：评论

学校读者我要写书评

暂无评论

DEISA: Dask-Enabled In Situ Analytics 28

DEISA: Dask-Enabled In Situ Analytics

引用

28th Annual IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)

作者： Gueroudji, Amal Bigot, Julien Raffin, Bruno Univ Paris Saclay UVSQ CNRS CEAMaison Simulat F-91191 Gif Sur Yvette France Univ Grenoble Alpes Inria CNRS Grenoble INPLIG F-38000 Grenoble France

ISBN: (纸本)9781665410168

A widening performance gap is separating CPU performance and IO bandwidth on large scale systems. In some fields such as weather forecast and nuclear fusion, numerical models generate such amounts of data that classical post hoc processing is not feasible anymore due to the limits in both storage capacity and IO performance. In situ approaches are attractive to bypass disk accesses in these cases and fully leverage the UPC platform. They are however often complex to set up and can require to re-develop parallel versions of the analysis from scratch. In this paper we propose a hybrid model that is well suited for in situ workflows that combine regular simulations and irregular analytics. Our model couples the bulk synchronous parallel paradigm for simulation with a distributed task-based one for analysis. This reduces complexity and leverages the best of each of these two powerful paradigms. We validate the model with a prototype, called DEISA, that supports coupling MPI parallel codes with analyses written using Dask. This implementation requires minimal modifications of both the simulation and analysis codes compared to their post hoc counterpart. It give access to an already existing rich ecosystem to be used in situ such as the parallel versions of Numpy, Pandas and scikit-learn. Experiments in configurations up to 1024 cores show that DEISA can improve the simulation wallclock time (excluding analysis) by a factor up to 3 and the total experiment (including analysis) *** cost by a factor of up to 5 compared to parallel post hoc with plain Dask while requiring the modification of only two lines of python code, three of YAML, and none at all in a C simulation code already instrumented with PDI Data Interface.

关键词： In situ processing code coupling task-based programming MPI Dask

来源：评论

学校读者我要写书评

暂无评论

Beyond Fork-Join: Integration of Performance Portable Kokkos Kernels with HPX

Beyond Fork-Join: Integration of Performance Portable Kokkos...

引用

35th IEEE International Parallel and Distributed Processing Symposium (IPDPS)

作者： Daiss, Gregor Simberg, Mikael Reverdell, Auriane Biddiscombe, John Pollinger, Theresa Kaiser, Hartmut Pfluger, Dirk Univ Stuttgart Inst Parallel & Distributed Syst Sci Comp Stuttgart Germany Swiss Natl Supercomp Ctr Porza Switzerland Louisiana State Univ CCT Baton Rouge LA 70803 USA

ISBN: (纸本)9781665435772

Between a widening range of GPU vendors and the trend of having more GPUs per compute node in supercomputers such as Summit, Perlmutter, Frontier and Aurora, developing performant yet portable distributed HPC applications becomes ever more challenging. Leveraging existing solutions like Kokkos for platform-independent code and HPX for distributing the application in a task-based fashion can alleviate these challenges. However, using such frameworks in the same application requires them to work together seamlessly. In this work we present an HPX Kokkos integration that works both ways: we can integrate CPU and GPU Kokkos kernels as HPX tasks and inversely use HPX worker threads to work on Kokkos kernels. Using HPX futures makes launching and synchronizing Kokkos kernels from multiple threads easy, allowing us to move away from the more traditional fork-join model. To evaluate our integrations we ported existing Vc and CUDA kernels within an existing HPX application, Octo-Tiger, to use Kokkos instead. We achieve comparable, or better, performance than with previous Vc and CUDA kernels, showing both the viability of our HPX Kokkos integration, as well as future-proofing Octo-Tiger for a wider range of potential machines. Furthermore, we introduce event polling for synchronizing CUDA kernels (or Kokkos kernels on the respective backend) achieving speedups over the previous solution using callbacks.

关键词： Performance Portability task-based programming Kokkos HPX SIMD GPU CUDA

来源：评论

学校读者我要写书评

暂无评论

Parallel and Distributed task-based Kirchhoff Seismic Pre-Stack Depth Migration Application 20

Parallel and Distributed Task-Based Kirchhoff Seismic Pre-St...

引用

20th International Symposium on Parallel and Distributed Computing (ISPDC)

作者： Gurhem, Jerome Calandra, Henri Petiton, Serge G. Univ Lille UMR 9189 CRIStAL CNRS Lille France CNRS USR 3441 Maison Simulat Saclay France Total SA Pau France

ISBN: (纸本)9781665432818

Since the middle of the 1990s, message passing libraries are the most used technology to implement parallel and distributed scientific applications. However, they may not be a solution efficient enough on exascale machines since scalability issues will appear due to the increase in computing resources. task-based programming models can be used to avoid collective communications like reductions, broadcast, or gather by transforming them into multiple operations on tasks. Then, these operations can be scheduled by the programming scheduler to place the data and computations in a way that optimizes and reduces the data communications. These properties could help to solve some MPI and exascale computing challenges. The oil and gas applications could also benefit from task-based programming properties. We developed a simplified version of the Kirchhoff seismic pre-stack depth migration, a subsurface exploration application, to experiment with HPX, a task-based programming model as well and MPI and MPI+OpenMP. Then, we perform strong scaling and weak scaling experiments on Pangea, Total supercomputer. We also study the variation of the number of OpenMP threads per MPI process. We show that the current task-based programming model schedulers lack the capability to completely manage the memory used and are not efficient enough to reduce the data migrations.

关键词： Kirchhoff Seismic Pre-Stack Depth Migration task-based programming Parallel and Distributed Application

来源：评论

学校读者我要写书评

暂无评论

Optimizing Distributed Load Balancing for Workloads with Time-Varying Imbalance

Optimizing Distributed Load Balancing for Workloads with Tim...

引用

IEEE International Conference on Cluster Computing (Cluster)

作者： Lifflander, Jonathan Slattengren, Nicole Lemaster Pebay, Philippe P. Miller, Phil Rizzi, Francesco Bettencourt, Matthew T. Sandia Natl Labs Livermore CA 94550 USA NexGen Analyt Sheridan WY USA Intense Comp New York NY USA

ISBN: (纸本)9781728196664

This paper explores dynamic load balancing algorithms used by asynchronous many-task (AMT), or 'task-based', programming models to optimize task placement for scientific applications with dynamic workload imbalances. AMT programming models use overdecomposition of the computational domain. Overdecompostion provides a natural mechanism for domain developers to expose concurrency and break their computational domain into pieces that can be remapped to different hardware. This paper explores fully distributed load balancing strategies that have shown great promise for exascale-level computing but are challenging to theoretically reason about and implement effectively. We present a novel theoretical analysis of a gossip-based load balancing protocol and use it to build an efficient implementation with fast convergence rates and high load balancing quality. We demonstrate our algorithm in a next-generation plasma physics application (EMPIRE) that induces time-varying workload imbalance due to spatial non-uniformity in particle density across the domain. Our highly scalable, novel load balancing algorithm, achieves over a 3x speedup (particle work) compared to a bulk-synchronous MPI implementation without load balancing.

关键词： dynamic load balancing overdecomposition exascale computing asynchronous many-task (AMT) task-based programming distributed algorithms

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共4页 << < 1 2 3 4 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：