检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

29 篇 会议
7 篇 期刊文献
1 篇 学位论文

馆藏范围

37 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

34 篇 工学
- 33 篇 计算机科学与技术...
- 14 篇 软件工程
- 9 篇 电气工程
- 2 篇 机械工程
- 2 篇 信息与通信工程
- 1 篇 控制科学与工程
- 1 篇 网络空间安全
5 篇 理学
- 4 篇 数学
- 1 篇 物理学
3 篇 管理学
- 2 篇 管理科学与工程(可...
- 1 篇 图书情报与档案管...

主题

37 篇 task-based progr...
4 篇 hpx
4 篇 multicore
4 篇 hpc
4 篇 openmp
3 篇 plasma
3 篇 xeon phi
3 篇 eigensolver
3 篇 scheduling
2 篇 parallelization
2 篇 runtime system
2 篇 dataflow
2 篇 cuda
2 篇 coordination lan...
2 篇 performance port...
2 篇 tile algorithms
2 篇 mapping
2 篇 exascale computi...
2 篇 parsec
2 篇 fpga

机构

2 篇 univ politecn ca...
2 篇 univ bayreuth de...
2 篇 tech univ chemni...
2 篇 oak ridge natl l...
2 篇 univ tennessee d...
1 篇 erasmus mc dept ...
1 篇 univ leeds inst ...
1 篇 univ durham inst...
1 篇 louisiana state ...
1 篇 univ neuchatel i...
1 篇 univ durham larg...
1 篇 inria le chesnay
1 篇 university of te...
1 篇 barcelona superc...
1 篇 inpt toulouse
1 篇 louisiana state ...
1 篇 slac natl accele...
1 篇 sandia natl labs...
1 篇 technical univer...
1 篇 univ bordeaux bo...

作者

3 篇 kaiser hartmut
3 篇 kurzak jakub
3 篇 dongarra jack
3 篇 haidar azzam
3 篇 rauber thomas
3 篇 ruenger gudula
3 篇 thibault samuel
3 篇 bosilca george
2 篇 schuchart joseph
2 篇 kalkhof torben
2 篇 calandra henri
2 篇 koch andreas
2 篇 guermouche abdou
2 篇 yarkhan asim
2 篇 luszczek piotr
2 篇 agullo emmanuel
2 篇 faverge mathieu
2 篇 herault thomas
2 篇 diehl patrick
2 篇 weinzierl tobias

语言

36 篇 英文
1 篇 其他

检索条件"主题词=Task-based programming"

共 37 条记录，以下是11-20 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

DEISA: Dask-Enabled In Situ Analytics 28

DEISA: Dask-Enabled In Situ Analytics

引用

28th Annual IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)

作者： Gueroudji, Amal Bigot, Julien Raffin, Bruno Univ Paris Saclay UVSQ CNRS CEAMaison Simulat F-91191 Gif Sur Yvette France Univ Grenoble Alpes Inria CNRS Grenoble INPLIG F-38000 Grenoble France

ISBN: (纸本)9781665410168

A widening performance gap is separating CPU performance and IO bandwidth on large scale systems. In some fields such as weather forecast and nuclear fusion, numerical models generate such amounts of data that classical post hoc processing is not feasible anymore due to the limits in both storage capacity and IO performance. In situ approaches are attractive to bypass disk accesses in these cases and fully leverage the UPC platform. They are however often complex to set up and can require to re-develop parallel versions of the analysis from scratch. In this paper we propose a hybrid model that is well suited for in situ workflows that combine regular simulations and irregular analytics. Our model couples the bulk synchronous parallel paradigm for simulation with a distributed task-based one for analysis. This reduces complexity and leverages the best of each of these two powerful paradigms. We validate the model with a prototype, called DEISA, that supports coupling MPI parallel codes with analyses written using Dask. This implementation requires minimal modifications of both the simulation and analysis codes compared to their post hoc counterpart. It give access to an already existing rich ecosystem to be used in situ such as the parallel versions of Numpy, Pandas and scikit-learn. Experiments in configurations up to 1024 cores show that DEISA can improve the simulation wallclock time (excluding analysis) by a factor up to 3 and the total experiment (including analysis) *** cost by a factor of up to 5 compared to parallel post hoc with plain Dask while requiring the modification of only two lines of python code, three of YAML, and none at all in a C simulation code already instrumented with PDI Data Interface.

关键词： In situ processing code coupling task-based programming MPI Dask

来源：评论

学校读者我要写书评

暂无评论

Combining Asynchronous task Parallelism and Intel SGX for Secure Deep Learning 19

Combining Asynchronous Task Parallelism and Intel SGX for Se...

引用

19th European Dependable Computing Conference (EDCC)

作者： Rocha, Isabelly Felber, Pascal Martorel, Xavier Pasin, Marcelo Schiavoni, Valerio Unsal, Osman Univ Neuchatel Inst Comp Sci IIUN Neuchatel Switzerland Barcelona Supercomputing Ctr Barcelona Spain Univ Politecn Cataluna Barcelona Spain

ISBN: (纸本)9798350360691;9798350360684

A common way of improving performance of applications for multi-core processors is to exploit parallelism. In deep learning (DL), training or tuning parameters use user's sensitive data, and thus preserving privacy is critical. Hardware-assisted protection mechanisms (i.e., trusted execution environments - TEEs) offer a practical privacy-preserving solution, nowadays available both in private and public data centers. We present SGX- OMPSS, a new approach combining a task-based programming model (i.e., OmpSs) with TEEs (i.e., Intel Software Guard Extensions). SGX- OMPSS supports asynchronous task parallelism and hardware heterogeneity by using the data dependencies between tasks of the application, easily specified by code annotations. We evaluate SGX- OMPSS via several microbenchmarks and state-of-the-art DL applications and datasets (e.g., YOLO and MNIST). SGX-OMPSS achieves 94% gain speedup while offering additional security guarantees.

关键词： deep learning intel sgx mnist OmpSs task parallelism task-based programming yolo

来源：评论

学校读者我要写书评

暂无评论

From task-based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU tasks into Portable GPU Kernels 5

From Task-Based GPU Work Aggregation to Stellar Mergers: Tur...

引用

5th Annual IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)

作者： Daiss, Gregor Diehl, Patrick Marcello, Dominic Kheirkhahan, Alireza Kaiser, Hartmut Pflueger, Dirk Louisiana State Univ LSU Ctr Computat & Technol Baton Rouge LA 70803 USA Univ Stuttgart IPVS Stuttgart Germany Louisiana State Univ Dept Phys & Astron Baton Rouge LA USA

ISBN: (纸本)9781665460217

Meeting both scalability and performance portability requirements is a challenge for any HPC application, especially for adaptively refined ones. In Octo-Tiger, an astrophysics application for the simulation of stellar mergers, we approach this with existing solutions: We employ HPX to obtain fine-grained tasks to easily distribute work and finely overlap communication and computation. For the computations themselves, we use Kokkos to turn these tasks into compute kernels capable of running on hardware ranging from a few CPU cores to powerful accelerators. There is a missing link, however: while the fine-grained parallelism exposed by HPX is useful for scalability, it can hinder GPU performance when the tasks become too small to saturate the device, causing low resource utilization. To bridge this gap, we investigate multiple different GPU work aggregation strategies within Octo-Tiger, adding one new strategy, and evaluate the node-level performance impact on recent AMD and NVIDIA GPUs, achieving noticeable speedups.

关键词： HPX HIP CUDA Kokkos Work Aggregation Performance Portability task-based programming

来源：评论

学校读者我要写书评

暂无评论

Modeling the Energy Consumption for Concurrent Executions of Parallel tasks 14

Modeling the Energy Consumption for Concurrent Executions of...

引用

14th Communications and Networking Symposium (CNS 2011) / Spring Simulation Multiconference (SpringSim '11)

作者： Rauber, Thomas Ruenger, Gudula Univ Bayreuth Bayreuth Germany Tech Univ Chemnitz Chemnitz Germany

ISBN: (纸本)9781617828379

programming models using parallel tasks provide portable performance and scalability for modular applications on many high-performance systems. This is achieved by the flexibility of a two-level programming structure supporting mixed task and data parallelism. Due to the emerging importance of energy efficiency in high-performance computing, programming models with parallel tasks should be extended to be able to include energy concerns. based on a well-accepted analytical energy model for a processor's energy consumption, this article explores the energy consumption of parallel tasks with communication that are executed concurrently with other tasks. Simulations show the different energy consumption scenarios for different task cooperations and demonstrate the potential for a flexible energy usage on varying parallel platforms.

关键词： energy model task-based programming communication

来源：评论

学校读者我要写书评

暂无评论

Performance Analysis and Optimisation of Two-Sided Factorization Algorithms for Heterogeneous Platform

Performance Analysis and Optimisation of Two-Sided Factoriza...

引用

15th Annual International Conference on Computational Science (ICCS)

作者： Kabir, Khairul Haidar, Azzam Tomov, Stanimire Dongarra, Jack Univ Tennessee Knoxville TN USA Oak Ridge Natl Lab Oak Ridge TN USA Univ Manchester Manchester Lancs England

Many applications, ranging from big data analytics to nanostructure designs, require the solution of large dense singular value decomposition (SVD) or eigenvalue problems. A first step in the solution methodology for these problems is the reduction of the matrix at hand to condensed form by two-sided orthogonal transformations. This step is standardly used to significantly accelerate the solution process. We present a performance analysis of the main two-sided factorizations used in these reductions: the bidiagonalization, tridiagonalization, and the upper Hessenberg factorizations on heterogeneous systems of multicore CPUs and Xeon Phi coprocessors. We derive a performance model and use it to guide the analysis and to evaluate performance. We develop optimized implementations for these methods that get up to 80% of the optimal performance bounds. Finally, we describe the heterogeneous multicore and coprocessor development considerations and the techniques that enable us to achieve these high-performance results. The work here presents the first highly optimized implementation of these main factorizations for Xeon Phi coprocessors. Compared to the LAPACK versions optmized by Intel for Xeon Phi (in MKL), we achieve up to 50% speedup.

关键词： Eigensolver multicore Xeon Phi task-based programming

来源：评论

学校读者我要写书评

暂无评论

Beyond Fork-Join: Integration of Performance Portable Kokkos Kernels with HPX

Beyond Fork-Join: Integration of Performance Portable Kokkos...

引用

35th IEEE International Parallel and Distributed Processing Symposium (IPDPS)

作者： Daiss, Gregor Simberg, Mikael Reverdell, Auriane Biddiscombe, John Pollinger, Theresa Kaiser, Hartmut Pfluger, Dirk Univ Stuttgart Inst Parallel & Distributed Syst Sci Comp Stuttgart Germany Swiss Natl Supercomp Ctr Porza Switzerland Louisiana State Univ CCT Baton Rouge LA 70803 USA

ISBN: (纸本)9781665435772

Between a widening range of GPU vendors and the trend of having more GPUs per compute node in supercomputers such as Summit, Perlmutter, Frontier and Aurora, developing performant yet portable distributed HPC applications becomes ever more challenging. Leveraging existing solutions like Kokkos for platform-independent code and HPX for distributing the application in a task-based fashion can alleviate these challenges. However, using such frameworks in the same application requires them to work together seamlessly. In this work we present an HPX Kokkos integration that works both ways: we can integrate CPU and GPU Kokkos kernels as HPX tasks and inversely use HPX worker threads to work on Kokkos kernels. Using HPX futures makes launching and synchronizing Kokkos kernels from multiple threads easy, allowing us to move away from the more traditional fork-join model. To evaluate our integrations we ported existing Vc and CUDA kernels within an existing HPX application, Octo-Tiger, to use Kokkos instead. We achieve comparable, or better, performance than with previous Vc and CUDA kernels, showing both the viability of our HPX Kokkos integration, as well as future-proofing Octo-Tiger for a wider range of potential machines. Furthermore, we introduce event polling for synchronizing CUDA kernels (or Kokkos kernels on the respective backend) achieving speedups over the previous solution using callbacks.

关键词： Performance Portability task-based programming Kokkos HPX SIMD GPU CUDA

来源：评论

学校读者我要写书评

暂无评论

Divide and Conquer Symmetric Tridiagonal Eigensolver for Multicore Architectures 29

Divide and Conquer Symmetric Tridiagonal Eigensolver for Mul...

引用

29th IEEE International Parallel and Distributed Processing Symposium (IPDPS)

作者： Pichon, Gregoire Haidar, Azzam Faverge, Mathieu Kurzak, Jakub Inria Bordeaux Sud Ouest Bordeaux INP Talence France Univ Tennessee Innovat Comp Lab Knoxville TN USA

ISBN: (纸本)9781479986484

Computing eigenpairs of a symmetric matrix is a problem arising in many industrial applications, including quantum physics and finite-elements computation for automobiles. A classical approach is to reduce the matrix to tridiagonal form before computing eigenpairs of the tridiagonal matrix. Then, a back-transformation allows one to obtain the final solution. Parallelism issues of the reduction stage have already been tackled in different shared-memory libraries. In this article, we focus on solving the tridiagonal eigenproblem, and we describe a novel implementation of the Divide and Conquer algorithm. The algorithm is expressed as a sequential task-flow, scheduled in an out-of-order fashion by a dynamic runtime which allows the programmer to play with tasks granularity. The resulting implementation is between two and five times faster than the equivalent routine from the INTEL MKL library, and outperforms the best MRRR implementation for many matrices.

关键词： Eigensolver multicore task-based programming PLASMA LAPACK

来源：评论

学校读者我要写书评

暂无评论

On the Arithmetic Intensity of Distributed-Memory Dense Matrix Multiplication Involving a Symmetric Input Matrix (SYMM) 37

On the Arithmetic Intensity of Distributed-Memory Dense Matr...

引用

37th IEEE International Parallel and Distributed Processing Symposium (IPDPS)

作者： Agullo, Emmanuel Buttari, Alfredo Coulaud, Olivier Eyraud-Dubois, Lionel Faverge, Mathieu Franc, Alain Guermouche, Abdou Jego, Antoine Peressoni, Romain Pruvost, Florent INRIA Le Chesnay France Univ Bordeaux Bordeaux France Bordeaux INP Bordeaux France LaBRI Saanichton BC Canada CNRS Toulouse France INPT Toulouse France IRIT Sunnyvale CA USA

ISBN: (纸本)9798350337662

Dense matrix multiplication involving a symmetric input matrix (SYMM) is implemented in reference distributed-memory codes with the same data distribution as its general analogue (GEMM). We show that, when the symmetric matrix is dominant, such a 2D block-cyclic (2D BC) scheme leads to a lower arithmetic intensity (AI) of SYMM than that of GEMM by a factor of 2. We propose alternative data distributions preserving the memory benefit of SYMM of storing only half of the matrix while achieving up to the same AI as GEMM. We also show that, in the case we can afford the same memory footprint as GEMM, SYMM can achieve a higher AI. We propose a task-based design of SYMM independent of the data distribution. This design allows for scalable A-stationary SYMM with which all discussed data distributions, may they be very irregular, can be easily assessed. We have integrated the resulting code in a reduction dimension algorithm involving a randomized singular value decomposition dominated by SYMM. An experimental study shows a compelling impact on performance.

关键词： Symmetric matrix multiplication SYMM GEMM 2DBC task-based programming SBC TBC 3D 2.5D

来源：评论

学校读者我要写书评

暂无评论

Dynamic task Fusion for a Block-Structured Finite Volume Solver over a Dynamically Adaptive Mesh with Local Time Stepping 37th

Dynamic Task Fusion for a Block-Structured Finite Volume Sol...

引用

37th International Supercomputing Conference on High Performance Computing (ISC High Performance Computing)

作者： Li, Baojiu Schulz, Holger Weinzierl, Tobias Zhang, Han Univ Durham Inst Computat Cosmol Durham DH1 3FE England Univ Durham Dept Comp Sci Durham DH1 3FE England Univ Durham Large Scale Comp Inst Data Sci Durham DH1 3FE England

ISBN: (纸本)9783031073120;9783031073113

Load balancing of generic wave equation solvers over dynamically adaptive meshes with local time stepping is difficult, as the load changes with every time step. task-based programming promises to mitigate the load balancing problem. We study a Finite Volume code over dynamically adaptive block-structured meshes for two astrophysics simulations, where the patches (blocks) define tasks. They are classified into urgent and low priority tasks. Urgent tasks are algorithmically latency-sensitive. They are processed directly as part of our bulk-synchronous mesh traversals. Non-urgent tasks are held back in an additional task queue on top of the task runtime system. If they lack global side-effects, i.e. do not alter the global solver state, we can generate optimised compute kernels for these tasks. Furthermore, we propose to use the additional queue to merge tasks without side-effects into task assemblies, and to balance out imbalanced bulk synchronous processing phases.

关键词： task-based programming Block-structured dynamic adaptive mesh refinement Local time stepping Wave equation solvers

来源：评论

学校读者我要写书评

暂无评论

Speaking Pygion: Experiences Writing an Exascale Single Particle Imaging Code 1

引用

2nd International Workshop on Asynchronous Many-task Systems and Applications (WAMTA)

作者： Mirchandaney, Seema Aiken, Alex Slaughter, Elliott SLAC Natl Accelerator Lab Menlo Pk CA 94025 USA Stanford Univ Stanford CA 94305 USA

ISBN: (数字)9783031617638

ISBN: (纸本)9783031617621;9783031617638

The goal of the SpiniFEL project was to write, from scratch, a single particle imaging code for exascale supercomputers. The original vision was to have two versions of the code, one in MPI and one in Pygion, a Python-based interface to the Legion task-based runtime. We describe the motivation for the project, some of the programming challenges we encountered along the way, what worked and what didn't, and why only the Pygion code eventually succeeded in running at scale.

关键词： task-based programming exascale computing single particle imaging

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共4页 << < 1 2 3 4 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：