检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

336 篇 会议
46 篇 期刊文献

馆藏范围

382 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

329 篇 工学
- 284 篇 软件工程
- 268 篇 计算机科学与技术...
- 12 篇 电子科学与技术（可...
- 7 篇 信息与通信工程
- 4 篇 机械工程
- 4 篇 控制科学与工程
- 4 篇 生物工程
- 3 篇 生物医学工程（可授...
- 1 篇 力学（可授工学、理...
- 1 篇 动力工程及工程热...
- 1 篇 电气工程
- 1 篇 建筑学
- 1 篇 土木工程
- 1 篇 化学工程与技术
- 1 篇 核科学与技术
- 1 篇 农业工程
- 1 篇 环境科学与工程（可...
58 篇 理学
- 52 篇 数学
- 5 篇 系统科学
- 4 篇 生物学
- 4 篇 统计学（可授理学、...
- 3 篇 化学
15 篇 管理学
- 10 篇 管理科学与工程(可...
- 8 篇 工商管理
- 5 篇 图书情报与档案管...
3 篇 经济学
- 3 篇 应用经济学
2 篇 法学
- 2 篇 社会学
2 篇 教育学
- 2 篇 教育学
1 篇 农学
- 1 篇 作物学

主题

71 篇 performance
49 篇 parallel process...
42 篇 algorithms
42 篇 parallel program...
39 篇 languages
34 篇 design
21 篇 gpu
20 篇 parallel algorit...
12 篇 experimentation
12 篇 measurement
9 篇 theory
9 篇 parallel computi...
8 篇 mpi
8 篇 parallel
7 篇 parallelism
7 篇 graphics process...
7 篇 logic programmin...
7 篇 concurrency
6 篇 openmp
5 篇 reliability

机构

7 篇 carnegie mellon ...
5 篇 indiana univ blo...
4 篇 univ wisconsin d...
3 篇 univ of tokyo
3 篇 univ chinese aca...
3 篇 massachusetts in...
3 篇 univ illinois ur...
3 篇 swiss fed inst t...
3 篇 mit csail united...
3 篇 shanghai jiao to...
3 篇 tsinghua univ pe...
3 篇 univ utah sch co...
3 篇 rice univ housto...
3 篇 purdue univ w la...
3 篇 univ calif berke...
2 篇 ist austria klos...
2 篇 princeton univ d...
2 篇 georgetown univ ...
2 篇 yale university ...
2 篇 coll william & m...

作者

8 篇 blelloch guy e.
6 篇 hoefler torsten
6 篇 garland michael
6 篇 chen haibo
6 篇 shun julian
5 篇 sun yihan
5 篇 zhai jidong
5 篇 tsigas philippas
5 篇 kennedy ken
4 篇 dhulipala laxman
4 篇 miller barton p.
4 篇 tan guangming
4 篇 wang haojie
4 篇 nikolopoulos dim...
4 篇 long guoping
4 篇 valero mateo
4 篇 mellor-crummey j...
4 篇 agrawal kunal
4 篇 gu yan
4 篇 leiserson charle...

语言

356 篇 英文
26 篇 其他

检索条件"任意字段=14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming"

共 382 条记录，以下是91-100 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Provably and Practically Efficient Granularity Control 19

Provably and Practically Efficient Granularity Control

引用

24th acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Acar, Umut A. Aksenov, Vitaly Chargueraud, Arthur Rainey, Mike Carnegie Mellon Univ Pittsburgh PA 15213 USA INRIA Paris France ITMO Univ St Petersburg Russia Univ Strasbourg ICube CNRS Strasbourg France Indiana Univ Bloomington IN 47405 USA

ISBN: (纸本)9781450362252

Over the past decade, many programming languages and systems for parallel-computing have been developed, e.g., Fork/Join and Habanero Java, parallel Haskell, parallel ML, and X10. Although these systems raise the level of abstraction for writing parallel codes, performance continues to require labor-intensive optimizations for coarsening the granularity of parallel executions. In this paper, we present provably and practically efficient techniques for controlling granularity within the run-time system of the language. Our starting point is "oracle-guided scheduling", a result from the functional-programming community that shows that granularity can be controlled by an "oracle" that can predict the execution time of parallel codes. We give an algorithm for implementing such an oracle and prove that it has the desired theoretical properties under the nested-parallel programming model. We implement the oracle in C++ by extending Cilk and evaluate its practical performance. the results show that our techniques can essentially eliminate hand tuning while closely matching the performance of hand tuned codes.

关键词： parallel programming languages granularity control

来源：评论

学校读者我要写书评

暂无评论

Incremental Flattening for Nested Data parallelism 19

Incremental Flattening for Nested Data Parallelism

引用

24th acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Henriksen, Troels thoroe, Frederik Elsman, Martin Oancea, Cosmin Univ Copenhagen Copenhagen Denmark

ISBN: (纸本)9781450362252

Compilation techniques for nested-parallel applications that can adapt to hardware and dataset characteristics are vital for unlocking the power of modern hardware. this paper proposes such a technique, which builds on flattening and is applied in the context of a functional data-parallel language. Our solution uses the degree of utilized parallelism as the driver for generating a multitude of code versions, which together cover all possible mappings of the application's regular nested parallelism to the levels of parallelism supported by the hardware. these code versions are then combined into one program by guarding them with predicates, whose threshold values are automatically tuned to hardware and dataset characteristics. Our unsupervised method-of statically clustering datasets to code versions-is different from autotuning work that typically searches for the combination of code transformations producing a single version, best suited for a specific dataset or on average for all datasets. We demonstrate-by fully integrating our technique in the repertoire of a compiler for the Futhark programming language-significant performance gains on two GPUs for three real-world applications, from the financial domain, and for six Rodinia benchmarks.

关键词： functional language parallel compilers GPGPU

来源：评论

学校读者我要写书评

暂无评论

Lock-free channels for programming via communicating sequential processes 19

Lock-free channels for programming via communicating sequent...

引用

24th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2019

作者： Koval, Nikita Alistarh, Dan Elizarov, Roman IST Austria Austria JetBrains Austria

ISBN: (纸本)9781450362252

Traditional concurrent programming involves manipulating shared mutable state. Alternatives to this programming style are communicating sequential processes (CSP) [1] and actor [2] models, which share data via explicit communication. Rendezvous channel is the common abstraction for communication between several processes, where senders and receivers perform a rendezvous handshake as a part of their protocol (senders wait for receivers and vice versa). Additionally to this, channels support the select expression. In this work, we present the first eficient lock-free channel algorithm, and compare it against Go [3] and Kotlin [4] baseline implementations. © 2019 Copyright held by the owner/author(s).

关键词： Locks (fasteners)

来源：评论

学校读者我要写书评

暂无评论

Implementing parallel and Concurrent Tree Structures 19

Implementing Parallel and Concurrent Tree Structures

引用

24th acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Sun, Yihan Blelloch, Guy Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9781450362252

As one of the most important data structures used in algorithm design and programming, balanced search trees are widely used in real-world applications for organizing data. Answering the challenges thrown up by modern largevolume and ever-changing data, it is important to consider parallelism, concurrency, and persistence. this tutorial will introduce techniques for supporting functionalities on trees, including various parallel algorithms, concurrency, multiversioning, etc. In particular, this tutorial will focus on an algorithmic framework for parallel balanced binary trees, which works for multiple balancing schemes, including AVL trees, red-black trees, weight-based trees, and treaps. this framework allows for theoretically-efficient algorithms. the corresponding implementation is available as a library, which demonstrates good performance both sequentially and in parallel in various use scenarios. this tutorial will focus on the following topics: 1) the algorithms and techniques used in the PAM library;2) the interface of the library and a hands-on introduction to the download/installation of the library;3) examples of applying the library to various applications and 4) introduction about other useful techniques for parallel tree structures and performance comparisons with PAM.

关键词： balanced tree augmented map parallel concurrent library PAM ordered set ordered map

来源：评论

学校读者我要写书评

暂无评论

throughput-Oriented GPU Memory Allocation 19

Throughput-Oriented GPU Memory Allocation

引用

24th acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Gelado, Isaac Garland, Michael NVIDIA Santa Clara CA 95051 USA

ISBN: (纸本)9781450362252

throughput-oriented architectures, such as GPUs, can sustain three orders of magnitude more concurrent threads than multicore architectures. this level of concurrency pushes typical synchronization primitives (e.g., mutexes) over their scalability limits, creating significant performance bottlenecks in modules, such as memory allocators, that use them. In this paper, we develop concurrent programming techniques and synchronization primitives, in support of a dynamic memory allocator, that are efficient for use with very high levels of concurrency. We formulate resource allocation as a two-stage process, that decouples accounting for the number of available resources from the tracking of the available resources themselves. To facilitate the accounting stage, we introduce a novel bulk semaphore abstraction that extends traditional semaphore semantics by optimizing for the case where threads operate on the semaphore simultaneously. We also similarly design new collective synchronization primitives that enable groups of cooperating threads to enter critical sections together. Finally, we show that delegation of deferred reclamation to threads already blocked greatly improves efficiency. Using all these techniques, our throughput-oriented memory allocator delivers both high allocation rates and low memory fragmentation on modern GPUs. Our experiments demonstrate that it achieves allocation rates that are on average 16.56 times higher than the counterpart implementation in the CUDA 9 toolkit.

关键词： Concurrency Memory Allocation GPU programming

来源：评论

学校读者我要写书评

暂无评论

VEBO: A vertex- and edge-balanced ordering heuristic to load balance parallel graph processing 19

VEBO: A vertex- and edge-balanced ordering heuristic to load...

引用

24th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2019

作者： Sun, Jiawen Vandierendonck, Hans Nikolopoulos, Dimitrios S. Queen's University of Belfast United Kingdom

ISBN: (纸本)9781450362252

this work proposes Vertex- and Edge-Balanced Ordering (VEBO): balance the number of edges and the number of unique destinations of those edges. VEBO balances edges and vertices for graphs with a power-law degree distribution, and ensures an equal degree distribution between partitions. Experimental evaluation on three shared-memory graph processing systems (Ligra, Polymer and GraphGrind) shows that VEBO achieves excellent load balance and improves performance by 1.09× over Ligra, 1.41× over Polymer and 1.65× over GraphGrind, compared to their respective partitioning algorithms, averaged across 8 algorithms and 7 graphs. VEBO improves GraphGrind performance with a speedup of 2.9× over Ligra on average. © 2019 Copyright held by the owner/author(s).

关键词： Graph theory

来源：评论

学校读者我要写书评

暂无评论

GOPipe: A granularity-oblivious programming framework for pipelined stencil executions on GPU 19

GOPipe: A granularity-oblivious programming framework for pi...

引用

24th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2019

作者： Oh, Chanyoung Zheng, Zhen Shen, Xipeng Zhai, Jidong Yi, Youngmin University of Seoul Korea Republic of Tsinghua University China North Carolina State University United States

ISBN: (纸本)9781450362252

Recent studies have shown promising performance benefits of pipelined stencil applications. An important factor for the computing eficiency of such pipelines is the granularity of a task. We presents GOPipe, the first granularity-oblivious programming framework for eficient pipelined stencil executions. With GOPipe, programmers no longer need to specify the appropriate task granularity. GOPipe automatically finds it, and schedules tasks of that granularity while observing all inter-task and inter-stage data dependencies. In our experiments on four real-life applications, GOPipe outperforms the state-of-the-art by up to 4.57× with a much better programming productivity. © 2019 Copyright held by the owner/author(s).

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

SEP-Graph: Finding Shortest Execution Paths for Graph Processing under a Hybrid Framework on GPU 19

SEP-Graph: Finding Shortest Execution Paths for Graph Proces...

引用

24th acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Wang, Hao Geng, Liang Lee, Rubao Hou, Kaixi Zhang, Yanfeng Zhang, Xiaodong Ohio State Univ Dept Comp Sci & Engn Columbus OH 43210 USA Northeastern Univ Dept Comp Sci & Engn Shenyang Peoples R China United Parallel Comp Corp Atlanta DE USA Virginia Tech Dept Comp Sci Blacksburg VA USA

ISBN: (纸本)9781450362252

In general, the performance of parallel graph processing is determined by three pairs of critical parameters, namely synchronous or asynchronous execution mode (Sync or Async), Push or Pull communication mechanism (Push or Pull), and Data-driven or Topology-driven traversing scheme (DD or TD), which increases the complexity and sophistication of programming and system implementation of GPU. Existing graph-processing frameworks mainly use a single combination in the entire execution for a given application, but we have observed their variable and suboptimal performance. In this paper, we present SEP-Graph, a highly efficient software framework for graph-processing on GPU. the hybrid execution mode is automatically switched among three pairs of parameters, with an objective to achieve the shortest execution time in each iteration. We also apply a set of optimizations to SEP-Graph, considering the characteristics of graph algorithms and underlying GPU architectures. We show the effectiveness of SEP-Graph based on our intensive and comparative performance evaluation on NVIDIA 1080, P100, and V100 GPUs. Compared with existing and representative GPU graph-processing framework Groute and Gunrock, SEP-Graph can reduce execution time up to 45.8 times and 39.4 times.

关键词： Graph Algorithms GPU Hybrid

来源：评论

学校读者我要写书评

暂无评论

Efficient Race Detection with Futures 19

Efficient Race Detection with Futures

引用

24th acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Utterback, Robert Agrawal, Kunal Fineman, Jeremy Lee, I-Ting Angelina Monmouth Coll Monmouth IL 61462 USA Washington Univ St Louis MO 14263 USA Georgetown Univ Washington DC 20057 USA

ISBN: (纸本)9781450362252

this paper addresses the problem of provably efficient and practically good on-the-fly determinacy race detection in task parallel programs that use futures. Prior works on determinacy race detection have mostly focused on either task parallel programs that follow a series-parallel dependence structure or ones with unrestricted use of futures that generate arbitrary dependences. In this work, we consider a restricted use of futures and show that we can detect races more efficiently than with general use of futures. Specifically, we present two algorithms: MultiBags and MultiBags+. MultiBags targets programs that use futures in a restricted fashion and runs in time O(T-1 alpha(m, n)), where T-1 is the sequential running time of the program, a is the inverse Ackermann's function, m is the total number of memory accesses, n is the dynamic count of places at which parallelism is created. Since a is a very slowly growing function (upper bounded by 4 for all practical purposes), it can be treated as a close-to-constant overhead. MultiBags+ is an extension of MultiBags that target programs with general use of futures. It runs in time O((T-1 + k(2))alpha(m, n)) where T-1, alpha, m and n are defined as before, and k is the number of future operations in the computation. We implemented both algorithms and empirically demonstrate their efficiency.

关键词： dynamic program analysis determinacy race race detection series-parallel maintenance

来源：评论

学校读者我要写书评

暂无评论

Managing application parallelism via parallel efficiency regulation 19

Managing application parallelism via parallel efficiency reg...

引用

24th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2019

作者： Srikanthan, Sharanyan Ferro, Princeton Chakraborti, Sayak Dwarkadas, Sandhya University of Rochester United States

ISBN: (纸本)9781450362252

Modern multiprocessor systems contain a wealth of compute, memory, and communication network resources, such that multiple applications can often successfully execute on and compete for these resources. Unfortunately, good performance for individual applications in addition to achieving overall system eficiency proves a dificult task, especially for applications with low parallel eficiency (speedup per utilized computational core). Limitations to parallel eficiency arise out of factors such as algorithm design, excess synchronization, limitations in hardware resources, and sub-optimal task placement on CPUs. In this work, we introduce MAPPER, a Manager of Application parallelism via parallel Efficiency Regulation. MAPPER monitors and coordinates all participating applications by making two coupled decisions: how much parallelism to afford to each application, and which specific CPU cores to schedule applications on. While MAPPER can work for generic applications without modifying their parallel runtimes, we introduce a simple interface that can be used by parallel runtime systems for a tighter integration, resulting in better task granularity control. Using MAPPER can result in up to 3.3X speedup, with an average performance improvement of 20%. © 2019 Copyright held by the owner/author(s).

关键词： Program processors

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共39页 << < 6 7 8 9 10 11 12 13 14 15 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：