检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

336 篇 会议
49 篇 期刊文献

馆藏范围

385 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

335 篇 工学
- 290 篇 软件工程
- 274 篇 计算机科学与技术...
- 13 篇 电子科学与技术（可...
- 7 篇 信息与通信工程
- 4 篇 机械工程
- 4 篇 控制科学与工程
- 4 篇 生物工程
- 3 篇 电气工程
- 3 篇 生物医学工程（可授...
- 2 篇 力学（可授工学、理...
- 2 篇 动力工程及工程热...
- 1 篇 建筑学
- 1 篇 土木工程
- 1 篇 化学工程与技术
- 1 篇 核科学与技术
- 1 篇 农业工程
- 1 篇 环境科学与工程（可...
63 篇 理学
- 58 篇 数学
- 4 篇 生物学
- 4 篇 系统科学
- 4 篇 统计学（可授理学、...
- 3 篇 化学
- 2 篇 物理学
17 篇 管理学
- 11 篇 管理科学与工程(可...
- 9 篇 工商管理
- 6 篇 图书情报与档案管...
3 篇 经济学
- 3 篇 应用经济学
3 篇 法学
- 3 篇 社会学
1 篇 教育学
- 1 篇 教育学
1 篇 农学
- 1 篇 作物学

主题

73 篇 performance
52 篇 parallel process...
44 篇 parallel program...
43 篇 languages
42 篇 algorithms
35 篇 design
21 篇 gpu
20 篇 parallel algorit...
14 篇 experimentation
12 篇 measurement
11 篇 theory
8 篇 mpi
8 篇 parallel computi...
7 篇 graphics process...
7 篇 parallel
7 篇 concurrency
6 篇 scalability
6 篇 parallelism
6 篇 verification
6 篇 openmp

机构

7 篇 carnegie mellon ...
4 篇 univ wisconsin d...
4 篇 indiana univ blo...
3 篇 univ of tokyo
3 篇 tsinghua univers...
3 篇 univ chinese aca...
3 篇 massachusetts in...
3 篇 univ illinois ur...
3 篇 swiss fed inst t...
3 篇 mit csail united...
3 篇 shanghai jiao to...
3 篇 tsinghua univ pe...
3 篇 univ utah sch co...
3 篇 rice univ housto...
3 篇 univ calif berke...
2 篇 ist austria klos...
2 篇 princeton univ d...
2 篇 georgetown univ ...
2 篇 shanghai key lab...
2 篇 univ of wisconsi...

作者

8 篇 blelloch guy e.
6 篇 hoefler torsten
6 篇 garland michael
6 篇 chen haibo
6 篇 shun julian
5 篇 sun yihan
5 篇 zhai jidong
5 篇 tsigas philippas
4 篇 dhulipala laxman
4 篇 chen wenguang
4 篇 tan guangming
4 篇 wang haojie
4 篇 nikolopoulos dim...
4 篇 long guoping
4 篇 sarkar vivek
4 篇 valero mateo
4 篇 mellor-crummey j...
4 篇 gu yan
4 篇 kennedy ken
3 篇 taura kenjiro

语言

357 篇 英文
28 篇 其他

检索条件"任意字段=9th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming"

共 385 条记录，以下是131-140 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Copperhead: Compiling an Embedded Data parallel Language 11

Copperhead: Compiling an Embedded Data Parallel Language

引用

16th acm symposium on principles and practice of parallel programming

作者： Catanzaro, Bryan Garland, Michael Keutzer, Kurt Univ Calif Berkeley Berkeley CA 94720 USA

ISBN: (纸本)9781450301190

Modern parallel microprocessors deliver high performance on applications that expose substantial fine-grained data parallelism. Although data parallelism is widely available in many computations, implementing data parallel algorithms in low-level languages is often an unnecessarily difficult task. the characteristics of parallel microprocessors and the limitations of current programming methodologies motivate our design of Copperhead, a high-level data parallel language embedded in Python. the Copperhead programmer describes parallel computations via composition of familiar data parallel primitives supporting both flat and nested data parallel computation on arrays of data. Copperhead programs are expressed in a subset of the widely used Python programming language and interoperate with standard Python modules, including libraries for numeric computation, data visualization, and analysis. In this paper, we discuss the language, compiler, and runtime features that enable Copperhead to efficiently execute data parallel code. We define the restricted subset of Python which Copperhead supports and introduce the program analysis techniques necessary for compiling Copperhead code into efficient low-level implementations. We also outline the runtime support by which Copperhead programs interoperate with standard Python modules. We demonstrate the effectiveness of our techniques with several examples targeting the CUDA platform for parallel programming on GPUs. Copperhead code is concise, on average requiring 3.6 times fewer lines of code than CUDA, and the compiler generates efficient code, yielding 45-100% of the performance of hand-crafted, well optimized CUDA code.

关键词： Python Data parallelism GPU Algorithms Design Performance

来源：评论

学校读者我要写书评

暂无评论

XIndex: A Scalable Learned Index for Multicore Data Storage 20

XIndex: A Scalable Learned Index for Multicore Data Storage

引用

25th acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Tang, Chuzhe Wang, Youyun Dong, Zhiyuan Hu, Gansen Wang, Zhaoguo Wang, Minjie Chen, Haibo Shanghai Jiao Tong Univ Inst Parallel & Distributed Syst Shanghai Peoples R China Shanghai Jiao Tong Univ Shanghai Key Lab Scalable Comp & Syst Shanghai Peoples R China NYU Dept Comp Sci New York NY 10003 USA

ISBN: (纸本)9781450368186

We present XIndex, a concurrent ordered index designed for fast queries. Similar to a recent proposal of the learned index, XIndex uses learned models to optimize index efficiency. Comparing with the learned index, XIndex is able to effectively handle concurrent writes without affecting the query performance by leveraging fine-grained synchronization and a new compaction scheme, Two-Phase Compaction. Furthermore, XIndex adapts its structure according to runtime workload characteristics to support dynamic workload. We demonstrate the advantages of XIndex with both YCSB and TPC-C (KV), a TPC-C variant for key-value stores. XIndex achieves up to 3.2x and 4.4x performance improvement comparing with Masstree and Wormhole, respectively, on a 24-core machine, and it is open-sourced(1).

关键词： Compaction

来源：评论

学校读者我要写书评

暂无评论

PPoPP 2013 - Proceedings of the 2013 acm sigplan symposium on principles and practice of parallel programming

PPoPP 2013 - Proceedings of the 2013 ACM SIGPLAN Symposium o...

引用

18th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2013

ISBN: (纸本)9781450319225

the proceedings contain 45 papers. the topics discussed include: a peta-scalable CPU-GPU algorithm for global atmospheric simulations;adoption protocols for fanout-optimal fault-tolerant termination detection;betweenness centrality: algorithms and implementations;complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU;fast concurrent queues for x86 processors;FASTLANE: improving performance of software transactional memory for low thread counts;Ligra: a lightweight graph processing framework for shared memory;ownership passing: efficient distributed memory programming on multi-core systems;parallel suffix array and least common prefix for the GPU;Streamscan: fast scan algorithms for GPUs without global barrier synchronization;using hardware transactional memory to correct and simplify a readers-writer lock algorithm;and exploring different automata representations for efficient regular expression matching on GPUs.

关键词：

来源：评论

学校读者我要写书评

暂无评论

On the fly MHP Analysis 20

On the fly MHP Analysis

引用

25th acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Saha, Sonali Nandivada, V. Krishna IIT Madras Madras Tamil Nadu India

ISBN: (纸本)9781450368186

May-Happen-in-parallel (MHP) analysis forms the basis for many problems of program analysis and program understanding. MHP analysis can also be used by IDEs (integrateddevelopment-environments) to help programmers to refactor parallel-programs, identify racy programs, understand which parts of the program run in parallel, and so on. Since the code keeps changing in the IDE, re-computing the MHP information after every change can be an expensive affair. In this manuscript, we propose a novel scheme to perform incremental MHP analysis (on the fly) of programs written in task parallel languages like X10 to keep the MHP information up to date, in an IDE environment. the key insight of our proposed approach to maintain the MHP information up to date is that we need not rebuild (from scratch) every data structure related to MHP information, after each modification (addition or deletion of statements) in the source code. the idea is to reuse the old MHP information as much as possible and incrementally recompute the MHP information (of a small set of statements) which depends on the statement added/removed. We introduce two new algorithms that deal with addition and removal of parallel constructs like finish, async, atomic, and sequential constructs like loop, if, if-else and other sequential statements, on the fly. Our evaluation shows that our algorithms run much faster than the repeated invocations of the fastest known MHP analysis for X10 programs [Sankar et al. 2016].

关键词： Concurrent programs may happen in parallel analysis incremental analysis

来源：评论

学校读者我要写书评

暂无评论

Designing and auto-tuning parallel 3-D FFT for computation-communication overlap 14

Designing and auto-tuning parallel 3-D FFT for computation-c...

引用

2014 19th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2014

作者： Song, Sukhyun Hollingsworth, Jeffrey K. Department of Computer Science University of Maryland College Park United States

ISBN: (纸本)9781450326568

this paper presents a method to design and auto-tune a new parallel 3-D FFT code using the non-blocking MPI all-to-all operation. We achieve high performance by optimizing computation-communication overlap. Our code performs fully asynchronous communication without any support from special hardware. We also improve cache performance through loop tiling. To cope with the complex tradeoff regarding our optimization techniques, we parameterize our code and auto-tune the parameters efficiently in a large parameter space. Experimental results from two systems confirm that our code achieves a speedup of up to 1.76× over the FFTW library. Copyright © 2014 acm.

关键词： Fast Fourier transforms

来源：评论

学校读者我要写书评

暂无评论

Scaling LAPACK Panel Operations Using parallel Cache Assignment

Scaling LAPACK Panel Operations Using Parallel Cache Assignm...

引用

15th acm sigplan symposium on principles and practice of parallel programming

作者： Castaldo, Anthony M. Whaley, R. Clint Univ Texas San Antonio Dept Comp Sci San Antonio TX 78249 USA

ISBN: (纸本)9781605587080

In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high performance Level 3 BLAS. the Level 3 BLAS have excellent weak scaling, but panel processing tends to be bus bound, and thus scales with bus speed rather than the number of processors (p). Amdahl's law therefore ensures that as p grows, the panel computation will become the dominant cost of these LAPACK routines. Our contribution is a novel parallel cache assignment approach which we show scales well with p. We apply this general approach to the QR and LU panel factorizations on two commodity 8-core platforms with very different cache structures, and demonstrate superlinear panel factorization speedups on both machines. Other approaches to this problem demand complicated reformulations of the computational approach, new kernels to be tuned, new mathematics, an inflation of the high-order flop count, and do not perform as well. By demonstrating a straight-forward alternative that avoids all of these contortions and scales with p, we address a critical stumbling block for dense linear algebra in the age of massive parallelism.

关键词： ATLAS LAPACK QR LU factorization parallel multicore multi-core GPU

来源：评论

学校读者我要写书评

暂无评论

parallel programming with big operators 13

Parallel programming with big operators

引用

18th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2013

作者： Park, Changhee Steele Jr., Guy L. Tristan, Jean-Baptiste KAIST Seoul Korea Republic of Oracle Labs. Burlington MA United States

ISBN: (纸本)9781450319225

In the sciences, it is common to use the so-called "big operator" notation to express the iteration of a binary operator (the reducer) over a collection of values. Such a notation typically assumes that the reducer is associative and abstracts the iteration process. Consequently, from a programming point-of-view, we can organize the reducer operations to minimize the depth of the overall reduction, allowing a potentially parallel evaluation of a big operator expression. We believe that the big operator notation is indeed an effective construct to express parallel computations in the Generate/Map/Reduce programming model, and our goal is to introduce it in programming languages to support parallel programming. the effective definition of such a big operator expression requires a simple way to generate elements, and a simple way to declare algebraic properties of the reducer (such as its identity, or its commutativity). In this poster, we want to present an extension of Scala with support for big operator expressions. We show how big operator expressions are defined and how the API is organized to support the simple definition of reducers with their algebraic properties. © 2013 Authors.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

the lock-free k-LSM relaxed priority queue 2015

The lock-free k-LSM relaxed priority queue

引用

20th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2015

作者： Wimmer, Martin Gruber, Jakob Träff, Jesper Larsson Tsigas, Philippas Faculty of Informatics Parallel Computing Vienna University of Technology Vienna/Wien1040 Austria Computer Science and Engineering Chalmers University of Technology Göteborg412 96 Sweden

ISBN: (纸本)9781450332057

We present a new, concurrent, lock-free priority queue that relaxes the delete-min operation to allow deletion of any of the ρ+1 smallest keys instead of only a minimal one, where ρ is a parameter that can be configured at runtime. It is built from a logarithmic number of sorted arrays, similar to log-structured merge-trees (LSM). For keys added and removed by the same thread the behavior is identical to a non-relaxed priority queue. We compare to state-of-the-art lock-free priority queues with both relaxed and non-relaxed semantics, showing high performance and good scalability of our approach.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallel Block-Delayed Sequences 22

Parallel Block-Delayed Sequences

引用

27th acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Westrick, Sam Rainey, Mike Anderson, Daniel Blelloch, Guy E. Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9781450392044

programming languages using functions on collections of values, such as map, reduce, scan and filter, have been used for over fifty years. Such collections have proven to be particularly useful in the context of parallelism because such functions are naturally parallel. However, if implemented naively they lead to the generation of temporary intermediate collections that can significantly increase memory usage and runtime. To avoid this pitfall, many approaches use "fusion" to combine operations and avoid temporary results. However, most of these approaches involve significant changes to a compiler and are limited to a small set of functions, such as maps and reduces. In this paper we present a library-based approach that fuses widely used operations such as scans, filters, and flattens. In conjunction with existing techniques, this covers most of the common operations on collections. Our approach is based on a novel technique which parallelizes over blocks, with streams within each block. We demonstrate the approach by implementing libraries targeting multicore parallelism in two languages: parallel ML and C++, which have very different semantics and compilers. To help users understand when to use the approach, we define a cost semantics that indicates when fusion occurs and how it reduces memory allocations. We present experimental results for a dozen benchmarks that demonstrate significant reductions in both time and space. In most cases the approach generates code that is near optimal for the machines it is running on.

关键词： parallel programming fusion collections functional programming

来源：评论

学校读者我要写书评

暂无评论

Scalable GPU Graph Traversal 12

Scalable GPU Graph Traversal

引用

17th acm sigplan symposium on principles and practice of parallel programming

作者： Merrill, Duane Garland, Michael Grimshaw, Andrew Univ Virginia Charlottesville VA 22903 USA NVIDIA Corp Santa Clara CA USA

ISBN: (纸本)9781450311601

Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal, but has tended to focus on asymptotically inefficient algorithms that perform poorly on graphs with non-trivial diameter. We present a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum that achieves an asymptotically optimal O(|V|+|E|) work complexity. Our implementation delivers excellent performance on diverse graphs, achieving traversal rates in excess of 3.3 billion and 8.3 billion traversed edges per second using single and quad-GPU configurations, respectively. this level of performance is several times faster than state-of-the-art implementations both CPU and GPU platforms.

关键词： Algorithms performance Breadth-first search GPU graph algorithms parallel algorithms prefix sum graph traversal sparse graph

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共39页 << < 10 11 12 13 14 15 16 17 18 19 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：