检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

510 篇 会议
49 篇 期刊文献
1 册 图书

馆藏范围

560 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

464 篇 工学
- 413 篇 软件工程
- 359 篇 计算机科学与技术...
- 21 篇 电子科学与技术（可...
- 21 篇 控制科学与工程
- 9 篇 信息与通信工程
- 5 篇 机械工程
- 4 篇 电气工程
- 4 篇 生物工程
- 3 篇 动力工程及工程热...
- 3 篇 生物医学工程（可授...
- 2 篇 力学（可授工学、理...
- 2 篇 建筑学
- 2 篇 土木工程
- 2 篇 农业工程
- 1 篇 冶金工程
87 篇 理学
- 78 篇 数学
- 12 篇 系统科学
- 7 篇 统计学（可授理学、...
- 4 篇 生物学
- 2 篇 物理学
- 2 篇 化学
- 1 篇 大气科学
- 1 篇 地质学
26 篇 管理学
- 19 篇 管理科学与工程(可...
- 14 篇 工商管理
- 7 篇 图书情报与档案管...
3 篇 经济学
- 3 篇 应用经济学
3 篇 法学
- 3 篇 社会学
2 篇 教育学
- 2 篇 教育学
2 篇 农学
- 2 篇 作物学

主题

74 篇 performance
72 篇 parallel process...
62 篇 parallel program...
44 篇 algorithms
42 篇 languages
35 篇 design
26 篇 parallel algorit...
25 篇 gpu
14 篇 computer program...
13 篇 parallel computi...
13 篇 parallel
12 篇 experimentation
12 篇 measurement
10 篇 mpi
10 篇 transactional me...
9 篇 graphics process...
9 篇 theory
9 篇 concurrency
8 篇 synchronization
7 篇 multithreading

机构

13 篇 carnegie mellon ...
7 篇 indiana univ blo...
4 篇 univ wisconsin d...
4 篇 univ chinese aca...
4 篇 univ illinois ur...
4 篇 swiss fed inst t...
4 篇 mit csail united...
4 篇 shanghai jiao to...
4 篇 mit comp sci & a...
4 篇 rice university
4 篇 univ rochester r...
4 篇 purdue univ w la...
3 篇 univ of tokyo
3 篇 tsinghua univ de...
3 篇 massachusetts in...
3 篇 ohio state univ ...
3 篇 carnegie mellon ...
3 篇 inria rocquencou...
3 篇 itmo univ st pet...
3 篇 tsinghua univ pe...

作者

9 篇 chen haibo
8 篇 hoefler torsten
8 篇 blelloch guy e.
8 篇 agrawal kunal
7 篇 garland michael
7 篇 leiserson charle...
6 篇 sun yihan
6 篇 zhai jidong
6 篇 shun julian
6 篇 mellor-crummey j...
5 篇 rainey mike
5 篇 miller barton p.
5 篇 krishnamoorthy s...
5 篇 tsigas philippas
5 篇 padua david
5 篇 nikolopoulos dim...
5 篇 lam monica s.
5 篇 valero mateo
5 篇 scott michael l.
4 篇 taura kenjiro

语言

521 篇 英文
39 篇 其他

检索条件"任意字段=2003 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming"

共 560 条记录，以下是181-190 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Cache-oblivious wavefront: Improving parallelism of recursive dynamic programming algorithms without losing cache-efficiency 2015

Cache-oblivious wavefront: Improving parallelism of recursiv...

引用

20th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2015

作者： Tang, Yuan You, Ronghui Kan, Haibin Tithi, Jesmin Jahan Ganapathi, Pramod Chowdhury, Rezaul A. Software School School of Computer Science Fudan University Shanghai Key Laboratory of Intelligent Information Processing Shanghai China Department of Computer Science Stony Brook University Stony BrookNY11790 United States

ISBN: (纸本)9781450332057

State-of-the-art cache-oblivious parallel algorithms for dynamic programming (DP) problems usually guarantee asymptotically optimal cache performance without any tuning of cache parameters, but they often fail to exploit the theoretically best parallelism at the same time. While these algorithms achieve cache-optimality through the use of a recursive divide-and-conquer (DAC) strategy, scheduling tasks at the granularity of task dependency introduces artificial dependencies in addition to those arising from the defining recurrence equations. We removed the artificial dependency by scheduling tasks ready for execution as soon as all its real dependency constraints are satisfied, while preserving the cache-optimality by inheriting the DAC strategy. We applied our approach to a set of widely known dynamic programming problems, such as Floyd-Warshall's All-Pairs Shortest Paths, Stencil, and LCS. Theoretical analyses show that our techniques improve the span of 2-way DAC-based Floyd Warshall's algorithm on an n node graph from Θ(nlog2 n) to Θ(n), stencil computations on a d-dimensional hypercubic grid of width w for h time steps from Θ ((d2h) wlog(d+2)-1) to Θ(h), and LCS on two sequences of length n each from Θ (nlog23) to Θ(n). In each case, the total work and cache complexity remain asymptotically optimal. Experimental measurements exhibit a 3-5 times improvement in absolute running time, 10-20 times improvement in burdened span by Cilkview, and approximately the same L1/L2 cache misses by PAPI. Copyright 2015 acm.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

SYNC or ASYNC: Time to fuse for distributed graph-parallel computation 2015

SYNC or ASYNC: Time to fuse for distributed graph-parallel c...

引用

20th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2015

作者： Xie, Chenning Chen, Rong Guan, Haibing Zang, Binyu Chen, Haibo Shanghai Key Laboratory of Scalable Computing and Systems Institute of Parallel and Distributed Systems Shanghai Jiao Tong University China Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science Shanghai Jiao Tong University China

ISBN: (纸本)9781450332057

Large-scale graph-structured computation usually exhibits iterative and convergence-oriented computing nature, where input data is computed iteratively until a convergence condition is reached. Such features have led to the development of two different computation modes for graph-structured programs, namely synchronous (Sync) and asynchronous (Async) modes. Unfortunately, there is currently no in-depth study on their execution properties and thus programmers have to manually choose a mode, either requiring a deep understanding of underlying graph engines, or suffering from suboptimal performance. This paper makes the first comprehensive characterization on the performance of the two modes on a set of typical graph-parallel applications. Our study shows that the performance of the two modes varies significantly with different graph algorithms, partitioning methods, execution stages, input graphs and cluster scales, and no single mode consistently outperforms the other. To this end, this paper proposes Hsync, a hybrid graph computation mode that adaptively switches a graph-parallel program between the two modes for optimal performance. Hsync constantly collects execution statistics on-the-fly and leverages a set of heuristics to predict future performance and determine when a mode switch could be profitable. We have built online sampling and offline profiling approaches combined with a set of heuristics to accurately predicting future performance in the two modes. A prototype called PowerSwitch has been built based on PowerGraph, a state-of-the-art distributed graph-parallel system, to support adaptive execution of graph algorithms. On a 48-node EC2-like cluster, PowerSwitch consistently outperforms the best of both modes, with a speedup ranging from 9% to 73% due to timely switch between two modes. Copyright 2015 acm.

关键词： Graphic methods

来源：评论

学校读者我要写书评

暂无评论

A hierarchical approach to reducing communication in parallel graph algorithms 2015

A hierarchical approach to reducing communication in paralle...

引用

20th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2015

作者： Harshvardhan Amato, Nancy M. Rauchwerger, Lawrence Parasol Laboratory Department of Computer Science and Engineering Texas A and M University United States

ISBN: (纸本)9781450332057

Large-scale graph computing has become critical due to the ever-increasing size of data. However, distributed graph computations are limited in their scalability and performance due to the heavy communication inherent in such computations. This is exacerbated in scale-free networks, such as social and web graphs, which contain hub vertices that have large degrees and therefore send a large number of messages over the network. Furthermore, many graph algorithms and computations send the same data to each of the neighbors of a vertex. Our proposed approach recognizes this, and reduces communication performed by the algorithm without change to user-code, through a hierarchical machine model imposed upon the input graph. The hierarchical model takes advantage of locale information of the neighboring vertices to reduce communication, both in message volume and total number of bytes sent. It is also able to better exploit the machine hierarchy to further reduce the communication costs, by aggregating traffic between different levels of the machine hierarchy. Results of an implementation in the STAPL GL shows improved scalability and performance over the traditional level-synchronous approach, with 2.5 × - 8× improvement for a variety of graph algorithms at 12, 000+ cores.

关键词： Big data

来源：评论

学校读者我要写书评

暂无评论

NUMA-aware graph-structured analytics 2015

NUMA-aware graph-structured analytics

引用

20th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2015

作者： Zhang, Kaiyuan Chen, Rong Chen, Haibo Shanghai Key Laboratory of Scalable Computing and Systems Institute of Parallel and Distributed Systems Shanghai Jiao Tong University China

ISBN: (纸本)9781450332057

Graph-structured analytics has been widely adopted in a number of big data applications such as social computation, web-search and recommendation systems. Though much prior research focuses on scaling graph-analytics on distributed environments, the strong desire on performance per core, dollar and joule has generated considerable interests of processing large-scale graphs on a single server-class machine, which may have several terabytes of RAM and 80 or more cores. However, prior graph-analytics systems are largely neutral to NUMA characteristics and thus have suboptimal performance. This paper presents a detailed study of NUMA characteristics and their impact on the efficiency of graph-analytics. Our study uncovers two insights: 1) either random or interleaved allocation of graph data will significantly hamper data locality and parallelism;2) sequential inter-node (i.e., remote) memory accesses have much higher bandwidth than both intra- and inter-node random ones. Based on them, this paper describes Polymer, a NUMA-aware graph-analytics system on multicore with two key design decisions. First, Polymer differentially allocates and places topology data, application-defined data and mutable runtime states of a graph system according to their access patterns to minimize remote accesses. Second, for some remaining random accesses, Polymer carefully converts random remote accesses into sequential remote accesses, by using lightweight replication of vertices across NUMA nodes. To improve load balance and vertex convergence, Polymer is further built with a hierarchical barrier to boost parallelism and locality, an edge-oriented balanced partitioning for skewed graphs, and adaptive data structures according to the proportion of active vertices. A detailed evaluation on an 80-core machine shows that Polymer often outperforms the state-of-the-art single-machine graph-analytics systems, including Ligra, X-Stream and Galois, for a set of popular real-world and synthetic grap

关键词： Random access storage

来源：评论

学校读者我要写书评

暂无评论

Provably good scheduling for parallel programs that use data structures through implicit batching 14

Provably good scheduling for parallel programs that use data...

引用

2014 19th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2014

作者： Agrawal, Kunal Fineman, Jeremy T. Sheridan, Brendan Sukha, Jim Utterback, Robert Washington University in Saint Louis United States Georgetown University United States Intel Corporation United States

This poster proposes an efficient runtime scheduler that provides provable performance guarantees to parallel programs that use data structures through the use of implicit batching.

ISBN: (纸本)9781450326568

This poster proposes an efficient runtime scheduler that provides provable performance guarantees to parallel programs that use data structures through the use of implicit batching.

关键词： Data structures

来源：评论

学校读者我要写书评

暂无评论

parallelizing dynamic programming through rank convergence 14

Parallelizing dynamic programming through rank convergence

引用

2014 19th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2014

作者： Maleki, Saeed Musuvathi, Madanlal Mytkowicz, Todd Univerity of Illinois at Urbana-Champaign United States Microsoft Research United States

ISBN: (纸本)9781450326568

This paper proposes an efficient parallel algorithm for an important class of dynamic programming problems that includes Viterbi, Needleman-Wunsch, Smith-Waterman, and Longest Common Subsequence. In dynamic programming, the subproblems that do not depend on each other, and thus can be computed in parallel, form stages or wavefronts. The algorithm presented in this paper provides additional parallelism allowing multiple stages to be computed in parallel despite dependences among them. The correctness and the performance of the algorithm relies on rank convergence properties of matrix multiplication in the tropical semiring, formed with plus as the multiplicative operation and max as the additive operation. This paper demonstrates the efficiency of the parallel algorithm by showing significant speed ups on a variety of important dynamic programming problems. In particular, the parallel Viterbi decoder is up-to 24× faster (with 64 processors) than a highly optimized commercial baseline. Copyright © 2014 acm.

关键词： Dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Extracting logical structure and identifying stragglers in parallel execution traces 14

Extracting logical structure and identifying stragglers in p...

引用

2014 19th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2014

作者： Isaacs, Katherine E. Gamblin, Todd Bhatele, Abhinav Bremer, Peer-Timo Schulz, Martin Hamann, Bernd Department of Computer Science University of California Davis United States Center for Applied Scientific Computing Lawrence Livermore National Laboratory United States

ISBN: (纸本)9781450326568

We introduce a new approach to automatically extract an idealized logical structure from a parallel execution trace. We use this structure to define intuitive metrics such as the lateness of a process involved in a parallel execution. By analyzing and illustrating traces in terms of logical steps, we leverage a developer's understanding of the happened-before relations in a parallel program. This technique can uncover dependency chains, elucidate communication patterns, and highlight sources and propagation of delays, all of which may be obscured in a traditional trace visualization.

关键词： Visualization

来源：评论

学校读者我要写书评

暂无评论

Initial study of multi-endpoint runtime for MPI+OpenMP hybrid programming model on multi-core systems 14

Initial study of multi-endpoint runtime for MPI+OpenMP hybri...

引用

2014 19th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2014

作者： Luo, Miao Lu, Xiaoyi Hamidouche, Khaled Kandalla, Krishna Panda, Dhabaleswar K. Dept. of Computer Science and Engineering Ohio State University United States

ISBN: (纸本)9781450326568

State-of-the-art MPI libraries rely on locks to guarantee thread-safety. This discourages application developers from using multiple threads to perform MPI operations. In this paper, we propose a high performance, lock-free multiendpoint MPI runtime, which can achieve up to 40% improvement for point-to-point operation and one representative collective operation with minimum or no modifications to the existing applications.

关键词： Locks (fasteners)

来源：评论

学校读者我要写书评

暂无评论

Fine-grain parallel megabase sequence comparison with multiple heterogeneous GPUs 14

Fine-grain parallel megabase sequence comparison with multip...

引用

2014 19th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2014

作者： De Sandes, Edans F.O. Miranda, Guillermo Melo, Alba C.M.A. Martorell, Xavier Ayguadé, Eduard University of Brasilia Brazil Universitat Politècnica de Catalunya Barcelona Supercomputing Center Spain

ISBN: (纸本)9781450326568

This paper proposes and evaluates a parallel strategy to execute the exact Smith-Waterman (SW) algorithm for megabase DNA sequences in heterogeneous multi-GPU platforms. In our strategy, the computation of a single huge SW matrix is spread over multiple GPUs, which communicate border elements to the neighbour, using a circular buffer mechanism that hides the communication overhead. We compared 4 pairs of human-chimpanzee homologous chromosomes using 2 different GPU environments, obtaining a performance of up to 140.36 GCUPS (Billion of cells processed per second) with 3 heterogeneous GPUS.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Triolet: A programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing 14

Triolet: A programming system that unifies algorithmic skele...

引用

2014 19th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2014

作者： Rodrigues, Christopher Jablin, Thomas Dakkak, Abdul Hwu, Wen-Mei University of Illinois at Urbana-Champaign United States

ISBN: (纸本)9781450326568

Functional algorithmic skeletons promise a high-level programming interface for distributed-memory clusters that free developers from concerns of task decomposition, scheduling, and communication. Unfortunately, prior distributed functional skeleton frameworks do not deliver performance comparable to that achievable in a low-level distributed programming model such as C with MPI and OpenMP, even when used in concert with high-performance array libraries. There are several causes: they do not take advantage of shared memory on each cluster node;they impose a fixed partitioning strategy on input data;and they have limited ability to fuse loops involving skeletons that produce a variable number of outputs per input. We address these shortcomings in the Triolet programming language through a modular library design that separates concerns of parallelism, loop nesting, and data partitioning. We show how Triolet substantially improves the parallel performance of algorithms involving array traversals and nested, variable-size loops over what is achievable in Eden, a distributed variant of Haskell. We further demonstrate how Triolet can substantially simplify parallel programming relative to C with MPI and OpenMP while achieving 23.100% of its performance on a 128-core cluster. Copyright © 2014 acm.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共56页 << < 15 16 17 18 19 20 21 22 23 24 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：