检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

336 篇 会议
46 篇 期刊文献

馆藏范围

382 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

329 篇 工学
- 284 篇 软件工程
- 268 篇 计算机科学与技术...
- 12 篇 电子科学与技术（可...
- 7 篇 信息与通信工程
- 4 篇 机械工程
- 4 篇 控制科学与工程
- 4 篇 生物工程
- 3 篇 生物医学工程（可授...
- 1 篇 力学（可授工学、理...
- 1 篇 动力工程及工程热...
- 1 篇 电气工程
- 1 篇 建筑学
- 1 篇 土木工程
- 1 篇 化学工程与技术
- 1 篇 核科学与技术
- 1 篇 农业工程
- 1 篇 环境科学与工程（可...
58 篇 理学
- 52 篇 数学
- 5 篇 系统科学
- 4 篇 生物学
- 4 篇 统计学（可授理学、...
- 3 篇 化学
15 篇 管理学
- 10 篇 管理科学与工程(可...
- 8 篇 工商管理
- 5 篇 图书情报与档案管...
3 篇 经济学
- 3 篇 应用经济学
2 篇 法学
- 2 篇 社会学
2 篇 教育学
- 2 篇 教育学
1 篇 农学
- 1 篇 作物学

主题

71 篇 performance
49 篇 parallel process...
42 篇 algorithms
42 篇 parallel program...
39 篇 languages
34 篇 design
21 篇 gpu
20 篇 parallel algorit...
12 篇 experimentation
12 篇 measurement
9 篇 theory
9 篇 parallel computi...
8 篇 mpi
8 篇 parallel
7 篇 parallelism
7 篇 graphics process...
7 篇 logic programmin...
7 篇 concurrency
6 篇 openmp
5 篇 reliability

机构

7 篇 carnegie mellon ...
5 篇 indiana univ blo...
4 篇 univ wisconsin d...
3 篇 univ of tokyo
3 篇 univ chinese aca...
3 篇 massachusetts in...
3 篇 univ illinois ur...
3 篇 swiss fed inst t...
3 篇 mit csail united...
3 篇 shanghai jiao to...
3 篇 tsinghua univ pe...
3 篇 univ utah sch co...
3 篇 rice univ housto...
3 篇 purdue univ w la...
3 篇 univ calif berke...
2 篇 ist austria klos...
2 篇 princeton univ d...
2 篇 georgetown univ ...
2 篇 yale university ...
2 篇 coll william & m...

作者

8 篇 blelloch guy e.
6 篇 hoefler torsten
6 篇 garland michael
6 篇 chen haibo
6 篇 shun julian
5 篇 sun yihan
5 篇 zhai jidong
5 篇 tsigas philippas
5 篇 kennedy ken
4 篇 dhulipala laxman
4 篇 miller barton p.
4 篇 tan guangming
4 篇 wang haojie
4 篇 nikolopoulos dim...
4 篇 long guoping
4 篇 valero mateo
4 篇 mellor-crummey j...
4 篇 agrawal kunal
4 篇 gu yan
4 篇 leiserson charle...

语言

356 篇 英文
26 篇 其他

检索条件"任意字段=14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming"

共 382 条记录，以下是171-180 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

An Adaptive Performance Modeling Tool for GPU Architectures

An Adaptive Performance Modeling Tool for GPU Architectures

引用

15th acm sigplan symposium on principles and practice of parallel programming

作者： Baghsorkhi, Sara S. Delahaye, Matthieu Patel, Sanjay J. Gropp, William D. Hwu, Wen-mei W. Univ Illinois Urbana IL 61801 USA

ISBN: (纸本)9781605587080

this paper presents an analytical model to predict the performance of general-purpose applications on a GPU architecture. the model is designed to provide performance information to an auto-tuning compiler and assist it in narrowing down the search to the more promising implementations. It can also be incorporated into a tool to help programmers better assess the performance bottlenecks in their code. We analyze each GPU kernel and identify how the kernel exercises major GPU microarchitecture features. To identify the performance bottlenecks accurately, we introduce an abstract interpretation of a GPU kernel, work flow graph, based on which we estimate the execution time of a GPU kernel. We validated our performance model on the NVIDIA GPUs using CUDA (Compute Unified Device Architecture). For this purpose, we used data parallel benchmarks that stress different GPU microarchitecture events such as uncoalesced memory accesses, scratch-pad memory bank conflicts, and control flow divergence, which must be accurately modeled but represent challenges to the analytical performance models. the proposed model captures full system complexity and shows high accuracy in predicting the performance trends of different optimized kernel implementations. We also describe our approach to extracting the performance model automatically from a kernel code.

关键词： Design Measurement Performance Analytical model GPU parallel programming Performance estimation

来源：评论

学校读者我要写书评

暂无评论

throughput-Oriented GPU Memory Allocation 19

Throughput-Oriented GPU Memory Allocation

引用

24th acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Gelado, Isaac Garland, Michael NVIDIA Santa Clara CA 95051 USA

ISBN: (纸本)9781450362252

throughput-oriented architectures, such as GPUs, can sustain three orders of magnitude more concurrent threads than multicore architectures. this level of concurrency pushes typical synchronization primitives (e.g., mutexes) over their scalability limits, creating significant performance bottlenecks in modules, such as memory allocators, that use them. In this paper, we develop concurrent programming techniques and synchronization primitives, in support of a dynamic memory allocator, that are efficient for use with very high levels of concurrency. We formulate resource allocation as a two-stage process, that decouples accounting for the number of available resources from the tracking of the available resources themselves. To facilitate the accounting stage, we introduce a novel bulk semaphore abstraction that extends traditional semaphore semantics by optimizing for the case where threads operate on the semaphore simultaneously. We also similarly design new collective synchronization primitives that enable groups of cooperating threads to enter critical sections together. Finally, we show that delegation of deferred reclamation to threads already blocked greatly improves efficiency. Using all these techniques, our throughput-oriented memory allocator delivers both high allocation rates and low memory fragmentation on modern GPUs. Our experiments demonstrate that it achieves allocation rates that are on average 16.56 times higher than the counterpart implementation in the CUDA 9 toolkit.

关键词： Concurrency Memory Allocation GPU programming

来源：评论

学校读者我要写书评

暂无评论

Distributed data access in AC

Distributed data access in AC

引用

Proceedings of the 5th acm sigplan symposium on principles and practice of parallel programming

作者： Carlson, William W. Draper, Jesse M. IDA Supercomputing Research Cent Bowie United States

We have modified the C language to support a programming model based on a shared address space with physically distributed memory. With this model users can write programs in which the nodes of a massively parallel processor can access remote memory without message passing. AC provides support for distributed arrays as well as pointers to distributed data. Simple array references and pointer dereferencing are sufficient to generate low-overhead remote reads and writes. We have implemented these ideas in a compiler based on the GNU C compiler and targeted at Cray Research's T3D. Initial performance measurements show that AC generates code for remote accesses which is considerably faster than that of the native compiler for structures up to about 16 words in size and virtually equivalent for larger transfers.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

A collection-oriented programming model for performance portability 2015

A collection-oriented programming model for performance port...

引用

20th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2015

作者： Muralidharan, Saurav Garland, Michael Catanzaro, Bryan Sidelnik, Albert Hall, Mary University of Utah Salt Lake CityUT United States NVIDIA Corporation Santa ClaraCA United States Baidu Inc. SunnyvaleCA United States

ISBN: (纸本)9781450332057

this paper describes Surge, a collection-oriented programming model that enables programmers to compose parallel computations using nested high-level data collections and operators. Surge exposes a code generation interface, decoupled from the core computation, that enables programmers and autotuners to easily generate multiple implementations of the same computation on various parallel architectures such as multi-core CPUs and GPUs. By decoupling computations from architecture-specific implementation, programmers can target multiple architectures more easily, and generate a search space that facilitates optimization and customization for specific architectures. We express in Surge four real-world benchmarks from domains such as sparse linear-algebra and machine learning and from the same performance-portable specification, generate OpenMP and CUDA C++ implementations. Surge generates efficient, scalable code which achieves up to 1.32x speedup over handcrafted, well-optimized CUDA code.

关键词： parallel architectures

来源：评论

学校读者我要写书评

暂无评论

High performance Fortran for highly irregular problems

High performance Fortran for highly irregular problems

引用

Proceedings of the 1997 6th acm sigplan symposium on principles and practice of parallel programming

作者： Hu, Y.Charlie Johnsson, S.Lennart Teng, Shang-Hua Harvard Univ Cambridge United States

We present a general data parallel formulation for highly irregular problems in High Performance Fortran (HPF). Our formulation consists of (1) a method for linearizing irregular data structures (2) a data parallel implementation (in HPF) of graph partitioning algorithms applied to the linearized data structure, (3) techniques for expressing irregular communication and nonuniform computations associated with the elements of linearized data structures. We demonstrate and evaluate our formulation on a parallel, hierarchical N-body method for the evaluation of potentials and forces of nonuniform particle distributions. Our experimental results demonstrate that efficient data parallel (HPF) implementations of highly nonuniform problems are feasible with the proper language/compiler/runtime support. Our data parallel N-body code provides a much needed 'benchmark' code for evaluating and improving HPF compilers.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

parallel execution of multi-set constraint rewrite rules 08

Parallel execution of multi-set constraint rewrite rules

引用

PPDP 2008: 10th International acm sigplan symposium on principles and practice of Declarative programming

作者： Sulzmann, Martin Lam, Edmund S. L. Programming Logics and Semantics Group IT University of Copenhagen Rued Langgaards Vej 7 2300 Copenhagen S Denmark School of Computing National University of Singapore S16 Level 5 3 Science Drive 2 Singapore 117543 Singapore

ISBN: (纸本)9781605581170

Multi-set constraint rewriting allows for a highly parallel computational model and has been used in a multitude of application domains such as constraint solving, agent specification etc. Rewriting steps can be applied simultaneously as long as they do not interfere with each other. We wish that the underlying constraint rewrite implementation executes rewrite steps in parallel on increasingly popular becoming multi-core architectures. We design and implement efficient algorithms which allow for the parallel execution of multi-set constraint rewrite rules. Our experiments show that we obtain some significant speed-ups on multi-core architectures. Copyright © 2008 acm.

关键词： Computer architecture

来源：评论

学校读者我要写书评

暂无评论

Improving parallel shear-warp volume rendering on shared address space multiprocessors 97

Improving parallel shear-warp volume rendering on shared add...

引用

Proceedings of the 1997 6th acm sigplan symposium on principles and practice of parallel programming

作者： Jiang, Dongming Singh, Jaswinder Pal Princeton Univ Princeton NJ United States

ISBN: (纸本)9780897919067

this paper presents a new parallel volume rendering algorithm and implementation, based on shear warp factorization, for shared address space multiprocessors. Starting from an existing parallel shear-warp renderer, we use increasingly detailed performance measurements on real machines and simulators to understand performance bottlenecks. this leads us to a new parallel implementation that substantially outperforms and out-scales the old one on a range of shared address space platforms, from bus-based centralized memory machine to hardware-coherent distributed memory machines to networks of computers connected by page-based shared virtual memory. the results demonstrate that real time volume rendering is promising on general purpose multiprocessors, and illustrate the utility of tool hierarchies in conjunction with algorithmic and application knowledge to understand memory system interactions and improve parallel algorithms.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Compilation of parallel multimedia computations - extending retiming theory and Amdahl's law 97

Compilation of parallel multimedia computations - extending ...

引用

Proceedings of the 1997 6th acm sigplan symposium on principles and practice of parallel programming

作者： Prasanna, G.N.Srinivasa Lucent Technologies Murray Hill NJ United States

ISBN: (纸本)9780897919067

Multimedia applications operate on downstreams. A large class of multimedia applications is described by the macro-dataflow graph model. this study attempted to examine how such multimedia applications can be compiled to run efficiently on parallel machines, by optimizing both throughput (T) and latency (L), using two techniques based on task speedup functions. the first step chooses an appropriate pipeline structure for the system while the second exploits the dataset parallelism intrinsic in the period datastream, and runs multiple datasets in parallel (task/cluster multiplicity) for each clustering. Both techniques were used to compile real-time image-processing problems on an NCUBE-2 multiprocessor. the two techniques showed substantial performance gains.

关键词： Program compilers

来源：评论

学校读者我要写书评

暂无评论

PPDP'08 Proceedings of the 10th International acm sigplan symposium on principles and practice of Declarative programming

PPDP'08 Proceedings of the 10th International ACM SIGPLAN Sy...

引用

PPDP 2008: 10th International acm sigplan symposium on principles and practice of Declarative programming

ISBN: (纸本)9781605581170

the proceedings contain 25 papers. the topics discussed include: order-sorted dependency pairs;macros for context-free grammars;inferring precise polymorphic type dependencies in logic programs;a type system for safe memory management and its proof of correctness;programming with proofs and explicit contexts;towards execution time estimation in abstract machine-based languages;similarity-based reasoning in qualified logic programming;classifying integrity checking methods with regard to inconsistency tolerance;comprehending finite maps for algorithmic debugging of higher-order functional programs;parallel execution of multi-set constraint rewrite rules;a rewriting framework for the composition of access control policies;global difference constraint propagation for finite domain solvers;and dynamic variable elimination during propagation solving.

关键词：

来源：评论

学校读者我要写书评

暂无评论

VEBO: A vertex- and edge-balanced ordering heuristic to load balance parallel graph processing 19

VEBO: A vertex- and edge-balanced ordering heuristic to load...

引用

24th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2019

作者： Sun, Jiawen Vandierendonck, Hans Nikolopoulos, Dimitrios S. Queen's University of Belfast United Kingdom

ISBN: (纸本)9781450362252

this work proposes Vertex- and Edge-Balanced Ordering (VEBO): balance the number of edges and the number of unique destinations of those edges. VEBO balances edges and vertices for graphs with a power-law degree distribution, and ensures an equal degree distribution between partitions. Experimental evaluation on three shared-memory graph processing systems (Ligra, Polymer and GraphGrind) shows that VEBO achieves excellent load balance and improves performance by 1.09× over Ligra, 1.41× over Polymer and 1.65× over GraphGrind, compared to their respective partitioning algorithms, averaged across 8 algorithms and 7 graphs. VEBO improves GraphGrind performance with a speedup of 2.9× over Ligra on average. © 2019 Copyright held by the owner/author(s).

关键词： Graph theory

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共39页 << < 14 15 16 17 18 19 20 21 22 23 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：