检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

322 篇 会议
18 篇 期刊文献

馆藏范围

340 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

288 篇 工学
- 248 篇 软件工程
- 232 篇 计算机科学与技术...
- 13 篇 电子科学与技术（可...
- 7 篇 信息与通信工程
- 5 篇 控制科学与工程
- 4 篇 机械工程
- 4 篇 生物工程
- 3 篇 生物医学工程（可授...
- 1 篇 力学（可授工学、理...
- 1 篇 动力工程及工程热...
- 1 篇 电气工程
- 1 篇 核科学与技术
- 1 篇 农业工程
- 1 篇 环境科学与工程（可...
53 篇 理学
- 49 篇 数学
- 4 篇 生物学
- 4 篇 系统科学
- 4 篇 统计学（可授理学、...
- 2 篇 化学
14 篇 管理学
- 10 篇 管理科学与工程(可...
- 8 篇 工商管理
- 4 篇 图书情报与档案管...
3 篇 经济学
- 3 篇 应用经济学
2 篇 法学
- 2 篇 社会学
1 篇 教育学
- 1 篇 教育学
1 篇 农学
- 1 篇 作物学

主题

54 篇 performance
48 篇 parallel process...
33 篇 algorithms
33 篇 parallel program...
27 篇 languages
25 篇 design
20 篇 parallel algorit...
20 篇 gpu
9 篇 experimentation
9 篇 measurement
7 篇 graphics process...
7 篇 theory
7 篇 parallel
6 篇 scalability
6 篇 mpi
6 篇 parallel computi...
6 篇 concurrency
5 篇 parallelism
5 篇 graph algorithms
5 篇 multicore

机构

7 篇 carnegie mellon ...
4 篇 indiana univ blo...
4 篇 shanghai jiao to...
3 篇 univ of tokyo
3 篇 tsinghua univ de...
3 篇 univ chinese aca...
3 篇 massachusetts in...
3 篇 univ illinois ur...
3 篇 swiss fed inst t...
3 篇 mit csail united...
3 篇 tsinghua univ pe...
3 篇 univ calif berke...
2 篇 ist austria klos...
2 篇 fudan univ sch c...
2 篇 georgetown univ ...
2 篇 univ wisconsin d...
2 篇 shanghai key lab...
2 篇 univ of wisconsi...
2 篇 tsinghua univers...
2 篇 shanghai jiao to...

作者

8 篇 blelloch guy e.
7 篇 chen haibo
6 篇 hoefler torsten
6 篇 garland michael
6 篇 zhai jidong
6 篇 shun julian
5 篇 sun yihan
4 篇 dhulipala laxman
4 篇 chen wenguang
4 篇 tsigas philippas
4 篇 tan guangming
4 篇 wang haojie
4 篇 nikolopoulos dim...
4 篇 mellor-crummey j...
4 篇 gu yan
4 篇 kennedy ken
3 篇 taura kenjiro
3 篇 li jiajia
3 篇 yonezawa akinori
3 篇 pingali keshav

语言

325 篇 英文
15 篇 其他

检索条件"任意字段=Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming"

共 340 条记录，以下是141-150 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

CUDA-NP: Realizing nested thread-level parallelism in GPGPU applications 14

CUDA-NP: Realizing nested thread-level parallelism in GPGPU ...

引用

2014 19th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2014

作者： Yang, Yi Zhou, Huiyang Department of Computing Systems Architecture NEC Laboratories America Inc. United States Department of Electrical and Computer Engineering North Carolina State University United States

ISBN: (纸本)9781450326568

parallel programs consist of series of code sections with different thread-level parallelism (TLP). As a result, it is rather common that a thread in a parallel program, such as a GPU kernel in CUDA programs, still contains both sequential code and parallel loops. In order to leverage such parallel loops, the latest Nvidia Kepler architecture introduces dynamic parallelism, which allows a GPU thread to start another GPU kernel, thereby reducing the overhead of launching kernels from a CPU. However, with dynamic parallelism, a parent thread can only communicate with its child threads through global memory and the overhead of launching GPU kernels is non-trivial even within GPUs. In this paper, we first study a set of GPGPU benchmarks that contain parallel loops, and highlight that these benchmarks do not have a very high loop count or high degrees of TLP. Consequently, the benefits of leveraging such parallel loops using dynamic parallelism are too limited to offset its overhead. We then present our proposed solution to exploit nested parallelism in CUDA, referred to as CUDA-NP. With CUDA-NP, we initially enable a high number of threads when a GPU program starts, and use control flow to activate different numbers of threads for different code sections. We implemented our proposed CUDA-NP framework using a directive-based compiler approach. For a GPU kernel, an application developer only needs to add OpenMP-like pragmas for parallelizable code sections. then, our CUDA-NP compiler automatically generates the optimized GPU kernels. It supports both the reduction and the scan primitives, explores different ways to distribute parallel loop iterations into threads, and efficiently manages on-chip resource. Our experiments show that for a set of GPGPU benchmarks, which have already been optimized and contain nested parallelism, our proposed CUDA-NP framework further improves the performance by up to 6.69 times and 2.18 times on average. Copyright © 2014 acm.

关键词： Application programming interfaces (API)

来源：评论

学校读者我要写书评

暂无评论

Resilient X10: Efficient failure-aware programming 14

Resilient X10: Efficient failure-aware programming

引用

2014 19th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2014

作者： Cunningham, David Grove, David Herta, Benjamin Iyengar, Arun Kawachiya, Kiyokuni Murata, Hiroki Saraswat, Vijay Takeuchi, Mikio Tardieu, Olivier IBM T. J. Watson Research Center Japan Google Inc. Japan IBM Research Tokyo Japan

ISBN: (纸本)9781450326568

Scale-out programs run on multiple processes in a cluster. In scale-out systems, processes can fail. Computations using traditional libraries such as MPI fail when any component process fails. the advent of Map Reduce, Resilient Data Sets and MillWheel has shown dramatic improvements in productivity are possible when a high-level programming framework handles scale-out and resilience automatically. We are concerned with the development of generalpurpose languages that support resilient programming. In this paper we show how the X10 language and implementation can be extended to support resilience. In Resilient X10, places may fail asynchronously, causing loss of the data and tasks at the failed place. Failure is exposed through exceptions. We identify a Happens Before Invariance Principle and require the runtime to automatically repair the global control structure of the program to maintain this principle. We show this reduces much of the burden of resilient programming. the programmer is only responsible for continuing execution with fewer computational resources and the loss of part of the heap, and can do so while taking advantage of domain knowledge. We build a complete implementation of the language, capable of executing benchmark applications on hundreds of nodes. We describe the algorithms required to make the language runtime resilient. We then give three applications, each with a different approach to fault tolerance (replay, decimation, and domain-level checkpointing). these can be executed at scale and survive node failure. We show that for these programs the overhead of resilience is a small fraction of overall runtime by comparing to equivalent non-resilient X10 programs. On one program we show end-to-end performance of Resilient X10 is ∼100x faster than Hadoop. Copyright © 2014 acm.

关键词： Fault tolerance

来源：评论

学校读者我要写书评

暂无评论

Real-Time Matching of Antescofo Temporal Patterns 14

Real-Time Matching of <i>Antescofo</i> Temporal Patterns

引用

16th International symposium on principles and practice of Declarative programming (PPDP)

作者： Giavitto, Jean-Louis Echeveste, Jose IRCAM UMR STMS 9912 CNRS 1 Pl Igor Stravinsky F-75004 Paris France INRIA MuTant Project 1 Pl Igor Stravinsky F-75004 Paris France Univ Paris 06 Sorbonne Univ 1 Pl Igor Stravinsky F-75004 Paris France

ISBN: (纸本)9781450329477

online matching. Antescofo is a real-time system for performance coordination between musicians and computer processes during live music concert. ATP are used to define complex events that correspond to a combination of perceived events in the musical environment as well as arbitrary logical and metrical temporal conditions. the real-time recognition of such event is used to trigger arbitrary actions in the style of event-condition-action rules. the musical context, the rationales of temporal patterns and several illustrative examples are introduced to motivate the design of ATP. the semantics of ATP matching is defined to parallel the well-known notion of regular expression and Brzozowski's derivatives but extended to handle an infinite alphabet, arbitrary predicates, elapsing time and inhibitory conditions. this approach is compared to those developed in log auditing and for the runtime verification of realtime logics. ATP are implemented by translation into a core subset of the Antescofo domain-specific language. this compilation has proven efficient enough to avoid the extension of the real-time runtime of the language and has been validated with composers in actual pieces.

关键词： timed regular expressions event-driven programming score following timed and reactive system domain-specific language computer music Antescofo

来源：评论

学校读者我要写书评

暂无评论

Session details: Session order 8: programming systems session 14

Session details: Session order 8: programming systems sessio...

引用

proceedings of the 19th acm sigplan symposium on principles and practice of parallel programming

作者： Kunle Olukotun Stanford

No abstract available.

ISBN: (纸本)9781450326568

No abstract available.

关键词：

来源：评论

学校读者我要写书评

暂无评论

High-level dataflow programming for reconfigurable computing 26

High-level dataflow programming for reconfigurable computing

引用

26th IEEE International symposium on Computer Architecture and High Performance Computing Workshops, SBAC-PADW 2014

作者： Sérot, J. Berry, F. Institut Pascal Université Blaise Pascal / CNRS Clermont-Ferrand France

ISBN: (纸本)9781479970148

In many application domains, FPGAS are now promoted as a way of getting round the restrictions of specific CPU designs on system scalability. However, in the current state-of-the art, programming FPGAS remains essentially a hardware-oriented activity, relying on dedicated hardware description languages such as VHDL or Verilog. Using these languages requires expertise in digital design and in practice this limits the applicability of FPGA-based solutions. this is particulary true for stream-processing applications, in which some processing must be carried out "on the fly" on digital data streams. In this context, the dataflow programming model offers a very effective way to reduce the gap between high-level formulations and low-level implementations. To support this claim, the authors have recently introduced CAPH, a domain specific language, offering a fully-automated compilation path from high-level dataflow descriptions to FPGA configuration for stream-processing applications. this paper is a introduction to the CAPH language, giving its motivations and main design principles and exposing the basic features of its syntax, semantics and compilation. It also points to experimental results showing that, at least for stream-processing applications, the dataflow model of computation, used jointly as a programming model and an execution model, can offer a very effective way to conciliate abstraction and efficiency when programming FPGAS. © 2014 IEEE.

关键词： Computer hardware description languages

来源：评论

学校读者我要写书评

暂无评论

Beyond parallel programming with domain specific languages 14

Beyond parallel programming with domain specific languages

引用

proceedings of the 19th acm sigplan symposium on principles and practice of parallel programming

作者： Kunle Olukotun Stanford University Stanford CA USA

ISBN: (纸本)9781450326568

Today, almost all computer architectures are parallel and heterogeneous; a combination of multiple CPUs, GPUs and specialized processors. this creates a challenging problem for application developers who want to develop high performance programs without the effort required to use low-level, architecture specific parallel programming models (e.g. OpenMP for CMPs, CUDA for GPUs, MPI for clusters). Domain-specific languages (DSLs) are a promising solution to this problem because they can provide an avenue for high-level application-specific abstractions with implicit parallelism to be mapped directly to low level architecture-specific programming models; providing both high programmer productivity and high execution *** this talk I will describe an approach to building high performance DSLs, which is based on DSL embedding in a general purpose programming language, metaprogramming and a DSL infrastructure called Delite. I will describe how we transform DSL programs into efficient first-order low-level code using domain specific optimization, parallelism and locality optimization with parallel patterns, and architecture-specific code generation. All optimizations and transformations are implemented in Delite: an extensible DSL compiler infrastucture that significantly reduces the effort required to develop new DSLs. Delite DSLs for machine learning, data querying, graph analysis, and scientific computing all achieve performance competitive with manually parallelized C++ code.

关键词： domain specific languages

来源：评论

学校读者我要写书评

暂无评论

Efficient pseudorecursive evaluation schemes for non-adaptive sparse grids 1

引用

2nd Workshop on Sparse Grids and Applications, SGA 2012

作者： Buse, Gerrit Pflüger, Dirk Jacob, Riko TU München München Germany Institute for Parallel and Distributed Systems Universität Stuttgart Stuttgart Germany ETH Zurich Zurich Switzerland

ISBN: (数字)9783319045375

ISBN: (纸本)9783319045368

In this work we propose novel algorithms for storing and evaluating sparse grid functions, operating on regular (not spatially adaptive), yet potentially dimensionally adaptive grid types. Besides regular sparse grids our approach includes truncated grids, both with and without boundary grid points. Similar to the implicit data structures proposed in Feuersänger (Dünngitterverfahren für hochdimensionale elliptische partielle Differntialgleichungen. Diploma thesis, Institut für Numerische Simulation, Universität Bonn, 2005) and Murarasu et al. (proceedings of the 16th acm symposium on principles and practice of parallel programming. Cambridge University Press, New York, 2011, pp. 25–34) we also define a bijective mapping from the multi-dimensional space of grid points to a contiguous index, such that the grid data can be stored in a simple array without overhead. Our approach is especially well-suited to exploit all levels of current commodity hardware, including cache-levels and vector extensions. Furthermore, this kind of data structure is extremely attractive for today’s real-time applications, as it gives direct access to the hierarchical structure of the grids, while outperforming other common sparse grid structures (hash maps, etc.) which do not match with modern compute platforms that well. For dimensionality d ≤10 we achieve good speedups on a 12 core Intel Westmere-EP NUMA platform compared to the results presented in Murarasu et al. (proceedings of the International Conference on Computational Science—ICCS 2012. Procedia Computer Science, 2012). As we show, this also holds for the results obtained on Nvidia Fermi GPUs, for which we observe speedups over our own CPU implementation of up to 4.5 when dealing with moderate dimensionality. In high-dimensional settings, in the order of tens to hundreds of dimensions, our sparse grid evaluation kernels on the CPU outperform any other known implementation. © Springer International Publishing Switzerland 2014.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

21st century computer architecture 14

21st century computer architecture

引用

proceedings of the 19th acm sigplan symposium on principles and practice of parallel programming

作者： Mark D. Hill University of Wisconsin - Madison Madison WI USA

ISBN: (纸本)9781450326568

this talk has two parts. the first part will discuss possible directions for computer architecture research, including architecture as infrastructure, energy first, impact of new technologies, and cross-layer opportunities. this part is based on a 2012 Computing Community Consortium (CCC) whitepaper effort led by Hill, as well as other recent National Academy and ISAT studies. See: http://***/ccc/docs/init/***. the second part of the talk will discuss one or more exam-ples of cross-layer research advocated in the first part. For example, our analysis shows that many "big-memory" server workloads, such as databases, in-memory caches, and graph analytics, pay a high cost for page-based virtual memory: up to 50% of execution time wasted. Via small changes to the operating system (Linux) and hardware (x86-64 MMU), this work reduces execution time these workloads waste to less than 0.5%. the key idea is to map part of a process's linear virtual address space with a new incarnation of segmentation, while providing compatibility by mapping the rest of the virtual address space with pag-ing.

关键词： computer systems performace architechture programming methods new technology energy

来源：评论

学校读者我要写书评

暂无评论

programming with Hardware Lock Elision 13

Programming with Hardware Lock Elision

引用

18th acm sigplan symposium on principles and practice of parallel programming

作者： Afek, Yehuda Levy, Amir Morrison, Adam Tel Aviv Univ Blavatnik Sch Comp Sci IL-69978 Tel Aviv Israel

We present a simple yet effective technique for improving performance of lock-based code using the hardware lock elision (HLE) feature in Intel's upcoming Haswell processor. We also describe how to extend Haswell&... 详细信息

ISBN: (纸本)9781450319225

关键词： Haswell hardware lock elision speculative execution

来源：评论

学校读者我要写书评

暂无评论

Reducing Contention through Priority Updates 13

Reducing Contention Through Priority Updates

引用

18th acm sigplan symposium on principles and practice of parallel programming

作者： Shun, Julian Blelloch, Guy E. Fineman, Jeremy T. Gibbons, Phillip B. Carnegie Mellon Univ Pittsburgh PA 15213 USA Georgetown Univ Washington DC 20057 USA Intel Labs Pittsburgh PA USA

No abstract available.

ISBN: (纸本)9781450319225

No abstract available.

关键词： Experimentation Performance parallel programming Contention

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共34页 << < 11 12 13 14 15 16 17 18 19 20 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：