检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

361 篇 会议
46 篇 期刊文献

馆藏范围

407 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

351 篇 工学
- 296 篇 软件工程
- 287 篇 计算机科学与技术...
- 13 篇 电子科学与技术（可...
- 7 篇 信息与通信工程
- 7 篇 控制科学与工程
- 4 篇 机械工程
- 4 篇 电气工程
- 4 篇 生物工程
- 3 篇 生物医学工程（可授...
- 2 篇 动力工程及工程热...
- 1 篇 力学（可授工学、理...
- 1 篇 建筑学
- 1 篇 土木工程
- 1 篇 化学工程与技术
- 1 篇 核科学与技术
- 1 篇 农业工程
- 1 篇 环境科学与工程（可...
61 篇 理学
- 55 篇 数学
- 6 篇 系统科学
- 4 篇 生物学
- 4 篇 统计学（可授理学、...
- 3 篇 化学
- 1 篇 物理学
17 篇 管理学
- 12 篇 管理科学与工程(可...
- 9 篇 工商管理
- 5 篇 图书情报与档案管...
4 篇 教育学
- 4 篇 教育学
3 篇 经济学
- 3 篇 应用经济学
2 篇 法学
- 2 篇 社会学
1 篇 农学
- 1 篇 作物学

主题

72 篇 performance
49 篇 parallel process...
46 篇 parallel program...
43 篇 algorithms
40 篇 languages
34 篇 design
22 篇 gpu
21 篇 parallel algorit...
12 篇 experimentation
12 篇 measurement
10 篇 parallel computi...
9 篇 theory
8 篇 mpi
7 篇 parallelism
7 篇 graphics process...
7 篇 parallel
7 篇 openmp
7 篇 concurrency
6 篇 multicore
5 篇 reliability

机构

7 篇 carnegie mellon ...
4 篇 univ wisconsin d...
4 篇 indiana univ blo...
4 篇 shanghai jiao to...
3 篇 univ of tokyo
3 篇 tsinghua univ de...
3 篇 univ chinese aca...
3 篇 massachusetts in...
3 篇 univ illinois ur...
3 篇 swiss fed inst t...
3 篇 mit csail united...
3 篇 tsinghua univ pe...
3 篇 univ utah sch co...
3 篇 rice univ housto...
3 篇 univ calif berke...
3 篇 univ texas austi...
2 篇 ist austria klos...
2 篇 fudan univ sch c...
2 篇 princeton univ d...
2 篇 georgetown univ ...

作者

8 篇 blelloch guy e.
7 篇 chen haibo
6 篇 hoefler torsten
6 篇 garland michael
6 篇 zhai jidong
6 篇 shun julian
5 篇 sun yihan
5 篇 tsigas philippas
4 篇 dhulipala laxman
4 篇 pingali keshav
4 篇 chen wenguang
4 篇 tan guangming
4 篇 wang haojie
4 篇 nikolopoulos dim...
4 篇 long guoping
4 篇 valero mateo
4 篇 mellor-crummey j...
4 篇 gu yan
4 篇 leiserson charle...
4 篇 kennedy ken

语言

380 篇 英文
26 篇 其他
1 篇 葡萄牙文

检索条件"任意字段=16th ACM Symposium on Principles and Practice of Parallel Programming"

共 407 条记录，以下是231-240 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Accelerating CUDA Graph Algorithms at Maximum Warp 11

Accelerating CUDA Graph Algorithms at Maximum Warp

引用

16th acm symposium on principles and practice of parallel programming

作者： Hong, Sungpack Kim, Sang Kyun Oguntebi, Tayo Olukotun, Kunle Stanford Univ Comp Syst Lab Stanford CA 94305 USA

ISBN: (纸本)9781450301190

Graphs are powerful data representations favored in many computational domains. Modern GPUs have recently shown promising results in accelerating computationally challenging graph problems but their performance suffers heavily when the graph structure is highly irregular, as most real-world graphs tend to be. In this study, we first observe that the poor performance is caused by work imbalance and is an artifact of a discrepancy between the GPU programming model and the underlying GPU architecture. We then propose a novel virtual warp-centric programming method that exposes the traits of underlying GPU architectures to users. Our method significantly improves the performance of applications with heavily imbalanced workloads, and enables trade-offs between workload imbalance and ALU underutilization for fine-tuning the performance. Our evaluation reveals that our method exhibits up to 9x speedup over previous GPU algorithms and 12x over single thread CPU execution on irregular graphs. When properly configured, it also yields up to 30% improvement over previous GPU algorithms on regular graphs. In addition to performance gains on graph algorithms, our programming method achieves 1.3x to 15.1x speedup on a set of GPU benchmark applications. Our study also confirms that the performance gap between GPUs and other multi-threaded CPU graph implementations is primarily due to the large difference in memory bandwidth.

关键词： Algorithms Performance parallel graph algorithms CUDA GPGPU

来源：评论

学校读者我要写书评

暂无评论

Achieving a Single Compute Device Image in OpenCL for Multiple GPUs 11

Achieving a Single Compute Device Image in OpenCL for Multip...

引用

16th acm symposium on principles and practice of parallel programming

作者： Kim, Jungwon Kim, Honggyu Lee, Joo Hwan Lee, Jaejin Seoul Natl Univ Ctr Manycore Programming Sch Comp Sci & Engn Seoul 151744 South Korea

ISBN: (纸本)9781450301190

In this paper, we propose an OpenCL framework that treats multiple GPUs as a single compute device. Providing the single GPU image makes an OpenCL application written for a single GPU portable to the GPGPU systems with multiple GPUs. It also makes the application exploit the full computing power of the multiple GPUs and the entire amount of GPU memories available in the system. Our OpenCL framework automatically distributes at run time an OpenCL kernel written for a single GPU into multiple CUDA kernels that execute on the multiple GPUs. It applies a run-time memory access range analysis to the kernel by performing a sampling run and identifies an optimal workload distribution for the kernel. To achieve a single compute device image, the runtime maintains a virtual device memory that is allocated in the main memory of the GPGPU system. the OpenCL runtime treats the memory as if it were the memory of a single GPO device and keeps it consistent to the memories of the multiple GPUs. Our OpenCL-C-to-C translator generates the sampling code from the OpenCL kernel code and our OpenCL-C-to-CUDA-C translator generates the CUDA kernel code for the distributed OpenCL kernel. We show the effectiveness of our OpenCL framework by implementing the OpenCL runtime and the two source-to-source translators. We evaluate its performance with a GPGU system that contains eight GPUs using eleven OpenCL benchmark applications.

关键词： Algorithm Design Experimentation Languages Measurement Performance OpenCL Compilers Runtime Access range analysis Workload distribution Virtual device memory

来源：评论

学校读者我要写书评

暂无评论

Ordered vs. Unordered: a Comparison of parallelism and Work-efficiency in Irregular Algorithms 11

Ordered <i>vs</i>. Unordered: a Comparison of Parallelism an...

引用

16th acm symposium on principles and practice of parallel programming

作者： Hassaan, M. Amber Burtscher, Martin Pingali, Keshav Texas State Univ San Marcos Dept Comp Sci San Marcos TX USA Univ Texas Austin Dept Comp Sci Austin TX 78712 USA

ISBN: (纸本)9781450301190

Outside of computational science, most problems are formulated in terms of irregular data structures such as graphs, trees and sets. Unfortunately, we understand relatively little about the structure of parallelism and locality in irregular algorithms. In this paper, we study several algorithms for four such problems: discrete-event simulation, single-source shortest path, breadth-first search, and minimal spanning trees. We show that these algorithms can be classified into two categories that we call unordered and ordered, and demonstrate experimentally that there is a trade-off between parallelism and work efficiency: unordered algorithms usually have more parallelism than their ordered counterparts for the same problem, but they may also perform more work. Nevertheless, our experimental results show that unordered algorithms typically lead to more scalable implementations, demonstrating that less work-efficient irregular algorithms may be better for parallel execution.

关键词： Irregular Algorithms Amorphous Data-parallelism parallel Breadth-first Search Single-Source Shortest Path Discrete-Event Simulation Minimal Spanning Tree Multicore processors Galois system Algorithms Languages Performance

来源：评论

学校读者我要写书评

暂无评论

the STAPL parallel Container Framework 11

The STAPL Parallel Container Framework

引用

16th acm symposium on principles and practice of parallel programming

作者： Tanase, Gabriel Buss, Antal Fidel, Adam Harshvardhan Papadopoulos, Ioannis Pearce, Olga Smith, Timmie thomas, Nathan Xu, Xiabing Mourad, Nedal Vu, Jeremy Bianco, Mauro Amato, Nancy M. Rauchwerger, Lawrence Texas A&M Univ Parasol Lab Dept Comp Sci & Engn College Stn TX 77843 USA

ISBN: (纸本)9781450301190

the Standard Template Adaptive parallel Library (STAPL) is a parallel programming infrastructure that extends C++ with support for parallelism. It includes a collection of distributed data structures called pContainers that are thread-safe, concurrent objects, i.e., shared objects that provide parallel methods that can be invoked concurrently. In this work, we present the STAPL parallel Container Framework (PCF), that is designed to facilitate the development of generic parallel containers. We introduce a set of concepts and a methodology for assembling a pContainer from existing sequential or parallel containers, without requiring the programmer to deal with concurrency or data distribution issues. the PCF provides a large number of basic parallel data structures (e.g., pArray, pList, pVector, pMatrix, pGraph, pMap, pSet). the PCF provides a class hierarchy and a composition mechanism that allows users to extend and customize the current container base for improved application expressivity and performance. We evaluate STAPL pContainer performance on a CRAY XT4 massively parallel system and show that pContainer methods, generic pAlgorithms, and different applications provide good scalability on more than 16,000 processors.

关键词： parallel programming Languages Libraries Data Structures Languages Design Performance

来源：评论

学校读者我要写书评

暂无评论

COREMU: A Scalable and Portable parallel Full-system Emulator 11

COREMU: A Scalable and Portable Parallel Full-system Emulato...

引用

16th acm symposium on principles and practice of parallel programming

作者： Wang, Zhaoguo Liu, Ran Chen, Yufei Wu, Xi Chen, Haibo Zhang, Weihua Zang, Binyu Fudan Univ Parallel Proc Inst Shanghai Peoples R China

ISBN: (纸本)9781450301190

this paper presents the open-source COREMU, a scalable and portable parallel emulation framework that decouples the complexity of parallelizing full-system emulators from building a mature sequential one. the key observation is that CPU cores and devices in current (and likely future) multiprocessors are loosely-coupled and communicate through well-defined interfaces. Based on this observation, COREMU emulates multiple cores by creating multiple instances of existing sequential emulators, and uses a thin library layer to handle the inter-core and device communication and synchronization, to maintain a consistent view of system resources. COREMU also incorporates lightweight memory transactions, feedback-directed scheduling, lazy code invalidation and adaptive signal control to provide scalable performance. To make COREMU useful in practice, we also provide some preliminary tools and APIs that can help programmers to diagnose performance problems and (concurrency) bugs. A working prototype, which reuses the widely-used QEMU as the sequential emulator, is with only 2500 lines of code (LOCs) changes to QEMU. It currently supports x64 and ARM platforms, and can emulates up to 255 (1) cores running commodity OSes with practical performance, while QEMU cannot scale above 32 cores. A set of performance evaluation against QEMU indicates that, COREMU has negligible uniprocessor emulation overhead, performs and scales significantly better than QEMU. We also show how COREMU could be used to diagnose performance problems and concurrency bugs of both OS kernel and parallel applications.

关键词： Full-system Emulator parallel Emulator Multicore Design Experimentation Performance

来源：评论

学校读者我要写书评

暂无评论

A Domain-Specific Approach To Heterogeneous parallelism

A Domain-Specific Approach To Heterogeneous Parallelism

引用

16th acm symposium on principles and practice of parallel programming

作者： Chafi, Hassan Sujeeth, Arvind K. Brown, Kevin J. Lee, HyoukJoong Atreya, Anand R. Olukotun, Kunle Stanford Univ Pervas Parallelism Lab Stanford CA 94305 USA

Exploiting heterogeneous parallel hardware currently requires mapping application code to multiple disparate programming models. Unfortunately, general-purpose programming models available today can yield high performance but are too low-level to be accessible to the average programmer. We propose leveraging domain-specific languages (DSLs) to map high-level application code to heterogeneous devices. To demonstrate the potential of this approach we present OptiML, a DSL for machine learning. OptiML programs are implicitly parallel and can achieve high performance on heterogeneous hardware with no modification required to the source code. For such a DSL-based approach to be tractable at large scales, better tools are required for DSL authors to simplify language creation and parallelization. To address this concern, we introduce De lite, a system designed specifically for DSLs that is both a framework for creating an implicitly parallel DSL as well as a dynamic runtime providing automated targeting to heterogeneous parallel hardware. We show that OptiML running on De lite achieves single-threaded, parallel, and GPU performance superior to explicitly parallelized MATLAB code in nearly all cases.

关键词： parallel programming Domain-Specific Languages Dynamic Optimizations Languages Performance

来源：评论

学校读者我要写书评

暂无评论

Lifeline-based Global Load Balancing 11

Lifeline-based Global Load Balancing

引用

16th acm symposium on principles and practice of parallel programming

作者： Saraswat, Vijay Kambadur, Prabhanjan Kodali, Sreedhar Grove, David Krishnamoorthy, Sriram Pacific NW Natl Lab Richland WA 99352 USA

ISBN: (纸本)9781450301190

On shared-memory systems, Cilk-style work-stealing [5] has been used to effectively parallelize irregular task-graph based applications such as Unbalanced Tree Search (UTS) [24, 28]. there are two main difficulties in extending this approach to distributed memory. In the shared memory approach, thieves (nodes without work) constantly attempt to asynchronously steal work from randomly chosen victims until they find work. In distributed memory, thieves cannot autonomously steal work from a victim without disrupting its execution. When work is sparse, this results in performance degradation. In essence, a direct extension of traditional work-stealing to distributed memory violates the work-first principle underlying work-stealing. Further, thieves spend useless CPU cycles attacking victims that have no work, resulting in system inefficiencies in multi-programmed contexts. Second, it is non-trivial to detect active distributed termination (detect that programs at all nodes are looking for work, hence there is no work). this problem is well-studied and requires careful design for good performance. Unfortunately, in most existing languages/frameworks, application developers are forced to implement their own distributed termination detection. In this paper, we develop a simple set of ideas that allow work-stealing to be efficiently extended to distributed memory. First, we introduce lifeline graphs: low-degree, low-diameter, fully-connected directed graphs. Such graphs can be constructed from k-dimensional hypercubes. When a node is unable to find work after w unsuccessful steals, it quiesces after informing the outgoing edges in its lifeline graph. Quiescent nodes do not disturb other nodes. A quiesced node is reactivated when work arrives from a lifeline, and itself shares this work with those of its incoming lifelines that are activated. Termination occurs precisely when computation at all nodes has quiesced. In a language such as X10, such passive distributed terminati

关键词： UTS global load balancing distributed computing X10 work-stealing parallel programming Algorithms Design

来源：评论

学校读者我要写书评

暂无评论

PSP practice Support System Using Defect Types based on phenomenon

PSP Practice Support System Using Defect Types based on phen...

引用

16th International symposium on Artificial Life and Robotics (AROB 16th '11)

作者： Yamaguchi, Daisuke Niimi, Ayahiko Katayama, Fumiyo Takahashi, Muneo Toin Univ Yokohama Aoba Ku 1614 Kurogane Cho Yokohama Kanagawa Japan Future Univ Hakodate Hakodate Hokkaido Japan

ISBN: (纸本)9784990288051

In this paper, we propose the PSP practice Support System using Defect Types based on Phenomenon. this system can transmit programming to specific human among many software processes using a Multiagent technology. the system is also synthesized to do parallel and cooperative proposing internally. Applying the proposed method to a personal process-removing task, a flexible programming for quality of software. Software developments depend on information, which is possible to collection of personal process. Agent planning has get use working data on user action and other communication. therefore collection of all user data is necessary for agent learning. Agent studies the best transmission programming, planning and quality according to the makes planning in the personal process.

关键词： Multi-Agent System Personal Software Process Software Engineering Artificial Intelligence

来源：评论

学校读者我要写书评

暂无评论

Tool demonstration: DrHJ - A lightweight pedagogic IDE for habanero Java

Tool demonstration: DrHJ - A lightweight pedagogic IDE for h...

引用

9th International Conference on principles and practice of programming in Java, PPPJ 2011

作者： Payne, Jarred Raman, Raghavan Cav´e, Vincent Ricken, Mathias Cartwright, Robert Sarkar, Vivek Department of Computer Science Rice University United States

ISBN: (纸本)9781450309356

the Java language and runtime environment has had a profound worldwide impact on computer software since its introduction nearly two decades ago. It has enabled the creation of a rich ecosystem of libraries, frameworks, and tools that promises to deliver significant value for many years to come. Consequently, a wide range of Interactive Development Environments (IDEs) have emerged to increase the productivity of Java programmers. they vary in functionality based on the expertise level assumed for their target user base. the Eclipse Java Development Tools (JDT) project offers a rich set of power tools for experienced programmers, but can be harder for novice programmers to set up and use. In contrast, IDEs such as DrJava [2] and BlueJ [16] have been developed primarily for use in introductory programming courses. © 2011 acm.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

How's the parallel computing revolution going? 11

How's the parallel computing revolution going?

引用

Proceedings of the 16th acm symposium on principles and practice of parallel programming

作者： Kathryn S. McKinley The University of Texas at Austin Austin TX USA

ISBN: (纸本)9781450301190

Two trends changed the computing landscape over the past decade: (1) hardware vendors started delivering chip multiprocessors (CMPs) instead of uniprocessors, and (2) software developers increasingly chose managed languages instead of native languages. Unfortunately, the former change is disrupting the virtuous-cycle between performance improvements and software innovation. Establishing a new parallel performance virtuous cycle for managed languages will require scalable applications executing on scalable Virtual Machine (VM) services, since the VM schedules, monitors, compiles, optimizes, garbage collects, and executes together with the application. this talk describes current progress, opportunities, and challenges for scalable VM services. the parallel computing revolution urgently needs more innovations.

关键词： multicore managed languages

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共41页 << < 20 21 22 23 24 25 26 27 28 29 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：