检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

361 篇 会议
46 篇 期刊文献

馆藏范围

407 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

351 篇 工学
- 296 篇 软件工程
- 287 篇 计算机科学与技术...
- 13 篇 电子科学与技术（可...
- 7 篇 信息与通信工程
- 7 篇 控制科学与工程
- 4 篇 机械工程
- 4 篇 电气工程
- 4 篇 生物工程
- 3 篇 生物医学工程（可授...
- 2 篇 动力工程及工程热...
- 1 篇 力学（可授工学、理...
- 1 篇 建筑学
- 1 篇 土木工程
- 1 篇 化学工程与技术
- 1 篇 核科学与技术
- 1 篇 农业工程
- 1 篇 环境科学与工程（可...
61 篇 理学
- 55 篇 数学
- 6 篇 系统科学
- 4 篇 生物学
- 4 篇 统计学（可授理学、...
- 3 篇 化学
- 1 篇 物理学
17 篇 管理学
- 12 篇 管理科学与工程(可...
- 9 篇 工商管理
- 5 篇 图书情报与档案管...
4 篇 教育学
- 4 篇 教育学
3 篇 经济学
- 3 篇 应用经济学
2 篇 法学
- 2 篇 社会学
1 篇 农学
- 1 篇 作物学

主题

72 篇 performance
49 篇 parallel process...
46 篇 parallel program...
43 篇 algorithms
40 篇 languages
34 篇 design
22 篇 gpu
21 篇 parallel algorit...
12 篇 experimentation
12 篇 measurement
10 篇 parallel computi...
9 篇 theory
8 篇 mpi
7 篇 parallelism
7 篇 graphics process...
7 篇 parallel
7 篇 openmp
7 篇 concurrency
6 篇 multicore
5 篇 reliability

机构

7 篇 carnegie mellon ...
4 篇 univ wisconsin d...
4 篇 indiana univ blo...
4 篇 shanghai jiao to...
3 篇 univ of tokyo
3 篇 tsinghua univ de...
3 篇 univ chinese aca...
3 篇 massachusetts in...
3 篇 univ illinois ur...
3 篇 swiss fed inst t...
3 篇 mit csail united...
3 篇 tsinghua univ pe...
3 篇 univ utah sch co...
3 篇 rice univ housto...
3 篇 univ calif berke...
3 篇 univ texas austi...
2 篇 ist austria klos...
2 篇 fudan univ sch c...
2 篇 princeton univ d...
2 篇 georgetown univ ...

作者

8 篇 blelloch guy e.
7 篇 chen haibo
6 篇 hoefler torsten
6 篇 garland michael
6 篇 zhai jidong
6 篇 shun julian
5 篇 sun yihan
5 篇 tsigas philippas
4 篇 dhulipala laxman
4 篇 pingali keshav
4 篇 chen wenguang
4 篇 tan guangming
4 篇 wang haojie
4 篇 nikolopoulos dim...
4 篇 long guoping
4 篇 valero mateo
4 篇 mellor-crummey j...
4 篇 gu yan
4 篇 leiserson charle...
4 篇 kennedy ken

语言

380 篇 英文
26 篇 其他
1 篇 葡萄牙文

检索条件"任意字段=16th ACM Symposium on Principles and Practice of Parallel Programming"

共 407 条记录，以下是161-170 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Designing and auto-tuning parallel 3-D FFT for computation-communication overlap 14

Designing and auto-tuning parallel 3-D FFT for computation-c...

引用

2014 19th acm SIGPLAN symposium on principles and practice of parallel programming, PPoPP 2014

作者： Song, Sukhyun Hollingsworth, Jeffrey K. Department of Computer Science University of Maryland College Park United States

ISBN: (纸本)9781450326568

this paper presents a method to design and auto-tune a new parallel 3-D FFT code using the non-blocking MPI all-to-all operation. We achieve high performance by optimizing computation-communication overlap. Our code performs fully asynchronous communication without any support from special hardware. We also improve cache performance through loop tiling. To cope with the complex tradeoff regarding our optimization techniques, we parameterize our code and auto-tune the parameters efficiently in a large parameter space. Experimental results from two systems confirm that our code achieves a speedup of up to 1.76× over the FFTW library. Copyright © 2014 acm.

关键词： Fast Fourier transforms

来源：评论

学校读者我要写书评

暂无评论

Well-structured futures and cache locality 14

Well-structured futures and cache locality

引用

2014 19th acm SIGPLAN symposium on principles and practice of parallel programming, PPoPP 2014

作者： Herlihy, Maurice Liu, Zhiyu Computer Science Department Brown University United States

ISBN: (纸本)9781450326568

In fork-join parallelism, a sequential program is split into a directed acyclic graph of tasks linked by directed dependency edges, and the tasks are executed, possibly in parallel, in an order consistent with their dependencies. A popular and effective way to extend fork-join parallelism is to allow threads to create futures. A thread creates a future to hold the results of a computation, which may or may not be executed in parallel. that result is returned when some thread touches that future, blocking if necessary until the result is ready. Recent research has shown that while futures can, of course, enhance parallelism in a structured way, they can have a deleterious effect on cache locality. In the worst case, futures can incur Ω(PT ∞ +tT∞) deviations, which implies Ω(CPT∞+CtT∞) additional cache misses, where C is the number of cache lines, P is the number of processors, t is the number of touches, and T∞ is the computation span. Since cache locality has a large impact on software performance on modern multicores, this result is troubling. In this paper, however, we show that if futures are used in a simple, disciplined way, then the situation is much better: if each future is touched only once, either by the thread that created it, or by a later descendant of the thread that created it, then parallel executions with work stealing can incur at most O(CPT2∞ ) additional cache misses, a substantial improvement. this structured use of futures is characteristic of many (but not all) parallel applications. Copyright © 2014 acm.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Efficient deterministic multithreading without global barriers 14

Efficient deterministic multithreading without global barrie...

引用

Proceedings of the 19th acm SIGPLAN symposium on principles and practice of parallel programming

作者： Lu, Kai Zhou, Xu Bergan, Tom Wang, Xiaoping Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China College of Computer National University of Defense Technology Changsha China University of Washington Computer Science and Engineering United States

ISBN: (纸本)9781450326568

Multithreaded programs execute nondeterministically on conventional architectures and operating systems. this complicates many tasks, including debugging and testing. Deterministic multithreading (DMT) makes the output of a multithreaded program depend on its inputs only, which can totally solve the above problem. However, current DMT implementations suffer from a common inefficiency: they use frequent global barriers to enforce a deterministic ordering on memory accesses. In this paper, we eliminate that inefficiency using an execution model we call deterministic lazy release consistency (DLRC). Our execution model uses the Kendo algorithm to enforce a deterministic ordering on synchronization, and it uses a deterministic version of the lazy release consistency memory model to propagate memory updates across threads. Our approach guarantees that programs execute deterministically even when they contain data races. We implemented a DMT system based on these ideas (RFDet) and evaluated it using 17 parallel applications. Our implementation targets C/C++ programs that use POSIX threads. Results show that RFDet gains nearly 2x speedup compared with Dthreads-a start-of-the-art DMT system. Copyright © 2014 acm.

关键词： C++ (programming language)

来源：评论

学校读者我要写书评

暂无评论

Data structures for task-based priority scheduling 14

Data structures for task-based priority scheduling

引用

2014 19th acm SIGPLAN symposium on principles and practice of parallel programming, PPoPP 2014

作者： Wimmer, Martin Versaci, Francesco Träff, Jesper Larsson Cederman, Daniel Tsigas, Philippas Faculty of Informatics Parallel Computing Vienna University of Technology 1040 Vienna/Wien Austria Computer Science and Engineering Chalmers University of Technology 412 96 Göteborg Sweden

ISBN: (纸本)9781450326568

We present three lock-free data structures for priority task scheduling: a priority work-stealing one, a centralized one with ρ-relaxed semantics, and a hybrid one combining both concepts. With the single-source shortest path (SSSP) problem as example, we show how the different approaches affect the prioritization and provide upper bounds on the number of examined nodes. We argue that priority task scheduling allows for an intuitive and easy way to parallelize the SSSP problem, notoriously a hard task. Experimental evidence supports the good scalability of the resulting algorithm. the larger aim of this work is to understand the trade-offs between scalability and priority guarantees in task scheduling systems. We show that ρ-relaxation is a valuable technique for improving the first, while still allowing semantic constraints to be satisfied: the lock-free, hybrid κ-priority data structure can scale as well as work-stealing, while still providing strong priority scheduling guarantees, which depend on the parameter κ. Our theoretical results open up possibilities for even more scalable data structures by adopting a weaker form of ρ-relaxation, which still enables the semantic constraints to be respected.

关键词： Scalability

来源：评论

学校读者我要写书评

暂无评论

CUDA-NP: Realizing nested thread-level parallelism in GPGPU applications 14

CUDA-NP: Realizing nested thread-level parallelism in GPGPU ...

引用

2014 19th acm SIGPLAN symposium on principles and practice of parallel programming, PPoPP 2014

作者： Yang, Yi Zhou, Huiyang Department of Computing Systems Architecture NEC Laboratories America Inc. United States Department of Electrical and Computer Engineering North Carolina State University United States

ISBN: (纸本)9781450326568

parallel programs consist of series of code sections with different thread-level parallelism (TLP). As a result, it is rather common that a thread in a parallel program, such as a GPU kernel in CUDA programs, still contains both sequential code and parallel loops. In order to leverage such parallel loops, the latest Nvidia Kepler architecture introduces dynamic parallelism, which allows a GPU thread to start another GPU kernel, thereby reducing the overhead of launching kernels from a CPU. However, with dynamic parallelism, a parent thread can only communicate with its child threads through global memory and the overhead of launching GPU kernels is non-trivial even within GPUs. In this paper, we first study a set of GPGPU benchmarks that contain parallel loops, and highlight that these benchmarks do not have a very high loop count or high degrees of TLP. Consequently, the benefits of leveraging such parallel loops using dynamic parallelism are too limited to offset its overhead. We then present our proposed solution to exploit nested parallelism in CUDA, referred to as CUDA-NP. With CUDA-NP, we initially enable a high number of threads when a GPU program starts, and use control flow to activate different numbers of threads for different code sections. We implemented our proposed CUDA-NP framework using a directive-based compiler approach. For a GPU kernel, an application developer only needs to add OpenMP-like pragmas for parallelizable code sections. then, our CUDA-NP compiler automatically generates the optimized GPU kernels. It supports both the reduction and the scan primitives, explores different ways to distribute parallel loop iterations into threads, and efficiently manages on-chip resource. Our experiments show that for a set of GPGPU benchmarks, which have already been optimized and contain nested parallelism, our proposed CUDA-NP framework further improves the performance by up to 6.69 times and 2.18 times on average. Copyright © 2014 acm.

关键词： Application programming interfaces (API)

来源：评论

学校读者我要写书评

暂无评论

SCCMulti: An improved parallel strongly connected components algorithm 14

SCCMulti: An improved parallel strongly connected components...

引用

2014 19th acm SIGPLAN symposium on principles and practice of parallel programming, PPoPP 2014

作者： Tomkins, Daniel Smith, Timmie Amato, Nancy M. Rauchwerger, Lawrence Parasol Laboratory Department of Computer Science and Engineering Texas A and M University United States

ISBN: (纸本)9781450326568

Tarjan's famous linear time, sequential algorithm for finding the strongly connected components (SCCs) of a graph relies on depth first search, which is inherently sequential. Deterministic parallel algorithms solve this problem in logarithmic time using matrix multiplication techniques, but matrix multiplication requires a large amount of total work. Randomized algorithms based on reachability - the ability to get from one vertex to another along a directed path - greatly improve the work bound in the average case. However, these algorithms do not always perform well;for instance, Divide-and-Conquer Strong Components (DCSC), a scalable, divide-and-conquer algorithm, has good expected theoretical limits, but can perform very poorly on graphs for which the maximum reachability of any vertex is small. A related algorithm, MultiPivot, gives very high probability guarantees on the total amount of work for all graphs, but this improvement introduces an overhead that increases the average running time. this work introduces SCCMulti, a multi-pivot improvement of DCSC that offers the same consistency as MultiPivot without the time overhead. We provide experimental results demonstrating SCCMulti's scalability;these results also show that SCCMulti is more consistent than DCSC and is always faster than MultiPivot.

关键词： Matrix algebra

来源：评论

学校读者我要写书评

暂无评论

An Imperialistic Strategy Approach to Continuous Global Optimization Problem 16

An Imperialistic Strategy Approach to Continuous Global Opti...

引用

16th International symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)

作者： Anescu, George Univ Politehn Bucuresti Power Plant Engn Fac Bucharest 060042 Romania

ISBN: (纸本)9781479984480

the paper is introducing the principles of a new global optimization strategy, Imperialistic Strategy (IS), applied to the Continuous Global Optimization Problem (CGOP). Inspired from existing multi-population strategies, like the Island Model (IM) approaches to parallel Evolutionary Algorithms (EA) and the Imperialistic Competitive Algorithm (ICA), the proposed IS method is considered an optimization strategy for the reason that it can integrate other well-known optimization methods, which in the context are regarded as sub-methods (although in other contexts they are prominent global optimization methods). Four optimization methods were implemented and tested in the roles of sub-methods: Genetic Algorithm (GA) (a floating-point representation variant), Differential Evolution (DE), Quantum Particle Swarm Optimization (QPSO) and Artificial Bee Colony (ABC). the optimization performances of the proposed optimization methods were compared on a test bed of 9 known multimodal optimization problems by applying an appropriate testing methodology. the obtained increased success rates of IS multi-population variants compared to the success rates of the optimization sub-methods run separately, combined with the increased computing efficiencies possible to be perceived for parallel and distributed implementations, demonstrated that IS is a promising approach to CGOP.

关键词： genetic algorithms parallel algorithms quantum computing ABC CGOP DE EA GA ICA IM IS multipopulation variants QPSO artificial bee colony continuous global optimization problem differential evolution distributed implementations floating-point representation variant genetic algorithm imperialistic competitive algorithm imperialistic strategy approach island model multimodal optimization problems multipopulation strategies parallel evolutionary algorithms parallel implementations quantum particle swarm optimization Biological cells Genetic algorithms Linear programming Optimization Sociology Statistics Vectors Artificial Bee Colony Continuous Global Optimization Problem Differential Evolution Genetic Algorithm IS Imperialistic Competitive Algorithm Imperialistic Strategy Island Model Quantum Particle Swarm Optimization Genetic Algorithm (GA) MT activity based costing Independent component analysis islet cell antibody Instant messaging Particle swarm optimization Gallium cytology Sociology parallel algorithms quantum computing Policies and plans bee colonies linear programming

来源：评论

学校读者我要写书评

暂无评论

Resilient X10: Efficient failure-aware programming 14

Resilient X10: Efficient failure-aware programming

引用

2014 19th acm SIGPLAN symposium on principles and practice of parallel programming, PPoPP 2014

作者： Cunningham, David Grove, David Herta, Benjamin Iyengar, Arun Kawachiya, Kiyokuni Murata, Hiroki Saraswat, Vijay Takeuchi, Mikio Tardieu, Olivier IBM T. J. Watson Research Center Japan Google Inc. Japan IBM Research Tokyo Japan

ISBN: (纸本)9781450326568

Scale-out programs run on multiple processes in a cluster. In scale-out systems, processes can fail. Computations using traditional libraries such as MPI fail when any component process fails. the advent of Map Reduce, Resilient Data Sets and MillWheel has shown dramatic improvements in productivity are possible when a high-level programming framework handles scale-out and resilience automatically. We are concerned with the development of generalpurpose languages that support resilient programming. In this paper we show how the X10 language and implementation can be extended to support resilience. In Resilient X10, places may fail asynchronously, causing loss of the data and tasks at the failed place. Failure is exposed through exceptions. We identify a Happens Before Invariance Principle and require the runtime to automatically repair the global control structure of the program to maintain this principle. We show this reduces much of the burden of resilient programming. the programmer is only responsible for continuing execution with fewer computational resources and the loss of part of the heap, and can do so while taking advantage of domain knowledge. We build a complete implementation of the language, capable of executing benchmark applications on hundreds of nodes. We describe the algorithms required to make the language runtime resilient. We then give three applications, each with a different approach to fault tolerance (replay, decimation, and domain-level checkpointing). these can be executed at scale and survive node failure. We show that for these programs the overhead of resilience is a small fraction of overall runtime by comparing to equivalent non-resilient X10 programs. On one program we show end-to-end performance of Resilient X10 is ∼100x faster than Hadoop. Copyright © 2014 acm.

关键词： Fault tolerance

来源：评论

学校读者我要写书评

暂无评论

Teaching parallel Design Patterns to Undergraduates in Computer Science 14

Teaching Parallel Design Patterns to Undergraduates in Compu...

引用

45th acm SIGCSE Technical symposium on Computer Science Education (SIGCSE)

作者： Brown, Richard A. Adams, Joel C. Ferner, Clayton Shoop, Elizabeth Wilkinson, Barry St Olaf Coll Northfield MN 55057 USA Calvin Coll Grand Rapids MI 49506 USA UNC Wilmington Wilmington NC USA Macalester Coll St Paul MN 55105 USA UNC Charlotte Charlotte NC USA

ISBN: (纸本)9781450326056

the industry shift emerging forms of parallel and distributed computing (PDC), including multi-core CPUs, cloud computing, and general-purpose use of GPUs, have naturally led to increased presence of PDC elements undergraduate Computer Science curriculum recommendations, such as the new and substantial "PD" knowledge area in the joint acm/IEEE CS2013 recommendations[1]. How can undergraduate students grasp the extensive and complex range of PDC principles and practices, and apply that knowledge in problem solving, while PDC technologies continue to evolve rapidly? parallel design patterns are descriptions of effective solutions to recurring parallel programming problems in particular contexts and have emerged from long-standing industry practice. parallel patterns occur at all computational levels, ranging from low-level concurrent execution patterns (such as message passing or thread pool patterns) to high-level software design patterns suitable for organizing entire systems or their components (such as model-view-control or pipe and filter patterns). the sheer number of parallel patterns, which reflect the full breadth and complexity of PDC, can be quite daunting for a newcomer. However, the ubiquity of parallel patterns in all forms of parallel and distributed computation makes these patterns relevant and illuminating at all undergraduate levels. Knowledge of parallel patterns, being reusable elements of parallel design, guides problem-solving during the creation of parallel programs;and those enduring design patterns remain relevant and useful as new PDC infrastructure emerges in this rapidly evolving field. this panel presents four viewpoints representing various approaches for teaching parallel patterns to CS undergraduates at multiple academic levels. Moderator Dick Brown co-directs (with Adams and Shoop) the CSinparallel project (***), which produces and shares modular materials for incrementally adding parallelism to existing undergraduate comp

关键词： parallel computing parallelism distributed computing education parallel design patterns curriculum design patterns Seeds Paraguin patternlets exemplars

来源：评论

学校读者我要写书评

暂无评论

Efficient pseudorecursive evaluation schemes for non-adaptive sparse grids 1

引用

2nd Workshop on Sparse Grids and Applications, SGA 2012

作者： Buse, Gerrit Pflüger, Dirk Jacob, Riko TU München München Germany Institute for Parallel and Distributed Systems Universität Stuttgart Stuttgart Germany ETH Zurich Zurich Switzerland

ISBN: (数字)9783319045375

ISBN: (纸本)9783319045368

In this work we propose novel algorithms for storing and evaluating sparse grid functions, operating on regular (not spatially adaptive), yet potentially dimensionally adaptive grid types. Besides regular sparse grids our approach includes truncated grids, both with and without boundary grid points. Similar to the implicit data structures proposed in Feuersänger (Dünngitterverfahren für hochdimensionale elliptische partielle Differntialgleichungen. Diploma thesis, Institut für Numerische Simulation, Universität Bonn, 2005) and Murarasu et al. (Proceedings of the 16th acm symposium on principles and practice of parallel programming. Cambridge University Press, New York, 2011, pp. 25–34) we also define a bijective mapping from the multi-dimensional space of grid points to a contiguous index, such that the grid data can be stored in a simple array without overhead. Our approach is especially well-suited to exploit all levels of current commodity hardware, including cache-levels and vector extensions. Furthermore, this kind of data structure is extremely attractive for today’s real-time applications, as it gives direct access to the hierarchical structure of the grids, while outperforming other common sparse grid structures (hash maps, etc.) which do not match with modern compute platforms that well. For dimensionality d ≤10 we achieve good speedups on a 12 core Intel Westmere-EP NUMA platform compared to the results presented in Murarasu et al. (Proceedings of the International Conference on Computational Science—ICCS 2012. Procedia Computer Science, 2012). As we show, this also holds for the results obtained on Nvidia Fermi GPUs, for which we observe speedups over our own CPU implementation of up to 4.5 when dealing with moderate dimensionality. In high-dimensional settings, in the order of tens to hundreds of dimensions, our sparse grid evaluation kernels on the CPU outperform any other known implementation. © Springer International Publishing Switzerland 2014.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共41页 << < 13 14 15 16 17 18 19 20 21 22 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：