检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

333 篇 会议
46 篇 期刊文献

馆藏范围

379 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

329 篇 工学
- 283 篇 软件工程
- 268 篇 计算机科学与技术...
- 12 篇 电子科学与技术（可...
- 7 篇 信息与通信工程
- 7 篇 控制科学与工程
- 4 篇 机械工程
- 4 篇 生物医学工程（可授...
- 4 篇 生物工程
- 2 篇 力学（可授工学、理...
- 1 篇 动力工程及工程热...
- 1 篇 电气工程
- 1 篇 建筑学
- 1 篇 土木工程
- 1 篇 化学工程与技术
- 1 篇 核科学与技术
- 1 篇 农业工程
- 1 篇 环境科学与工程（可...
61 篇 理学
- 55 篇 数学
- 5 篇 系统科学
- 4 篇 生物学
- 4 篇 统计学（可授理学、...
- 3 篇 化学
- 1 篇 物理学
19 篇 管理学
- 11 篇 管理科学与工程(可...
- 9 篇 工商管理
- 8 篇 图书情报与档案管...
3 篇 经济学
- 3 篇 应用经济学
3 篇 法学
- 3 篇 社会学
1 篇 教育学
- 1 篇 教育学
1 篇 农学
- 1 篇 作物学

主题

71 篇 performance
49 篇 parallel process...
42 篇 algorithms
41 篇 parallel program...
39 篇 languages
34 篇 design
21 篇 gpu
20 篇 parallel algorit...
12 篇 experimentation
12 篇 measurement
9 篇 theory
8 篇 mpi
8 篇 parallel computi...
7 篇 scalability
7 篇 graphics process...
7 篇 parallel
7 篇 concurrency
6 篇 parallelism
6 篇 semantics
6 篇 openmp

机构

8 篇 carnegie mellon ...
4 篇 univ wisconsin d...
4 篇 indiana univ blo...
3 篇 univ of tokyo
3 篇 univ chinese aca...
3 篇 massachusetts in...
3 篇 univ illinois ur...
3 篇 swiss fed inst t...
3 篇 mit csail united...
3 篇 shanghai jiao to...
3 篇 tsinghua univ pe...
3 篇 univ utah sch co...
3 篇 rice univ housto...
3 篇 univ calif berke...
2 篇 ist austria klos...
2 篇 princeton univ d...
2 篇 georgetown univ ...
2 篇 shanghai key lab...
2 篇 univ of wisconsi...
2 篇 tsinghua univers...

作者

8 篇 blelloch guy e.
6 篇 hoefler torsten
6 篇 garland michael
6 篇 chen haibo
6 篇 shun julian
5 篇 sun yihan
5 篇 zhai jidong
5 篇 tsigas philippas
4 篇 dhulipala laxman
4 篇 tan guangming
4 篇 wang haojie
4 篇 nikolopoulos dim...
4 篇 long guoping
4 篇 valero mateo
4 篇 mellor-crummey j...
4 篇 gu yan
4 篇 kennedy ken
3 篇 taura kenjiro
3 篇 li jiajia
3 篇 yonezawa akinori

语言

353 篇 英文
26 篇 其他

检索条件"任意字段=6th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming"

共 379 条记录，以下是141-150 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

the lock-free k-LSM relaxed priority queue 2015

The lock-free k-LSM relaxed priority queue

引用

20th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2015

作者： Wimmer, Martin Gruber, Jakob Träff, Jesper Larsson Tsigas, Philippas Faculty of Informatics Parallel Computing Vienna University of Technology Vienna/Wien1040 Austria Computer Science and Engineering Chalmers University of Technology Göteborg412 96 Sweden

ISBN: (纸本)9781450332057

We present a new, concurrent, lock-free priority queue that relaxes the delete-min operation to allow deletion of any of the ρ+1 smallest keys instead of only a minimal one, where ρ is a parameter that can be configured at runtime. It is built from a logarithmic number of sorted arrays, similar to log-structured merge-trees (LSM). For keys added and removed by the same thread the behavior is identical to a non-relaxed priority queue. We compare to state-of-the-art lock-free priority queues with both relaxed and non-relaxed semantics, showing high performance and good scalability of our approach.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Scaling Out Speculative Execution of Finite-State Machines with parallel Merge 20

Scaling Out Speculative Execution of Finite-State Machines w...

引用

25th acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Xia, Yang Jiang, Peng Agrawal, Gagan Ohio State Univ Columbus OH 43210 USA Univ Iowa Iowa City IA USA

ISBN: (纸本)9781450368186

A finite-state machine (FSM) is a key component for many important applications, such as Huffman decoding, regular expression matching and HTML tokenization. Due to its inherent dependencies and unpredictable memory access pattern, FSM computations are considered to be extremely difficult to parallelize. As such, significant research efforts have been made to accelerate FSM computations. Although they achieve promising performance results on multi-core machines, these methods are not scalable for emerging many-core architectures such as the GPUs. Based on our experiments, we point out that the bottleneck of achieving scalability on GPUs is the sequential merge inherent to these methods. However, unlike the case for simple reduction loops, parallel merge implementations for FSM computations typically require runtime checks and re-executions, which can also impede performance. Based on these observations, we develop parallel merge techniques that select efficient runtime check implementations and avoids unnecessary re-executions. Further, based on GPU architectural features, we develop optimization techniques to improve performance. We evaluate our parallel merge implementations on a set of representative algorithms. Experimental results show that our parallel merge implementations are 2.02-6.74 times more efficient than corresponding sequential merge implementations and achieve better scalability on an Nvidia V100 GPU.

关键词： Finite-State Machines Speculation GPUs

来源：评论

学校读者我要写书评

暂无评论

parallel Schedule Synthesis for Attribute Grammars 13

Parallel Schedule Synthesis for Attribute Grammars

引用

18th acm sigplan symposium on principles and practice of parallel programming

作者： Meyerovich, Leo A. Torok, Matthew E. Atkinson, Eric Bodik, Rastislav Univ Calif Berkeley Berkeley CA 94720 USA

ISBN: (纸本)9781450319225

We examine how to synthesize a parallel schedule of structured traversals over trees. In our system, programs are declaratively specified as attribute grammars. Our synthesizer automatically, correctly, and quickly schedules the attribute grammar as a composition of parallel tree traversals. Our downstream compiler optimizes for GPUs and multicore CPUs. We provide support for designing efficient schedules. First, we introduce a declarative language of schedules where programmers may constrain any part of the schedule and the synthesizer will complete and autotune the rest. Furthermore, the synthesizer answers debugging queries about how schedules may be completed. We evaluate our approach with two case studies. First, we created the first parallel schedule for a large fragment of CSS and report a 3X multicore speedup. Second, we created an interactive GPU-accelerated animation of over 100,000 nodes.

关键词： CSS layout sketching attribute grammars scheduling

来源：评论

学校读者我要写书评

暂无评论

No Barrier in the Road: A Comprehensive Study and Optimization of ARM Barriers 20

No Barrier in the Road: A Comprehensive Study and Optimizati...

引用

25th acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Liu, Nian Zang, Binyu Chen, Haibo Shanghai Jiao Tong Univ Inst Parallel & Distributed Syst Shanghai Peoples R China Shanghai Jiao Tong Univ Shanghai Key Lab Scalable Comp & Syst Shanghai Peoples R China

ISBN: (纸本)9781450368186

In this paper, we present the first comprehensive performance characterization and optimization of ARM barriers on both mobile and server platforms. We draw a set of observations through several abstracted models and validate them in scenarios where barriers are intensively used. We find that (1) order-preserving approaches without involving the bus significantly outperform other approaches, and (2) the tremendous overhead mostly comes from barriers strictly following remote memory references. Usually, such barriers are inserted when threads are exchanging data, and they are used to ensure the relative order between storing the data to a shared buffer and setting a flag to inform the receiver. Based on the observations, we propose a new mechanism, Pilot, to remove such barriers by leveraging the single-copy atomicity to piggyback the flag with the data. Applying Pilot only requires minor changes to applications and provides 10%-360% performance improvements in multiple benchmarks, which are close to the ideal performance without barriers.

关键词： barrier synchronization concurrency lock

来源：评论

学校读者我要写书评

暂无评论

Using Sample-Based Time Series Data for Automated Diagnosis of Scalability Losses in parallel Programs 20

Using Sample-Based Time Series Data for Automated Diagnosis ...

引用

25th acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Wei, Lai Mellor-Crummey, John Rice Univ Dept Comp Sci Houston TX 77005 USA Rice Univ Houston TX USA Pony Ai Fremont CA USA

ISBN: (纸本)9781450368186

the performance of many parallel applications has failed to scale as fast as successive generations of hardware on which these applications execute. To understand the cause of scalability losses, experts use performance tools to monitor and analyze application behavior. Profiles generated by performance tools can usually indicate the presence of scalability losses while time series data are generally necessary to pinpoint the root causes of such losses. However, manual analysis of time series data can be difficult in executions with a large number of processes, long running times, and deep call chains. this paper describes an automated framework that analyzes sample-based time series data to diagnose scalability losses in parallel executions. the framework's automated diagnosis of scalability losses indicates their symptoms, severity, and causes. Two case studies illustrate the effectiveness of this framework. When compared to a tool that analyzes performance using instrumentation-based traces, our overhead for collecting sample-based time series is 1/28 in time and 1/1600 in space while our automated analysis takes 1/25 of the time.

关键词： Performance automated diagnosis scalability losses sample-based time series data

来源：评论

学校读者我要写书评

暂无评论

Copperhead: Compiling an Embedded Data parallel Language 11

Copperhead: Compiling an Embedded Data Parallel Language

引用

16th acm symposium on principles and practice of parallel programming

作者： Catanzaro, Bryan Garland, Michael Keutzer, Kurt Univ Calif Berkeley Berkeley CA 94720 USA

ISBN: (纸本)9781450301190

Modern parallel microprocessors deliver high performance on applications that expose substantial fine-grained data parallelism. Although data parallelism is widely available in many computations, implementing data parallel algorithms in low-level languages is often an unnecessarily difficult task. the characteristics of parallel microprocessors and the limitations of current programming methodologies motivate our design of Copperhead, a high-level data parallel language embedded in Python. the Copperhead programmer describes parallel computations via composition of familiar data parallel primitives supporting both flat and nested data parallel computation on arrays of data. Copperhead programs are expressed in a subset of the widely used Python programming language and interoperate with standard Python modules, including libraries for numeric computation, data visualization, and analysis. In this paper, we discuss the language, compiler, and runtime features that enable Copperhead to efficiently execute data parallel code. We define the restricted subset of Python which Copperhead supports and introduce the program analysis techniques necessary for compiling Copperhead code into efficient low-level implementations. We also outline the runtime support by which Copperhead programs interoperate with standard Python modules. We demonstrate the effectiveness of our techniques with several examples targeting the CUDA platform for parallel programming on GPUs. Copperhead code is concise, on average requiring 3.6 times fewer lines of code than CUDA, and the compiler generates efficient code, yielding 45-100% of the performance of hand-crafted, well optimized CUDA code.

关键词： Python Data parallelism GPU Algorithms Design Performance

来源：评论

学校读者我要写书评

暂无评论

GraphCube: Interconnection Hierarchy-aware Graph Processing 24

GraphCube: Interconnection Hierarchy-aware Graph Processing

引用

29th acm sigplan Annual symposium on principles and practice of parallel programming (PPoPP)

作者： Gan, Xinbiao Wu, Guang Qiu, Shenghao Xiong, Feng Si, Jiaqi Fang, Jianbin Dong, Dezun Gong, Chunye Li, Tiejun Wang, Zheng NUDT Beijing Peoples R China Univ Leeds Leeds W Yorkshire England Natl Supercomputer Ctr Tianjin Peoples R China

ISBN: (纸本)9798400704352

Processing large-scale graphs with billions to trillions of edges requires efficiently utilizing parallel systems. However, current graph processing engines do not scale well beyond a few tens of computing nodes because they are oblivious to the communication cost variations across the interconnection hierarchy. We introduce GraphCube, a better approach to optimizing graph processing on large-scale parallel systems with complex interconnections. GraphCube features a new graph partitioning approach to achieve better load balancing and minimize communication overhead across multiple levels of the interconnection hierarchy. We evaluate GraphCube by applying it to fundamental graph operations performed on synthetic and real-world graph datasets. Our evaluation used up to 79,024 computing nodes and 1.2+ million processor cores. Our large-scale experiments show that GraphCube outperforms state-of-the-art parallel graph processing methods in throughput and scalability. Furthermore, GraphCube outperformed the top-ranked systems on the Graph 500 list.

关键词： Graph processing Graph partitioning parallel computing Vectorization Graph500

来源：评论

学校读者我要写书评

暂无评论

PPoPP 2013 - Proceedings of the 2013 acm sigplan symposium on principles and practice of parallel programming

PPoPP 2013 - Proceedings of the 2013 ACM SIGPLAN Symposium o...

引用

18th acm sigplan symposium on principles and practice of parallel programming, PPoPP 2013

ISBN: (纸本)9781450319225

the proceedings contain 45 papers. the topics discussed include: a peta-scalable CPU-GPU algorithm for global atmospheric simulations;adoption protocols for fanout-optimal fault-tolerant termination detection;betweenness centrality: algorithms and implementations;complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU;fast concurrent queues for x86 processors;FASTLANE: improving performance of software transactional memory for low thread counts;Ligra: a lightweight graph processing framework for shared memory;ownership passing: efficient distributed memory programming on multi-core systems;parallel suffix array and least common prefix for the GPU;Streamscan: fast scan algorithms for GPUs without global barrier synchronization;using hardware transactional memory to correct and simplify a readers-writer lock algorithm;and exploring different automata representations for efficient regular expression matching on GPUs.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Backtracking-based Load Balancing

Backtracking-based Load Balancing

引用

14th acm sigplan symposium on principles and practice of parallel programming

作者： Hiraishi, Tasuku Yasugi, Masahiro Umatani, Seiji Yuasa, Taiichi Kyoto Univ Acad Ctr Comp & Media Studies Kyoto 6068501 Japan Kyoto Univ Grad Sch Informat Kyoto 6068501 Japan

ISBN: (纸本)9781605583976

High-productivity languages for parallel computing become more important as parallel environments including multicores become more common. Cilk is such a language. It provides good load balancing for many applications including irregular ones;that is, it keeps all workers busy by creating plenty of "logical" threads and adopting the oldest-first work stealing strategy. this paper proposes a "logical thread"-free framework called Tascell, which achieves a higher performance and supports a wider range of parallel environments including clusters without loss of productivity. A Tascell worker spawns a "real" task only when requested by another idle worker. the worker performs the spawning by temporarily "backtracking" and restoring its oldest task-spawnable state. Our approach eliminates the cost of spawning/managing logical threads. It also promotes the reuse of workspaces and improves the locality of reference since it does not need to prepare a workspace for each concurrently runnable logical thread. Furthermore, Tascell enables elegant and highly-efficient backtrack search algorithms with delayed workspace copying. For instance, our 16-queens problem solver is 1.86 times faster than Cilk on a system with two dual-core processors. Our approach also enables a single program to run in both shared and distributed memory environments with reasonable efficiency and scalability.

关键词： Design Languages Performance load balancing parallel computing backtracking backtrack search

来源：评论

学校读者我要写书评

暂无评论

Hierarchical Memory Management for Mutable State 18

Hierarchical Memory Management for Mutable State

引用

23rd acm sigplan symposium on principles and practice of parallel programming (PPoPP)

作者： Guatto, Adrien Westrick, Sam Raghunathan, Ram Acar, Umut Fluet, Matthew Carnegie Mellon Univ Pittsburgh PA 15213 USA Rochester Inst Technol Rochester NY 14623 USA

ISBN: (纸本)9781450349826

It is well known that modern functional programming languages are naturally amenable to parallel programming. Achieving efficient parallelism using functional languages, however, remains difficult. Perhaps the most important reason for this is their lack of support for efficient in-place updates, i.e., mutation, which is important for the implementation of both parallel algorithms and the run-time system services (e.g., schedulers and synchronization primitives) used to execute them. In this paper, we propose techniques for efficient mutation in parallel functional languages. To this end, we couple the memory manager with the thread scheduler to make reading and updating data allocated by nested threads efficient. We describe the key algorithms behind our technique, implement them in the MLton Standard ML compiler, and present an empirical evaluation. Our experiments show that the approach performs well, significantly improving efficiency over existing functional language implementations.

关键词： parallel functional language implementation garbage collection hierarchical heaps mutation promotion

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共38页 << < 11 12 13 14 15 16 17 18 19 20 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：