检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

344 篇 会议
19 篇 期刊文献
1 册 图书

馆藏范围

364 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

305 篇 工学
- 261 篇 软件工程
- 250 篇 计算机科学与技术...
- 13 篇 电子科学与技术（可...
- 9 篇 信息与通信工程
- 5 篇 控制科学与工程
- 4 篇 机械工程
- 4 篇 生物工程
- 3 篇 生物医学工程（可授...
- 1 篇 力学（可授工学、理...
- 1 篇 动力工程及工程热...
- 1 篇 电气工程
- 1 篇 核科学与技术
- 1 篇 农业工程
- 1 篇 环境科学与工程（可...
- 1 篇 网络空间安全
57 篇 理学
- 53 篇 数学
- 4 篇 生物学
- 4 篇 系统科学
- 4 篇 统计学（可授理学、...
- 2 篇 化学
18 篇 管理学
- 12 篇 管理科学与工程(可...
- 11 篇 工商管理
- 5 篇 图书情报与档案管...
5 篇 经济学
- 5 篇 应用经济学
3 篇 法学
- 3 篇 社会学
3 篇 教育学
- 3 篇 教育学
1 篇 农学
- 1 篇 作物学

主题

54 篇 performance
50 篇 parallel process...
34 篇 parallel program...
33 篇 algorithms
27 篇 languages
25 篇 design
20 篇 parallel algorit...
20 篇 gpu
9 篇 experimentation
9 篇 measurement
8 篇 parallel
7 篇 scalability
7 篇 graphics process...
7 篇 theory
7 篇 parallel computi...
6 篇 parallelism
6 篇 mpi
6 篇 concurrency
5 篇 graph algorithms
5 篇 logic programmin...

机构

7 篇 carnegie mellon ...
4 篇 indiana univ blo...
3 篇 univ of tokyo
3 篇 tsinghua univ de...
3 篇 univ chinese aca...
3 篇 massachusetts in...
3 篇 univ illinois ur...
3 篇 swiss fed inst t...
3 篇 mit csail united...
3 篇 shanghai jiao to...
3 篇 tsinghua univ pe...
3 篇 univ calif berke...
2 篇 ist austria klos...
2 篇 georgetown univ ...
2 篇 univ wisconsin d...
2 篇 yale university ...
2 篇 shanghai key lab...
2 篇 univ of wisconsi...
2 篇 tsinghua univers...
2 篇 shanghai jiao to...

作者

8 篇 blelloch guy e.
6 篇 hoefler torsten
6 篇 garland michael
6 篇 zhai jidong
6 篇 chen haibo
6 篇 shun julian
5 篇 sun yihan
4 篇 dhulipala laxman
4 篇 chen wenguang
4 篇 tsigas philippas
4 篇 tan guangming
4 篇 wang haojie
4 篇 mellor-crummey j...
4 篇 gu yan
4 篇 kennedy ken
3 篇 taura kenjiro
3 篇 li jiajia
3 篇 yonezawa akinori
3 篇 pingali keshav
3 篇 kim jungwon

语言

361 篇 英文
3 篇 其他

检索条件"任意字段=Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming"

共 364 条记录，以下是91-100 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

On the fly MHP Analysis 20

On the fly MHP Analysis

引用

25th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP)

作者： Saha, Sonali Nandivada, V. Krishna IIT Madras Madras Tamil Nadu India

ISBN: (纸本)9781450368186

May-Happen-in-parallel (MHP) analysis forms the basis for many problems of program analysis and program understanding. MHP analysis can also be used by IDEs (integrateddevelopment-environments) to help programmers to refactor parallel-programs, identify racy programs, understand which parts of the program run in parallel, and so on. Since the code keeps changing in the IDE, re-computing the MHP information after every change can be an expensive affair. In this manuscript, we propose a novel scheme to perform incremental MHP analysis (on the fly) of programs written in task parallel languages like X10 to keep the MHP information up to date, in an IDE environment. the key insight of our proposed approach to maintain the MHP information up to date is that we need not rebuild (from scratch) every data structure related to MHP information, after each modification (addition or deletion of statements) in the source code. the idea is to reuse the old MHP information as much as possible and incrementally recompute the MHP information (of a small set of statements) which depends on the statement added/removed. We introduce two new algorithms that deal with addition and removal of parallel constructs like finish, async, atomic, and sequential constructs like loop, if, if-else and other sequential statements, on the fly. Our evaluation shows that our algorithms run much faster than the repeated invocations of the fastest known MHP analysis for X10 programs [Sankar et al. 2016].

关键词： Concurrent programs may happen in parallel analysis incremental analysis

来源：评论

学校读者我要写书评

暂无评论

GPU Initiated OpenSHMEM: Correct and Eicient Intra-Kernel Networking for dGPUs 25

GPU Initiated OpenSHMEM: Correct and Eicient Intra-Kernel Ne...

引用

25th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP)

作者： Hamidouche, Khaled LeBeane, Michael Adv Micro Devices Inc Santa Clara CA 95054 USA

ISBN: (纸本)9781450368186

Current state-of-the-art in GPU networking utilizes a host-centric, kernel-boundary communication model that reduces performance and increases code complexity. To address these concerns, recent works have explored performing network operations from within a GPU kernel itself. However, these approaches typically involve the CPU in the critical path, which leads to high latency and ineicient utilization of network and/or GPU resources. In this work, we introduce GPU Initiated OpenSHMEM (GIO), a new intra-kernel PGAS programming model and runtime that enables GPUs to communicate directly with a NIC without the intervention of the CPU. We accomplish this by exploring the GPU's coarse-grained memory model and correcting semantic mismatches when GPUs wish to directly interact with the network. GIO also reduces latency by relying on a novel template-based design to minimize the overhead of initiating a network operation. We illustrate that for structured applications like a Jacobi 2D stencil, GIO can improve application performance by up to 40% compared to traditional kernel-boundary networking. Furthermore, we demonstrate that on irregular applications like Sparse Triangular Solve (SpTS), GIO provides up to 44% improvement compared to existing intra-kernel networking schemes.

关键词： GPUs Distributed programming models RDMA networks

来源：评论

学校读者我要写书评

暂无评论

Practical parallel Hypergraph Algorithms 20

Practical Parallel Hypergraph Algorithms

引用

25th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP)

作者： Shun, Julian MIT CSAIL Cambridge MA 02139 USA

ISBN: (纸本)9781450368186

While there has been significant work on parallel graph processing, there has been very surprisingly little work on high-performance hypergraph processing. this paper presents a collection of efficient parallel algorithms for hypergraph processing, including algorithms for betweenness centrality, maximal independent set, k-core decomposition, hyper-trees, hyperpaths, connected components, PageRank, and single-source shortest paths. For these problems, we either provide new parallel algorithms or more efficient implementations than prior work. Furthermore, our algorithms are theoretically-efficient in terms of work and depth. To implement our algorithms, we extend the Ligra graph processing framework to support hypergraphs, and our implementations benefit from graph optimizations including switching between sparse and dense traversals based on the frontier size, edge-aware parallelization, using buckets to prioritize processing of vertices, and compression. Our experiments on a 72-core machine and show that our algorithms obtain excellent parallel speedups, and are *** faster than algorithms in existing hypergraph processing frameworks.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Scaling Out Speculative Execution of Finite-State Machines with parallel Merge 20

Scaling Out Speculative Execution of Finite-State Machines w...

引用

25th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP)

作者： Xia, Yang Jiang, Peng Agrawal, Gagan Ohio State Univ Columbus OH 43210 USA Univ Iowa Iowa City IA USA

ISBN: (纸本)9781450368186

A finite-state machine (FSM) is a key component for many important applications, such as Huffman decoding, regular expression matching and HTML tokenization. Due to its inherent dependencies and unpredictable memory access pattern, FSM computations are considered to be extremely difficult to parallelize. As such, significant research efforts have been made to accelerate FSM computations. Although they achieve promising performance results on multi-core machines, these methods are not scalable for emerging many-core architectures such as the GPUs. Based on our experiments, we point out that the bottleneck of achieving scalability on GPUs is the sequential merge inherent to these methods. However, unlike the case for simple reduction loops, parallel merge implementations for FSM computations typically require runtime checks and re-executions, which can also impede performance. Based on these observations, we develop parallel merge techniques that select efficient runtime check implementations and avoids unnecessary re-executions. Further, based on GPU architectural features, we develop optimization techniques to improve performance. We evaluate our parallel merge implementations on a set of representative algorithms. Experimental results show that our parallel merge implementations are 2.02-6.74 times more efficient than corresponding sequential merge implementations and achieve better scalability on an Nvidia V100 GPU.

关键词： Finite-State Machines Speculation GPUs

来源：评论

学校读者我要写书评

暂无评论

Using Sample-Based Time Series Data for Automated Diagnosis of Scalability Losses in parallel Programs 20

Using Sample-Based Time Series Data for Automated Diagnosis ...

引用

25th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP)

作者： Wei, Lai Mellor-Crummey, John Rice Univ Dept Comp Sci Houston TX 77005 USA Rice Univ Houston TX USA Pony Ai Fremont CA USA

ISBN: (纸本)9781450368186

the performance of many parallel applications has failed to scale as fast as successive generations of hardware on which these applications execute. To understand the cause of scalability losses, experts use performance tools to monitor and analyze application behavior. Profiles generated by performance tools can usually indicate the presence of scalability losses while time series data are generally necessary to pinpoint the root causes of such losses. However, manual analysis of time series data can be difficult in executions with a large number of processes, long running times, and deep call chains. this paper describes an automated framework that analyzes sample-based time series data to diagnose scalability losses in parallel executions. the framework's automated diagnosis of scalability losses indicates their symptoms, severity, and causes. Two case studies illustrate the effectiveness of this framework. When compared to a tool that analyzes performance using instrumentation-based traces, our overhead for collecting sample-based time series is 1/28 in time and 1/1600 in space while our automated analysis takes 1/25 of the time.

关键词： Performance automated diagnosis scalability losses sample-based time series data

来源：评论

学校读者我要写书评

暂无评论

No Barrier in the Road: A Comprehensive Study and Optimization of ARM Barriers 20

No Barrier in the Road: A Comprehensive Study and Optimizati...

引用

25th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP)

作者： Liu, Nian Zang, Binyu Chen, Haibo Shanghai Jiao Tong Univ Inst Parallel & Distributed Syst Shanghai Peoples R China Shanghai Jiao Tong Univ Shanghai Key Lab Scalable Comp & Syst Shanghai Peoples R China

ISBN: (纸本)9781450368186

In this paper, we present the first comprehensive performance characterization and optimization of ARM barriers on both mobile and server platforms. We draw a set of observations through several abstracted models and validate them in scenarios where barriers are intensively used. We find that (1) order-preserving approaches without involving the bus significantly outperform other approaches, and (2) the tremendous overhead mostly comes from barriers strictly following remote memory references. Usually, such barriers are inserted when threads are exchanging data, and they are used to ensure the relative order between storing the data to a shared buffer and setting a flag to inform the receiver. Based on the observations, we propose a new mechanism, Pilot, to remove such barriers by leveraging the single-copy atomicity to piggyback the flag with the data. Applying Pilot only requires minor changes to applications and provides 10%-360% performance improvements in multiple benchmarks, which are close to the ideal performance without barriers.

关键词： barrier synchronization concurrency lock

来源：评论

学校读者我要写书评

暂无评论

Identifying Scalability Bottlenecks for Large-Scale parallel Programs with Graph Analysis 20

Identifying Scalability Bottlenecks for Large-Scale Parallel...

引用

25th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP)

作者： Jin, Yuyang Wang, Haojie Tang, Xiongchao Hoefler, Torsten Liu, Xu Zhai, Jidong Tsinghua Univ Beijing Peoples R China Swiss Fed Inst Technol Zurich Switzerland Coll William & Mary Williamsburg VA 23187 USA

ISBN: (纸本)9781450368186

Scaling a parallel program to modern supercomputers is challenging due to inter-process communication, code serialization, and resource contention. Performance analysis tools for finding such scaling bottlenecks either base on profiling or tracing. Profiling incurs lower overheads but does not capture detailed dependencies needed for root-cause analyses. Tracing collects all information at prohibitive overheads. In this work, we develop ScalAna that uses static analysis techniques to achieve the best of both worlds-it enables the analyzability of traces at a cost similar to profiling. We leverage compiler and runtime lightweight techniques to generate performance graph and perform graph analysis algorithm to detect the root cause of scaling issues. We evaluate ScalAna with real applications on the Tianhe-2 supercomputer. Results show that our approach can effectively locate the root cause of scalability bottlenecks for real applications and incur less than 6.38% overhead (1.89% on average) for up to 2,048 processes.

关键词： Profiling Tools Scaling Loss Analysis Performance Optimization

来源：评论

学校读者我要写书评

暂无评论

Provably and Practically Efficient Granularity Control 19

Provably and Practically Efficient Granularity Control

引用

24th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP)

作者： Acar, Umut A. Aksenov, Vitaly Chargueraud, Arthur Rainey, Mike Carnegie Mellon Univ Pittsburgh PA 15213 USA INRIA Paris France ITMO Univ St Petersburg Russia Univ Strasbourg ICube CNRS Strasbourg France Indiana Univ Bloomington IN 47405 USA

ISBN: (纸本)9781450362252

Over the past decade, many programming languages and systems for parallel-computing have been developed, e.g., Fork/Join and Habanero Java, parallel Haskell, parallel ML, and X10. Although these systems raise the level of abstraction for writing parallel codes, performance continues to require labor-intensive optimizations for coarsening the granularity of parallel executions. In this paper, we present provably and practically efficient techniques for controlling granularity within the run-time system of the language. Our starting point is "oracle-guided scheduling", a result from the functional-programming community that shows that granularity can be controlled by an "oracle" that can predict the execution time of parallel codes. We give an algorithm for implementing such an oracle and prove that it has the desired theoretical properties under the nested-parallel programming model. We implement the oracle in C++ by extending Cilk and evaluate its practical performance. the results show that our techniques can essentially eliminate hand tuning while closely matching the performance of hand tuned codes.

关键词： parallel programming languages granularity control

来源：评论

学校读者我要写书评

暂无评论

Scalable Top-K Retrieval with Sparta 20

Scalable Top-K Retrieval with Sparta

引用

25th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP)

作者： Sheffi, Gali Basin, Dmitry Bortnikov, Edward Carmel, David Keidar, Idit Technion Haifa Israel Yahoo Res Haifa Israel Amazon Haifa Israel

ISBN: (纸本)9781450368186

Many big data processing applications rely on a top-k retrieval building block, which selects (or approximates) the k highest-scoring data items based on an aggregation of features. In web search, for instance, a document's score is the sum of its scores for all query terms. Top-k retrieval is often used to sift through massive data and identify a smaller subset of it for further analysis. Because it filters out the bulk of the data, it often constitutes the main performance bottleneck. Beyond the rise in data sizes, today's data processing scenarios also increase the number of features contributing to the overall score. In web search, for example, verbose queries are becoming mainstream, while state-of-the-art algorithms fail to process long queries in real-time. We present Sparta, a practical parallel algorithm that exploits multi-core hardware for fast (approximate) top-k retrieval. thanks to lightweight coordination and judicious context sharing among threads, Sparta scales both in the number of features and in the searched index size. In our web search case study on 50M documents, Sparta processes 12-term queries more than twice as fast as the state-of-the-art. On a tenfold bigger index, Sparta processes queries at the same speed, whereas the average latency of existing algorithms soars to be an order-of-magnitude larger than Sparta's.

关键词： parallel computing multi-threading performance information retrieval web search top-k search

来源：评论

学校读者我要写书评

暂无评论

Incremental Flattening for Nested Data parallelism 19

Incremental Flattening for Nested Data Parallelism

引用

24th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP)

作者： Henriksen, Troels thoroe, Frederik Elsman, Martin Oancea, Cosmin Univ Copenhagen Copenhagen Denmark

ISBN: (纸本)9781450362252

Compilation techniques for nested-parallel applications that can adapt to hardware and dataset characteristics are vital for unlocking the power of modern hardware. this paper proposes such a technique, which builds on flattening and is applied in the context of a functional data-parallel language. Our solution uses the degree of utilized parallelism as the driver for generating a multitude of code versions, which together cover all possible mappings of the application's regular nested parallelism to the levels of parallelism supported by the hardware. these code versions are then combined into one program by guarding them with predicates, whose threshold values are automatically tuned to hardware and dataset characteristics. Our unsupervised method-of statically clustering datasets to code versions-is different from autotuning work that typically searches for the combination of code transformations producing a single version, best suited for a specific dataset or on average for all datasets. We demonstrate-by fully integrating our technique in the repertoire of a compiler for the Futhark programming language-significant performance gains on two GPUs for three real-world applications, from the financial domain, and for six Rodinia benchmarks.

关键词： functional language parallel compilers GPGPU

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共37页 << < 6 7 8 9 10 11 12 13 14 15 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：