检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

36 篇 会议
3 篇 期刊文献

馆藏范围

39 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

37 篇 工学
- 36 篇 计算机科学与技术...
- 18 篇 软件工程
- 1 篇 机械工程
- 1 篇 电子科学与技术（可...
10 篇 理学
- 10 篇 数学
- 1 篇 化学
- 1 篇 系统科学
- 1 篇 统计学（可授理学、...
6 篇 管理学
- 4 篇 管理科学与工程(可...
- 3 篇 工商管理
- 2 篇 图书情报与档案管...
1 篇 经济学
- 1 篇 应用经济学

主题

5 篇 parallel program...
3 篇 parallel algorit...
3 篇 performance
2 篇 graphic methods
2 篇 concurrency
1 篇 lock
1 篇 compaction
1 篇 parallel process...
1 篇 parallel archite...
1 篇 scalability loss...
1 篇 call-by-need eva...
1 篇 bounded model ch...
1 篇 sequentializatio...
1 篇 performance opti...
1 篇 finite-state mac...
1 篇 gpu programming
1 篇 massively parall...
1 篇 big data
1 篇 supercomputers
1 篇 anomaly detectio...

机构

3 篇 tsinghua univ pe...
2 篇 shanghai key lab...
2 篇 shanghai jiao to...
2 篇 washington univ ...
2 篇 shanghai jiao to...
2 篇 purdue univ w la...
1 篇 arm res 5707 sou...
1 篇 nvidia corporati...
1 篇 bnrist peoples r...
1 篇 coll william & m...
1 篇 sandia national ...
1 篇 tsinghua univ de...
1 篇 software school ...
1 篇 baidu inc. sunny...
1 篇 computer science...
1 篇 technion haifa
1 篇 department of co...
1 篇 ohio state univ ...
1 篇 llnl united stat...
1 篇 iit madras madra...

作者

4 篇 chen haibo
4 篇 wang haojie
3 篇 zhai jidong
3 篇 jin yuyang
3 篇 mellor-crummey j...
2 篇 li peng
2 篇 chen wenguang
2 篇 zang binyu
2 篇 chen rong
2 篇 iancu costin
2 篇 beard jonathan c...
2 篇 tang xiongchao
1 篇 ballard grey
1 篇 guan haibing
1 篇 meng xiaozhu
1 篇 jin ye
1 篇 milaković sran
1 篇 maydan dror e.
1 篇 klasky scott
1 篇 jiang peng

语言

38 篇 英文
1 篇 其他

检索条件"任意字段=20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015"

共 39 条记录，以下是1-10 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

PERFLOW: A Domain Specific Framework for Automatic Performance Analysis of parallel Applications 22

PERFLOW: A Domain Specific Framework for Automatic Performan...

引用

27th acm sigplan symposium on principles and practice of parallel programming (ppopp)

作者： Jin, Yuyang Wang, Haojie Zhong, Runxin Zhang, Chen Zhai, Jidong Tsinghua Univ Beijing Peoples R China

ISBN: (纸本)9781450392044

Performance analysis is widely used to identify performance issues of parallel applications. However, complex communications and data dependence, as well as the interactions between different kinds of performance issues make high-efficiency performance analysis even harder. Although a large number of performance tools have been designed, accurately pinpointing root causes for such complex performance issues still needs specific in-depth analysis. To implement each such analysis, significant human efforts and domain knowledge are normally required. To reduce the burden of implementing accurate performance analysis, we propose a domain specific programming framework, named PERFLOW. PERFLOW abstracts the step-by-step process of performance analysis as a dataflow graph. this dataflow graph consists of main performance analysis sub-tasks, called passes, which can either be provided by PERFLOW'S built-in analysis library, or be implemented by developers to meet their requirements. Moreover, to achieve effective analysis, we propose a Program Abstraction Graph to represent the performance of a program execution and then leverage various graph algorithms to automate the analysis. We demonstrate the efficacy of PERFLOW by three case studies of real-world applications with up to 700K lines of code. Results show that PERFLOW significantly eases the implementation of customized analysis tasks. In addition, PERFLOW is able to perform analysis and locate performance bugs automatically and effectively.

关键词： Performance Analysis Domain Specific Framework Dataflow Graph

来源：评论

学校读者我要写书评

暂无评论

Scaling Graph Traversal to 281 Trillion Edges with 40 Million Cores 22

Scaling Graph Traversal to 281 Trillion Edges with 40 Millio...

引用

27th acm sigplan symposium on principles and practice of parallel programming (ppopp)

作者： Cao, Huanqi Wang, Yuanwei Wang, Haojie Lin, Heng Ma, Zixuan Yin, Wanwang Chen, Wenguang Tsinghua Univ Dept Comp Sci & Technol Beijing Peoples R China Tsinghua Univ BNRist Beijing Peoples R China Peking Univ Sch Comp Sci Beijing Peoples R China Natl Supercomp Ctr Wuxi Wuxi Jiangsu Peoples R China

ISBN: (纸本)9781450392044

Graph processing, especially high-performance graph traversal, plays a more and more important role in data analytics. the successor of Sunway TaihuLight, NEW SUNWAY, is equipped with nearly 10 PB memory and over 40 million cores, which brings the opportunity to process hundreds of trillions of edges graphs. However, the graph with an unprecedented scale also brings severe performance challenges, including load imbalance, poor locality, and irregular access of graph traversal workload. To address the scalability problem, we propose a novel 3-level degree-aware 1.5D graph partitioning, which benefits from both delegated 1D and 2D partitioning. By delegating extremely heavy vertices globally and other heavy vertices on columns and rows in the processes mesh, we break the scalability wall of previous partitioning methods. Together with sub-iteration direction optimization, core group -aware core subgraph segmenting, and a new on-chip sorting mechanism using RMA, we achieve 180,792 GTEPS on a graph with 281 trillion edges, using 103,912 processors with over 40 million cores, achieving 1.75x performance and 8x capacity compared to the previous state of the art and conforming to the Graph 500 BFS benchmark[14].

关键词： massively parallel algorithm breadth-first search heterogeneous architecture

来源：评论

学校读者我要写书评

暂无评论

VAPRO: Performance Variance Detection and Diagnosis for Production-Run parallel Applications 22

VAPRO: Performance Variance Detection and Diagnosis for Prod...

引用

27th acm sigplan symposium on principles and practice of parallel programming (ppopp)

作者： Zheng, Liyan Zhai, Jidong Tang, Xiongchao Wang, Haojie Yu, Teng Jin, Yuyang Song, Shuaiwen Leon Chen, Wenguang Tsinghua Univ Beijing Peoples R China Sangfor Technol Inc Shenzhen Guangdong Peoples R China Univ Sydney Sydney NSW Australia BNRist Beijing Peoples R China

ISBN: (纸本)9781450392044

Performance variance is a serious problem for parallel applications, which can cause performance degradation and make applications' behavior hard to understand. therefore, detecting and diagnosing performance variance are of crucial importance for users and application developers. However, previous detection approaches either bring too large overhead and hurt applications' performance, or rely on nontrivial source code analysis that is impractical for production-run parallel applications. In this work, we propose VAPRO, a performance variance detection and diagnosis framework for production-run parallel applications. Our approach is based on an important observation that most parallel applications contain code snippets that are repeatedly executed with fixed workload, which can be used for performance variance detection. To effectively identify these snippets at runtime even without program source code, we introduce State Transition Graph (STG) to track program execution and then conduct lightweight workload analysis on STG to locate variance. To diagnose the detected variance, VAPRO leverages a progressive diagnosis method based on a hybrid model leveraging variance breakdown and statistical analysis. Results show that the performance overhead of VAPRO is only 1.38% on average. VAPRO can detect the variance in real applications caused by hardware bugs, memory, and IQ After fixing the detected variance, the standard deviation of the execution time is reduced by up to 73.5%. Compared with the state-of-the-art variance detection tool based on source code analysis, VAPRO achieves 30.0% higher detection coverage.

关键词： Performance Variance Anomaly Detection System Noise

来源：评论

学校读者我要写书评

暂无评论

High-Performance GPU-to-CPU Transpilation and Optimization via High-Level parallel Constructs 23

High-Performance GPU-to-CPU Transpilation and Optimization v...

引用

28th acm sigplan Annual symposium on principles and practice of parallel programming, ppopp 2023

作者： Moses, William S. Ivanov, Ivan R. Domke, Jens Endo, Toshio Doerfert, Johannes Zinenko, Oleksandr MIT CSAIL United States Tokyo Tech Japan RIKEN Japan LLNL United States Google France

ISBN: (纸本)9798400700156

While parallelism remains the main source of performance, architectural implementations and programming models change with each new hardware generation, often leading to costly application re-engineering. Most tools for performance portability require manual and costly application porting to yet another programming model. We propose an alternative approach that automatically translates programs written in one programming model (CUDA), into another (CPU threads) based on Polygeist/MLIR. Our approach includes a representation of parallel constructs that allows conventional compiler transformations to apply transparently and without modification and enables parallelism-specific optimizations. We evaluate our framework by transpiling and optimizing the CUDA Rodinia benchmark suite for a multi-core CPU and achieve a 58% geomean speedup over handwritten OpenMP code. Further, we show how CUDA kernels from PyTorch can efficiently run and scale on the CPU-only Supercomputer Fugaku without user intervention. Our PyTorch compatibility layer making use of transpiled CUDA PyTorch kernels outperforms the PyTorch CPU native backend by 2.7×. © 2023 Owner/Author.

关键词： Supercomputers

来源：评论

学校读者我要写书评

暂无评论

Advanced synchronization techniques for task-based runtime systems 21

Advanced synchronization techniques for task-based runtime s...

引用

26th acm sigplan symposium on principles and practice of parallel programming, ppopp 2021

作者： Álvarez, David Sala, Kevin Maroñas, Marcos Roca, Aleix Beltran, Vincenç Barcelona Supercomputing Center Barcelona Spain

ISBN: (纸本)9781450382946

Task-based programming models like OmpSs-2 and OpenMP provide a flexible data-flow execution model to exploit dynamic, irregular and nested parallelism. Providing an efficient implementation that scales well with small granularity tasks remains a challenge, and bottlenecks can manifest in several runtime components. In this paper, we analyze the limiting factors in the scalability of a task-based runtime system and propose individual solutions for each of the challenges, including a wait-free dependency system and a novel scalable scheduler design based on delegation. We evaluate how the optimizations impact the overall performance of the runtime, both individually and in combination. We also compare the resulting runtime against state of the art OpenMP implementations, showing equivalent or better performance, especially for fine-grained tasks. © 2021 acm.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallel binary code analysis 21

Parallel binary code analysis

引用

26th acm sigplan symposium on principles and practice of parallel programming, ppopp 2021

作者： Meng, Xiaozhu Anderson, Jonathon M. Mellor-Crummey, John Krentel, Mark W. Miller, Barton P. Milaković, Sran Department of Computer Science Rice University Houston TX United States Computer Sciences Department University of Wisconsin-Madison MadisonWI United States

ISBN: (纸本)9781450382946

Binary code analysis is widely used to help assess a program's correctness, performance, and provenance. Binary analysis applications often construct control flow graphs, analyze data flow, and use debugging information to understand how machine code relates to source lines, inlined functions, and data types. To date, binary analysis has been single-threaded, which is too slow for convenient use in performance tuning workflows where it is used to help attribute performance to complex applications with large binaries. this paper describes our design and implementation for accelerating the task of constructing control flow graphs (CFGs) from binaries by using multithreading. Prior research focuses on algorithms for analysis of challenging code constructs encountered while constructing CFGs, including functions sharing code, jump tables, non-returning functions, and tail calls. these algorithms are described from a program analysis perspective and are not suitable for direct parallel implementation. We abstract the task of constructing CFGs as repeated applications of several core CFG operations that include creating functions, basic blocks, and edges. We then derive CFG operation dependency, commutativity, and monotonicity. these operation properties guide our design of a new parallel analysis for constructing CFGs. Using 64 threads, we achieved as much as 25× speedup for constructing CFGs and 8× for a performance analysis tool that leverages our new analysis to recover program structure. © 2021 acm.

关键词： Graphic methods

来源：评论

学校读者我要写书评

暂无评论

parallel and Distributed Bounded Model Checking of Multi-threaded Programs 20

Parallel and Distributed Bounded Model Checking of Multi-thr...

引用

25th acm sigplan symposium on principles and practice of parallel programming (ppopp)

作者： Inverso, Omar Trubiani, Catia Gran Sasso Sci Inst Laquila Italy

ISBN: (纸本)9781450368186

We introduce a structure-aware parallel technique for context-bounded analysis of concurrent programs. the key intuition consists in decomposing the set of concurrent traces into symbolic subsets that are separately explored by multiple instances of the same decision procedure running in parallel. the decision procedures work on different partitions of the search space without cooperating, whence distribution follows effortlessly. Our experiments on a selection of complex multi-threaded programs show significant analysis speedups and scalability, and greater performance gains than with general-purpose parallel solvers.

关键词： Concurrency Multithreading Sequentialization Software Verification parallel Analysis Bounded Model Checking SAT

来源：评论

学校读者我要写书评

暂无评论

PLUM: Static parallel Program Locality Analysis under Uniform Multiplexing 20

PLUM: Static Parallel Program Locality Analysis under Unifor...

引用

25th acm sigplan symposium on principles and practice of parallel programming (ppopp)

作者： Liu, Fangzhou Chen, Dong Smith, Wesley Ding, Chen Univ Rochester Rochester NY 14627 USA Natl Univ Def Technol Changsha Peoples R China

ISBN: (纸本)9781450368186

Data movement has a significant impact on program performance. For multithread programs, this impact is amplified, since different threads often interfere with each other by competing for shared cache space. However, recent de facto locality metrics consider either sequential execution only, or derive locality for multithread programs in an inefficient way, i.e. exhaustive simulation. this paper presents PLUM, a compiler solution for timescale locality analysis for parallel programs. Experiments demonstrate that the prediction accuracy is 93.97% on average. PLUM is the first tool that analyzes data locality for parallel programs during compile time;in addition, it provides an approach for efficiently studying the representative interleaving pattern for parallel executions.

关键词： Static analysis Locality Multithread

来源：评论

学校读者我要写书评

暂无评论

parallel Determinacy Race Detection for Futures 20

Parallel Determinacy Race Detection for Futures

引用

25th acm sigplan symposium on principles and practice of parallel programming (ppopp)

作者： Xu, Yifan Singer, Kyle Lee, I-Ting Angelina Washington Univ St Louis St Louis MO 63130 USA

ISBN: (纸本)9781450368186

the use of futures can generate arbitrary dependences in the computation, making it difficult to detect races efficiently. Algorithms proposed by priorwork to detect races on programs with futures all have to execute the program sequentially. We propose F-Order, the first known parallel race detection algorithm that detects races on programs that use futures. Given a computation with work T-1 and span T-infinity, our algorithm detects races in time O((T-1 lg (k) over cap + k(2))/ P + T-infinity(k + lg r lg (k) over cap)) on P processors, where k is the number of future operations, r is the maximum number of readers per memory location, and (k) over cap is the maximum number of future operations done by a single future task, which is typically small. We have also implemented a prototype system based on the proposed algorithm and empirically demonstrates its practical efficiency and scalability.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A parallel Sparse Tensor Benchmark Suite on CPUs and GPUs 20

A Parallel Sparse Tensor Benchmark Suite on CPUs and GPUs

引用

25th acm sigplan symposium on principles and practice of parallel programming (ppopp)

作者： Li, Jiajia Lakshminarasimhan, Mahesh Wu, Xiaolong Li, Ang Olschanowsky, Catherine Barker, Kevin Pacific Northwest Natl Lab Richland WA 99352 USA Univ Utah Salt Lake City UT USA Purdue Univ W Lafayette IN 47907 USA Boise State Univ Boise ID 83725 USA

ISBN: (纸本)9781450368186

Tensor computations present significant performance challenges that impact a wide spectrum of applications. Efforts on improving the performance of tensor computations include exploring data layout, execution scheduling, and parallelism in common tensor kernels. this work presents a benchmark suite for arbitrary-order sparse tensor kernels using state-of-the-art tensor formats: coordinate (COO) and hierarchical coordinate (HiCOO). It demonstrates a set of reference tensor kernel implementations and some observations on Intel CPUs and NVIDIA GPUs. the full paper can be referred to at http://***/abs/2001.00660.

关键词： sparse tensors benchmark GPU roofline model

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共4页 << < 1 2 3 4 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：