检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

344 篇 会议
19 篇 期刊文献
1 册 图书

馆藏范围

364 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

305 篇 工学
- 261 篇 软件工程
- 250 篇 计算机科学与技术...
- 13 篇 电子科学与技术（可...
- 9 篇 信息与通信工程
- 5 篇 控制科学与工程
- 4 篇 机械工程
- 4 篇 生物工程
- 3 篇 生物医学工程（可授...
- 1 篇 力学（可授工学、理...
- 1 篇 动力工程及工程热...
- 1 篇 电气工程
- 1 篇 核科学与技术
- 1 篇 农业工程
- 1 篇 环境科学与工程（可...
- 1 篇 网络空间安全
57 篇 理学
- 53 篇 数学
- 4 篇 生物学
- 4 篇 系统科学
- 4 篇 统计学（可授理学、...
- 2 篇 化学
18 篇 管理学
- 12 篇 管理科学与工程(可...
- 11 篇 工商管理
- 5 篇 图书情报与档案管...
5 篇 经济学
- 5 篇 应用经济学
3 篇 法学
- 3 篇 社会学
3 篇 教育学
- 3 篇 教育学
1 篇 农学
- 1 篇 作物学

主题

54 篇 performance
50 篇 parallel process...
34 篇 parallel program...
33 篇 algorithms
27 篇 languages
25 篇 design
20 篇 parallel algorit...
20 篇 gpu
9 篇 experimentation
9 篇 measurement
8 篇 parallel
7 篇 scalability
7 篇 graphics process...
7 篇 theory
7 篇 parallel computi...
6 篇 parallelism
6 篇 mpi
6 篇 concurrency
5 篇 graph algorithms
5 篇 logic programmin...

机构

7 篇 carnegie mellon ...
4 篇 indiana univ blo...
3 篇 univ of tokyo
3 篇 tsinghua univ de...
3 篇 univ chinese aca...
3 篇 massachusetts in...
3 篇 univ illinois ur...
3 篇 swiss fed inst t...
3 篇 mit csail united...
3 篇 shanghai jiao to...
3 篇 tsinghua univ pe...
3 篇 univ calif berke...
2 篇 ist austria klos...
2 篇 georgetown univ ...
2 篇 univ wisconsin d...
2 篇 yale university ...
2 篇 shanghai key lab...
2 篇 univ of wisconsi...
2 篇 tsinghua univers...
2 篇 shanghai jiao to...

作者

8 篇 blelloch guy e.
6 篇 hoefler torsten
6 篇 garland michael
6 篇 zhai jidong
6 篇 chen haibo
6 篇 shun julian
5 篇 sun yihan
4 篇 dhulipala laxman
4 篇 chen wenguang
4 篇 tsigas philippas
4 篇 tan guangming
4 篇 wang haojie
4 篇 mellor-crummey j...
4 篇 gu yan
4 篇 kennedy ken
3 篇 taura kenjiro
3 篇 li jiajia
3 篇 yonezawa akinori
3 篇 pingali keshav
3 篇 kim jungwon

语言

361 篇 英文
3 篇 其他

检索条件"任意字段=Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming"

共 364 条记录，以下是51-60 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

VAPRO: Performance Variance Detection and Diagnosis for Production-Run parallel Applications 22

VAPRO: Performance Variance Detection and Diagnosis for Prod...

引用

27th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP)

作者： Zheng, Liyan Zhai, Jidong Tang, Xiongchao Wang, Haojie Yu, Teng Jin, Yuyang Song, Shuaiwen Leon Chen, Wenguang Tsinghua Univ Beijing Peoples R China Sangfor Technol Inc Shenzhen Guangdong Peoples R China Univ Sydney Sydney NSW Australia BNRist Beijing Peoples R China

ISBN: (纸本)9781450392044

Performance variance is a serious problem for parallel applications, which can cause performance degradation and make applications' behavior hard to understand. therefore, detecting and diagnosing performance variance are of crucial importance for users and application developers. However, previous detection approaches either bring too large overhead and hurt applications' performance, or rely on nontrivial source code analysis that is impractical for production-run parallel applications. In this work, we propose VAPRO, a performance variance detection and diagnosis framework for production-run parallel applications. Our approach is based on an important observation that most parallel applications contain code snippets that are repeatedly executed with fixed workload, which can be used for performance variance detection. To effectively identify these snippets at runtime even without program source code, we introduce State Transition Graph (STG) to track program execution and then conduct lightweight workload analysis on STG to locate variance. To diagnose the detected variance, VAPRO leverages a progressive diagnosis method based on a hybrid model leveraging variance breakdown and statistical analysis. Results show that the performance overhead of VAPRO is only 1.38% on average. VAPRO can detect the variance in real applications caused by hardware bugs, memory, and IQ After fixing the detected variance, the standard deviation of the execution time is reduced by up to 73.5%. Compared with the state-of-the-art variance detection tool based on source code analysis, VAPRO achieves 30.0% higher detection coverage.

关键词： Performance Variance Anomaly Detection System Noise

来源：评论

学校读者我要写书评

暂无评论

Scaling Graph Traversal to 281 Trillion Edges with 40 Million Cores 22

Scaling Graph Traversal to 281 Trillion Edges with 40 Millio...

引用

27th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP)

作者： Cao, Huanqi Wang, Yuanwei Wang, Haojie Lin, Heng Ma, Zixuan Yin, Wanwang Chen, Wenguang Tsinghua Univ Dept Comp Sci & Technol Beijing Peoples R China Tsinghua Univ BNRist Beijing Peoples R China Peking Univ Sch Comp Sci Beijing Peoples R China Natl Supercomp Ctr Wuxi Wuxi Jiangsu Peoples R China

ISBN: (纸本)9781450392044

Graph processing, especially high-performance graph traversal, plays a more and more important role in data analytics. the successor of Sunway TaihuLight, NEW SUNWAY, is equipped with nearly 10 PB memory and over 40 million cores, which brings the opportunity to process hundreds of trillions of edges graphs. However, the graph with an unprecedented scale also brings severe performance challenges, including load imbalance, poor locality, and irregular access of graph traversal workload. To address the scalability problem, we propose a novel 3-level degree-aware 1.5D graph partitioning, which benefits from both delegated 1D and 2D partitioning. By delegating extremely heavy vertices globally and other heavy vertices on columns and rows in the processes mesh, we break the scalability wall of previous partitioning methods. Together with sub-iteration direction optimization, core group -aware core subgraph segmenting, and a new on-chip sorting mechanism using RMA, we achieve 180,792 GTEPS on a graph with 281 trillion edges, using 103,912 processors with over 40 million cores, achieving 1.75x performance and 8x capacity compared to the previous state of the art and conforming to the Graph 500 BFS benchmark[14].

关键词： massively parallel algorithm breadth-first search heterogeneous architecture

来源：评论

学校读者我要写书评

暂无评论

the Performance Power of Software Combining in Persistence 22

The Performance Power of Software Combining in Persistence

引用

27th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP)

作者： Fatourou, Panagiota Kallimanis, Nikolaos D. Kosmas, Eleftherios Univ Paris LIPADE F-75006 Paris France Fdn Res Technol Hellas FORTH Inst Comp Sci Ilellas Greece Univ Crete Dept Comp Sci Iraklion Greece

ISBN: (纸本)9781450392044

the availability of Non-Volatile Main Memory (known as NVMM) enables the design of recoverable concurrent algorithms. We study the power of software combining in achieving recoverable synchronization and designing persistent data structures. Software combining is a general synchronization approach, which attempts to simulate the ideal world when executing synchronization requests (i.e., requests that must be executed in mutual exclusion). A single thread, called the combiner, executes all active requests, while the rest of the threads are waiting for the combiner to notify them that their requests have been applied. Software combining significantly decreases the synchronization cost and outperforms many other synchronization techniques in various cases. We identify three persistence principles, crucial for performance, that an algorithm's designer has to take into consideration when designing highly-efficient recoverable synchronization protocols or data structures. We illustrate how to make the appropriate design decisions in all stages of devising recoverable combining protocols to respect these principles. Specifically, we present two recoverable software combining protocols, satisfying different progress properties, that are many times faster and have much lower persistence cost than a large collection of existing persistent techniques for achieving scalable synchronization. We build fundamental recoverable data structures, such as stacks and queues, based on these protocols that outperform by far existing recoverable implementations of such data structures. We also provide the first recoverable implementation of a concurrent heap and present experiments to show that it has good performance when the size of the heap is not very large.

关键词： non-volatile memory NVM-based computing persistence recoverable algorithms and data structures software combining concurrent data structures stack queue heap synchronization wait-freedom performance principles performance analysis

来源：评论

学校读者我要写书评

暂无评论

parallel k-Core Decomposition with Batched Updates and Asynchronous Reads 24

Parallel k-Core Decomposition with Batched Updates and Async...

引用

29th ACM SIGPLAN Annual symposium on principles and practice of parallel programming, PPoPP 2024

作者： Liu, Quanquan C. Shun, Julian Zablotchi, Igor Yale University United States MIT CSAIL United States Mysten Labs Switzerland

ISBN: (纸本)9798400704352

Maintaining a dynamic k-core decomposition is an important problem that identifies dense subgraphs in dynamically changing graphs. Recent work by Liu et al. [SPAA 2022] presents a parallel batch-dynamic algorithm for maintaining an approximate k-core decomposition. In their solution, both reads and updates need to be batched, and therefore each type of operation can incur high latency waiting for the other type to finish. To tackle most real-world workloads, which are dominated by reads, this paper presents a novel hybrid concurrent-parallel dynamic k-core data structure where asynchronous reads can proceed concurrently with batches of updates, leading to significantly lower read latencies. Our approach is based on tracking causal dependencies between updates, so that causally related groups of updates appear atomic to concurrent readers. Our data structure guarantees linearizability and liveness for both reads and updates, and maintains the same approximation guarantees as prior work. Our experimental evaluation on a 30-core machine shows that our approach reduces read latency by orders of magnitude compared to the batch-dynamic algorithm, up to a (4.05 · 105 ) -factor. Compared to an unsynchronized (non-linearizable) baseline, our read latency overhead is only up to a 3.21-factor greater, while improving accuracy of coreness estimates by up to a factor of 52.7. © 2024 Copyright held by the owner/author(s).

关键词： Data structures

来源：评论

学校读者我要写书评

暂无评论

Boosting Performance and QoS for Concurrent GPU B+trees by Combining-Based Synchronization 23

Boosting Performance and QoS for Concurrent GPU B+trees by C...

引用

28th ACM SIGPLAN Annual symposium on principles and practice of parallel programming, PPoPP 2023

作者： Zhang, Weihua Zhao, Chuanlei Peng, Lu Lin, Yuzhe Zhang, Fengzhe Lu, Yunping School of Computer Science Fudan University China Institute of Big Data Fudan University China State Key Laboratory of Mathematical Engineering and Advanced Computing China Parallel Processing Institute Fudan University China Department of Computer Science Tulane University United States

ISBN: (纸本)9798400700156

Concurrent B+trees have been widely used in many systems. With the scale of data requests increasing exponentially, the systems are facing tremendous performance pressure. GPU has shown its potential to accelerate concurrent B+trees performance. When many concurrent requests are processed, the conflicts should be detected and resolved. Prior methods guarantee the correctness of concurrent GPU B+trees through lock-based or software transactional memory (STM)-based approaches. However, these methods complicate the request processing logic, increase the number of memory accesses and bring execution path divergence. they lead to performance degradation and variance in response time increasing. Moreover, previous methods do not guarantee linearizability among concurrent requests. In this paper, we design a combined-based concurrency control framework, called Eirene, for GPU B+tree to reduce the overhead of conflict detection and resolution. First, a combining-based synchronization method is designed to combine and issue requests. It combines the requests with the same key, constructs their dependence, decides the issued request, and determines their return values. Since only one request for each key is issued, key conflicts are eliminated. then, an optimistic STM method is used to reduce structure conflicts. the query and the update requests are partitioned into different kernels. For the update kernels, STM is involved only when the number of the retry reaches a threshold. Finally, a locality-aware warp reorganization optimization is proposed to improve memory behavior and reduce conflicts by exploiting the locality among requests. Evaluations on an NVIDIA A100 GPU show that Eirene is efficient (a throughput of 2.4 billion per second) and can guarantee linearizability. Compared to the state-of-the-art GPU B+tree, it can achieve a speedup of 7.43X and reduce the response time variance from 36% to 5%. © 2023 ACM.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

High-Performance GPU-to-CPU Transpilation and Optimization via High-Level parallel Constructs 23

High-Performance GPU-to-CPU Transpilation and Optimization v...

引用

28th ACM SIGPLAN Annual symposium on principles and practice of parallel programming, PPoPP 2023

作者： Moses, William S. Ivanov, Ivan R. Domke, Jens Endo, Toshio Doerfert, Johannes Zinenko, Oleksandr MIT CSAIL United States Tokyo Tech Japan RIKEN Japan LLNL United States Google France

ISBN: (纸本)9798400700156

While parallelism remains the main source of performance, architectural implementations and programming models change with each new hardware generation, often leading to costly application re-engineering. Most tools for performance portability require manual and costly application porting to yet another programming model. We propose an alternative approach that automatically translates programs written in one programming model (CUDA), into another (CPU threads) based on Polygeist/MLIR. Our approach includes a representation of parallel constructs that allows conventional compiler transformations to apply transparently and without modification and enables parallelism-specific optimizations. We evaluate our framework by transpiling and optimizing the CUDA Rodinia benchmark suite for a multi-core CPU and achieve a 58% geomean speedup over handwritten OpenMP code. Further, we show how CUDA kernels from PyTorch can efficiently run and scale on the CPU-only Supercomputer Fugaku without user intervention. Our PyTorch compatibility layer making use of transpiled CUDA PyTorch kernels outperforms the PyTorch CPU native backend by 2.7×. © 2023 Owner/Author.

关键词： Supercomputers

来源：评论

学校读者我要写书评

暂无评论

A Monadic Implementation of Functional Logic Programs 22

A Monadic Implementation of Functional Logic Programs

引用

24th International symposium on principles and practice of Declarative programming, PPDP 2022

作者： Hanus, Michael Prott, Kai-Oliver Teegen, Finn Institut für Informatik Kiel University Kiel Germany

ISBN: (纸本)9781450397032

Functional logic languages are a high-level approach to programming by combining the most important declarative features. they abstract from small-step operational details so that programmers can concentrate on the logical aspects of an application. this is supported by appropriate evaluation strategies. Demand-driven evaluation from functional programming is amalgamated with non-determinism from logic programming so that solutions or values are computed whenever they exist. this frees the programmer from considering the influence of an operational strategy to the success of a computation but it is a challenge to the language implementer. A non-deterministic demand-driven strategy might duplicate unevaluated choices of an expression which could duplicate the computational efforts. In recent implementations, this problem has been tackled by adding a kind of memoization of non-deterministic choices to the expression under evaluation. Since this has been implemented in imperative target languages, it was unclear whether this could also be supported in a functional programming environment, like Haskell. this paper presents a solution to this challenge by transforming functional logic programs into a monadic representation. Although this transformation is not new, we present an implementation of the monadic interface which supports memoization in non-deterministic branches. We demonstrate that our approach yields a promising performance that outperforms current compilers for Curry. © 2022 ACM.

关键词： Functional programming

来源：评论

学校读者我要写书评

暂无评论

Minimizing speculation overhead in a parallel recognizer for regular texts 25

Minimizing speculation overhead in a parallel recognizer for...

引用

proceedings of the 30th ACM SIGPLAN Annual symposium on principles and practice of parallel programming

作者： Angelo Borsotti Luca Breveglieri Angelo Morzenti Stefano Crespi Reghizzi Politecnico di Milano Milano Italy Politecnico di Milano and CNR-IEIIT Milano Italy

ISBN: (纸本)9798400714436

Speculative data-parallel algorithms for language recognition have been widely experimented for various types of finitestate automata (FA), deterministic (DFA) and nondeterministic (NFA), often derived fromregular expressions (RE). Such an algorithm cuts the input string into chunks, independently recognizes each chunk in parallel by means of identical FAs, and at last joins the chunk results and checks the overall consistency. In chunk recognition, it is necessary to speculatively start the FAs in any state, thus causing an overhead that reduces the speedup over a serial algorithm. the existing data-parallel DFA-based recognizers suffer from an excessive number of starting states, and the NFA-based ones suffer from the number of nondeterministic transitions.

关键词： data-parallel recognition algorithm

来源：评论

学校读者我要写书评

暂无评论

PPoPP 2020 - proceedings of the 2020 25th ACM SIGPLAN symposium on principles and practice of parallel programming

PPoPP 2020 - Proceedings of the 2020 25th ACM SIGPLAN Sympos...

引用

Journal Track at 18th International Semantic Web Conference, JT@ISWC 2019

the proceedings contain 13 papers. the topics discussed include: a guided walk into link key candidate extraction with relational concept analysis;reflections on profiling and cataloguing the content of SPARQL endpoints using SPORTAL;reflections on: modeling linked open statistical data;reflections on: DCAT-AP representation of Czech national open data catalog and its impact;reflections on: deep learning for noise-tolerant RDFS reasoning;reflections on: finding melanoma drugs through a probabilistic knowledge graph;reflections on: knowledge graph fact prediction via knowledge-enriched tensor factorization;the semantic sensor network ontology, revamped;and reflections on: knowmore - knowledge base augmentation with structured web markup.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Simplifying low-level GPU programming with GAS 21

Simplifying low-level GPU programming with GAS

引用

26th ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP 2021

作者： Yan, Da Wang, Wei Chu, Xiaowen Hkust Hong Kong Hong Kong Baptist University Hong Kong

ISBN: (纸本)9781450382946

Many low-level optimizations for NVIDIA GPU can only be implemented in native hardware assembly (SASS). However, programming in SASS is unproductive and not portable. To simplify low-level GPU programming, we present GAS (Gpu ASsembly), a PTX-like language that provides a stable instruction set across hardware architectures while giving programmers a low-level control of code execution. We demonstrate that GAS can be used with ease for low-level benchmarking and performance tuning in the context of Tensor Core HGEMM. © 2021 Owner/Author.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共37页 << < 2 3 4 5 6 7 8 9 10 11 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：