检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

361 篇 会议
46 篇 期刊文献

馆藏范围

407 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

351 篇 工学
- 296 篇 软件工程
- 287 篇 计算机科学与技术...
- 13 篇 电子科学与技术（可...
- 7 篇 信息与通信工程
- 7 篇 控制科学与工程
- 4 篇 机械工程
- 4 篇 电气工程
- 4 篇 生物工程
- 3 篇 生物医学工程（可授...
- 2 篇 动力工程及工程热...
- 1 篇 力学（可授工学、理...
- 1 篇 建筑学
- 1 篇 土木工程
- 1 篇 化学工程与技术
- 1 篇 核科学与技术
- 1 篇 农业工程
- 1 篇 环境科学与工程（可...
61 篇 理学
- 55 篇 数学
- 6 篇 系统科学
- 4 篇 生物学
- 4 篇 统计学（可授理学、...
- 3 篇 化学
- 1 篇 物理学
17 篇 管理学
- 12 篇 管理科学与工程(可...
- 9 篇 工商管理
- 5 篇 图书情报与档案管...
4 篇 教育学
- 4 篇 教育学
3 篇 经济学
- 3 篇 应用经济学
2 篇 法学
- 2 篇 社会学
1 篇 农学
- 1 篇 作物学

主题

72 篇 performance
49 篇 parallel process...
46 篇 parallel program...
43 篇 algorithms
40 篇 languages
34 篇 design
22 篇 gpu
21 篇 parallel algorit...
12 篇 experimentation
12 篇 measurement
10 篇 parallel computi...
9 篇 theory
8 篇 mpi
7 篇 parallelism
7 篇 graphics process...
7 篇 parallel
7 篇 openmp
7 篇 concurrency
6 篇 multicore
5 篇 reliability

机构

7 篇 carnegie mellon ...
4 篇 univ wisconsin d...
4 篇 indiana univ blo...
4 篇 shanghai jiao to...
3 篇 univ of tokyo
3 篇 tsinghua univ de...
3 篇 univ chinese aca...
3 篇 massachusetts in...
3 篇 univ illinois ur...
3 篇 swiss fed inst t...
3 篇 mit csail united...
3 篇 tsinghua univ pe...
3 篇 univ utah sch co...
3 篇 rice univ housto...
3 篇 univ calif berke...
3 篇 univ texas austi...
2 篇 ist austria klos...
2 篇 fudan univ sch c...
2 篇 princeton univ d...
2 篇 georgetown univ ...

作者

8 篇 blelloch guy e.
7 篇 chen haibo
6 篇 hoefler torsten
6 篇 garland michael
6 篇 zhai jidong
6 篇 shun julian
5 篇 sun yihan
5 篇 tsigas philippas
4 篇 dhulipala laxman
4 篇 pingali keshav
4 篇 chen wenguang
4 篇 tan guangming
4 篇 wang haojie
4 篇 nikolopoulos dim...
4 篇 long guoping
4 篇 valero mateo
4 篇 mellor-crummey j...
4 篇 gu yan
4 篇 leiserson charle...
4 篇 kennedy ken

语言

380 篇 英文
26 篇 其他
1 篇 葡萄牙文

检索条件"任意字段=16th ACM Symposium on Principles and Practice of Parallel Programming"

共 407 条记录，以下是11-20 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

PPoPP 2020 - Proceedings of the 2020 25th acm SIGPLAN symposium on principles and practice of parallel programming

PPoPP 2020 - Proceedings of the 2020 25th ACM SIGPLAN Sympos...

引用

25th acm SIGPLAN symposium on principles and practice of parallel programming, PPoPP 2020

the proceedings contain 46 papers. the topics discussed include: kite: efficient and available release consistency for the datacenter;Oak: a scalable off-heap allocated key-value map;optimizing batched Winograd convolution on GPUs;taming unbalanced training workloads in deep learning with partial collective operations;scalable top-K retrieval with Sparta;waveSZ: a hardware-algorithm co-design of efficient lossy compression for scientific data;scaling concurrent queues by using HTM to profit from failed atomic operations;a wait-free universal construction for large objects;using sample-based time series data for automated diagnosis of scalability losses in parallel programs;scaling out speculative execution of finite-state machines with parallel merge;and detecting and reproducing error-code propagation bugs in MPI implementations.

关键词：

来源：评论

学校读者我要写书评

暂无评论

SBMGT: Scaling Bayesian Multinomial Group Testing 25

SBMGT: Scaling Bayesian Multinomial Group Testing

引用

30th symposium on principles and practice of parallel programming

作者： Chen, Weicong Qi, Hao Tatsuoka, Curtis Lu, Xiaoyi Univ Calif Merced Merced CA 95343 USA Univ Pittsburgh Pittsburgh PA 15260 USA

ISBN: (纸本)9798400714436

Group testing is a widely used binary classification method that efficiently distinguishes between samples with and without a binary-classifiable attribute by pooling and testing subsets of a group. Bayesian Group Testing (BGT) is the state-of-the-art approach, which integrates prior risk information into a Bayesian Boolean Lattice framework to minimize test counts and reduce false classifications. However, BGT, like other existing group testing techniques, struggles with multinomial group testing, where samples have multiple binary-classifiable attributes that can be individually distinguished simultaneously. We address this need by proposing Bayesian Multinomial Group Testing (BMGT), which includes a new Bayesian-based model and supporting theorems for an efficient and precise multinomial pooling strategy. We further design and develop SBMGT, a high-performance and scalable framework to tackle BMGT's computational challenges by proposing three key innovations: 1) a parallel binaryencoded product lattice model with up to 99.8% efficiency;2) the Bayesian Balanced Partitioning Algorithm (BBPA), a multinomial pooling strategy optimized for parallel computation with up to 97.7% scaling efficiency on 4096 cores;and 3) a scalable multinomial group testing analytics framework, demonstrated in a real-world disease surveillance case study using AIDS and STDs datasets from Uganda, where SBMGT reduced tests by up to 54% and lowered false classification rates by 92% compared to BGT.

关键词： Multinomial group testing Bayesian methods parallel algorithms Graph algorithms

来源：评论

学校读者我要写书评

暂无评论

DORADD: Deterministic parallel Execution in the Era of Microsecond-Scale Computing 25

DORADD: Deterministic Parallel Execution in the Era of Micro...

引用

30th symposium on principles and practice of parallel programming

作者： Liu, Zhengqing Unal, Musa Parkinson, Matthew J. Kogias, Marios Imperial Coll London London England Ecole Polytech Fed Lausanne Lausanne Switzerland Azure Res Austin TX USA

ISBN: (纸本)9798400714436

Deterministic parallelism is a key building block for distributed and fault-tolerant systems that offers substantial performance benefits while guaranteeing determinism. By studying existing deterministically parallel systems (DPS), we identify certain design pitfalls, such as batched execution and inefficient runtime synchronization, that preclude them from meeting the demands of mu s-scale and high-throughput distributed systems deployed in modern datacenters. We present DORADD, a deterministically parallel runtime with low latency and high throughput, designed for modern datacenter services. DORADD introduces a hybrid scheduling scheme that effectively decouples request dispatching from execution. It employs a single dispatcher to deterministically construct a dynamic dependency graph of incoming requests and worker pools that can independently execute requests in a work-conserving and synchronization-free manner. Furthermore, DORADD overcomes the single-dispatcher throughput bottleneck based on core pipelining. We use DORADD to build an in-memory database and compare it with Caracal, the current state-of-the-art deterministic database, via the YCSB and TPC-C benchmarks. Our evaluation shows up to 2.5x better throughput and more than 150x and 300x better tail latency in non-contended and contended cases, respectively. We also compare DO-RADD with Caladan, the state-of-the-art non-deterministic remote procedure call (RPC) scheduler, and demonstrate that determinism in DORADD does not incur any performance overhead.

关键词： parallel execution determinism runtime scheduling

来源：评论

学校读者我要写书评

暂无评论

POSTER: TensorMD: Molecular Dynamics Simulation with Ab Initio Accuracy of 50 Billion Atoms 30

POSTER: TensorMD: Molecular Dynamics Simulation with Ab Init...

引用

30th symposium on principles and practice of parallel programming

作者： Ouyang, Yucheng Liu, Yin Shang, Honghui Chen, Zhenchuan Shan, Jiahao Cui, Huimin Feng, Xiaobing Chen, Xin Gao, Xingyu Wang, Lifang Song, Haifeng Chen, Xin Lin, Rongfen Li, Fang Chinese Acad Sci Inst Comp Technol Beijing Peoples R China Inst Appl Phys & Computat Math Beijing Peoples R China Natl Res Ctr Parallel Comp Engn & Technol Beijing Peoples R China

ISBN: (纸本)9798400714436

Molecular dynamics simulation emerges as an important area that HPC+AI helps to investigate the physical properties, with machine-learning interatomic potentials (MLIPs) being used. General-purpose machine-learning (ML) tools have been leveraged in MLIPs, but they are not perfectly matched with each other, since many optimization opportunities in MLIPs have been missed by ML tools. this inefficiency arises from the fact that HPC+AI applications work with far more computational complexity compared with pure AI scenarios. this paper has developed an MLIP, named TensorMD, independently from any ML tool. TensorMD has been evaluated on two supercomputers and scaled to 51.8 billion atoms, i.e., similar to 3x compared with state-of-the-art.

关键词： Machine Learning Interatomic Potentials ManyCore Processor GPU Molecular Dynamics

来源：评论

学校读者我要写书评

暂无评论

POSTER: Minimizing speculation overhead in a parallel recognizer for regular texts 30

POSTER: Minimizing speculation overhead in a parallel recogn...

引用

30th symposium on principles and practice of parallel programming

作者： Borsotti, Angelo Breveglieri, Luca Morzenti, Angelo Reghizzi, Stefano Crespi Politecn Milan Milan Italy CNR IEIIT Milan Italy

ISBN: (纸本)9798400714436

Speculative data-parallel algorithms for language recognition have been widely experimented for various types of finite-state automata (FA), deterministic (DFA) and nondeterministic (NFA), often derived from regular expressions (RE). Such an algorithm cuts the input string into chunks, independently recognizes each chunk in parallel by means of identical FAs, and at last joins the chunk results and checks the overall consistency. In chunk recognition, it is necessary to speculatively start the FAs in any state, thus causing an overhead that reduces the speedup over a serial algorithm. the existing data-parallel DFA-based recognizers suffer from an excessive number of starting states, and the NFA-based ones suffer from the number of nondeterministic transitions. Our data-parallel algorithm is based on the new FA type called reduced-interface DFA (RI-DFA), which minimizes the speculation overhead without incurring in the penalty of nondeterministic transitions or of impractically enlarged DFA machines. the algorithm is theoretically efficient, because it combines the state-reduction of an NFA with the speed of deterministic transitions, thus improving on both DFA-based and NFA-based existing implementations. the practical applicability of the RI-DFA approach is confirmed by a quantitative comparison of the number of starting states for a large public benchmark of complex FAs. On multi-core computing architectures, the RI-DFA recognizer is considerably faster than the NFA-based one on all benchmarks, while it matches the DFA-based one on some benchmarks and performs much better on some others. the extra time needed to construct RI-DFA vs DFA is moderate and is compatible with a practical use. the full paper with all details is in [4]. © 2025 Copyright held by the owner/author(s).

关键词： regular language recognition data-parallel recognition algorithm minimal speculation speedup onmulti-core architecture multi-entry DFA reduced-interface DFA

来源：评论

学校读者我要写书评

暂无评论

parallel Integer Sort: theory and practice 24

Parallel Integer Sort: Theory and Practice

引用

29th acm SIGPLAN Annual symposium on principles and practice of parallel programming (PPoPP)

作者： Dong, Xiaojun Dhulipala, Laxman Gu, Yan Sun, Yihan UC Riverside Riverside CA 92521 USA Univ Maryland Baltimore MD USA

ISBN: (纸本)9798400704352

Integer sorting is a fundamental problem in computer science. this paper studies parallel integer sort both in theory and in practice. In theory, we show tighter bounds for a class of existing practical integer sort algorithms, which provides a solid theoretical foundation for their widespread usage in practice and strong performance. In practice, we design a new integer sorting algorithm, DovetailSort, that is theoreticallyefficient and has good practical performance. In particular, DovetailSort overcomes a common challenge in existing parallel integer sorting algorithms, which is the difficulty of detecting and taking advantage of duplicate keys. the key insight in DovetailSort is to combine algorithmic ideas from both integer- and comparison-sorting algorithms. In our experiments, DovetailSort achieves competitive or better performance than existing state-of-the-art parallel integer and comparison sorting algorithms on various synthetic and real-world datasets.

关键词： Integer Sort Radix Sort parallel Algorithms

来源：评论

学校读者我要写书评

暂无评论

Pure: Evolving Message Passing To Better Leverage Shared Memory Within Nodes 24

Pure: Evolving Message Passing To Better Leverage Shared Mem...

引用

29th acm SIGPLAN Annual symposium on principles and practice of parallel programming (PPoPP)

作者： Psota, James Solar-Lezama, Armando MIT CSAIL Cambridge MA 02139 USA

ISBN: (纸本)9798400704352

Pure is a new programming model and runtime system explicitly designed to take advantage of shared memory within nodes in the context of a mostly message passing interface enhanced with the ability to use tasks to make use of idle cores. Pure leverages shared memory in two ways: (a) by allowing cores to steal work from each other while waiting on messages to arrive, and, (b) by leveraging *** lock-free data structures in shared memory to achieve highperformance messaging and collective operations between the ranks within nodes. We use microbenchmarks to evaluate Pure's key messaging and collective features and also show application speedups up to 2.1 Chi on the CoMD molecular dynamics and the miniAMR adaptive mesh *** applications scaling up to 4,096 cores.

关键词： parallel programming models distributed runtime systems task-based parallelism concurrent data structures lock-free data structures

来源：评论

学校读者我要写书评

暂无评论

PPoPP 2020 - Proceedings of the 2020 25th acm SIGPLAN symposium on principles and practice of parallel programming

PPoPP 2020 - Proceedings of the 2020 25th ACM SIGPLAN Sympos...

引用

8th International symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, HEART 2017

ISBN: (纸本)9781450368186

the proceedings contain 46 papers. the topics discussed include: kite: efficient and available release consistency for the datacenter;oak: a scalable off-heap allocated key-value map;taming unbalanced training workloads in deep learning with partial collective operations;scalable top-k retrieval with sparta;waveSZ: a hardware-algorithm co-design of efficient lossy compression for scientific data;scaling concurrent queues by using HTM to profit from failed atomic operations;a wait-free universal construction for large objects;universal wait-free memory reclamation;and using sample-based time series data for automated diagnosis of scalability losses in parallel programs.

关键词：

来源：评论

学校读者我要写书评

暂无评论

POSTER: RadiK: Scalable Radix Top-K Selection on GPUs 24

POSTER: RadiK: Scalable Radix Top-K Selection on GPUs

引用

29th acm SIGPLAN Annual symposium on principles and practice of parallel programming (PPoPP)

作者： Li, Yifei Zhou, Bole Zhang, Jiejing Wei, Xuechao Li, Yinghan Chen, Yingda Alibaba Grp Hangzhou Peoples R China

ISBN: (纸本)9798400704352

By identifying the.. largest or smallest elements in a set of data, top-k selection is critical for modern high-performance databases and machine learning systems, especially with large data volumes. However, previous studies on its GPU implementation are mostly merge-based and rely heavily on the high-speed but size-limited on-chip memory, thereby resulting in a restricted upper bound on... this paper introduces RadiK, a highly optimized GPU-parallel radix top-k selection that is scalable with.., input length, and batch size. With a carefully designed optimization framework targeting high memory bandwidth and resource utilization, RadiK supports far larger.. than the prior art, achieving up to 2.5x speedup for non-batch queries and up to 4.8x speedup for batch queries. We also propose a lightweight refinement that strengthens the robustness of RadiK against skewed distributions by adaptively scaling the input elements.

关键词： Top-K Radix Select GPU-parallel Algorithm

来源：评论

学校读者我要写书评

暂无评论

GraphCube: Interconnection Hierarchy-aware Graph Processing 24

GraphCube: Interconnection Hierarchy-aware Graph Processing

引用

29th acm SIGPLAN Annual symposium on principles and practice of parallel programming (PPoPP)

作者： Gan, Xinbiao Wu, Guang Qiu, Shenghao Xiong, Feng Si, Jiaqi Fang, Jianbin Dong, Dezun Gong, Chunye Li, Tiejun Wang, Zheng NUDT Beijing Peoples R China Univ Leeds Leeds W Yorkshire England Natl Supercomputer Ctr Tianjin Peoples R China

ISBN: (纸本)9798400704352

Processing large-scale graphs with billions to trillions of edges requires efficiently utilizing parallel systems. However, current graph processing engines do not scale well beyond a few tens of computing nodes because they are oblivious to the communication cost variations across the interconnection hierarchy. We introduce GraphCube, a better approach to optimizing graph processing on large-scale parallel systems with complex interconnections. GraphCube features a new graph partitioning approach to achieve better load balancing and minimize communication overhead across multiple levels of the interconnection hierarchy. We evaluate GraphCube by applying it to fundamental graph operations performed on synthetic and real-world graph datasets. Our evaluation used up to 79,024 computing nodes and 1.2+ million processor cores. Our large-scale experiments show that GraphCube outperforms state-of-the-art parallel graph processing methods in throughput and scalability. Furthermore, GraphCube outperformed the top-ranked systems on the Graph 500 list.

关键词： Graph processing Graph partitioning parallel computing Vectorization Graph500

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共41页 << < 1 2 3 4 5 6 7 8 9 10 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：