检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

352 篇 会议
18 篇 期刊文献

馆藏范围

370 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

255 篇 工学
- 252 篇 计算机科学与技术...
- 168 篇 软件工程
- 25 篇 电气工程
- 23 篇 信息与通信工程
- 17 篇 控制科学与工程
- 5 篇 电子科学与技术（可...
- 4 篇 农业工程
- 3 篇 生物工程
- 2 篇 机械工程
- 2 篇 生物医学工程（可授...
- 1 篇 材料科学与工程（可...
- 1 篇 建筑学
- 1 篇 化学工程与技术
151 篇 理学
- 148 篇 数学
- 23 篇 统计学（可授理学、...
- 3 篇 生物学
- 3 篇 系统科学
- 1 篇 化学
13 篇 管理学
- 10 篇 管理科学与工程(可...
- 9 篇 工商管理
- 3 篇 图书情报与档案管...
6 篇 农学
- 6 篇 作物学
- 2 篇 农业资源与环境
1 篇 经济学
- 1 篇 应用经济学

主题

82 篇 parallel algorit...
68 篇 parallel process...
13 篇 computer program...
12 篇 parallel program...
9 篇 scheduling
7 篇 computer archite...
7 篇 pram
6 篇 computer systems...
5 篇 graph algorithms
4 篇 performance
4 篇 parallel archite...
4 篇 approximation al...
4 篇 multithreading
4 篇 transactional me...
4 篇 work stealing
3 篇 parallel process...
3 篇 parallelism
3 篇 cilk
3 篇 sorting
3 篇 chip multiproces...

机构

10 篇 carnegie mellon ...
4 篇 carnegie mellon ...
4 篇 univ of paderbor...
3 篇 department of co...
3 篇 university of ma...
3 篇 mit 77 massachus...
2 篇 duke univ durham...
2 篇 univ calif river...
2 篇 carnegie mellon ...
2 篇 univ of toronto ...
2 篇 dept. of compute...
2 篇 at and t bell la...
2 篇 sandia national ...
2 篇 computer science...
2 篇 univ of californ...
2 篇 department of ma...
2 篇 digital systems ...
2 篇 t.j. watson rese...
2 篇 max planck inst ...
2 篇 bell laboratorie...

作者

12 篇 gibbons phillip ...
11 篇 blelloch guy e.
6 篇 reif john h.
6 篇 leiserson charle...
5 篇 matias yossi
4 篇 uzi vishkin
4 篇 ramachandran vij...
4 篇 vitter jeffrey s...
4 篇 muthukrishnan s.
4 篇 goodrich michael...
4 篇 miller gary l.
4 篇 phillip b. gibbo...
3 篇 snir marc
3 篇 cormen thomas h.
3 篇 deng xiaotie
3 篇 tangwongsan kana...
3 篇 sohn andrew
3 篇 leighton tom
3 篇 simhadri harsha ...
3 篇 gu yan

语言

357 篇 英文
13 篇 其他

检索条件"任意字段=Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures"

共 370 条记录，以下是11-20 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Efficient parallel Reinforcement Learning Framework Using the Reactor Model 24

Efficient Parallel Reinforcement Learning Framework Using th...

引用

36th acm symposium on parallelism in algorithms and architectures (SPAA)

作者： Kwok, Jacky Lohstroh, Marten Lee, Edward A. Univ Calif Berkeley Berkeley CA 94720 USA

ISBN: (纸本)9798400704161

parallel Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources, allowing for faster generation of samples, estimation of values, and policy improvement. These computational paradigms require a seamless integration of training, serving, and simulation workloads. Existing frameworks, such as Ray, are not managing this orchestration efficiently, especially in RL tasks that demand intensive input/output and synchronization between actors on a single node. In this study, we have proposed a solution implementing the reactor model, which enforces a set of actors to have a fixed communication pattern. This allows the scheduler to eliminate work needed for synchronization, such as acquiring and releasing locks for each actor or sending and processing coordination-related messages. Our framework, Lingua Franca (LF), a coordination language based on the reactor model, also supports true parallelism in Python and provides a unified interface that allows users to automatically generate dataflow graphs for RL tasks. In comparison to Ray on a single-node multi-core compute platform, LF achieves 1.21x and 11.62x higher simulation throughput in OpenAI Gym and Atari environments, reduces the average training time of synchronized parallel Q-learning by 31.2%, and accelerates multi-agent RL inference by 5.12x.

关键词： parallel Computing Reinforcement Learning Programming Languages Machine Learning Model of Computation

来源：评论

学校读者我要写书评

暂无评论

A Simpler and parallelizable O(√log n)-approximation Algorithm for SPARSEST CUT 24

A Simpler and Parallelizable O(√log n)-approximation Algori...

引用

36th acm symposium on parallelism in algorithms and architectures (SPAA)

作者： Kolmogorov, Vladimir Inst Sci & Technol Austria ISTA Klosterneuburg Austria

ISBN: (纸本)9798400704161

Currently, the best known tradeoff between approximation ratio and complexity for the Sparsest Cut problem is achieved by the algorithm in [Sherman, FOCS 2009]: it computes O(root(log n)/epsilon)-approximation using O(n(epsilon) log(O(1))n) maxflows for any epsilon is an element of[Theta(1/log n), Theta(1)]. It works by solving the SDP relaxation of [Arora-Rao-Vazirani, STOC 2004] using the Multiplicative Weights Update algorithm (MW) of [Arora-Kale, Jacm 2016]. To implement one MW step, Sherman approximately solves a multicommodity flow problem using another application of MW. Nested MW steps are solved via a certain "chaining" algorithm that combines results of multiple calls to the maxflow algorithm. We present an alternative approach that avoids solving the multicommodity flow problem and instead computes "violating paths". This simplifies Sherman's algorithm by removing a need for a nested application of MW, and also allows parallelization: we show how to compute O(root(log n)/epsilon)-approximation via O(log(O(1)) n) maxflows using O(n(epsilon)) processors. We also revisit Sherman's chaining algorithm, and present a simpler version together with a new analysis.

关键词： SPARSEST CUT approximation algorithms parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Fault-Tolerant parallel Integer Multiplication 24

Fault-Tolerant Parallel Integer Multiplication

引用

36th acm symposium on parallelism in algorithms and architectures (SPAA)

作者： Nissim, Roy Schwartz, Oded Spiizer, Yuval Hebrew Univ Jerusalem Dept Comp Sci Jerusalem Israel

ISBN: (纸本)9798400704161

Exascale machines have a small mean time between failures, necessitating fault tolerance. Out-of-the-box fault-tolerant solutions, such as checkpoint-restart and replication, apply to any algorithm but incur significant overhead costs. Long integer multiplication is a fundamental kernel in numerical linear algebra and cryptography. The naive, schoolbook multiplication algorithm runs in Theta(n(2)) while Toom-Cook algorithms runs in Theta(n(logk (2k -1)) for 2 <= k. We obtain the first efficient fault-tolerant parallel Toom-Cook algorithm. While asymptotically faster FFT-based algorithms exist, Toom-Cook algorithms are often favored in practice on small scale and on supercomputers. Our algorithm enables fault tolerance with negligible overhead costs. Compared to existing, general-purpose, fault-tolerant solutions, our algorithm reduces the arithmetic and communication (bandwidth) overhead costs by a factor of Theta(P/(2k -1)) (where P is the number of processors). To this end, we adapt the fault-tolerant BFS-DFS method of Birnbaum et al. (2020) for fast matrix multiplication and combine it with a coding strategy tailored for Toom-Cook. This eliminates the need for recomputations, resulting in a much faster algorithm.

关键词： Fault Tolerance Long Integer Multiplication Toom-Cook parallel Computing I/O Complexity

来源：评论

学校读者我要写书评

暂无评论

SPAA 2024 - proceedings of the 36th acm symposium on parallelism in algorithms and architectures

SPAA 2024 - Proceedings of the 36th ACM Symposium on Paralle...

引用

36th acm symposium on parallelism in algorithms and architectures, SPAA 2024

ISBN: (纸本)9798400704161

The proceedings contain 54 papers. The topics discussed include: expediting hazard pointers with bounded RCU critical sections;Alock: asymmetric lock primitive for RDMA systems;when is parallelism fearless and zero-cost with rust?;efficient parallel reinforcement learning framework using the reactor model;parallel best arm identification in heterogeneous environments;brief announcement: lock-free learned search data structure;brief announcement: LIT: lookup interlocked table for range queries;brief announcement: a fast scalable detectable unrolled lock-based linked list;scheduling out-trees online to optimize maximum flow;optimizing dynamic data center provisioning through speed scaling: a primal-dual perspective;scheduling jobs with work-inefficient parallel solutions;and multi bucket queues: efficient concurrent priority scheduling.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Scheduling Jobs with Work-Inefficient parallel Solutions 24

Scheduling Jobs with Work-Inefficient Parallel Solutions

引用

36th acm symposium on parallelism in algorithms and architectures (SPAA)

作者： Kuszmaul, William Westover, Alek Harvard Univ Cambridge MA 02138 USA MIT Cambridge MA USA

ISBN: (纸本)9798400704161

This paper introduces the serial-parallel decision problem. Consider an online scheduler that receives a series of tasks, where each task has both a parallel and a serial implementation. The parallel implementation has the advantage that it can make progress concurrently on multiple processors, but the disadvantage that it is (potentially) work-inefficient. As tasks arrive, the scheduler must decide for each task which implementation to use. We begin by studying total awake time. We give a simple decide-on-arrival scheduler that achieves a competitive ratio of 3 for total awake time-this scheduler makes serial/parallel decisions immediately when jobs arrive. Our second result is an parallel-work-oblivious scheduler that achieves a competitive ratio of 6 for total awake time-this scheduler makes all of its decisions based only on the size of each serial job and without needing to know anything about the parallel implementations. Finally, we prove a lower bound showing that, if a scheduler wishes to achieve a competitive ratio of O(1), it can have at most one of the two aforementioned properties (decide-on-arrival or parallel-work-oblivious). We also prove lower bounds of the form 1 + Omega(1) on the optimal competitive ratio for any scheduler. Next, we turn our attention to optimizing mean response time. Here, we show that it is possible to achieve an O(1) competitive ratio with O(1) speed augmentation. This is the most technically involved of our results. We also prove that, in this setting, it is not possible for a parallel-work-oblivious scheduler to do well. In addition to these results, we present tight bounds on the optimal competitive ratio if we allow for arrival dependencies between tasks (e.g., tasks are components of a single parallel program), and we give an in-depth discussion of the remaining open questions.

关键词： Scheduling parallel Work-Inefficient Competitive-Analysis

来源：评论

学校读者我要写书评

暂无评论

Massively parallel algorithms for Approximate Shortest Paths 24

Massively Parallel Algorithms for Approximate Shortest Paths

引用

36th acm symposium on parallelism in algorithms and architectures (SPAA)

作者： Dory, Michal Matar, Shaked Univ Haifa Haifa Israel Ben Gurion Univ Negev Beer Sheva Israel

ISBN: (纸本)9798400704161

We present fast algorithms for approximate shortest paths in the massively parallel computation (MPC) model. We provide randomized algorithms that take poly(log log n) rounds in the near-linear memory MPC model. Our results are for unweighted undirected graphs with.. vertices and m edges. Our first contribution is a (1 + epsilon)-approximation algorithm for Single-Source Shortest Paths (SSSP) that takes poly(log log n) rounds in the near-linear MPC model, where the memory per machine is (O) over tilde (n) and the total memory is (O) over tilde (mn(rho)), where rho is a small constant. Our second contribution is a distance oracle that allows to approximate the distance between any pair of vertices. The distance oracle is constructed in poly(log log n) rounds and allows to query a (1+epsilon) (2k-1)-approximate distance between any pair of vertices u and v in O(1) additional rounds. The algorithm is for the near-linear memory MPC model with total memory of size (O) over tilde ((m+n(1+rho))n(1/k)), where rho is a small constant. While our algorithms are for the near-linear MPC model, in fact they only use one machine with (O) over tilde (n) memory, where the rest of machines can have sublinear memory of size (O) over tilde (n(gamma)) for a small constant gamma < 1. All previous algorithms for approximate shortest paths in the near-linear MPC model either required Omega(log n) rounds or had an Omega(log n) approximation. Our approach is based on fast construction of near-additive emulators, limited-scale hopsets and limited-scale distance sketches that are tailored for the MPC model. While our end-results are for the near-linear MPC model, many of the tools we construct such as hopsets and emulators are constructed in the more restricted sublinear MPC model.

关键词： Approximate Shortest Paths Massively parallel Computation Hopsets Emulators Distance Oracles

来源：评论

学校读者我要写书评

暂无评论

Brief Announcement: Work Stealing through Partial Asynchronous Delegation 24

Brief Announcement: Work Stealing through Partial Asynchrono...

引用

36th acm symposium on parallelism in algorithms and architectures (SPAA)

作者： Wang, Jiawei Liu, Yutao Fu, Ming Haertig, Hermann Chen, Haibo Tech Univ Dresden Huawei Dresden Res Ctr Dresden Germany Huawei Dresden Res Ctr Dresden Germany Huawei Cent Software Inst Shenzhen Peoples R China Tech Univ Dresden Dresden Germany Shanghai Jiao Tong Univ Huawei Cent Software Inst Shanghai Peoples R China

ISBN: (纸本)9798400704161

Work stealing is a well-established technique in multi-core systems that aims to improve load balancing and task scheduling efficiency. Each processing unit maintains its own task queue, and when idle, it steals tasks from other units. Traditional work-stealing approaches face performance bottlenecks due to costly synchronization primitives and contention arising from concurrent access by both the queue owner and thieves. The state-of-the-art solution addresses these issues through coarse-grained synchronization;however, it restricts stealing in specific scenarios, thereby limiting parallelism. We introduce PadWS, a partial and asynchronous delegated work-stealing algorithm. PadWS employs a block-based design in which, under common cases, the queue owner and thieves work on separate blocks, reducing metadata contention. Delegation is partially enabled for the block in which the owner is located, allowing thieves to steal from it-an approach that deviates from the current block-based approach. Additionally, our delegation strategy is asynchronous, which removes the need for thieves to spin-wait after sending a request.

关键词： parallel processing scheduling work stealing delegation

来源：评论

学校读者我要写书评

暂无评论

Brief Announcement: Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offs 24

Brief Announcement: Red-Blue Pebbling with Multiple Processo...

引用

36th acm symposium on parallelism in algorithms and architectures (SPAA)

作者： Boehnlein, Toni Papp, Pal Andras Yzelman, Albert-Jan N. Huawei Zurich Res Ctr Comp Syst Lab Zurich Switzerland

ISBN: (纸本)9798400704161

The well-studied red-blue pebble game models the execution of an arbitrary computational DAG by a single processor over a two-level memory hierarchy. We present a natural generalization to a multiprocessor setting where each processor has its own limited fast memory, and all processors share unlimited slow memory. To our knowledge, this is the first thorough study that combines pebbling and DAG scheduling problems, capturing the computation of general workloads on multiple processors with memory constraints and communication costs. Our pebbling model enables us to analyze trade-offs between workload balancing, communication and memory limitations, and it captures real-world factors such as superlinear speedups due to parallelization. Our results include upper and lower bounds on the pebbling cost, an analysis of a greedy pebbling strategy, and an extension of NP-hardness results for specific DAG classes from simpler models. For our main technical contribution, we show two inapproximability results that already hold for the long-standing problem of standard red-blue pebbling: (i) the optimal I/O cost cannot be approximated to any finite factor, and (ii) the optimal total cost (I/O+computation) can only be approximated to a limited constant factor, i.e., it does not allow for a polynomial-time approximation scheme. These results also carry over naturally to our multiprocessor pebbling model.

关键词： Red-blue pebble game Limited memory Scheduling parallel computing Approximation Communication costs

来源：评论

学校读者我要写书评

暂无评论

POSTER: Minimizing speculation overhead in a parallel recognizer for regular texts 30

POSTER: Minimizing speculation overhead in a parallel recogn...

引用

30th symposium on Principles and Practice of parallel Programming

作者： Borsotti, Angelo Breveglieri, Luca Morzenti, Angelo Reghizzi, Stefano Crespi Politecn Milan Milan Italy CNR IEIIT Milan Italy

ISBN: (纸本)9798400714436

Speculative data-parallel algorithms for language recognition have been widely experimented for various types of finite-state automata (FA), deterministic (DFA) and nondeterministic (NFA), often derived from regular expressions (RE). Such an algorithm cuts the input string into chunks, independently recognizes each chunk in parallel by means of identical FAs, and at last joins the chunk results and checks the overall consistency. In chunk recognition, it is necessary to speculatively start the FAs in any state, thus causing an overhead that reduces the speedup over a serial algorithm. The existing data-parallel DFA-based recognizers suffer from an excessive number of starting states, and the NFA-based ones suffer from the number of nondeterministic transitions. Our data-parallel algorithm is based on the new FA type called reduced-interface DFA (RI-DFA), which minimizes the speculation overhead without incurring in the penalty of nondeterministic transitions or of impractically enlarged DFA machines. The algorithm is theoretically efficient, because it combines the state-reduction of an NFA with the speed of deterministic transitions, thus improving on both DFA-based and NFA-based existing implementations. The practical applicability of the RI-DFA approach is confirmed by a quantitative comparison of the number of starting states for a large public benchmark of complex FAs. On multi-core computing architectures, the RI-DFA recognizer is considerably faster than the NFA-based one on all benchmarks, while it matches the DFA-based one on some benchmarks and performs much better on some others. The extra time needed to construct RI-DFA vs DFA is moderate and is compatible with a practical use. The full paper with all details is in [4]. © 2025 Copyright held by the owner/author(s).

关键词： regular language recognition data-parallel recognition algorithm minimal speculation speedup onmulti-core architecture multi-entry DFA reduced-interface DFA

来源：评论

学校读者我要写书评

暂无评论

The All Nearest Smaller Values Problem Revisited in Practice, parallel and External Memory 24

The All Nearest Smaller Values Problem Revisited in Practice...

引用

36th acm symposium on parallelism in algorithms and architectures (SPAA)

作者： Sitchinava, Nodari Svenning, Rolf Univ Hawaii Manoa Honolulu HI 96822 USA Aarhus Univ Aarhus Denmark

ISBN: (纸本)9798400704161

We present a thorough investigation of the All Nearest Smaller Values (ANSV) problem from a practical perspective. The ANSV problem is defined as follows: given an array A consisting of n values, for each entry A(i) compute the largest index l < i and the smallest index r > i such that A(i) > A(l) and A(i) > A(r), i.e., the indices of the nearest smaller values to the left and to the right of A(i). The ANSV problem was solved by Berkman, Schieber, and Vishkin [J. algorithms, 1993] in the PRAM model. Their solution in the CREW PRAM model, which we will refer to as the BSV algorithm, achieves optimal O(n) work and O(log n) span. Until now, the BSV algorithm has been perceived as too complicated for practical use, and we are not aware of any publicly available implementations. Instead, the best existing practical solution to the ANSV problem is the implementation by Shun and Zhao presented at DCC'13. They implemented a simpler O(n log n)-work algorithm with an additional heuristic first proposed by Blelloch and Shun at ALENEX'11. We refer to this implementation as the BSZ algorithm. In this paper, we implement the original BSV algorithm and demonstrate its practical efficiency. Despite its perceived complexity, our results show that its performance is comparable to the BSZ algorithm. We also present the first theoretical analysis of the heuristic implemented in the BSZ algorithm and show that it provides a tunable trade-off between optimal work and optimal span. In particular, we show that it achieves O(n(1 + log n/k)) work and O(k(1 + log n/k)) span, for any integer parameter 1 <= k <= n. Thus, for k = Theta(log n), the BSZ algorithm can be made to be work-optimal, albeit at the expense of increased span compared to BSV. Our discussion includes a detailed examination of different input types, particularly highlighting that for random inputs, the low expected distance between values and their nearest smaller values renders simple algorithms efficient. Finally, we analy

关键词： Algorithm analysis parallel algorithms external memory PRAM algorithm engineering all nearest smaller values problem ANSV

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共37页 << < 1 2 3 4 5 6 7 8 9 10 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：