检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

1,505 篇 会议
105 篇 期刊文献

馆藏范围

1,610 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

1,169 篇 工学
- 1,112 篇 计算机科学与技术...
- 557 篇 软件工程
- 118 篇 电气工程
- 75 篇 信息与通信工程
- 46 篇 控制科学与工程
- 37 篇 电子科学与技术（可...
- 13 篇 材料科学与工程（可...
- 13 篇 农业工程
- 11 篇 机械工程
- 11 篇 光学工程
- 8 篇 化学工程与技术
- 8 篇 生物工程
- 7 篇 建筑学
- 7 篇 生物医学工程（可授...
- 6 篇 动力工程及工程热...
- 5 篇 土木工程
- 3 篇 力学（可授工学、理...
579 篇 理学
- 557 篇 数学
- 55 篇 统计学（可授理学、...
- 16 篇 物理学
- 9 篇 生物学
- 9 篇 系统科学
- 8 篇 化学
74 篇 管理学
- 64 篇 管理科学与工程(可...
- 40 篇 工商管理
- 11 篇 图书情报与档案管...
16 篇 农学
- 16 篇 作物学
6 篇 经济学
- 6 篇 应用经济学
3 篇 法学
- 3 篇 社会学
3 篇 教育学
- 3 篇 教育学
2 篇 医学
1 篇 文学
1 篇 军事学

主题

237 篇 parallel algorit...
175 篇 parallel process...
80 篇 computer archite...
74 篇 parallel process...
57 篇 parallel program...
56 篇 algorithms
47 篇 parallel archite...
41 篇 hardware
30 篇 scheduling
27 篇 computer program...
21 篇 graph algorithms
20 篇 computer systems...
18 篇 approximation al...
18 篇 processor schedu...
18 篇 computational mo...
18 篇 field programmab...
17 篇 parallel computi...
16 篇 performance
16 篇 delay
15 篇 computer science

机构

32 篇 carnegie mellon ...
15 篇 swiss fed inst t...
15 篇 carnegie mellon ...
11 篇 univ maryland de...
11 篇 stanford univ st...
10 篇 univ maryland co...
10 篇 mit 77 massachus...
10 篇 univ calif berke...
8 篇 eth zurich
7 篇 georgetown univ ...
7 篇 mit cambridge ma...
7 篇 univ texas austi...
6 篇 penn state univ ...
6 篇 mit csail cambri...
5 篇 univ calif river...
5 篇 princeton univer...
5 篇 university of ma...
5 篇 microsoft res re...
5 篇 carnegie mellon ...
5 篇 harvard univ cam...

作者

38 篇 blelloch guy e.
20 篇 gu yan
18 篇 gibbons phillip ...
18 篇 shun julian
18 篇 goodrich michael...
16 篇 fineman jeremy t...
15 篇 sun yihan
14 篇 dhulipala laxman
13 篇 vishkin uzi
12 篇 agrawal kunal
11 篇 leiserson charle...
10 篇 ballard grey
10 篇 hoefler torsten
10 篇 anon
10 篇 miller gary l.
10 篇 harris david g.
9 篇 ghaffari mohsen
9 篇 tangwongsan kana...
9 篇 reif john h.
9 篇 demmel james

语言

1,556 篇 英文
54 篇 其他

检索条件"任意字段=Annual ACM Symposium on Parallel Algorithms and Architectures"

共 1610 条记录，以下是141-150 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Minimizing speculation overhead in a parallel recognizer for regular texts 25

Minimizing speculation overhead in a parallel recognizer for...

引用

Proceedings of the 30th acm SIGPLAN annual symposium on Principles and Practice of parallel Programming

作者： Angelo Borsotti Luca Breveglieri Angelo Morzenti Stefano Crespi Reghizzi Politecnico di Milano Milano Italy Politecnico di Milano and CNR-IEIIT Milano Italy

ISBN: (纸本)9798400714436

Speculative data-parallel algorithms for language recognition have been widely experimented for various types of finitestate automata (FA), deterministic (DFA) and nondeterministic (NFA), often derived fromregular expressions (RE). Such an algorithm cuts the input string into chunks, independently recognizes each chunk in parallel by means of identical FAs, and at last joins the chunk results and checks the overall consistency. In chunk recognition, it is necessary to speculatively start the FAs in any state, thus causing an overhead that reduces the speedup over a serial algorithm. The existing data-parallel DFA-based recognizers suffer from an excessive number of starting states, and the NFA-based ones suffer from the number of nondeterministic transitions.

关键词： data-parallel recognition algorithm

来源：评论

学校读者我要写书评

暂无评论

Low-span parallel algorithms for the binary-forking model 21

Low-span parallel algorithms for the binary-forking model

引用

33rd acm symposium on parallelism in algorithms and architectures, SPAA 2021

作者： Ahmad, Zafar Chowdhury, Rezaul Das, Rathish Ganapathi, Pramod Gregory, Aaron Javanmard, Mohammad Mahdi Stony Brook University Stony BrookNY United States University of Waterloo WaterlooON Canada

ISBN: (纸本)9781450380706

The binary-forking model is a parallel computation model, formally defined by Blelloch et al., in which a thread can fork a concurrent child thread, recursively and asynchronously. The model incurs a cost of (log n) to spawn or synchronize n tasks or threads. The binary-forking model realistically captures the performance of parallel algorithms implemented using modern multithreaded programming languages on multicore shared-memory machines. In contrast, the widely studied theoretical PRAM model does not consider the cost of spawning and synchronizing threads, and as a result, algorithms achieving optimal performance bounds in the PRAM model may not be optimal in the binary-forking model. Often, algorithms need to be redesigned to achieve optimal performance bounds in the binary-forking model and the non-constant synchronization cost makes the task challenging. In this paper, we show that in the binary-forking model we can achieve optimal or near-optimal span with negligible or no asymptotic blowup in work for comparison-based sorting, Strassen's matrix multiplication (MM), and the Fast Fourier Transform (FFT). Our major results are as follows: (1) A randomized comparison-based sorting algorithm with optimal O(log n) span and O(nlog n) work, both w.h.p. in n. (2) An optimal O(log n) span algorithm for Strassen's matrix multiplication (MM) with only a loglog n -factor blow-up in work as well as a near-optimal O(log n loglog log n) span algorithm with no asymptotic blow-up in work. (3) A near-optimal O(log n logloglog n) span Fast Fourier Transform (FFT) algorithm with less than a log n-factor blow-up in work for all practical values of n (i.e., n le 10 ^10,000 ). © 2021 acm.

关键词： Fast Fourier transforms

来源：评论

学校读者我要写书评

暂无评论

The Dataflow Abstract Machine Simulator Framework

The Dataflow Abstract Machine Simulator Framework

引用

annual International symposium on Computer Architecture, ISCA

作者： Nathan Zhang Rubens Lacouture Gina Sohn Paul Mure Qizheng Zhang Fredrik Kjolstad Kunle Olukotun Stanford University Stanford USA MIT Cambridge USA

ISBN: (数字)9798350326581

ISBN: (纸本)9798350326598

The growing interest in novel dataflow architectures and streaming execution paradigms has created the need for a simulator optimized for modeling dataflow systems. To fill this need, we present three new techniques that make it feasible to simulate complex systems consisting of thousands of components. First, we introduce an interface based on Communicating Sequential Processes which allows users to simultaneously describe functional and timing characteristics. Second, we introduce a scalable point-to-point synchronization scheme that avoids global synchronization. Finally, we demonstrate a technique to exploit slack in the simulated system, such as FIFOs, to increase simulation parallelism. We implement these techniques in the Dataflow Machine (DAM), a parallel simulator framework for dataflow systems. We demonstrate the benefits of using DAM by highlighting three case studies using the framework. First, we use DAM directly as an exploration tool for streaming algorithms on dataflow hardware. We simulate two different implementations of the attention algorithm used in large language models, and use DAM to show that the second implementation only requires a constant amount of local memory. Second, we re-implement a simulator for a sparse tensor algebra accelerator, resulting in $57 \%$ less code and a simulation speedup of up to four orders of magnitude. Finally, we demonstrate a general technique for time-multiplexing real hardware to simulate multiple virtual copies of the hardware using DAM.

关键词： Tensors Machine learning algorithms Dams Large language models Memory management Machine learning parallel processing

来源：评论

学校读者我要写书评

暂无评论

Scalable, Programmable and Dense: The HammerBlade Open-Source RISC-V Manycore

Scalable, Programmable and Dense: The HammerBlade Open-Sourc...

引用

annual International symposium on Computer Architecture, ISCA

作者： Dai Cheol Jung Max Ruttenberg Paul Gao Scott Davidson Daniel Petrisko Kangli Li Aditya K Kamath Lin Cheng Shaolin Xie Peitian Pan Zhongyuan Zhao Zichao Yue Bandhav Veluri Sripathi Muralitharan Adrian Sampson Andrew Lumsdaine Zhiru Zhang Christopher Batten Mark Oskin Dustin Richmond Michael Bedford Taylor University of Washington Cornell University PNNL University of California Santa Cruz

ISBN: (数字)9798350326581

ISBN: (纸本)9798350326598

Existing tiled manycore architectures propose to convert abundant silicon resources into general-purpose parallel processors with unmatched computational density and programmability. However, as we approach 100 K cores in one chip, conventional manycore architectures struggle to navigate three key axes: scalability, programmability, and density. Many manycores sacrifice programmability for density; or scalability for programmability. In this paper, we explore HammerBlade, which simultaneously achieves scalability, programmability and density. HammerBlade is a fully open-source RISC-V manycore architecture, which has been silicon-validated with a 2048-core ASIC implementation using a 14/16nm process. We evaluate the system using a suite of parallel benchmarks that captures a broad spectrum of computation and communication patterns.

关键词： Scalability Instruction sets Source coding Computer architecture Voltage parallel processing Silicon

来源：评论

学校读者我要写书评

暂无评论

Massively parallel algorithms for distance approximation and spanners 21

Massively parallel algorithms for distance approximation and...

引用

33rd acm symposium on parallelism in algorithms and architectures, SPAA 2021

作者： Biswas, Amartya Shankha Dory, Michal Ghaffari, Mohsen Mitrović, Slobodan Nazari, Yasamin Massachusetts Institute of Technology Cambridge United States ETH Zurich Zurich Switzerland Johns Hopkins University Baltimore United States

ISBN: (纸本)9781450380706

Over the past decade, there has been increasing interest in distributed/parallel algorithms for processing large-scale graphs. By now, we have quite fast algorithms - -usually sublogarithmic-time and often poly(loglog n)-time, or even faster - -for a number of fundamental graph problems in the massively parallel computation (MPC) model. This model is a widely-adopted theoretical abstraction of MapReduce style settings, where a number of machines communicate in an all-to-all manner to process large-scale data. Contributing to this line of work on MPC graph algorithms, we present poly(log k) ϵ poly(loglog n) round MPC algorithms for computing O(k^1+o(1) )-spanners in the strongly sublinear regime of local memory. To the best of our knowledge, these are the first sublogarithmic-time MPC algorithms for spanner construction. As primary applications of our spanners, we get two important implications, as follows: -For the MPC setting, we get an O(log^2log n)-round algorithm for O(log^1+o(1) n) approximation of all pairs shortest paths (APSP) in the near-linear regime of local memory. To the best of our knowledge, this is the first sublogarithmic-time MPC algorithm for distance approximations. -Our result above also extends to the Congested Clique model of distributed computing, with the same round complexity and approximation guarantee. This gives the first sub-logarithmic algorithm for approximating APSP in weighted graphs in the Congested Clique model. © 2021 acm.

关键词： MapReduce

来源：评论

学校读者我要写书评

暂无评论

A Deterministic Algorithm for the MST Problem in Constant Rounds of Congested Clique 2021

A Deterministic Algorithm for the MST Problem in Constant Ro...

引用

53rd annual acm SIGACT symposium on Theory of Computing (STOC)

作者： Nowicki, Krzysztof Univ Copenhagen Copenhagen Denmark Univ Wroclaw Wroclaw Poland

ISBN: (纸本)9781450380539

In this paper we show that the Minimum Spanning Tree problem (MST) can be solved deterministically in O(1) rounds of the Congested Clique model. In the Congested Clique model there are n players that perform computation in synchronous rounds. Each round consist of a phase of local computation and a phase of communication, in which each pair of players is allowed to exchange O(log n) bit messages. The studies of this model began with the MST problem: in the paper by Lotker, Pavlov, Patt-Shamir, and Peleg [SPAA'03, SICOMP'05] that defines the Congested Clique model the authors give a deterministic O(log log n) round algorithm that improved over a trivial O(log n) round adaptation of Boravka's algorithm. There was a sequence of gradual improvements to this result: an O(log log log n) round algorithm by Hegeman, Pandurangan, Pemmaraju, Sardeshmukh, and Scquizzato [PODC'15], an O(log* n) round algorithm by Ghaffari and Parter, [PODC'16] and an O(1) round algorithm by Jurdzinski and Nowicki, [SODA'18], but all those algorithms were randomized. Therefore, the question about the existence of any deterministic o(log log n) round algorithms for the Minimum Spanning Tree problem remains open since the seminal paper by Lotker, Pavlov, Patt-Shamir, and Peleg [SPAA'03, SICOMP'05]. Our result resolves this question and establishes that O(1) rounds is enough to solve the MST problem in the Congested CI iq ue model, even if we are not allowed to use any randomness. Furthermore, the amount of communication needed by the algorithm makes it applicable to a variant of the M PC model using machines with local memory of size O(n).

关键词： Minimum Spanning Tree MST Deterministic algorithms Graph algorithms Distributed algorithms Congested Clique parallel algorithms Massively parallel algorithms MapReduce

来源：评论

学校读者我要写书评

暂无评论

A Cloud-Agnostic Serverless Architecture for Distributed Machine Learning

A Cloud-Agnostic Serverless Architecture for Distributed Mac...

引用

IEEE/acm International symposium on Big Data Computing (BDC)

作者： Ionut Predoaia Pedro García-López University of York United Kingdom Universitat Rovira i Virgili Spain

ISBN: (数字)9798350367300

ISBN: (纸本)9798350367317

Serverless computing has shown vast potential for big data analytics applications, especially involving machine learning algorithms. Nevertheless, little consideration has been given in the literature to cloud-agnostic serverless architectures that leverage existing parallel implementations of machine learning algorithms. This work bridges this gap by proposing a multicloud serverless architecture for distributed machine learning, that enables machine learning engineers without cloud computing expertise to effortlessly port already implemented parallel machine learning algorithms to serverless, whilst overcoming vendor lock-in. In this work, two stateful machine learning algorithms have been ported to serverless, k-means clustering and logistic regression. The serverless implementation of k-means provided superior performance and scalability compared to a serverful implementation when using a number of workers that is equal to or slightly lower than the total number of vCPUs available on the VM running the serverful implementation. Additionally, it achieved an 87-fold speedup compared to a sequential implementation. Moreover, two storage designs of the shared state will be proposed for the serverless implementations, one that requires locks for updating the shared state, and another that is lock-free. Our experimental evaluation demonstrates that the performance of the lock-free serverless implementation of k-means declines with the increase in the number of clusters.

关键词： Bridges Logistic regression Machine learning algorithms Scalability Serverless computing Clustering algorithms Machine learning Computer architecture Big Data parallel machines

来源：评论

学校读者我要写书评

暂无评论

A Deterministic parallel APSP Algorithm and its Applications 32

A Deterministic Parallel APSP Algorithm and its Applications

引用

32nd annual acm-SIAM symposium on Discrete algorithms (SODA)

作者： Karczmarz, Adam Sankowski, Piotr Univ Warsaw Inst Informat Warsaw Poland

ISBN: (纸本)9781611976465

In this paper we show a deterministic parallel all-pairs shortest paths algorithm for real-weighted directed graphs. The algorithm has (O) over tilde (nm + (n/d)(3)) work and (O) over tilde (d) depth for any depth parameter d is an element of [1;n]. To the best of our knowledge, such a trade-off has only been previously described for the real-weighted single-source shortest paths problem using randomization [Bringmann et al., ICALP'17]. Moreover, our result improves upon the parallelism of the state-of-the-art randomized parallel algorithm for computing transitive closure, which has (O) over tilde (nm+n(3)/d(2)) work and (O) over tilde (d) depth [Ullman and Yannakakis, SIAM J. Comput. '91]. Our APSP algorithm turns out to be a powerful tool for designing efficient planar graph algorithms in both parallel and sequential regimes. By suitably adjusting the depth parameter d and applying known techniques, we obtain: (1) nearly work-efficient (O) over tilde (n(1/6))-depth parallel algorithms for the real-weighted single-source shortest paths problem and finding a bipartite perfect matching in a planar graph, (2) an (O) over tilde (n(9/8))-time sequential strongly polynomial algorithm for computing a minimum mean cycle or a minimum cost-to-time-ratio cycle of a planar graph, (3) a slightly faster algorithm for computing so-called external dense distance graphs of all pieces of a recursive decomposition of a planar graph. One notable ingredient of our parallel APSP algorithm is a simple deterministic (O) over tilde (nm)-work (O) over tilde (d)-depth procedure for computing (O) over tilde (n/d)-size hitting sets of shortest d-hop paths between all pairs of vertices of a real-weighted digraph. Such hitting sets have also been called d-hub sets. Hub sets have previously proved especially useful in designing parallel or dynamic shortest paths algorithms and are typically obtained via random sampling. Our procedure implies, for example, an (O) over tilde (nm)-time deterministic

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

SmartSAGE: Training Large-scale Graph Neural Networks using In-Storage Processing architectures 22

SmartSAGE: Training Large-scale Graph Neural Networks using ...

引用

49th IEEE/acm annual International symposium on Computer Architecture (ISCA)

作者： Lee, Yunjae Chung, Jinha Rhu, Minsoo Korea Adv Inst Sci & Technol Sch Elect Engn Daejeon South Korea

ISBN: (纸本)9781450386104

Graph neural networks (GNNs) can extract features by learning both the representation of each objects (i.e., graph nodes) and the relationship across different objects (i.e., the edges that connect nodes), achieving state-of-the-art performance in various graph-based tasks. Despite its strengths, utilizing these algorithms in a production environment faces several challenges as the number of graph nodes and edges amount to several billions to hundreds of billions scale, requiring substantial storage space for training. Unfortunately, state-of-the-art ML frameworks employ an in-memory processing model which significantly hampers the productivity of ML practitioners as it mandates the overall working set to fit within DRAM capacity. In this work, we first conduct a detailed characterization on a state-of-the-art, large-scale GNN training algorithm, GraphSAGE. Based on the characterization, we then explore the feasibility of utilizing capacity-optimized NVMe SSDs for storing memory-hungry GNN data, which enables large-scale GNN training beyond the limits of main memory size. Given the large performance gap between DRAM and SSD, however, blindly utilizing SSDs as a direct substitute for DRAM leads to significant performance loss. We therefore develop SmartSAGE, our software/hardware co-design based on an in-storage processing (ISP) architecture. Our work demonstrates that an ISP based large-scale GNN training system can achieve both high capacity storage and high performance, opening up opportunities for ML practitioners to train large GNN datasets without being hampered by the physical limitations of main memory size.

关键词： Graph neural network computational storage device near data processing solid state drives (SSD)

来源：评论

学校读者我要写书评

暂无评论

ParaBit: Processing parallel Bitwise Operations in NAND Flash Memory based SSDs 21

ParaBit: Processing Parallel Bitwise Operations in NAND Flas...

引用

54th annual IEEE/acm International symposium on Microarchitecture (MICRO)

作者： Gao, Congming Xin, Xin Lu, Youyou Zhang, Youtao Yang, Jun Shu, Jiwu Tsinghua Univ Dept Comp Sci & Technol BNRist Beijing Peoples R China Univ Pittsburgh Dept Elect & Comp Engn Pittsburgh PA 15260 USA Univ Pittsburgh Dept Comp Sci Pittsburgh PA 15260 USA

ISBN: (纸本)9781450385572

Processing-in-memory (PIM) and in-storage-computing (ISC) architectures have been constructed to implement computation inside memory and near storage, respectively. While effectively mitigating the overhead of data movement from memory and storage to the processor, due to the limited bandwidth of existing systems, these architectures still suffer from the large data movement overhead between storage and memory, in particular, if the amount of required data is large. It has become a major constraint for further improving the computation efficiency in PIM and ISC architectures. In this paper, we propose ParaBit, a scheme that enables parallel Bitwise operations in NAND flash storage where data reside. By adjusting the latching circuit control and the sequence of sensing operations, ParaBit enables in-flash bitwise operation with no or little extra hardware, which effectively reduces the overhead of data movement between storage and memory. We exploit the massive parallelism in NAND flash based SSDs to mitigate the long latency of flash operations. Our experimental results show that the proposed ParaBit design achieves significant performance improvements over the state-of-the-art PIM and ISC architectures.

关键词： flash memory in-storage computing near data processing bitwise operation

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共161页 << < 11 12 13 14 15 16 17 18 19 20 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：