检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

348 篇 会议
18 篇 期刊文献

馆藏范围

366 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

252 篇 工学
- 249 篇 计算机科学与技术...
- 163 篇 软件工程
- 25 篇 电气工程
- 23 篇 信息与通信工程
- 17 篇 控制科学与工程
- 5 篇 电子科学与技术（可...
- 4 篇 农业工程
- 3 篇 生物工程
- 2 篇 机械工程
- 2 篇 生物医学工程（可授...
- 1 篇 材料科学与工程（可...
- 1 篇 建筑学
- 1 篇 化学工程与技术
146 篇 理学
- 143 篇 数学
- 23 篇 统计学（可授理学、...
- 3 篇 生物学
- 3 篇 系统科学
- 1 篇 化学
13 篇 管理学
- 10 篇 管理科学与工程(可...
- 9 篇 工商管理
- 3 篇 图书情报与档案管...
6 篇 农学
- 6 篇 作物学
- 2 篇 农业资源与环境
1 篇 经济学
- 1 篇 应用经济学

主题

83 篇 parallel algorit...
69 篇 parallel process...
12 篇 parallel program...
11 篇 computer program...
9 篇 scheduling
7 篇 computer archite...
7 篇 pram
6 篇 computer systems...
5 篇 graph algorithms
4 篇 performance
4 篇 parallel archite...
4 篇 multithreading
4 篇 transactional me...
4 篇 work stealing
3 篇 parallel process...
3 篇 parallelism
3 篇 approximation al...
3 篇 cilk
3 篇 sorting
3 篇 chip multiproces...

机构

10 篇 carnegie mellon ...
4 篇 carnegie mellon ...
4 篇 univ of paderbor...
3 篇 department of co...
3 篇 university of ma...
3 篇 mit 77 massachus...
2 篇 duke univ durham...
2 篇 univ calif river...
2 篇 carnegie mellon ...
2 篇 univ of toronto ...
2 篇 dept. of compute...
2 篇 at and t bell la...
2 篇 sandia national ...
2 篇 computer science...
2 篇 univ maryland de...
2 篇 univ of californ...
2 篇 department of ma...
2 篇 digital systems ...
2 篇 t.j. watson rese...
2 篇 max planck inst ...

作者

12 篇 gibbons phillip ...
11 篇 blelloch guy e.
6 篇 reif john h.
6 篇 leiserson charle...
5 篇 matias yossi
4 篇 uzi vishkin
4 篇 ramachandran vij...
4 篇 vitter jeffrey s...
4 篇 muthukrishnan s.
4 篇 goodrich michael...
4 篇 phillip b. gibbo...
3 篇 snir marc
3 篇 cormen thomas h.
3 篇 deng xiaotie
3 篇 tangwongsan kana...
3 篇 sohn andrew
3 篇 leighton tom
3 篇 simhadri harsha ...
3 篇 miller gary l.
3 篇 gu yan

语言

353 篇 英文
13 篇 其他

检索条件"任意字段=Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures"

共 366 条记录，以下是61-70 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Towards Optimizing Energy Costs of algorithms for Shared Memory architectures 10

Towards Optimizing Energy Costs of Algorithms for Shared Mem...

引用

22nd acm symposium on parallelism in algorithms and architectures

作者： Korthikanti, Vijay Anand Agha, Gul Univ Illinois Dept Comp Sci Urbana IL USA

ISBN: (纸本)9781450300797

Energy consumption by computer systems has emerged as an important concern However, the energy consumed in executing an algorithm cannot be inferred from its performance alone it must be modeled explicitly This paper analyzes energy consumption of parallel algorithms executed on shared memory multicore processors Specifically, we develop a methodology to evaluate how energy consumption of a given parallel algorithm changes as the number of cores and their frequency is varied We use this analysis to establish the optimal number of cores to minimize the energy consumed by the execution of a parallel algorithm for a specific problem size while satisfying a given performance requirement We study the sensitivity of our analysis to changes in parameters such as the ratio of the power consumed by a computation step versus the power consumed in accessing memory The results show that the relation between the problem size and the optimal number of cores is relatively unaffected for a wide range of these parameters.

关键词： Energy Performance parallel algorithms Shared Memory architectures

来源：评论

学校读者我要写书评

暂无评论

Brief Announcement: A Reinforcement Learning Approach for Dynamic Load-Balancing of parallel Digital Logic Simulation 10

Brief Announcement: A Reinforcement Learning Approach for Dy...

引用

22nd acm symposium on parallelism in algorithms and architectures

作者： Meraji, Sina Zhang, Wei Tropper, Carl McGill Univ Sch Comp Sci Montreal PQ Canada

ISBN: (纸本)9781450300797

In this paper, we present a dynamic load-balancing algorithm for parallel digital logic simulation making use of reinforcement learning We first introduce two dynamic load-balancing algorithms oriented towards balancing the computational and communication load respectively and then utilize reinforcement learning to create an algorithm which is a combination of the first two algorithms In addition, the algorithm determines the value of two important parameters the number of processors which participate in the algorithm and the load which is exchanged during its execution. We investigate the algorithms on gate level simulations of several open source VLSI circuits

关键词： Digital Logic Simulation Dynamic load-balancing Reinforcement Learning Time Warp Verilog

来源：评论

学校读者我要写书评

暂无评论

Low Depth Cache-Oblivious algorithms 10

Low Depth Cache-Oblivious Algorithms

引用

22nd acm symposium on parallelism in algorithms and architectures

作者： Blelloch, Guy E. Gibbons, Phillip B. Simhadri, Harsha Vardhan Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9781450300797

In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on parallel machines with private or shared caches The approach is to design nested-parallel algorithms that have low depth (span. critical path length) and for which the natural sequential evaluation order has low cache complexity in the cache-oblivious model We describe several cache-oblivious algorithms with optimal work, polylogarithmic depth, and sequential cache complexities that match the best sequential algorithms, including the first such algorithms for sorting and for sparse-matrix vector multiply on matrices with good vertex separators Using known mappings. our results lead to low cache complexities on shared-memory multiprocessors with a single level of private caches or a single shared cache We generalize these mappings to multi-level cache hierarchies of private or shared caches, implying that our algorithms also have low cache complexities on such hierarchies The key factor in obtaining these low parallel cache complexities is the low depth of the algorithms we propose.

关键词： Cache-oblivious algorithms sorting sparse-matrix vector multiply graph algorithms parallel algorithms multiprocessors schedulers

来源：评论

学校读者我要写书评

暂无评论

SPAA'10 - proceedings of the 22nd annual symposium on parallelism in algorithms and architectures

SPAA'10 - Proceedings of the 22nd Annual Symposium on Parall...

引用

22nd acm symposium on parallelism in algorithms and architectures, SPAA'10

ISBN: (纸本)9781450300797

The proceedings contain 45 papers. The topics discussed include: buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures;scheduling to minimize power consumption using submodular functions;collaborative scoring with dishonest participants;securing every bit: authenticated broadcast in radio networks;brief announcement: on speculative replication of transactional systems;data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory;basic network creation games;on the bit communication complexity of randomized rumor spreading;algorithms and application for grids and clouds;towards optimizing energy costs of algorithms for shared memory architectures;brief announcement: on regenerator placement problems in optical networks;best-effort group service in dynamic networks;and implementing and evaluating nested parallel transactions in software transactional memory.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Brief Announcement: Serial-parallel Reciprocity in Dynamic Multithreaded Languages 10

Brief Announcement: Serial-Parallel Reciprocity in Dynamic M...

引用

22nd acm symposium on parallelism in algorithms and architectures

作者： Agrawal, Kunal Lee, I-Ting Angelina Sukha, Jim Washington Univ St Louis MO 63130 USA

ISBN: (纸本)9781450300797

In dynamically multithreaded platforms that employ work stealing, there appears to be a fundamental tradeoff between providing provably good time and space bounds and supporting SP-reciprocity, the property of allowing arbitrary calling between parallel and serial code, including legacy serial binaries. Many known dynamically multithreaded platforms either fail to support SP-reciprocity or sacrifice on the provable time and space bounds that an efficient work-stealing scheduler could otherwise guarantee We describe PR-Cilk, a design of a runtime system that supports SP-reciprocity in Cilk and provides provable bounds on time and space In order to maintain the space bound, PR-Cilk uses subtree-restricted work stealing. We show that with subtree-restricted work stealing. PR-Cilk provides the same guarantee on stack space usage as ordinary Cilk. The completion time guaranteed by PR-Cilk is slightly worse than ordinary Cilk Nevertheless, if the number of times a C function calls a Cilk function is small, or if each Cilk function called by a C function is sufficiently parallel. PR-Cilk still guarantees linear speedup.

关键词： Cilk dynamic multithreading Intel Threading Building Blocks scheduling work stealing serial-parallel reciprocity

来源：评论

学校读者我要写书评

暂无评论

Implementing and Evaluating Nested parallel Transactions in Software Transactional Memory 10

Implementing and Evaluating Nested Parallel Transactions in ...

引用

22nd acm symposium on parallelism in algorithms and architectures

作者： Baek, Woongki Bronson, Nathan Kozyrakis, Christos Olukotun, Kunle Stanford Univ Comp Syst Lab Stanford CA 94305 USA

ISBN: (纸本)9781450300797

Transactional Memory (TM) is a promising technique that simplifies parallel programming for shared-memory applications To date, most TM systems have been designed to efficiently support single-level parallelism To achieve widespread use and maximize performance gains. TM must support nested parallelism available in many applications and supported by several programming models. We present NesTM, a software TM (STM) system that supports closed-nested parallel transactions NesTM is based on a high-performance. blocking STM that uses eager version management and word-granularity conflict detection Its algorithm targets the state and runtime overheads of nested parallel transactions We also describe several subtle correctness issues in supporting nested parallel transactions in NesTM and discuss their performance impact Through our evaluation, we quantitatively analyze the performance of NesTM using STAMP applications and microbenchmarks based on concurrent data structures. First, we show that the performance overhead of NesTM is reasonable when single-level parallelism is used. Second. we quantify the incremental overhead of NesTM when the parallelism is exploited in deeper nesting levels and draw conclusions that can be useful in designing a nesting-aware TM runtime environment. Finally, we demonstrate a use-case where nested parallelism improves the performance of a transactional microbenchmark

关键词： Transactional Memory Nested parallelism parallel Programming

来源：评论

学校读者我要写书评

暂无评论

New algorithms for Efficient parallel String Comparison 10

New Algorithms for Efficient Parallel String Comparison

引用

22nd acm symposium on parallelism in algorithms and architectures

作者： Krusche, Peter Tiskin, Alexander Univ Warwick Dept Comp Sci Coventry CV4 7AL W Midlands England

ISBN: (纸本)9781450300797

In this paper. we show new parallel algorithms for a set of classical string comparison problems. computation of string alignments. longest common subsequences (LCS) or edit distances, and longest increasing subsequence computation. These problems have a wide range of applications, in particular in computational biology and signal processing. We discuss the scalability of our new parallel algorithms in computation time, in memory, and in commumcation Our new algorithms are based on an efficient parallel method for (min, +)-multiplication of distance matrices The core result of this paper is a scalable parallel algorithm for multiplying Implicit simple unit-Monge matrices of size n x n on p processors using lime O(n log n/p), communication O(n log p/p) and O( log p) supersteps. This algorithm allows us to implement scalable LCS computation for two strings of length n using time O(n(2)/p) and communication O(n/root P) requiring local memory of size O(n/root P) on each processor Furthermore, our algorithm can be used to obtain the first generally work-scalable algorithm for computing the longest increasing subsequence (LIS) Our algorithm for LIS computation requires computation O(n log(2) n/p), communication O(n log(2) p/p), and O(log(2) p) supersteps for computing the LIS of a sequence of length n This is within a log n factor of work-optimality for the LIS problem. which can be solved sequentially in time O(n log n) in the comparison-based model. Our LIS algorithm is also within a log p-factor of achieving perfectly scalable communication and furthermore has perfectly scalable memory size requirements of O(n/p) per processor

关键词： BSP algorithms longest common subsequences longest increasing subsequences

来源：评论

学校读者我要写书评

暂无评论

The Cilkview Scalability Analyzer 10

The Cilkview Scalability Analyzer

引用

22nd acm symposium on parallelism in algorithms and architectures

作者： He, Yuxiong Leiserson, Charles E. Leiserson, William M. Microsoft Res Redmond WA USA

ISBN: (纸本)9781450300797

The Cilkview scalability analyzer is a software tool for profiling, estimating scalability, and benchmarking multithreaded Cilk++ applications Cilkview monitors logical parallelism during an instrumented execution of the Cilk++ application on a single processing core. As Cilkview executes, it analyzes logical dependencies within the computation to determine its work and span (critical-path length) These metrics allow Cilkview to estimate parallelism and predict how the application will scale with the number of processing cores. In addition, Cilkview analyzes scheduling overhead using the concept of a "burdened dag," which allows it to diagnose performance problems in the application due to an insufficient grain size of parallel subcomputations Cilkview employs the Pin dynamic-instrumentation framework to collect metrics during a serial execution of the application code It operates directly on the optimized code rather than on a debug version Metadata embedded by the Cilk++ compiler in the binary executable identifies the parallel control constructs in the executing application This approach introduces little or no overhead to the program binary in normal runs. Cilkview can perform real-time scalability benchmarking automatically, producing gnuplot-compatible output that allows developers to compare an application's performance with the tool's predictions. If the program performs beneath the range of expectation, the programmer can be confident in seeking a cause such as insufficient memory bandwidth, false sharing, or contention, rather than inadequate parallelism or insufficient exam size.

关键词： Burdened parallelism Cilk plus Cilkview dag model multicore programming multithreading parallelism parallel programming performance scalability software tools span speedup work

来源：评论

学校读者我要写书评

暂无评论

A Work-Efficient parallel Breadth-First Search Algorithm (or How to Cope with the Nondeterminism of Reducers) 10

A Work-Efficient Parallel Breadth-First Search Algorithm (or...

引用

22nd acm symposium on parallelism in algorithms and architectures

作者： Leiserson, Charles E. Schardl, Tao B. MIT Comp Sci & Artificial Intelligence Lab Cambridge MA 02139 USA

ISBN: (纸本)9781450300797

We have developed a multithreaded implementation of breadth-first search (BFS) of a sparse graph using the Cilk++ extensions to C++. Our PBFS program on a single processor runs as quickly as a standard C++ breadth-first search implementation PBFS achieves high work-efficiency by using a novel implementation of a multiset data structure, called a "bag," in place of the FIFO queue usually employed in serial breadth-first search algorithms For a variety of benchmark Input graphs whose diameters are significantly smaller than the number of vertices a condition met by many real-world graphs PBFS demonstrates good speedup with the number of processing cores. Since PBFS employs a nonconstant-time "reducer" a "hyper-object" feature of Cilk++ the work inherent in a PM execution depends nondeterministically on how the underlying work-stealing scheduler load-balances the computation We provide a general method for analyzing nondeterministic programs that use reducers PBFS also is nondeterministic in that it contains benign races which affect its performance but not its correctness Fixing these races with mutual-exclusion locks slows down PBFS empirically, but it makes the algorithm amenable to analysis In particular, we show that for a graph G = (V, E) with diameter D and bounded out-degree, this data-race-free version of PBFS algorithm runs in time O((V + E)/P + Dlg(3) (V/D)) on P processors, which means that it attains near-perfect linear speedup if P << (V + E)I Dlg(3) (V/D).

关键词： Breadth-first search Cilk graph algorithms hyperobjects multithreading nondeterminism parallel algorithms reducers work-stealing

来源：评论

学校读者我要写书评

暂无评论

Brief Announcement: Performance Potential of an Easy-to-Program PRAM-On-Chip Prototype Versus State-of-the-Art Processor 09

Brief Announcement: Performance Potential of an Easy-to-Prog...

引用

21st acm symposium on parallelism in algorithms and architectures

作者： Caragea, George C. Saybasili, A. Beliz Wen, Xingzhi Vishkin, Uzi Univ Maryland College Pk MD 20742 USA

ISBN: (纸本)9781605586069

we compare the Paraleap FPCA computer, a 64-processor hardware prototype of the PRAM-driven XMT architecture, with an Intel Core 2 Duo processor and show that Paraleap outperforms the Intel processor by up to 13.89x in terms of cycle counts. The comparison favors the Intel design, since the silicon area of an ASIC implementation of the 64-processor XMT design is the same as that of a single core.

关键词： paraleap parallel algorithms xmt PRAM explicit multi-treading ease of programming on-chip parallel processor

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共37页 << < 3 4 5 6 7 8 9 10 11 12 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：