检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

348 篇 会议
18 篇 期刊文献

馆藏范围

366 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

252 篇 工学
- 249 篇 计算机科学与技术...
- 163 篇 软件工程
- 25 篇 电气工程
- 23 篇 信息与通信工程
- 17 篇 控制科学与工程
- 5 篇 电子科学与技术（可...
- 4 篇 农业工程
- 3 篇 生物工程
- 2 篇 机械工程
- 2 篇 生物医学工程（可授...
- 1 篇 材料科学与工程（可...
- 1 篇 建筑学
- 1 篇 化学工程与技术
146 篇 理学
- 143 篇 数学
- 23 篇 统计学（可授理学、...
- 3 篇 生物学
- 3 篇 系统科学
- 1 篇 化学
13 篇 管理学
- 10 篇 管理科学与工程(可...
- 9 篇 工商管理
- 3 篇 图书情报与档案管...
6 篇 农学
- 6 篇 作物学
- 2 篇 农业资源与环境
1 篇 经济学
- 1 篇 应用经济学

主题

83 篇 parallel algorit...
69 篇 parallel process...
12 篇 parallel program...
11 篇 computer program...
9 篇 scheduling
7 篇 computer archite...
7 篇 pram
6 篇 computer systems...
5 篇 graph algorithms
4 篇 performance
4 篇 parallel archite...
4 篇 multithreading
4 篇 transactional me...
4 篇 work stealing
3 篇 parallel process...
3 篇 parallelism
3 篇 approximation al...
3 篇 cilk
3 篇 sorting
3 篇 chip multiproces...

机构

10 篇 carnegie mellon ...
4 篇 carnegie mellon ...
4 篇 univ of paderbor...
3 篇 department of co...
3 篇 university of ma...
3 篇 mit 77 massachus...
2 篇 duke univ durham...
2 篇 univ calif river...
2 篇 carnegie mellon ...
2 篇 univ of toronto ...
2 篇 dept. of compute...
2 篇 at and t bell la...
2 篇 sandia national ...
2 篇 computer science...
2 篇 univ maryland de...
2 篇 univ of californ...
2 篇 department of ma...
2 篇 digital systems ...
2 篇 t.j. watson rese...
2 篇 max planck inst ...

作者

12 篇 gibbons phillip ...
11 篇 blelloch guy e.
6 篇 reif john h.
6 篇 leiserson charle...
5 篇 matias yossi
4 篇 uzi vishkin
4 篇 ramachandran vij...
4 篇 vitter jeffrey s...
4 篇 muthukrishnan s.
4 篇 goodrich michael...
4 篇 phillip b. gibbo...
3 篇 snir marc
3 篇 cormen thomas h.
3 篇 deng xiaotie
3 篇 tangwongsan kana...
3 篇 sohn andrew
3 篇 leighton tom
3 篇 simhadri harsha ...
3 篇 miller gary l.
3 篇 gu yan

语言

353 篇 英文
13 篇 其他

检索条件"任意字段=Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures"

共 366 条记录，以下是71-80 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Brief Announcement: Low Depth Cache-Oblivious Sorting 09

Brief Announcement: Low Depth Cache-Oblivious Sorting

引用

21st acm symposium on parallelism in algorithms and architectures

作者： Blelloch, Guy E. Gibbons, Phillip B. Simhadri, Harsha Vardhan Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9781605586069

Cache-oblivious algorithms have the advantage of achieving good sequential cache complexity across all levels of a multi-level cache hierarchy, regardless of the specifies (cache size and cache line size) of each level. In this paper, we describe cache-oblivious sorting algorithms with optimal work, optimal cache complexity and polylogarithmic depth. Using known mappings, these lead to low cache complexities on shared-memory multiprocessors with a single level of private caches or a single shared cache. Moreover, the low cache complexities extend to shared-memory multiprocessors with common configurations of multi-level caches. The key factor in the low cache complexity on multiprocessors is the low depth of the algorithms we propose.

关键词： Cache-oblivious algorithms sorting parallel algorithms multiprocessors schedulers

来源：评论

学校读者我要写书评

暂无评论

SPAA'09 - proceedings of the 21st annual symposium on parallelism in algorithms and architectures

SPAA'09 - Proceedings of the 21st Annual Symposium on Parall...

引用

21st annual symposium on parallelism in algorithms and architectures, SPAA'09

ISBN: (纸本)9781605586069

The proceedings contain 43 papers. The topics discussed include: speed scaling of processes with arbitrary speedup curves on a multiprocessor;the bell is ringing in speed-scaled multiprocessor scheduling;mapping filtering streaming applications with communication costs;scheduling to minimize staleness and stretch in real-time data warehouses;parameterized maximum and average degree approximation in topic-based publish-subscribe overlay network design;selfishness in transactional memory;at-most-once semantics in asynchronous shared memory;memory models: a case for rethinking parallel languages and hardware;the life and times of a ZooKeeper;Cassandra - a structured storage system on a P2P network;Pregel: a system for large-scale graph processing;towards transactional memory semantics for C++;on avoiding spare aborts in transactional memory;inherent limitations on disjoint-access parallel implementations of transactional memory;reducers and other Cilk++ hyperobjects;and beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Inherent Limitations on Disjoint-Access parallel Implementations of Transactional Memory (Preliminary Version) 09

Inherent Limitations on Disjoint-Access Parallel Implementat...

引用

21st acm symposium on parallelism in algorithms and architectures

作者： Attiya, Hagit Hillel, Eshcar Milani, Alessia Technion Israel Inst Technol Dept Comp Sci IL-32000 Haifa Israel

ISBN: (纸本)9781605586069

Transactional memory (TM) is a promising approach for designing concurrent data structures, and it is essential to develop better understanding of the formal properties that can be achieved by TM implementations. Two fundamental properties of TM implementations are disjoint-access parallelism, which is critical for their scalability, and the invisibility of read operations, which reduces memory contention. This paper proves an inherent tradeoff for implementations of transactional memories: they cannot be both disjoint-access parallel and have read-only transactions that are invisible and always terminate successfully. In fact, a lower bound of Omega(t) is proved on the number of writes needed in order to implement a read-only transaction of t items, which successfully terminates in a disjoint-access parallel TM implementation. The results assume strict serializability and thus hold under the assumption of opacity. It is shown how to extend the results to hold also for weaker consistency conditions, serializability and snapshot isolation.

关键词： Transactional memory disjoint-access parallelism partial snapshots lower bound impossibility result

来源：评论

学校读者我要写书评

暂无评论

parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks 09

Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Mu...

引用

21st acm symposium on parallelism in algorithms and architectures

作者： Buluc, Aydin Fineman, Jeremy T. Frigo, Matteo Gilbert, John R. Leiserson, Charles E. Univ Calif Santa Barbara Dept Comp Sci Santa Barbara CA 93106 USA

ISBN: (纸本)9781605586069

This paper introduces a storage format for sparse matrices, called compressed sparse blocks (CSB), which allows both Ax and A(x)(inverted perpendicular) to be computed efficiently in parallel, where A is an n x n sparse matrix with nnz >= n nonzeros and x is a dense n-vector. Our algorithms use Theta(nnz) work (serial running time) and Theta(root nlgn) span (critical-path length), yielding a parallelism of Theta(nnz/root nlgn), which is amply high for virtually any large matrix. The storage requirement for CSB is esssentially the same as that for the more-standard compressed-sparse-rows (CSR) format, for which computing Ax in parallel is easy but A(x)(inverted perpendicular) is difficult. Benchmark results indicate that on one processor, the CSB algorithms for Ax and A(x)(inverted perpendicular) run just as fast as the CSR algorithm for Ax, but the CSB algorithms also scale up linearly with processors until limited by off-chip memory bandwidth.

关键词： Compressed sparse blocks compressed sparse columns compressed sparse rows matrix transpose matrix-vector multiplication multithreaded algorithm parallelism span sparse matrix storage format work

来源：评论

学校读者我要写书评

暂无评论

Field-Split parallel Architecture for High Performance Multi-Match Packet Classification Using FPGAs 09

Field-Split Parallel Architecture for High Performance Multi...

引用

21st acm symposium on parallelism in algorithms and architectures

作者： Jiang, Weirong Prasanna, Viktor K. Univ So Calif Ming Hsieh Dept Elect Engn Los Angeles CA 90089 USA

ISBN: (纸本)9781605586069

Multi-match packet classification is a critical function in network intrusion detection systems (NIDS), where all matching rules for a packet need to be reported. Most of the previous work is based on ternary content addressable memories (TCAMs) which are expensive and are not scalable with respect to clock rate, power consumption, and circuit area. This paper studies the characteristics of real-life Snort NIDS rule sets, and proposes a novel SRAM-based architecture. The proposed architecture is called field-split parallel bit vector (FSBV) where some header fields of a packet are further split into bit-level subfields. Unlike previous multi-match packet classification algorithms which suffer from memory explosion, the memory requirement of FSBV is linear in the number of rules. FPGA technology is exploited to provide high throughput and to support dynamic updates. Implementation results show that our architecture can store on a single Xilinx Virtex-5 FPGA the full set of packet header rules extracted from the latest Snort NIDS and sustains 100 Gbps throughput for minimum size (40 bytes) packets. The design achieves 1.25x improvement in throughput while the power consumption is approximately one fourth that of the state-of-the-art solutions.

关键词： FPGA multi-match packet classification NIDS SRAM

来源：评论

学校读者我要写书评

暂无评论

Beyond Nested parallelism: Tight Bounds on Work-Stealing Overheads for parallel Futures 09

Beyond Nested Parallelism: Tight Bounds on Work-Stealing Ove...

引用

21st acm symposium on parallelism in algorithms and architectures

作者： Spoonhower, Daniel Blelloch, Guy E. Gibbons, Phillip B. Harper, Robert Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9781605586069

Work stealing is a popular method of scheduling fine-grained parallel tasks. The performance of work stealing has been extensively studied, both theoretically and empirically, but primarily for the restricted class of nested-parallel (or fully strict) computations. we extend this prior work by considering a broader class of programs that also supports pipelined parallelism through the use of parallel futures. Though the overhead of work-stealing schedulers is often quantified in terms of the number of steals, we show that a broader metric, the number of deviations, is a better way to quantify work-stealing overhead for less restrictive forms of parallelism, including parallel futures. For such parallelism, we prove bounds on work-stealing overheads-scheduler time and cache misses-as a function of the number of deviations Deviations can occur, for example, when work is stolen or when a future is touched. We also show instances where deviations can occur independently of steals and touches. Next, we prove that, under work stealing, the expected number of deviations is O(Pd+td) in a P-processor execution of a computation with span d and t touches of futures. Moreover, this bound is existentially tight for any work-stealing scheduler that is parsimonious (those where processors steal only when their queues are empty);this class includes all prior work-stealing schedulers. We also present empirical measurements of the number of deviations incurred by a classic application of futures, Halstead's quicksort, using our parallel implementation of ML. Finally, we identify a family of applications that use futures and, in contrast to quicksort, incur significantly smaller overheads.

关键词： scheduling work stealing futures performance bounds

来源：评论

学校读者我要写书评

暂无评论

Communication-Optimal parallel and Sequential Cholesky Decomposition 09

Communication-Optimal Parallel and Sequential Cholesky Decom...

引用

21st acm symposium on parallelism in algorithms and architectures

作者： Ballard, Grey Demmel, James Holtz, Olga Schwartz, Oded Univ Calif Berkeley Dept Comp Sci Berkeley CA 94720 USA

ISBN: (纸本)9781605586069

Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case). Communication costs often dominate arithmetic costs, so it is of interest to design algorithms minimizing communication. In this paper we first extend known lower bounds on the communication cost (both for bandwidth and for latency) of conventional (O(n(3))) matrix multiplication to Cholesky factorization, which is used for solving dense symmetric positive definite linear systems. Second, we compare the cost of various Cholesky decomposition implementations to this lower bound, and draw the following conclusions: (1) "Naive" sequential algorithms for Cholesky attain neither the bandwidth nor latency lower bounds. (2) The sequential blocked algorithm in LAPACK (with the right block size), as well as various recursive algorithms [AP00, GJ01, ACW01, ST04] and one based on work of Toledo [Tol97], can attain the bandwidth lower bound. (3) The LAPACK algorithm can also attain the latency bound if used with blocked data structures rather than column-wise or row-wise matrix data structures, though the Toledo algorithm cannot. (4) The recursive sequential algorithm due to [AP00], attains the bandwidth and latency lower bounds at every level of a multi-level memory hierarchy, in a "cache-oblivious" way. (5) The parallel implementation of Cholesky in the ScaLA-PACK library (again with the right block-size) attains both the bandwidth and latency lower bounds to within a poly-logarithmic factor. Combined with prior results in [DGHL08a, DGHL08b, DGX08] this gives a complete set;of communication-optimal algorithms for O(n(3)) implementations of three basic factorizations of dense linear algebra: LU with pivoting, QR and Cholesky. But it goes beyond this prior work on sequential LU and QR by optimizing communication for any number of levels of memo

关键词： communication avoiding Cholesky decomposition lower bound bandwidth latency algorithm

来源：评论

学校读者我要写书评

暂无评论

Brief Announcement: Optimal Speedup on a Low-Degree Multi-Core parallel Architecture (LoPRAM)

Brief Announcement: Optimal Speedup on a Low-Degree Multi-Co...

引用

20th acm symposium on parallelism in algorithms and architectures

作者： Dorrigiv, Reza Lopez-Ortiz, Alejandro Salinger, Alejandro Univ Waterloo Sch Comp Sci Waterloo ON N2L 3G1 Canada

ISBN: (纸本)9781595939739

Over the last five years, major microprocessor manufacturers have released plans for a rapidly increasing number of cores per microprossesor, with upwards of 64 cores by 2015. In this setting, a sequential RAM computer will no longer accurately reflect the architecture on which algorithms are being executed. In this paper we propose a model of low degree parallelism (LoPRAM) which builds upon the RAM and PRAM models yet better reflects recent advances in parallel (multi-core) architectures. This model supports a high level of abstraction that simplifies the design and analysis of parallel programs. More importantly we show that in many instances it naturally leads to work-optimal parallel algorithms via simple modifications to sequential algorithms.

关键词： algorithms Theory

来源：评论

学校读者我要写书评

暂无评论

Fundamental parallel algorithms for Private-Cache Chip Multiprocessors 08

Fundamental Parallel Algorithms for Private-Cache Chip Multi...

引用

20th acm symposium on parallelism in algorithms and architectures

作者： Arge, Lars Goodrich, Michael T. Nelson, Michael Sitchinava, Nodari Univ Aarhus MADALGO Aarhus Denmark Univ Calif Irvine Irvine CA 92697 USA

ISBN: (纸本)9781595939739

In this paper, we study parallel algorithms for private-cache chip multiprocessors (CMPs), focusing on methods for foundational problems that are scalable with the number of cores. By focusing on private-cache CMPs, we show that we can design efficient algorithms that need no additional assumptions about the way cores are interconnected, for we assume that all inter-processor communication occurs through the memory hierarchy. We study several fundamental problems, including prefix sums, selection, and sorting, which often form the building blocks of other parallel algorithms. Indeed, we present two sorting algorithms, a distribution sort and a mergesort. Our algorithms are asymptotically optimal in terms of parallel cache accesses and space complexity under reasonable assumptions about the relationships between the number of processors, the size of memory, and the size of cache blocks. In addition, we study sorting lower bounds in a computational model, which we call the parallel external-memory (PEM) model, that formalizes the essential properties of our algorithms for private-cache CMPs.

关键词： parallel External Memory PEM private-cache CMP

来源：评论

学校读者我要写书评

暂无评论

Directed Transmission Method, a Fully Asynchronous Approach to Solve Sparse Linear Systems in parallel 08

Directed Transmission Method, a Fully Asynchronous Approach ...

引用

20th acm symposium on parallelism in algorithms and architectures

作者： Wei, Fei Yang, Huazhong Tsinghua Univ Dept Elect Engn Beijing Peoples R China

ISBN: (纸本)9781595939739

There are many algorithms to solve large sparse linear systems in parallel;however, most of them acquire synchronization and thus are lack of scalability. In this paper, we propose a new distributed numerical algorithm, called Directed Transmission Method (DTM). DTM is a fully asynchronous, scalable and continuous-time iterative algorithm to solve the arbitrarily-large sparse linear system whose coefficient matrix is symmetric-positive-definite (SPD). DTM is able to be freely running on the heterogeneous parallel computer with arbitrary number of processors, which might be manycore microprocessors, clusters, grids, clouds, and the Internet. We proved that DTM is convergent by making use of the final value theorem of Laplacian Transformation. Numerical experiments show that DTM is efficient.

关键词： Asynchronous Algorithm Convergence Theory Directed Transmission Method (DTM) Distributed Algorithm Sparse Linear System Virtual Transmission Method (VTM)

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共37页 << < 4 5 6 7 8 9 10 11 12 13 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：