检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

480 篇 会议
44 篇 期刊文献
1 册 图书

馆藏范围

525 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

415 篇 工学
- 408 篇 计算机科学与技术...
- 260 篇 软件工程
- 55 篇 信息与通信工程
- 25 篇 电气工程
- 23 篇 控制科学与工程
- 14 篇 电子科学与技术（可...
- 6 篇 农业工程
- 3 篇 光学工程
- 3 篇 材料科学与工程（可...
- 2 篇 机械工程
- 2 篇 动力工程及工程热...
- 2 篇 化学工程与技术
- 2 篇 环境科学与工程（可...
- 1 篇 力学（可授工学、理...
- 1 篇 冶金工程
- 1 篇 航空宇航科学与技...
- 1 篇 生物医学工程（可授...
254 篇 理学
- 248 篇 数学
- 29 篇 统计学（可授理学、...
- 4 篇 物理学
- 4 篇 系统科学
- 2 篇 化学
- 1 篇 生物学
36 篇 管理学
- 31 篇 管理科学与工程(可...
- 25 篇 工商管理
- 6 篇 图书情报与档案管...
9 篇 农学
- 9 篇 作物学
- 3 篇 农业资源与环境
4 篇 经济学
- 4 篇 应用经济学
1 篇 法学
- 1 篇 社会学

主题

150 篇 parallel process...
86 篇 parallel algorit...
32 篇 parallel archite...
20 篇 parallel process...
19 篇 computer archite...
18 篇 parallel program...
13 篇 computer science
12 篇 concurrent compu...
12 篇 scheduling
10 篇 hardware
9 篇 approximation al...
8 篇 scalability
8 篇 network topology
7 篇 sorting
7 篇 multiprocessor i...
7 篇 parallel computi...
7 篇 distributed comp...
6 篇 routing
6 篇 hypercubes
6 篇 computer network...

机构

8 篇 carnegie mellon ...
5 篇 carnegie mellon ...
4 篇 univ calif berke...
4 篇 univ of paderbor...
3 篇 uc berkeley unit...
3 篇 paderborn univ p...
3 篇 carnegie mellon ...
3 篇 univ calif berke...
3 篇 georgetown unive...
2 篇 univ calif river...
2 篇 carnegie mellon ...
2 篇 univ calif davis...
2 篇 int comp sci ins...
2 篇 washington state...
2 篇 massachusetts in...
2 篇 at and t bell la...
2 篇 university of ma...
2 篇 duke univ dept c...
2 篇 univ of central ...
2 篇 univ maryland de...

作者

11 篇 blelloch guy e.
9 篇 gibbons phillip ...
7 篇 reif john h.
6 篇 vishkin uzi
5 篇 gu yan
5 篇 leiserson charle...
5 篇 matias yossi
4 篇 karp richard m.
4 篇 snir marc
4 篇 ramachandran vij...
4 篇 tangwongsan kana...
4 篇 sun yihan
4 篇 leighton tom
4 篇 schwartz oded
4 篇 muthukrishnan s.
4 篇 goodrich michael...
4 篇 miller gary l.
4 篇 demmel james
3 篇 ballard grey
3 篇 vitter js

语言

503 篇 英文
22 篇 其他

检索条件"任意字段=8th Annual ACM Symposium on Parallel Algorithms and Architectures"

共 525 条记录，以下是81-90 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Work-efficient matrix inversion in polylogarithmic time 13

Work-efficient matrix inversion in polylogarithmic time

引用

25th acm symposium on parallelism in algorithms and architectures, SPAA 2013

作者： Sanders, Peter Speck, Jochen Steffen, Raoul Institute for Theoretical Informatics Karlsruhe Institute of Technology Karlsruhe Germany

ISBN: (纸本)9781450315722

We present an algorithm for matrix inversion that combines the practical requirement of an optimal number of arithmetic operations and the theoretical goal of a polylogarithmic critical path length. the algorithm reduces inversion to matrix multiplication. It uses Strassen's recursion scheme but on the critical path, it breaks the recursion early switching to an asymptotically inefficient yet fast use of Newton's method. We also show that the algorithm is numerically stable. Overall, we get a candidate for a massively parallel algorithm that scales to exascale systems even on relatively small inputs. Preliminary experiments on multicore machines give the surprising result that even on such moderately parallel machines the algorithm outperforms Intel's Math Kernel Library and that Strassen's algorithm seems to be numerically more stable than one might expect.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Communication optimal parallel multiplication of sparse random matrices 13

Communication optimal parallel multiplication of sparse rand...

引用

25th acm symposium on parallelism in algorithms and architectures, SPAA 2013

作者： Ballard, Grey Buluç, Aydin Demmel, James Grigori, Laura Lipshitz, Benjamin Schwartz, Oded Toledo, Sivan UC Berkeley United States Lawrence Berkeley Natl. Lab. United States INRIA Paris - Rocquencourt France Tel-Aviv University Israel

ISBN: (纸本)9781450315722

parallel algorithms for sparse matrix-matrix multiplication typically spend most of their time on inter-processor communication rather than on computation, and hardware trends predict the relative cost of communication will only increase. thus, sparse matrix multiplication algorithms must minimize communication costs in order to scale to large processor counts. In this paper, we consider multiplying sparse matrices corresponding to Erdo″s- Rényi random graphs on distributedmemory parallel machines. We prove a new lower bound on the expected communication cost for a wide class of algorithms. Our analysis of existing algorithms shows that, while some are optimal for a limited range of matrix density and number of processors, none is optimal in general. We obtain two new parallel algorithms and prove that they match the expected communication cost lower bound, and hence they are optimal.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

annual acm symposium on parallelism in algorithms and architectures

Annual ACM Symposium on Parallelism in Algorithms and Archit...

引用

25th acm symposium on parallelism in algorithms and architectures, SPAA 2013

the proceedings contain 39 papers. the topics discussed include: Fast greedy algorithms in MapReduce and streaming;reduced hardware transactions: a new approach to hybrid transactional memory;recursive design of hardware priority queues;drop the anchor: lightweight memory management for non-blocking data structures;scalable statistics counters;storage and search in dynamic peer-to-peer networks;expected sum and maximum of displacement of random sensors for coverage of a domain;on dynamics in selfish network creation;brief announcement: truly parallel burrows-wheeler compression and decompression;brief announcement: locality in wireless scheduling;brief announcement: universally truthful secondary spectrum auctions;and brief announcement: online batch scheduling for flow objectives.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Brief announcement: Between all and nothing-versatile aborts in hardware transactional memory

Brief announcement: Between all and nothing-versatile aborts...

引用

25th acm symposium on parallelism in algorithms and architectures, SPAA 2013

作者： Diestelhorst, Stephan Fetzer, Christof Spear, Michael Nowack, Martin TU Dresden Germany Lehigh University Bethlehem PA United States

ISBN: (纸本)9781450315722

Hardware Transactional Memory (HTM) implementations are becoming available in commercial, off-the-shelf components. While generally comparable, some implementations deviate from the strict all-or-nothing property of pure Transactional Memory. We analyse these deviations and find that with small modifications, they can be used to accelerate and simplify both transactional and nontransactional programming constructs. At the heart of our extensions we enable access to the transaction's full register state in the abort handler in an existing HTM without extending the architectural register state. Access to the full register state enables applications in both transactional and non-transactional parallel programming: hybrid transactional memory;transactional escape actions;transactional suspend/resume;and alert-on-update.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Brief announcement: Truly parallel burrows-wheeler compression and decompression

Brief announcement: Truly parallel burrows-wheeler compressi...

引用

25th acm symposium on parallelism in algorithms and architectures, SPAA 2013

作者： Edwards, James A. Vishkin, Uzi University of Maryland College Park MD United States

ISBN: (纸本)9781450315722

We present novel work-optimal PRAM algorithms for Burrows-Wheeler (BW) compression and decompression of strings over a constant alphabet. For a string of length n, the depth of the compression algorithm is O(log2n), and the depth of the corresponding decompression algorithm is O(logn). these appear to be the first polylogarithmic-time work-optimal parallel algorithms for any standard lossless compression scheme. the algorithms for the individual stages of compression and decompression may also be of independent interest: 1. a novel O(log n)-time, O(n)-work PRAM algorithm for Huffman decoding;2. original insights into the stages of the BW compression and decompression problems, bringing out parallelism that was not readily apparent. We then mapped such parallelism in interesting ways to elementary parallel routines that have O(log n)-time, O(n)-work solutions, such as: (i) prefix-sums problems with an appropriately-defined associative binary operator for several stages, and (ii) list ranking for the final stage of decompression (inverse blocksorting transform). Companion work reports empirical speedups of up to 25x for compression and up to 13x for decompression. this reflects a speedup of 7ox over recent work on BW compression on GPUs.

关键词： Inverse problems

来源：评论

学校读者我要写书评

暂无评论

Architectural Support for Runtime Verification on ccNUMA Multiprocessors

Architectural Support for Runtime Verification on ccNUMA Mul...

引用

8th IEEE International Design and Test symposium (IDT)

作者： Nassar, Ahmed Kurdahi, Fadi J. Univ Calif Irvine Dept EECS Irvine CA 92697 USA

ISBN: (纸本)9781479935253

this paper presents a runtime verification (RV) framework on distributed shared-memory multiprocessors based on explicit functional/concurrency intent specification in the form of temporal logic properties. A generic programming model, that subsumes task and data parallelism, has been wrought along with an automata-based formulation of the RV problem. algorithms are implemented for the construction and minimization of automata checkers that can be executed concurrently with multithreaded applications to assert their correct functioning. the needed architectural supporting mechanisms and the ensuing design tradeoffs are investigated using an approximately-timed transaction-level model. the simulation model confirms scalability of the proposed RV approach to large multiprocessor systems. It also quantifies the increase in the number of processors needed to replenish the monitoring-induced performance degradation.

关键词： parallel architectures temporal logic software debugging automata shared memory systems distributed computing

来源：评论

学校读者我要写书评

暂无评论

parallel rotor walks on finite graphs and applications in discrete load balancing 13

Parallel rotor walks on finite graphs and applications in di...

引用

25th acm symposium on parallelism in algorithms and architectures, SPAA 2013

作者： Akbari, Hoda Berenbrink, Petra Simon Fraser University Burnaby BC Canada

ISBN: (纸本)9781450315722

We study the parallel rotor walk process, which works as follows: Consider a graph along with an arbitrary distribution of tokens over its nodes. Every node is equipped with a rotor that points to its neighbours in a fixed circular order each round, every node distributes all of its tokens using the rotor. One token is allocated to the neighbour pointed at by the rotor, then the rotor moves to the subsequent neighbour, and so on, until no token remains. the process can be considered as a deterministic analogue of a process in which tokens perform one independent random walk step in each round. We compare the distribution of tokens in the rotor walk process with expected distribution in the random walk model. the similarity between the two processes is measured by their discrepancy, which is the maximum difference between the corresponding distribution entries over all rounds and nodes. We analyze a lazy variation of rotor walks that simulates a random walk with loop probability of 1/2 on each node, and each node sends not all its tokens, but every other token in each round. Viewing the rotor walk as a load balancing process, we prove that the rotor walk falls in the class of bounded-error diffusion processes introduced in [H]. this gives us discrepancy bounds of Ο(log3/2 n) and Ο(1) for hypercube and r-dimensional torus with r = Ο(1), respectively, which improve over the best existing bounds of Ο(log 2 n) and Ο(n1/r). Also, as a result of switching to the load balancing view, we observe that the existing load balancing results can be translated to rotor walk discrepancy bounds not previously noticed in the rotor walk literature. We also use the idea of rotor walks to propose and analyze a randomized rounding discrete load balancing process that achieves the same balancing quality as similar protocols [11, 3], but uses fewer number of random bits compared to [3], and avoids the negative load problem of [11]. © 2013 acm.

关键词： Random processes

来源：评论

学校读者我要写书评

暂无评论

On-the-Fly Pipeline parallelism 13

On-the-Fly Pipeline Parallelism

引用

25th acm symposium on parallelism in algorithms and architectures, SPAA 2013

作者： Lee, I.-Ting Angelina Leiserson, Charles E. Schardr, Tao B. Sukha, Jim Zhunping, Zhang MIT CSAIL 32 Vassar Street Cambridge MA 02139 United States Intel Corporation 25 Manchester Street Suite 200 Merrimack NH 03054 United States

ISBN: (纸本)9781450315722

Pipeline parallelism organizes a parallel program as a linear sequence of s stages. Each stage processes elements of a data stream, passing each processed data element to the next stage, and then taking on a new element before the subsequent stages have necessarily completed their processing. Pipeline parallelism is used especially in streaming applications that perform video, audio, and digital signal processing. three out of 13 benchmarks in PARSEC, a popular software benchmark suite designed for shared-memory multiprocessors, can be expressed as pipeline parallelism. Whereas most concurrency platforms that support pipeline parallelism use a "construct-and-run" approach, this paper investigates "on-the-fly" pipeline parallelism, where the structure of the pipeline emerges as the program executes rather than being specified a priori. On-the-fly pipeline parallelism allows the number of stages to vary from iteration to iteration and dependencies to be data dependent. We propose simple linguistics for specifying on-the-fly pipeline parallelism and describe a provably efficient scheduling algorithm, the PIPER algorithm, which integrates pipeline parallelism into a work-stealing scheduler, allowing pipeline and fork-join parallelism to be arbitrarily nested. the PIPER algorithm automatically throttles the parallelism, precluding "runaway" pipelines. Given a pipeline computation with T1 work and T 00 span (critical-path length), PIPER executes the computation on P processors in Tp ≤ T1 /P+O(T∞ + Ig P) expected time. PIPER also limits stack space, ensuring that it does not grow unboundedly with running time. We have incorporated on-the-fly pipeline parallelism into a Cilkbased work-stealing runtime system. Our prototype Cilk-P implementation exploits optimizations such as lazy enabling and dependency folding. We have ported the three PARSEC benchmarks that exhibit pipeline parallelism to run on Cilk-P. One of these, x264, cannot readily be executed by systems that suppor

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Locality-Aware Task Management for Unstructured parallelism: A Quantitative Limit Study

Locality-Aware Task Management for Unstructured Parallelism:...

引用

25th acm symposium on parallelism in algorithms and architectures, SPAA 2013

作者： Yoo, Richard M. Hughes, Christopher J. Kim, Changkyu Chen, Yen-Kuang Kozyrakis, Christos Parallel Computing Laboratory Intel Labs Santa Clara CA 95054 United States Pervasive Parallelism Laboratory Stanford University Stanford CA 94305 United States

ISBN: (纸本)9781450315722

As we increase the number of cores on a processor die, the onchip cache hierarchies that support these cores are getting larger, deeper, and more complex. As a result, non-uniform memory access effects are now prevalent even on a single chip. To reduce execution time and energy consumption, data access locality should be exploited. this is especially important for task-based programming systems, where a scheduler decides when and where on the chip the code segments, i.e., tasks, should execute. Capturing locality for structured task parallelism has been done effectively, but the more difficult case, unstructured parallelism, remains largely unsolved-little quantitative analysis exists to demonstrate the potential of locality-aware scheduling, and to guide future scheduler implementations in the most fruitful direction. this paper quantifies the potential of locality-aware scheduling for unstructured parallelism on three different many-core processors. Our simulation results of 32-core systems show that localityaware scheduling can bring up to 2.39x speedup over a randomized schedule, and 2.05x speedup over a state-of-the-art baseline scheduling scheme. At the same time, a locality-aware schedule reduces average energy consumption by 55% and 47%, relative to the random and the baseline schedule, respectively. In addition, our 1024-core simulation results project that these benefits will only increase: Compared to 32-core executions, we see up to 1.83x additional locality benefits. To capture such potentials in a practical setting, we also perform a detailed scheduler design space exploration to quantify the impact of different scheduling decisions. We also highlight the importance of locality-aware stealing, and demonstrate that a stealing scheme can exploit significant locality while performing load balancing. Over randomized stealing, our proposed scheme shows up to 2.0x speedup for stolen tasks. Copyright is held by the owner/author(s). Publication rights licensed

关键词： Scheduling

来源：评论

学校读者我要写书评

暂无评论

Cross-Architectural study of custom reconfigurable devices using crowdsourcing

Cross-Architectural study of custom reconfigurable devices u...

引用

2013 IEEE 37th annual Computer Software and Applications Conference, COMPSAC 2013

作者： Sistla, Anilkumar Parde, Natalie Patel, Krunalkumar Mehta, Gayatri University of North Texas Denton TX 76207 United States

ISBN: (纸本)9780769549798

Coarse grained reconfigurable architectures (CGRAs) are promising due to the ability to highly customize such architectures to an application domain. However, good tools and good algorithms to map benchmarks onto these architectures are needed to support design space exploration for CGRAs. In particular, the mapping problem has been difficult to solve in a satisfying and general way. In this paper, we present an architectural design flow using crowd sourcing to provide mappings of benchmarks onto new architectures. We show that the crowd can provide high quality, reliable mappings, outperforming our custom Simulated Annealing algorithm in 37 of 42 trials. We further show that the crowd can provide other types of feedback that are difficult to obtain from an automatic mapping algorithm. Our proof of concept cross-Architectural study concludes that a mesh architecture with 8Way connectivity outperforms the other interconnection options tested. A stripe architecture with dedicated vertical routes (StripeDR) performs competitively as well. © 2013 IEEE.

关键词： Reconfigurable architectures

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共53页 << < 5 6 7 8 9 10 11 12 13 14 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：