检索结果-内蒙古大学图书馆

Special Section on PAR-CAD: parallel CAD algorithms and CAD for parallel architectures/Systems

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 2012年第1期31卷 7-8页

作者： Marculescu, Diana Li, Peng Carnegie Mellon Univ Dept Elect & Comp Engn Pittsburgh PA 15213 USA Texas A&M Univ Dept Elect & Comp Engn College Stn TX 77843 USA

The five papers in this special section on PAR-CAD: parallel CAD algorithms and CAD for parallel architectures/systems.

关键词： Special issues and sections Design automation parallel architectures

来源：评论

学校读者我要写书评

暂无评论

Clustering Affine Subspaces: Hardness and algorithms 13

Clustering Affine Subspaces: Hardness and Algorithms

引用

symposium on Discrete algorithms

作者： Euiwoong Lee Leonard J. Schulman CMU Caltech.

ISBN: (纸本)9781627484855

We study a generalization of the famous κ-center problem where each object is an affine subspace of dimension Δ, and give either the first or significantly improved algorithms and hardness results for many combinations of parameters. This generalization from points (Δ = 0) is motivated by the analysis of incomplete data, a pervasive challenge in statistics: incomplete data objects in R~d can be modeled as affine subspaces. We give three algorithmic results for different values of k, under the assumption that all subspaces are axis-parallel, the main case of interest because of the correspondence to missing entries in data tables. 1) k = 1: Two polynomial time approximation schemes which runs in poly(A, 1/?)nd. 2) k = 2: O(Δ~(1/4))-approximation algorithm which runs in poly(n, d, Δ) 3) General k: Polynomial time approximation scheme which runs in 2~(O (Δk log k (1+1/?~2)))nd We also prove nearly matching hardness results; in both the general (not necessarily axis-parallel) case (for k ≥ 2) and in the axis-parallel case (for k ≥ 3), the running time of an approximation algorithm with any approximation ratio cannot be polynomial in even one of k and Δ, unless P = NP. Furthermore, assuming that the 3-SAT problem cannot be solved subexponentially, the dependence on both k and Δ must be exponential in the general case (in the axis-parallel case, only the dependence on k drops to 2~(Ω(k))). The simplicity of the first and the third algorithm suggests that they might be actually used in statistical applications. The second algorithm, which demonstrates a theoretical gap between the axis-parallel and general case for k = 2, displays a strong connection between geometric clustering and classical coloring problems on graphs and hypergraphs, via a new Helly-type theorem.

关键词： Hardness and algorithms famous κ-center axis-parallel case Subspace Potassium Affine In-law Approximation algorithms algorithms Soundness hardness Polynomial time approximation PS-K

来源：评论

学校读者我要写书评

暂无评论

Boosting Memory Performance of Many-Core FPGA Device through Dynamic Precedence Graph

Boosting Memory Performance of Many-Core FPGA Device through...

引用

annual IEEE symposium on Field-Programmable Custom Computing Machines (FCCM)

作者： Yu Bai Abigail Fuentes Michael Riera Mohammed Alawad Mingjie Lin Department of Electrical Engineering and Computer Science University of Central Florida Orlando FL USA

Emerging FPGA device, integrated with abundant RAM blocks and high-performance processor cores, offers an unprecedented opportunity to effectively implement single-chip distributed logic-memory (DLM) architectures [1]. Being “memory-centric”, the DLM architecture can significantly improve the overall performance and energy efficiency of many memory-intensive embedded applications, especially those that exhibit irregular array data access patterns at algorithmic level. However, implementing DLM architecture poses unique challenges to an FPGA designer in terms of 1) organizing and partitioning diverse on-chip memory resources, and 2) orchestrating effective data transmission between on-chip and off-chip memory. In this paper, we offer our solutions to both of these challenges. Specifically, 1) we propose a stochastic memory partitioning scheme based on the well-known simulated annealing algorithm. It obtains memory partitioning solutions that promote parallelized memory accesses by exploring large solution space; 2) we augment the proposed DLM architecture with a reconfigure hardware graph that can dynamically compute precedence relationship between memory partitions, thus effectively exploiting algorithmic level memory parallelism on a per-application basis. We evaluate the effectiveness of our approach (A3) against two other DLM architecture synthesizing methods: an algorithmic-centric reconfigurable computing architectures with a single monolithic memory (A1) and the heterogeneous distributed architectures synthesized according to [1] (A2). To make our comparison fair, in all three architectures, the data path remains the same while local memory architecture differs. For each of ten benchmark applications from SPEC2006 and MiBench [2], we break down the performance benefit of using A3 into two parts: the portion due to stochastic local memory partitioning and the portion due to the dynamic graph-based memory arbitration. All experiments have been conducted with a V

关键词： Random access memory Field programmable gate arrays Computer architecture Heuristic algorithms Hardware Partitioning algorithms parallel processing

来源：评论

学校读者我要写书评

暂无评论

Brief announcement: Speedups for parallel graph triconnectivity 12

Brief announcement: Speedups for parallel graph triconnectiv...

引用

24th acm symposium on parallelism in algorithms and architectures, SPAA'12

作者： Edwards, James A. Vishkin, Uzi University of Maryland College Park MD United States

ISBN: (纸本)9781450312134

We present a parallel solution to the problem of determining the triconnected components of an undirected graph. We obtain significant speedups over the only published optimal (linear-time) serial implementation of a triconnected components algorithm running on a modern CPU. This is accomplished on the PRAM-inspired XMT many-core architecture. To our knowledge, no other parallel implementation of a triconnected components algorithm has been published for any platform. Copyright is held by the author/owner(s).

关键词： Computer architecture

来源：评论

学校读者我要写书评

暂无评论

A parallel buffer tree 12

A parallel buffer tree

引用

24th acm symposium on parallelism in algorithms and architectures, SPAA'12

作者： Sitchinava, Nodari Zeh, Norbert Institite for Theoretical Informatics Karlsruhe Institute of Technology Germany Faculty of Computer Science Dalhousie University Canada

ISBN: (纸本)9781450312134

We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number of available processor cores compared to its sequential counterpart, thereby taking full advantage of multicore parallelism. The parallel buffer tree is a search tree data structure that supports the batched parallel processing of a sequence of N insertions, deletions, membership queries, and range queries in the optimal O(sortP (N) + K/PB) parallel I/O complexity, where K is the size of the output reported in the process and sortP (N) is the parallel I/O complexity of sorting N elements using P processors. Copyright 2012 acm.

关键词： Data structures

来源：评论

学校读者我要写书评

暂无评论

parallel and I/O efficient set covering algorithms 12

Parallel and I/O efficient set covering algorithms

引用

24th acm symposium on parallelism in algorithms and architectures, SPAA'12

作者： Blelloch, Guy E. Simhadri, Harsha Vardhan Tangwongsan, Kanat Carnegie Mellon University United States

ISBN: (纸本)9781450312134

This paper presents the design, analysis, and implementation of parallel and sequential I/O-efficient algorithms for set cover, tying together the line of work on parallel set cover and the line of work on efficient set cover algorithms for large, disk-resident instances. Our contributions are twofold: First, we design and analyze a parallel cache-oblivious set-cover algorithm that offers essentially the same approximation guarantees as the standard greedy algorithm, which has the optimal approximation. Our algorithm is the first efficient external-memory or cache-oblivious algorithm for when neither the sets nor the elements fit in memory, leading to I/O cost (cache complexity) equivalent to sorting in the Cache Oblivious or parallel Cache Oblivious models. The algorithm also implies low cache misses on parallel hierarchical memories (again, equivalent to sorting). Second, building on this theory, we engineer variants of the theoretical algorithm optimized for different hardware setups. We provide experimental evaluation showing substantial speedups over existing algorithms without compromising the solution's quality. Copyright 2012 acm.

关键词： Approximation algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel probabilistic tree embeddings, k-median, and buy-at-bulk network design 12

Parallel probabilistic tree embeddings, k-median, and buy-at...

引用

24th acm symposium on parallelism in algorithms and architectures, SPAA'12

作者： Blelloch, Guy E. Gupta, Anupam Tangwongsan, Kanat Carnegie Mellon University United States

ISBN: (纸本)9781450312134

This paper presents parallel algorithms for embedding an arbitrary n-point metric space into a distribution of dominating trees with O(log n) expected stretch. Such embedding has proved useful in the design of many approximation algorithms in the sequential setting. We give a parallel algorithm that runs in O(n2 log n) work and O(log2 n) depth - these bounds are independent of Δ = maxx,y d(x,y)/minx≠y d(x;y), the ratio of the largest to smallest distance. Moreover, when Δ is exponentially bounded (Δ ≤/2O(n)), our algorithm can be improved to O(n2) work and O(log2 n) depth. Using these results, we give an RNC O(log κ)-approximation algorithm for κ-median and an RNC O(log n)-approximation for buy-at-bulk network design. The κ-median algorithm is the first RNC algorithm with non-trivial guarantees for arbitrary values of κ, and the buy-at-bulk result is the first parallel algorithm for the problem. Copyright 2012 acm.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A (3/2 + Ε) approximation algorithm for scheduling moldable and non-moldable parallel tasks 12

A (3/2 + Ε) approximation algorithm for scheduling moldable...

引用

24th acm symposium on parallelism in algorithms and architectures, SPAA'12

作者： Jansen, Klaus Institut für Informatik Universität zu Kiel Olshausenstr. 40 D - 24098 Kiel Germany

ISBN: (纸本)9781450312134

In this paper we study a scheduling problem with moldable and non-moldable parallel tasks on m processors. A non-moldable parallel task is one that runs in parallel on a specific given number of processors. The goal is to find a non-preemptive schedule on the m processors which minimizes the makespan, or the latest task completion time. The previous best result is the list scheduling algorithm with an absolute approximation ratio of 2. On the other hand, there does not exist an approximation algorithm for scheduling non-moldable parallel tasks with ratio smaller than 1.5, unless P = NP. In this paper we show that a schedule with length (1.5+Ε)OPT can be computed for the scheduling problem in time O(n log n) + f(1/Ε). Furthermore we present an (1.5+Ε) approximation algorithm for scheduling moldable parallel tasks. Copyright 2012 acm.

关键词： Approximation algorithms

来源：评论

学校读者我要写书评

暂无评论

Brief announcement: Strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds 12

Brief announcement: Strong scaling of matrix multiplication ...

引用

24th acm symposium on parallelism in algorithms and architectures, SPAA'12

作者： Ballard, Grey Demmel, James Holtz, Olga Lipshitz, Benjamin Schwartz, Oded UC Berkeley United States TU Berlin Germany

ISBN: (纸本)9781450312134

A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P, including all communication costs. Distributed-memory parallel algorithms for matrix multiplication with perfect strong scaling have only recently been found. One is based on classical matrix multiplication (Solomonik and Demmel, 2011), and one is based on Strassen's fast matrix multiplication (Ballard, Demmel, Holtz, Lipshitz, and Schwartz, 2012). Both algorithms scale perfectly, but only up to some number of processors where the inter-processor communication no longer scales. We obtain a memory-independent communication cost lower bound on classical and Strassen-based distributed-memory matrix multiplication algorithms. These bounds imply that no classical or Strassen-based parallel matrix multiplication algorithm can strongly scale perfectly beyond the ranges already attained by the two parallel algorithms mentioned above. The memory-independent bounds and the strong scaling bounds generalize to other algorithms. Copyright is held by the author/owner(s).

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A lock-free B+tree 12

A lock-free B+tree

引用

24th acm symposium on parallelism in algorithms and architectures, SPAA'12

作者： Braginsky, Anastasia Petrank, Erez Dept. of Computer Science Technion Israel Institute of Technology Haifa 32000 Israel

ISBN: (纸本)9781450312134

Lock-free data structures provide a progress guarantee and are known for facilitating scalability, avoiding deadlocks and livelocks, and providing guaranteed system responsiveness. In this paper we present a design for a lock-free balanced tree, specifically, a B+tree. The B +tree data structure has an important practical applications, and is used in various storage-system products. As far as we know this is the first design of a lock-free, dynamic, and balanced tree, that employs standard compare-and-swap operations. Copyright 2012 acm.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：