检索结果-内蒙古大学图书馆

12th acm symposium on parallel algorithms and architectures

作者： Elsässer, R Monien, B Preis, R Univ Gesamthsch Paderborn Dept Math & Comp Sci D-33102 Paderborn Germany

Several different diffusion schemes have previously been developed for load balancing on homogeneous processor networks. We generalize existing schemes, in order to deal with heterogeneous networks. Generalized schemes may operate efficiently on networks where each processor can have arbitrary computing power, i.e., the load will be balanced proportionally to these powers. the balancing flow that is calculated by schemes for homogeneous networks is minimal with regard to the l(2)-norm and we prove this to hold true for generalized schemes, too. We demonstrate the usability of generalized schemes by a number of experiments on several heterogeneous networks.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Online scheduling of parallel programs on heterogeneous systems with applications to Cilk

引用

thEORY OF COMPUTING SYSTEMS 2002年第3期35卷 289-304页

作者： Bender, MA Rabin, MO SUNY Stony Brook Dept Comp Sci Stony Brook NY 11794 USA Harvard Univ Div Engn & Appl Sci Cambridge MA 02138 USA

We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of different speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be online. this problem has been considered previously in the fields of asynchronous parallel computing and scheduling theory. Our model is a bridge between the assumptions in these fields. We provide a new more accurate analysis of an old scheduling algorithm called the maximum utilization scheduler. Based on this analysis, we generalize this scheduling policy and define the high utilization scheduler. We next focus on the Cilk platform and introduce a new algorithm for scheduling Cilk multithreaded parallel programs on heterogeneous processors. this scheduler is inspired by the high utilization scheduler and is modified to fit in a Cilk context. A crucial aspect of our algorithm is that it keeps the original spirit of the Cilk scheduler. In fact, when our new algorithm runs on homogeneous processors, it exactly mimics the dynamics of the original Cilk scheduler.

关键词： COMPUTER systems parallel programming (Computer science)

来源：评论

学校读者我要写书评

暂无评论

Online scheduling of parallel programs on heterogeneous systems with applications to Cilk

Online scheduling of parallel programs on heterogeneous syst...

引用

12th acm symposium on parallel algorithms and architectures

作者： Bender, MA Rabin, MO SUNY Stony Brook Dept Comp Sci Stony Brook NY 11794 USA Harvard Univ Div Engn & Appl Sci Cambridge MA 02138 USA

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

the data locality of work stealing

The data locality of work stealing

引用

12th acm symposium on parallel algorithms and architectures

作者： Acar, UA Blelloch, GE Blumofe, RD Carnegie Mellon Univ Sch Comp Sci Pittsburgh PA 15213 USA Univ Texas Dept Comp Sci Austin TX 78712 USA

this paper studies the data locality of the work-stealing scheduling algorithm on hardware-controlled shared-memory machines, where movement of data to and from the cache is solely controlled by the hardware. We present lower and upper bounds on the number of cache misses when using work stealing, and introduce a locality-guided work-stealing algorithm and its experimental validation. As a lower bound, we show that a work-stealing application that exhibits good data locality on a uniprocessor may exhibit poor data locality on a multiprocessor. In particular, we show a family of multithreaded computations G(n) whose members perform theta(n) operations (work) and incur a constant number of cache misses on a uniprocessor, while even on two processors the total number of cache misses soars to Ohm(n). On the other hand, we show a tight upper bound on the number of cache misses that nested-parallel computations, a large, important class of computations, incur due to multiprocessing. In particular, for nested-parallel computations, we show that on P processors a multiprocessor execution incurs an expected O(C [m/s] PTinfinity) more misses than the uniprocessor execution. Here m is the execution time of an instruction incurring a cache miss, s is the steal time, C is the size of cache, and T-infinity is the number of nodes on the longest chain of dependencies. Based on this we give strong execution time bounds for nested-parallel computations using work stealing. For the second part of our results, we present a locality-guided work-stealing algorithm that improves the data locality of multithreaded computations by allowing a thread to have an affinity for a processor. Our initial experiments on iterative dataparallel applications show that the algorithm matches the performance of static-partitioning under traditional work loads but improves the performance up to 50% over static partitioning under multiprogrammed work loads. Furthermore, locality guided guided work stealing

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

(2.954+Ε)n oblivious routing algorithm on 2D meshes

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

Annual acm symposium on parallel algorithms and architectures 2000年 186-195页

作者： Iwama, Kazuo Miyano, Eiji Kyoto Univ Kyoto Japan

A deterministic, oblivious, permutation routing algorithm on the 2D meshes of constant queue size is presented. An optimal, linear-time bound for oblivious permutation routing could be obtained for these meshes. A new algorithm based on bit-reversal permutation for oblivious routing is proposed. A (2.954+Ε)n oblivious routing is proposed where Ε is any positive constant and the queue-size is a function of Ε. the running time of the algorithm is less than 1.5 times as large as the network diameter of the mesh. the n×n mesh is considered and model on these meshes are presented. the problems in controlling the pipeline of packet flows is discussed.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

No-busy-wait balanced tree parallel algorithmic paradigm

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

Annual acm symposium on parallel algorithms and architectures 2000年 147-155页

作者： Vishkin, Uzi Univ of Maryland United States

Suppose that a parallel algorithm can include any number of parallel threads. Each thread can proceed without ever having to busy wait to another thread. A thread can proceed till its termination, but no new threads can be formed. What kind of problems can such restrictive algorithms solve and still be competitive in the total number of operations they perform with the fastest serial algorithm for the same problem? Intrigued by this informal question, we considered one of the most elementary parallel algorithmic paradigms, that of balanced binary trees. the main contribution of this paper is a new balanced (not necessarily binary) tree no-busy-wait paradigm for parallel algorithms;applications of the basic paradigm to two problems are presented: building heaps, and executing parallel tree contraction (assuming a preparatory stage);the latter is known to be applicable to evaluating a family of general arithmetic expressions. For putting things in context, we also discuss our `PRAM-on-chip' vision (actually a small update to it), presented at SPAA98.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Interprocessor communication with memory constraints

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

Annual acm symposium on parallel algorithms and architectures 2000年 39-45页

作者： Pinar, Ali Hendrickson, Bruce Univ of Illinois Urbana IL United States

Many parallel applications require periodic redistribution of workloads and associated data. In a distributed memory computer, this redistribution can be difficult if limited memory is available for receiving messages. We propose a model for optimizing the exchange of messages under such circumstances which we call the minimum phase remapping problem. We first show that the problem is NP-Complete, and then analyze several methodologies for addressing it. First, we show how the problem can be phrased as an instance of multi-commodity flow. Next, we study a continuous approximation to the problem. We show that this continuous approximation has a solution which requires at most two more phases than the optimal discrete solution, but the question of how to consistently obtain a good discrete solution from the continuous problem remains open. Finally, we devise a simple and practical approximation algorithm for the problem with a bound of 1.5 times the optimal number of phases.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Algorithmic foundations for a parallel vector access memory system

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

Annual acm symposium on parallel algorithms and architectures 2000年 156-165页

作者： Mathew, Binu K. McKee, Sally A. Carter, John B. Davis, Al Univ of Utah Salt Lake City UT United States

this paper presents mathematical foundations for the design of a memory controller subcomponent that helps to bridge the processor/memory performance gap for applications with strided access patterns. the parallel Vector Access (PVA) unit exploits the regularity of vectors or streams to access them efficiently in parallel on a multi-bank SDRAM memory system. the PVA unit performs scatter/gather operations so that only the elements accessed by the application are transmitted across the system bus. Vector operations are broadcast in parallel to all memory banks, each of which implements an efficient algorithm to determine which vector elements it holds. Earlier performance evaluations have demonstrated that our PVA implementation loads elements up to 32.8 times faster than a conventional memory system and 3.3 times faster than a pipelined vector unit, without hurting the performance of normal cache-line fills. Here we present the underlying PVA algorithms for both word interleaved and cache-line inter-leaved memory systems.

关键词： Pipeline processing systems

来源：评论

学校读者我要写书评

暂无评论

Scheduling Cilk multithreaded parallel programs on processors of different speeds

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

Annual acm symposium on parallel algorithms and architectures 2000年 13-21页

作者： Bender, Michael A. Rabin, Michael O. State Univ of New York at Stony Brook Stony Brook NY United States

We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of different speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be online. this problem has been considered previously in the fields of asynchronous parallel computing and scheduling theory. Our model is a bridge between the assumptions in these fields. We provide a new more accurate analysis of an old scheduling algorithm called the maximum utilization scheduler. Based on this analysis, we generalize this scheduling policy and define the high utilization scheduler. We next focus on the Cilk platform and introduce a new algorithm for scheduling Cilk multithreaded parallel programs on heterogeneous processors. this scheduler is inspired by the high utilization scheduler and is modified to fit in a Cilk context.' A crucial aspect of our algorithm is that it keeps the original spirit of the Cilk scheduler. In fact, when our new algorithm runs on homogeneous processors, it exactly mimics the dynamics of the original Cilk scheduler.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Optimal schedules for data-parallel cycle-stealing in networks of workstations

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

Annual acm symposium on parallel algorithms and architectures 2000年 22-29页

作者： Rosenberg, Arnold L. Univ of Massachusetts Amherst United States

We refine the model underlying our prior work on scheduling cycle-stealing opportunities in NOWs, obtaining a model wherein the scheduling guidelines of [16] produce optimal schedules for every such opportunity. Although computing optimal schedules usually requires the use of general (often inefficient) function-optimizing methods, we show how to compute optimal schedules efficiently for the broad class of opportunities whose durations come from a concave probability distribution. Even when no such efficient computation of an optimal schedule is available, our refined model always suggests a natural notion of approximately optimal schedule, which may be efficiently computable. We illustrate such efficient approximability via the important class of cycle-stealing opportunities whose durations come from a heavy-tailed distribution. Such opportunities do not admit any optimal schedule - nor even a natural notion of approximately optimal schedule - within the model of [5, 16]. Within our refined model, though, we derive computationally simple schedules for heavy-tailed opportunities, which can be `tuned' to have expected work-output that is arbitrarily close to optimal.

关键词： Computer workstations

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：