检索结果-内蒙古大学图书馆

proceedings of the eleventh annual acm symposium on parallel algorithms and architectures

作者： Siddhartha Chatterjee Alvin R. Lebeck Praveen K. Patnala Mithuna Thottethodi Department of Computer Science The University of North Carolina Chapel Hill NC Department of Computer Science Duke University Durham NC

来源：评论

学校读者我要写书评

暂无评论

Allocating weighted jobs in parallel 97

Allocating weighted jobs in parallel

引用

proceedings of the 1997 9th annual acm symposium on parallel algorithms and architectures, SPAA

作者： Berenbrink, Petra auf der Heide, Friedhelm Meyer Schroeder, Klaus Univ Paderborn Paderborn Germany

ISBN: (纸本)9780897918909

It is well known that after placing m≥n balls independently and uniformly at random (i.u.r.) into n bins, the fullest bin contains Θ(log n/log log n+m/n) balls, with high probability. It is also known (see [Ste96]) that a maximum load of O (m/n) can be obtained for all m≥n if a ball is allocated in one (suitably chosen) of two (i.u.r.) bins. Stemann ([Ste96]) shows that r communication rounds suffice to guarantee a maximum load of max {r√log n, O (m/n)}, with high probability. In particular, O(log log n) communication rounds suffice to guarantee optimal load O(m/n) for m≥n, with high probability. Adler et al. have shown in [acmR95] that Stemanns protocol is optimal for constant r. In this paper we extend the above results in two directions: We generalize the lower bound to arbitrary r≤log log n. This implies that the result of Stemanns protocol is optimal for all r. Our main result is a generalization of Stemanns upper bound to weighted jobs: Let WA (WM) denote the average (maximum) weight of the balls. Further let Δ = WA/WM. Note that the maximum load is at least Ω(m/n·WA+WM). We present a protocol that achieves maximum load of γ·(m/n·WA+WM) using O (log log n/log(γ·((m/n)·Δ+1))) rounds of communication. For uniform weights this matches the results of Stemann. In particular, for log log n rounds we achieve optimal load of O (m/n·WA+WM). Using this lower bound it is also shown that our algorithm is optimal in the case of weighted balls for various degrees of uniformity. All the balls into bins games model Load Balancing problems: The balls are jobs, the bins are resources, the task is to allocate the jobs to the resources so that they are evenly distributed. From this interpretation, considering weighted balls is very important because the weights may e.g. model the runtime of jobs. Applications of such Load Balancing problems occur e.g. for client-server networks and for Multimedia-Servers using disk arrays.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Approximation algorithms for data placement on parallel disks

Approximation algorithms for data placement on parallel disk...

引用

11th annual acm/SIAM symposium on Discrete algorithms

作者： Golubchik, L Khanna, S Khuller, S Thurimella, R Zhu, A Univ Maryland Dept Comp Sci College Pk MD 20742 USA

ISBN: (纸本)0898714532

We study an optimization problem that arises in the context of data placement in multimedia storage systems. We are given a collection of M multimedia data objects that need to be assigned to a storage system consisting of N disks d(1),d(2)...,d(N). We are also given sets U-1,U-2,...,U-M such that U-i is the set of clients requesting the ith data object. Each disk d(j) is characterized by two parameters, namely, its storage capacity C-j which indicates the maximum number of data objects that may be assigned to it, and a load capacity L-j which indicates the maximum number of clients that it can serve. The goal is to find a placement of data objects on disks and an assignment of clients to disks so as to maximize the total number of clients served, subject to the capacity constraints of the storage system. We study this data placement problem for two natural classes of storage systems, namely, homogeneous and uniform ratio. Our first main result is a tight upper and lower bound on the number of items that can always be packed for any input instance to homogeneous as well as uniform ratio storage systems. We show that an algorithm given in [11] for data placement, achieves this bound. Our second main result is a polynomial time approximation scheme for the data placement problem in homogeneous and uniform ratio storage systems, answering an open question of [11]. Finally, we also study the problem from an empirical perspective.

关键词： Multimedia systems

来源：评论

学校读者我要写书评

暂无评论

Dynamic parallel tree contraction 94

Dynamic parallel tree contraction

引用

proceedings of the 6th annual acm symposium on parallel algorithms and architectures

作者： Reif, John H. Tate, Stephen R. Carnegie Mellon Univ. Pittsburgh PA Univ. of North Texas Denton TX

ISBN: (纸本)9780897916714

parallel tree contraction has been found to be a useful and quite powerful tool for the design of a wide class of efficient graph algorithms. We propose a corresponding technique for the parallel solution of incremental problems. As our computational model, we assume a variant of the CRCW PRAM where we can dynamically activate processors by a forking *** consider a dynamic binary tree T of ≤ n nodes and unbounded depth. We describe a procedure, which we call the dynamic parallel tree contraction algorithm, which incrementally processes various parallel modification requests and queries:(1)parallel requests to add or delete leaves of T, or modify labels of internal nodes or leaves of T, and also(2) parallel tree contraction queries which require recomputingvalues at specified nodes. Each modification or query is with respect to a set of nodes U in *** dynamic parallel tree contraction algorithm is a randomized algorithm that takes O(log(|U|log n)) expected parallel time using O(|U|log n/log(|U|log n) processors. We give a large number of applications (with the same bounds), including:(a) maintaining the usual tree properties (such as number of ancestors, preorder, etc.),(b) Eulerian tour,(c) expression evaluation,(d) least common ancestor, and(e) canonical forms of ***, there where no known parallel algorithms for incrementally maintaining and solving such problems in parallel time less than Θ(log n).In deriving our incremental algorithms, we solve a key subproblem, namely a processor activation problem, within the same asymptotic bounds, which may be useful in the design of other parallel incremental algorithms. This algorithm uses an interesting persistent parallel data structures involving a non-trivial *** a subsequent paper, we apply our dynamic parallel tree contraction technique to various incremental graph problems: maintaining various properties, (such as coloring, minimum covering set, maximum matching, etc.). of parall

关键词： Trees (mathematics)

来源：评论

学校读者我要写书评

暂无评论

Closer look at coscheduling approaches for a network of workstations

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

annual acm symposium on parallel algorithms and architectures 1999年 96-105页

作者： Nagar, Shailabh Banerjee, Ajit Sivasubramaniam, Anand Das, Chita R. Pennsylvania State Univ University Park United States

Efficient scheduling of processes on processors of a Network of Workstations (NOW) is essential for good system performance. The design of such schedulers is a complex interaction between several system and workload parameters. Two operations, waiting for a message and arrival of a message, can be used to take remedial actions that can guide the behavior of the system towards coscheduling using local information. An intensive implementation and evaluation exercise in studying these system are presented.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Scalable Power Control for Many-Core architectures Running Multi-threaded Applications 11

Scalable Power Control for Many-Core Architectures Running M...

引用

38th annual International symposium on Computer Architecture

作者： Ma, Kai Li, Xue Chen, Ming Wang, Xiaorui Univ Tennessee Dept Elect Engn & Comp Sci Knoxville TN 37996 USA

ISBN: (纸本)9781450304726

Optimizing the performance of a multi-core microprocessor within a power budget has recently received a lot of attention. However, most existing solutions are centralized and cannot scale well with the rapidly increasing level of core integration. While a few recent studies propose power control algorithms for many-core architectures, those solutions assume that the workload of every core is independent and therefore cannot effectively allocate power based on thread criticality to accelerate multi-threaded parallel applications, which are expected to be the primary workloads of many-core architectures. This paper presents a scalable power control solution for many-core microprocessors that is specifically designed to handle realistic workloads, i.e., a mixed group of single-threaded and multi-threaded applications. Our solution features a three-layer design. First, we adopt control theory to precisely control the power of the entire chip to its chip-level budget by adjusting the aggregated frequency of all the cores on the chip. Second, we dynamically group cores running the same applications and then partition the chip-level aggregated frequency quota among different groups for optimized overall microprocessor performance. Finally, we partition the group-level frequency quota among the cores in each group based on the measured thread criticality for shorter application completion time. As a result, our solution can optimize the microprocessor performance while precisely limiting the chip-level power consumption below the desired budget. Empirical results on a 12-core hardware testbed show that our control solution can provide precise power control, as well as 17% and 11% better application performance than two state-of-the-art solutions, on average, for mixed PARSEC and SPEC benchmarks. Furthermore, our extensive simulation results for 32, 64, and 128 cores, as well as overhead analysis for up to 4,096 cores, demonstrate that our solution is highly scalable to many

关键词： Chip multiprocessor many-core architecture power control power capping thread criticality scalability control theory

来源：评论

学校读者我要写书评

暂无评论

BOS is boss: A case for bulk-synchronous object systems

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

annual acm symposium on parallel algorithms and architectures 1999年 115-125页

作者： Goudreau, Mark W. Lang, Kevin Narlikar, Girija Rao, Satish B. NEC USA Inc Princeton United States

A key issue for parallel systems is the development of useful programming abstractions that can coexist with good performance. We describe a communication library that supports an object-based abstraction with a bulk-synchronous communication style;this is the first time such a library has been proposed and implemented. By restricting the library to the exclusive use of barrier synchronization, we are able to design a simple and easy-to-use object system. By exploiting established techniques based on the bulk-synchronous parallel (BSP) model, we are able to design algorithms and library implementations that work well across platforms.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Scheduling threads for low space requirement and good locality

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

annual acm symposium on parallel algorithms and architectures 1999年 83-95页

作者： Narlikar, Girija J. CMU Sch of Computer Science

A simple, asynchronous, space-efficient scheduling algorithm for shared memory machines was developed. The algorithm combined the low scheduling overheads and good locality of work stealing with the low space requirements of depth-first schedulers. The algorithm was applied in the context of a native, user-level implementation of Posix standard threads or Pthreads. Its performance was evaluated using a set of C-based benchmarks and was compared with two other schedulers. The new algorithm covered a range of scheduling granularities and space requirements, and allowed the user to trade the space requirement of a program with the scheduling granularity.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Post-mortem black-box correctness tests for basic parallel data structures 99

Post-mortem black-box correctness tests for basic parallel d...

引用

proceedings of the eleventh annual acm symposium on parallel algorithms and architectures

作者： Phillip B. Gibbons John L. Bruno Steven Phillips Bell Laboratories Lucent Technologies Room 2A-341 600 Mountain Avenue Murray Hill NJ Dept. of Computer Science University of California Santa Barbara CA AT&T Labs - Research Florham Park NJ

来源：评论

学校读者我要写书评

暂无评论

Efficient approximation algorithms for scheduling malleable tasks

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

annual acm symposium on parallel algorithms and architectures 1999年 23-32页

作者： Mounie, Gregory Rapine, Christophe Trystram, Denis IMAG Grenoble France

A malleable task is a computational unit which may be executed on any arbitrary number of processors, its execution time depending on the amount of resources allotted to it. According to the standard behavior of parallel applications, we assume that the malleable tasks are monotonic, i.e. that the execution tune is decreasing with the number of processors while the computational work increases. This paper presents a new approach for scheduling a set of independent malleable tasks which leads to a worst case guarantee of √3 for the minimization of the parallel execution time, or makespan. It improves all other existing practical results including the two-phases method introduced by Turek et al. The main idea is to transfer the difficulty of a two phases method from the scheduling part to the allotment selection. We show how to formulate this last problem as a knapsack optimization problem. Then, the scheduling problem is solved by a dual-approximation which leads to a simple structure of two consecutive shelves.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：