检索结果-内蒙古大学图书馆

proceedings of the twelfth annual acm symposium on parallel algorithms and architectures

作者： Parimala Thulasiraman Kevin B. Theobald Ashfaq A. Khokhar Guang R. Gao Department of Electrical and Computer Engineering 140 Evans Hall University of Delaware Newark DE

ISBN: (纸本)9781581131857

In this paper we present fine-grained multithreaded algorithms and implementations for the Fast Fourier Transform (FFT) problem. The FFT problem has been formulated using two distinct approaches based on the dataflow concepts. The first approach, referred to as the receiver-initiated algorithm, realizes the FFAT iterations as a parent-child relationship while fully exploiting the underlying parallelism. The second approach, referred to as the sender-initiated algorithm, follows a data-flow model based on the producer-consumer style of programming and can be adopted to different architectural parameters for achieving high performance. The implementations of the proposed algorithms have been carried out on the EARTH (Efficient Architecture for Running THreads) platform. For both the algorithms, we analyze the ratio of remote vs local threads and study its impact on the experimental results. Our implementation results show that for certain block sizes on fixed problem size and machine size, the receiver-initiated approach performs better than the sender-initiated approach. For large number of processors, both the algorithms perform well, yielding execution times of only 10 msec for an input of 16 K data points on a 64 processor machine, assuming each processor running at 140 MHz clock speed.

关键词： parallel algorithms fine-grained non-preemptive multithreading dataflow architecture

来源：评论

学校读者我要写书评

暂无评论

Scheduling Cilk multithreaded parallel programs on processors of different speeds 00

Scheduling Cilk multithreaded parallel programs on processor...

引用

proceedings of the twelfth annual acm symposium on parallel algorithms and architectures

作者： Michael A. Bender Michael O. Rabin Department of Computer Science State University of New York at Stony Brook Stony Brook NY Division of Engineering and Applied Sciences Harvard University Cambridge MA

ISBN: (纸本)9781581131857

We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of different speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be online. This problem has been considered previously in the fields of asynchronous parallel computing and scheduling theory. Our model is a bridge between the assumptions in these fields. We provide a new more accurate analysis of of an old scheduling algorithm called the maximum utilization scheduler. Based on this analysis, we generalize this scheduling policy and define the high utilization scheduler. We next focus on the Cilck platform and introduce a new algorithm for scheduling Cilk multithreaded parallel programs on heterogeneous processors. This scheduler is inspired by the high utilization scheduler and is modified to fit in a Cilk context. A crucial aspect of our algorithm is that it keeps the original spirit of the Cilk scheduler. In fact, when our new algorithm runs on homogeneous processors, it exactly mimics the dynamics of the original Cilk scheduler.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Optimal schedules for data-parallel cycle-stealing in networks of workstations (extended abstract) 00

Optimal schedules for data-parallel cycle-stealing in networ...

引用

proceedings of the twelfth annual acm symposium on parallel algorithms and architectures

作者： Arnold L. Rosenberg Department of Computer Science University of Massachusetts Amherst MA

ISBN: (纸本)9781581131857

We refine the model underlying our prior work on scheduling cycle-stealing opportunities in NOWs [5, 16], obtaining a model wherein the scheduling guidelines of [16] produce optimal schedules for every such opportunity. Although computing optimal schedules usually requires the use of general (often inefficient) function-optimizing methods, we show how to compute optimal schedules efficiently for the broad class of opportunities whose durations come from a concave probability distribution. Even when no such efficient computation of an optimal schedule is available, our refined model always suggests a natural notion of approximately optimal schedule, which may be efficiently computable. We illustrate such efficient approximability via the important class of cycle-stealing opportunities whose durations come from a heavy-tailed distribution. Such opportunities do not admit any optimal schedule—nor even a natural notion of approximately optimal schedule—within the model of [5, 16]. Within our refined model, though, we derive computationally simple schedules for heavy-tailed opportunities, which can be “tuned” to have expected work-output that is arbitrarily close to optimal.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Infinite parallel job allocation (extended abstract) 00

Infinite parallel job allocation (extended abstract)

引用

proceedings of the twelfth annual acm symposium on parallel algorithms and architectures

作者： Petra Berenbrink Artur Czumaj Tom Friedetzky Nikita D. Vvedenskaya Dept. of Mathematics & Computer Science Paderborn University D-33095 Paderborn Germany Department of Computer and Information Science New Jersey Institute of Technology University Heights Newark NJ Institut für Informatik Technische Universität München D-80290 München Germany Institute of Information Transmission Problems Russian Academy of Science Moscow 101447 Russia

ISBN: (纸本)9781581131857

In recent years, the task of allocating jobs to servers has been studied with the “balls and bins” abstraction. Results in this area exploit the large decrease in maximum load that can be achieved by allowing each job (ball) a little freedom in choosing its destination server (bin).In this paper we examine an infinite and parallel allocation process (see [ABS98]) which is related to the “balls and bins” abstraction. The simple process can be used to model many problems arising in applications like load balancing, data accesses for parallel data servers, hashing, and PRAM ***, the parallel allocation process behaves in a highly non-uniform manner which makes its analysis challenging. Even the typically simple question of for which arrival rates the process is stable, is highly non-trivial. In order to cope with this non-uniform behavior we introduce a new sequential process and show (via simulations) that the sequential process models the behavior of the parallel one very accurately. We develop a system of ordinary differential equations in order to describe the behavior of our sequential process and present a thorough analysis of the performance this process. For example, we show that the queue length distribution decreases double-exponentially. Finally, we present simulation results indicating that the solutions to the differential equations very well predict the queue length distribution of our sequential process and the largest injection rate for which it is ***, we can conclude that in all the performance characteristics we have measured experimentally, the parallel and the sequential process are closely related. This indicates that the obtained solution of the differential equations and the results presented above are applicable to the parallel process, too.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Efficient parallel solutions of linear algebraic circuits

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

annual acm symposium on parallel algorithms and architectures 1999年 212-221页

作者： Ben-Asher, Yosi Haber, Gady Haifa Univ Haifa Israel

The problem of obtaining efficient solutions for parallel evaluation of linear algebraic circuits (LAC) is considered. The parallelism is obtained through matrix multiplications. A CREW PRAM algorithm is used, in which, during execution, propagates computed values into future matrix products, and uses a special scheduling of the matrix multiplications to reduce constant factors of the execution times.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Post-mortem black-box correctness tests for basic parallel data structures

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

annual acm symposium on parallel algorithms and architectures 1999年 44-53页

作者： Gibbons, Phillip B. Bruno, John L. Phillips, Steven Lucent Technologies Murray Hill NJ United States

The black box procedures for testing whether a parallel data structure behaved correctly are considered. The first systematic study of algorithms and hardness results for such testing procedures is presented, focusing on queues, priority queues, stacks, and counters. The importance of selecting test data, such that distinct values are inserted into the data structures is shown.

关键词： Data structures

来源：评论

学校读者我要写书评

暂无评论

Simple and efficient parallel disk mergesort

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

annual acm symposium on parallel algorithms and architectures 1999年 232-241页

作者： Barve, Rakesh D. Vitter, Jeffrey Scott Duke Univ Durham United States

An efficient implementation of simple randomized merging (SRM) algorithm was developed based upon novel data structures. The SRM's lookahead forecasting technique and forecast and flush technique were used for parallel prefetching and buffer management, respectively. The techniques resulted in significant improvement in the way SRM carries out the parallel, independent disk accesses necessary to efficiently read blocks of input runs during external merging.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Closer look at coscheduling approaches for a network of workstations

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

annual acm symposium on parallel algorithms and architectures 1999年 96-105页

作者： Nagar, Shailabh Banerjee, Ajit Sivasubramaniam, Anand Das, Chita R. Pennsylvania State Univ University Park United States

Efficient scheduling of processes on processors of a Network of Workstations (NOW) is essential for good system performance. The design of such schedulers is a complex interaction between several system and workload parameters. Two operations, waiting for a message and arrival of a message, can be used to take remedial actions that can guide the behavior of the system towards coscheduling using local information. An intensive implementation and evaluation exercise in studying these system are presented.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

BOS is boss: A case for bulk-synchronous object systems

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

annual acm symposium on parallel algorithms and architectures 1999年 115-125页

作者： Goudreau, Mark W. Lang, Kevin Narlikar, Girija Rao, Satish B. NEC USA Inc Princeton United States

A key issue for parallel systems is the development of useful programming abstractions that can coexist with good performance. We describe a communication library that supports an object-based abstraction with a bulk-synchronous communication style;this is the first time such a library has been proposed and implemented. By restricting the library to the exclusive use of barrier synchronization, we are able to design a simple and easy-to-use object system. By exploiting established techniques based on the bulk-synchronous parallel (BSP) model, we are able to design algorithms and library implementations that work well across platforms.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Scheduling threads for low space requirement and good locality

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

annual acm symposium on parallel algorithms and architectures 1999年 83-95页

作者： Narlikar, Girija J. CMU Sch of Computer Science

A simple, asynchronous, space-efficient scheduling algorithm for shared memory machines was developed. The algorithm combined the low scheduling overheads and good locality of work stealing with the low space requirements of depth-first schedulers. The algorithm was applied in the context of a native, user-level implementation of Posix standard threads or Pthreads. Its performance was evaluated using a set of C-based benchmarks and was compared with two other schedulers. The new algorithm covered a range of scheduling granularities and space requirements, and allowed the user to trade the space requirement of a program with the scheduling granularity.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：