检索结果-内蒙古大学图书馆

POPL'00 - The 27th acm SIGPLAN-SIGACT symposium on Principles og Programming Languages

作者： Esparza, Javier Podelski, Andreas Technische Universität München 80290 München Germany Max-Planck-Institut für Informatik 66123 Saarbrücken Germany

ISBN: (纸本)9781581131253

This paper is a contribution to the already existing series of work on the algorithmic principles of interprocedural analysis. We consider the generalization to the case of parallel programs. We give algorithms that compute the sets of backward resp. forward reachable configurations for parallel flow graph systems in linear time in the size of the graph viz. the program. These operations are important in dataflow analysis and in model checking. In our method, we first model configurations as terms (viz. trees) in the process algebra PA that can express call stack operations and parallelism. We then give a `declarative' Horn-clause specification of the sets of predecessors resp. successors. The `operational' computation of these sets is carried out using the Dowling-Gallier procedure for HornSat.

关键词： Data flow analysis

来源：评论

学校读者我要写书评

暂无评论

Interprocessor communication with memory constraints

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

annual acm symposium on parallel algorithms and architectures 2000年 39-45页

作者： Pinar, Ali Hendrickson, Bruce Univ of Illinois Urbana IL United States

Many parallel applications require periodic redistribution of workloads and associated data. In a distributed memory computer, this redistribution can be difficult if limited memory is available for receiving messages. We propose a model for optimizing the exchange of messages under such circumstances which we call the minimum phase remapping problem. We first show that the problem is NP-Complete, and then analyze several methodologies for addressing it. First, we show how the problem can be phrased as an instance of multi-commodity flow. Next, we study a continuous approximation to the problem. We show that this continuous approximation has a solution which requires at most two more phases than the optimal discrete solution, but the question of how to consistently obtain a good discrete solution from the continuous problem remains open. Finally, we devise a simple and practical approximation algorithm for the problem with a bound of 1.5 times the optimal number of phases.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Algorithmic foundations for a parallel vector access memory system

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

annual acm symposium on parallel algorithms and architectures 2000年 156-165页

作者： Mathew, Binu K. McKee, Sally A. Carter, John B. Davis, Al Univ of Utah Salt Lake City UT United States

This paper presents mathematical foundations for the design of a memory controller subcomponent that helps to bridge the processor/memory performance gap for applications with strided access patterns. The parallel Vector Access (PVA) unit exploits the regularity of vectors or streams to access them efficiently in parallel on a multi-bank SDRAM memory system. The PVA unit performs scatter/gather operations so that only the elements accessed by the application are transmitted across the system bus. Vector operations are broadcast in parallel to all memory banks, each of which implements an efficient algorithm to determine which vector elements it holds. Earlier performance evaluations have demonstrated that our PVA implementation loads elements up to 32.8 times faster than a conventional memory system and 3.3 times faster than a pipelined vector unit, without hurting the performance of normal cache-line fills. Here we present the underlying PVA algorithms for both word interleaved and cache-line inter-leaved memory systems.

关键词： Pipeline processing systems

来源：评论

学校读者我要写书评

暂无评论

Scheduling Cilk multithreaded parallel programs on processors of different speeds

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

annual acm symposium on parallel algorithms and architectures 2000年 13-21页

作者： Bender, Michael A. Rabin, Michael O. State Univ of New York at Stony Brook Stony Brook NY United States

We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of different speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be online. This problem has been considered previously in the fields of asynchronous parallel computing and scheduling theory. Our model is a bridge between the assumptions in these fields. We provide a new more accurate analysis of an old scheduling algorithm called the maximum utilization scheduler. Based on this analysis, we generalize this scheduling policy and define the high utilization scheduler. We next focus on the Cilk platform and introduce a new algorithm for scheduling Cilk multithreaded parallel programs on heterogeneous processors. This scheduler is inspired by the high utilization scheduler and is modified to fit in a Cilk context.' A crucial aspect of our algorithm is that it keeps the original spirit of the Cilk scheduler. In fact, when our new algorithm runs on homogeneous processors, it exactly mimics the dynamics of the original Cilk scheduler.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Scheduling to minimize average stretch without migration

Scheduling to minimize average stretch without migration

引用

11th annual acm/SIAM symposium on Discrete algorithms

作者： Becchetti, L Leonardi, S Muthukrishnan, S Graz Tech Univ Inst Math B Graz Austria

ISBN: (纸本)0898714532

We study the problem of scheduling parallel machines online, allowing preemptions while disallowing migration of jobs that have been scheduled on one machine to another. For a given job, we measure the quality of service provided by an algorithm by the stretch of the job, defined as the ratio of the amount of time the job spends in the system (also known as the response time) to its processing time. For a sequence of jobs, we measure the performance of an algorithm by the average stretch achieved over all the jobs. The scheduling goal is to minimize the average stretch. This problem is of relevance in many applications, e.g., wireless data servers and distributed server systems In wired networks. We prove an O(1) competitive ratio for this problem. The algorithm for which we prove this result is the one proposed in [2] that has (tight) logarithmic competitive ratio for minimizing the average response time. Thus the algorithm in [2] simultaneously performs well for average response time as well as average stretch. We prove the O(1) competitive ratio against an adversary who not only knows the entire input ahead of time, but is also allowed to migrate jobs. Thus our result shows that migration is not necessary to be competitive for minimizing average stretch;in contrast, we prove that preemption is essential, even if randomization is allowed. We also establish a constant-factor lower bound on the competitive ratio of any online algorithm that minimizes average stretch without migration.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Optimal schedules for data-parallel cycle-stealing in networks of workstations

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

annual acm symposium on parallel algorithms and architectures 2000年 22-29页

作者： Rosenberg, Arnold L. Univ of Massachusetts Amherst United States

We refine the model underlying our prior work on scheduling cycle-stealing opportunities in NOWs, obtaining a model wherein the scheduling guidelines of [16] produce optimal schedules for every such opportunity. Although computing optimal schedules usually requires the use of general (often inefficient) function-optimizing methods, we show how to compute optimal schedules efficiently for the broad class of opportunities whose durations come from a concave probability distribution. Even when no such efficient computation of an optimal schedule is available, our refined model always suggests a natural notion of approximately optimal schedule, which may be efficiently computable. We illustrate such efficient approximability via the important class of cycle-stealing opportunities whose durations come from a heavy-tailed distribution. Such opportunities do not admit any optimal schedule - nor even a natural notion of approximately optimal schedule - within the model of [5, 16]. Within our refined model, though, we derive computationally simple schedules for heavy-tailed opportunities, which can be `tuned' to have expected work-output that is arbitrarily close to optimal.

关键词： Computer workstations

来源：评论

学校读者我要写书评

暂无评论

Multithreaded algorithms for the Fast Fourier Transform

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

annual acm symposium on parallel algorithms and architectures 2000年 176-185页

作者： Thulasiraman, Parimala Theobald, Kevin B. Khokhar, Ashfaq A. Gao, Guang R. Univ of Delaware Newark DE United States

In this paper we present fine-grained multithreaded algorithms and implementations for the Fast Fourier Transform (FFT) problem. The FFT problem has been formulated using two distinct approaches based on the dataflow concepts. The first approach, referred to as the receiver-initiated algorithm, realizes the FFT iterations as a parent-child relationship while fully exploiting the underlying parallelism. The second approach, referred to as the sender-initiated algorithm, follows a data-flow model based on the producer-consumer style of programming and can be adopted to different architectural parameters for achieving high performance. The implementations of the proposed algorithms have been carried out on the EARTH (Efficient Architecture for Running THreads) platform. For both the algorithms, we analyze the ratio of remote vs local threads and study its impact on the experimental results. Our implementation results show that for certain block sizes on fixed problem size and machine size, the receiver-initiated approach performs better than the sender-initiated approach. For large number of processors, both the algorithms perform well, yielding execution times of only 10 msec for an input of 16 K data points on a 64 processor machine, assuming each processor running at 140 MHz clock speed.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Broadcast scheduling optimization for heterogeneous cluster systems

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

annual acm symposium on parallel algorithms and architectures 2000年 129-136页

作者： Liu, Pangfeng Sheng, Tzu-Hao Natl Chung Cheng Univ Chiayi Taiwan

Network of workstation (NOW) is a cost-effective alternative to massively parallel supercomputers. As commercially available off-the-shelf processors become cheaper and faster, it is now possible to build a PC or workstation cluster that provides high computing power within a limited budget. However, a cluster may consist of different types of processors and this heterogeneity within a cluster complicates the design of efficient collective communication protocols. This paper shows that a simple heuristic called fastest-node-first (FNF) is very effective in reducing broadcast time for heterogeneous cluster systems. Despite the fact that FNF heuristic fails to give the optimal broadcast time for a general heterogeneous network of workstation, we prove that FNF always gives the optimal broadcast time in several special cases of clusters. Based on these special case results, we show that FNF is an approximation algorithm that guarantees a competitive ratio of 2. From these theoretical results we also derive techniques to speed up the branch-and-bound search for the optimal broadcast schedule in HNOW.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

T3C: A temporally correct concurrency control algorithm for distributed databases 8

T3C: A temporally correct concurrency control algorithm for ...

引用

8th International symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems

作者： Boukerche, A Tuck, TW Univ N Texas Parallel Simulat & Distributed Syst PARADISE Res Dept Comp Sci Denton TX 76203 USA

ISBN: (纸本)076950728X

In some scenarios involving on-line transaction processing within a distributed database, it is desirable to synchronize transactions in a manner that guarantees conflict equivalence with a serial schedule ordered by original transaction start times while providing each transaction with an anomaly serializable isolation. Few theoretical concurrency control algorithms guarantee such a conflict equivalence, and we are unaware of any protocol that accomplishes this while supporting real-world issues such as out-of-order transaction messages, out-of-order operation executions, and out-of-order transaction committals without the burden of explicit readset and writeset declarations We describe an algorithm that provides this guarantee mid supports these issues while requiring only table-level writeset declarations.

关键词： Concurrency control

来源：评论

学校读者我要写书评

暂无评论

Architecture description and prototype demonstration of optoelectronic parallel-matching architecture

Architecture description and prototype demonstration of opto...

引用

IEEE International parallel and Distributed Processing symposium (IPDP 2000)

作者： Kagawa, K Nitta, K Ogura, Y Tanida, J Ichioka, Y Osaka Univ Grad Sch Engn Dept Mat & Life Sci Suita Osaka 565 Japan

ISBN: (纸本)354067442X

We propose an optoelectronic parallel-matching architecture (PMA) that provides powerful processing capability for distributed algorithms comparing with traditional parallel computing architectures. The PMA is composed of a parallel-matching (PM) module and multiple processing elements (PE's). The PM module is implemented by a large-fan-out free-space optical interconnection and parallel-matching smart-pixel array (PM-SPA). In the proposed architecture, each PE can monitor the other PE's by utilizing several kinds of global processing by the PM module. The PE's can execute concurrent data matching among the others as well as inter-processor communication, Based on the state-of-the-art optoelectronic devices and a diffractive optical element, a prototype of the PM module is constructed. The prototype is assumed to be used in a multiple processor system composed of 4 x 4 processing elements, which are completely connected via 1-bit optical communication channels. On the prototype demonstrator, the fundamental operations of the PM module such as parallel-matching operations and inter-processor communication were virified at 15MHz.

关键词： Optoelectronic devices

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：