检索结果-内蒙古大学图书馆

Annual acm symposium on parallel algorithms and architectures 2000年 176-185页

作者： thulasiraman, Parimala theobald, Kevin B. Khokhar, Ashfaq A. Gao, Guang R. Univ of Delaware Newark DE United States

In this paper we present fine-grained multithreaded algorithms and implementations for the Fast Fourier Transform (FFT) problem. the FFT problem has been formulated using two distinct approaches based on the dataflow concepts. the first approach, referred to as the receiver-initiated algorithm, realizes the FFT iterations as a parent-child relationship while fully exploiting the underlying parallelism. the second approach, referred to as the sender-initiated algorithm, follows a data-flow model based on the producer-consumer style of programming and can be adopted to different architectural parameters for achieving high performance. the implementations of the proposed algorithms have been carried out on the EARth (Efficient Architecture for Running threads) platform. For both the algorithms, we analyze the ratio of remote vs local threads and study its impact on the experimental results. Our implementation results show that for certain block sizes on fixed problem size and machine size, the receiver-initiated approach performs better than the sender-initiated approach. For large number of processors, both the algorithms perform well, yielding execution times of only 10 msec for an input of 16 K data points on a 64 processor machine, assuming each processor running at 140 MHz clock speed.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Broadcast scheduling optimization for heterogeneous cluster systems

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

Annual acm symposium on parallel algorithms and architectures 2000年 129-136页

作者： Liu, Pangfeng Sheng, Tzu-Hao Natl Chung Cheng Univ Chiayi Taiwan

Network of workstation (NOW) is a cost-effective alternative to massively parallel supercomputers. As commercially available off-the-shelf processors become cheaper and faster, it is now possible to build a PC or workstation cluster that provides high computing power within a limited budget. However, a cluster may consist of different types of processors and this heterogeneity within a cluster complicates the design of efficient collective communication protocols. this paper shows that a simple heuristic called fastest-node-first (FNF) is very effective in reducing broadcast time for heterogeneous cluster systems. Despite the fact that FNF heuristic fails to give the optimal broadcast time for a general heterogeneous network of workstation, we prove that FNF always gives the optimal broadcast time in several special cases of clusters. Based on these special case results, we show that FNF is an approximation algorithm that guarantees a competitive ratio of 2. From these theoretical results we also derive techniques to speed up the branch-and-bound search for the optimal broadcast schedule in HNOW.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Data management in hierarchical bus networks

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

Annual acm symposium on parallel algorithms and architectures 2000年 109-118页

作者： auf der Heide, F.Meyer Raecke, H. Westermann, M. Paderborn Univ Paderborn Germany

A hierarchical bus network T = (V, E) uses hierarchically, tree-like connected buses as a communication network. New communication technologies like SCI (Scalable Coherent Interface) make such networks very attractive, because they allow their easy construction and guarantee reasonable communication performance. Such networks can be modeled as tree networks: leaves correspond to processors, inner nodes to buses, edges to switches, and bandwidths of inner nodes and edges are related to bandwidths of buses and switches, respectively. In this paper we address the problem of static data management. Given a set of shared data objects X and the read and write frequencies from the processors to the shared data objects, the goal is to compute a (maybe redundant) placement of the shared data objects to the processors, such that the congestion (the maximum over the load of all edges and inner nodes, induced by the read and write frequencies, divided by the bandwidth of the edge or inner node, respectively) is minimized. It is known that this problem can be solved optimally in linear time, if inner nodes are allowed to hold copies of shared data objects. In our model, inner nodes correspond to buses and therefore cannot store copies of shared data objects. We show that this restriction increases the complexity of the placement problem drastically: It becomes NP-hard. On the other hand, the main contribution of our paper is an approximation algorithm with runtime O(|X|·|V|&middotheight(T)·log(degree(T))) that increases the congestion by a factor of at most 7.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Infinite parallel job allocation

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

Annual acm symposium on parallel algorithms and architectures 2000年 99-108页

作者： Berenbrink, Petra Czumaj, Artur Friedetzky, Tom Vvedenskaya, Nikita D. Paderborn Univ Paderborn Germany

In recent years, the task of allocating jobs to servers has been studied with the `balls and bins' abstraction. Results in this area exploit the large decrease in maximum load that can be achieved by allowing each job (ball) a little freedom in choosing its destination server (bin). In this paper we examine an infinite and parallel allocation process (see [ABS98]) which is related to the `balls and bins' abstraction. the simple process can be used to model many problems arising in applications like load balancing, data accesses for parallel data servers, hashing, and PRAM simulations. Unfortunately, the parallel allocation process behaves in a highly non-uniform manner which makes its analysis challenging. Even the typically simple question of for which arrival rates the process is stable, is highly non-trivial. In order to cope with this non-uniform behavior we introduce a new sequential process and show (via simulations) that the sequential process models the behavior of the parallel one very accurately. We develop a system of ordinary differential equations in order to describe the behavior of our sequential process and present a thorough analysis of the performance this process. For example, we show that the queue length distribution decreases double-exponentially. Finally, we present simulation results indicating that the solutions to the differential equations very well predict the queue length distribution of our sequential process and the largest injection rate for which it is stable. Summarizing, we can conclude that in all the performance characteristics we have measured experimentally, the parallel and the sequential process are closely related. this indicates that the obtained solution of the differential equations and the results presented above are applicable to the parallel process, too.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

PACE: Processor architectures for circuit emulation 12th

引用

10 IPPS/SPDP 98 Workshops Held in Conjunction with the 12th International parallel Processing symposium / 9th symposium on parallel Distributed Processing

作者： Kolla, R Springauf, O Univ Wurzburg Lehrstuhl Tech Informat D-97070 Wurzburg Germany

ISBN: (纸本)3540643591

We describe a family of reconfigurable parallel architectures for logic emulation. they are supposed to be applicable like conventional FPGAs, while covering a larger range of circuit sizes and clock frequencies. In order to evaluate the performance of such programmable designs, we also need software methods for code generation from circuit descriptions. We propose a combination of scheduling and routing algorithms for embedding calculations into the target architecture.

关键词： parallel architectures

来源：评论

学校读者我要写书评

暂无评论

Asynchronous parallel algorithm for mining association rules on a shared-memory multi-processors 98

Asynchronous parallel algorithm for mining association rules...

引用

Proceedings of the 1998 10th Annual acm symposium on parallel algorithms and architectures, SPAA

作者： Cheung, D.W. Hu, K. Xia, Sh. The Univ of Hong Kong Hong Kong Hong Kong

ISBN: (纸本)9780897919890

Mining association rules from large databases is an important problem in data mining. there is a need to develop parallel algorithm for this problem because it is a very costly computation process. However, all proposed parallel algorithms for mining association rules follow the conventional level-wise approach. On a shared-memory multi-processors, they will impose a synchronization in every iteration which degrades greatly their performance. the deficiency comes from the contention on the shared I/O channel when all processors are accessing their database partitions in the shared storage synchronously. An asynchronous algorithm APM has been proposed for mining association rules on shared-memory multiprocessors. All participating processors in APM generate candidates and count their supports independently without synchronization. Furthermore, it can finish the computation with less I/O than required in the level-wise approach. the algorithm has been implemented on a Sun Enterprise 4000 multi-processors with 12 nodes. the experiments show that APM has super performance than other proposed synchronous algorithms.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Scheduling with communication delays and data routing in message passing architectures 12th

引用

10 IPPS/SPDP 98 Workshops Held in Conjunction with the 12th International parallel Processing symposium / 9th symposium on parallel Distributed Processing

作者： Moukrim, A Quilliot, A Univ Clermont Ferrand 2 LIMOS F-63177 Aubiere France

ISBN: (纸本)3540643591

this work deals with the scheduling problem of a directed acyclic graph with interprocessor communication delays. the objective is to minimize the makespan, taking into account the contention in the network induced by the message routing. We propose two heuristics for solving the scheduling and routing problems onto arbitrary networks, taking into consideration the access conflicts to links during the task scheduling. Both heuristics significantly improve the performance of the algorithms which do not consider the contention in the network. the comparison of these heuristics is done on problems with different granularity levels in regard to execution times and number of needed processors.

关键词： Scheduling

来源：评论

学校读者我要写书评

暂无评论

10 Workshops Held in Conjunction with the 12th International parallel Processing symposium and 9th symposium on parallel and Distributed Processing, IPPS/SPDP 1998

引用

10 Workshops held in conjunction with 12th International parallel symposium and 9th symposium on parallel and Distributed Processing, IPPS/SPDP 1998

ISBN: (纸本)3540643591

the proceedings contain 118 papers. the special focus in this conference is on parallel and Distributed Processing. the topics include: Dynamic reconfiguration of a PMMLA for high-throughput applications;a parallel algorithm for minimum cost path computation on polymorphic processor array;a performance modeling and analysis environment for reconfigurable computers;an integrated partitioning and synthesis system for dynamically reconfigurabte multi-FPGA architectures;temporal partioning for partially-reconfigurable-field-programmable gate;a java development and runtime environment for reconfigurable computing;synthesizing reconfigurable sequential machines using tabular models;evaluation of a low-power reconfigurable DSP architecture;a reconfigurable hardware-monitor for communication analysis in distributed real-time systems;a mathematical benefit analysis of context switching reconfigurable computing;a configurable computing approach towards real-time target tracking;hardware reconfigurable neural networks;a simulator for the reconfigurable mesh architecture;processor architectures for circuit emulation;an empirical comparison of runtime systems for conservative parallel simulation;synchronizing operations on multiple objects;migration and rollback transparency for arbitrary distributed applications in workstation clusters;a topology based approach to coordinated multicast operations;a parallel evolutionary algorithm for the vehicle routing problem with heterogeneous fleet;artificial neural networks on reconfigurable meshes;a molecular quasi-random model of computations applied to evaluate collective intelligence;replicated shared object model for edge detection with spiral architecture and scheduling tasks of a parallel program in two-processor systems with use of cellular automata.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Randomized routing and PRAM emulation on parallel machines 12th

引用

10 Workshops held in conjunction with 12th International parallel symposium and 9th symposium on parallel and Distributed Processing, IPPS/SPDP 1998

作者： Wei, David S. L. Department of Computer and Information Science Fordham University BronxNY10458-5198 United States

ISBN: (纸本)3540643591

this paper shows the power of randomization in designing efficient parallel algorithms for the problems of routing and PRAM emulation. We show that with randomization techniques optimal routing can be obtained for a large class of processor interconnection networks called leveled networks. this class includes well-known networks (e.g., mesh, hypercube and de Bruijn) as well as new ones (e.g., star graph, for which the network diameter is sub-logarithmic in the network size). Using the obtained routing algorithms, optimal emulations of the CRCW PRAM on the leveled network and mesh are also given. © Springer-Verlag Berlin Heidelberg 1998.

关键词： Mesh generation

来源：评论

学校读者我要写书评

暂无评论

Randomized parallel prefetching and buffer management 12th

引用

10 Workshops held in conjunction with 12th International parallel symposium and 9th symposium on parallel and Distributed Processing, IPPS/SPDP 1998

作者： Varman, Peter J. Department of Electrical and Computer Engineering Rice University HoustonTX77005 United States

ISBN: (纸本)3540643591

We show that deterministic algorithms using bounded lookahead cannot fully exploit the potential of a parallel I/O system. Randomization can be used to significantly improve the performance of parallel prefetching and buffer management algorithms. Using randomization in the data layout and a simple prefetching scheme, we show that a readonce reference string of length N can be serviced in Θ(N/D) parallel I/Os in a D-disk system. For the case of read-many reference strings we introduce a novel algorithm using randomized write-back with a competitive ratio of Θ(√D). In contrast, we show that deterministic write-back results in a competitive ratio of at least Ω(D). © Springer-Verlag Berlin Heidelberg 1998.

关键词： Random processes

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：