检索结果-内蒙古大学图书馆

proceedings of the 1998 10th annual acm symposium on parallel algorithms and architectures, SPAA

作者： Shen, K. Jiao, X. Yang, T. Univ of California Santa Barbara CA United States

ISBN: (纸本)9780897919890

Sparse LU factorization with partial pivoting is important for many scientific applications and delivering high performance for this problem is difficult on distributed memory machines. Our previous work has developed an approach called S* that incorporates static symbolic factorization, supernode partitioning and graph scheduling. This paper studies the properties of elimination forests and uses them to guide supernode partitioning/amalgamation and execution scheduling. The new design with 2D mapping effectively identifies dense structures without introducing too many zeros in the BLAS computation and exploits asynchronous parallelism with low buffer space cost. The implementation of this code, called S+, uses supernodal matrix multiplication which retains the BLAS-3 level efficiency and avoids unnecessary arithmetic operations. The experiments show that S+ improves our previous code substantially and can achieve up to 11.04GFLOPS on 128 Cray T3E 450 MHz nodes, which is the highest performance reported in the literature.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Blocking in parallel multisearch problems (extended abstract) 98

Blocking in parallel multisearch problems (extended abstract...

引用

proceedings of the 1998 10th annual acm symposium on parallel algorithms and architectures, SPAA

作者： Dittrich, W. Hutchinson, D. Maheshwari, A. Bosch Telecom GmbH Backnang Germany

ISBN: (纸本)9780897919890

External memory (EM) algorithms are designed for computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Block-wise access to data is a central theme in the design of efficient EM algorithms. A similar requirement arises in the transmission of data between processors in certain parallel computation models such as BSP* and CGM, for which block-wise communication is a crucial issue. We consider multisearch problems, where a large number of queries are to be simultaneously processed and satisfied by navigating through large data structures on parallel computers. The examples used originate as BSP* algorithms and we adapt them to the EM situation where the queries and data structure are considered to be much larger than the size of the available internal memory. This paper presents techniques to achieve blocking for I/O as well as for communication in multisearch on the BSP* and EM-BSP* models. In the area of EM algorithms new algorithms for multisearch in balanced trees are described. For search trees up to size O(n log n) where n is the number of queries, we obtain work-optimal, parallel, EM multisearch algorithms whose I/O and communication time are the same, asymptotically, as the computation time. These algorithms are obtained via the simulation technique of [15]. For larger trees we describe a parallel, EM algorithm which is simultaneously c-optimal and I/O-optimal. We give a lower bound to the number of I/O operations required for filtering n queries through a binary or multiway search tree of size m when m ≥ n2+Ε, constant Ε > 0. In the domain of parallel, non-EM algorithms we describe a new 1-optimal algorithm for BSP* multisearch in optimal trees which communicates in a block-wise fashion.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Communication-optimal parallel minimum spanning tree algorithms extended abstract

Communication-optimal parallel minimum spanning tree algorit...

引用

proceedings of the 1998 10th annual acm symposium on parallel algorithms and architectures, SPAA

作者： Adler, M. Dittrich, W. Juurlink, B. Kutylowski, M. Rieping, I. Univ of Toronto Toronto Ont Canada

Lower and upper bounds for finding a minimum spanning tree (MST) in a weighted undirected graph on the BSP model are presented. We provide the first non-trivial lower bounds on the communication volume required to solve the MST problem. Let p denote the number of processors, n the number of nodes of the input graph, and m the number of edges of the input graph. We show that in the worst case a total of Ω(k·min(m, pn)) bits need to be transmitted in order to solve the MST problem, where k is the number of bits required to represent a single edge weight. This implies that if each message contains k bits, any BSP algorithm for finding an MST requires communication time Ω(g · min(m/p, n)), where g is the gap parameter of the BSP model. In addition, we present two algorithms whose running times match the lower bounds in different situations. Both algorithms perform linear work and use a number of super-steps independent of the input size. The first algorithm is simple but can employ at most m/n processors efficiently. Hence, it should be applied in situations where the input graph is relatively dense. The second algorithm is a randomized algorithm that performs linear work with high probability, provided that m ≥ n · log p. This is the first linear work BSP algorithm for finding an MST in sparse graphs.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Dynamic scheduling with incomplete information 98

Dynamic scheduling with incomplete information

引用

proceedings of the 1998 10th annual acm symposium on parallel algorithms and architectures, SPAA

作者： Bast, H. Max-Planck-Inst fur Informatik Saarbrucken Germany

ISBN: (纸本)9780897919890

We consider the following scheduling problem: Our goal is to execute a given amount of arbitrarily decomposable work on a distributed machine as quickly as possible. The work is maintained by a central scheduler that can assign chunks of work of an arbitrary size to idle processors. The difficulty is that the processing time required for a chunk is not exactly predictable - usually the less, the larger the chunk - and that processors suffer a delay for each assignment. Our objective is to minimize the total wasted time of the schedule, that is, the sum of all delays plus the idle times of processors waiting for the last processor to finish. We introduce a new deterministic model for this setting, based on estimated ranges [α(w), β(w)] for processing times of chunks of size w. Depending on α, β, and a measure for the overall deviation from these estimates, we can prove matching upper and lower bounds on the wasted time, the former being achieved by our new balancing strategy. This is in sharp contrast with previous work that, even under the strong assumption of independent, approximately normally distributed chunk processing times, proposed only heuristic scheduling schemes supported merely by empirical evidence. Our model naturally subsumes this stochastic setting, and our generic analysis is valid for most of the existing schemes too, proving them to be non-optimal.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Detecting data races in Cilk programs that use locks 98

Detecting data races in Cilk programs that use locks

引用

proceedings of the 1998 10th annual acm symposium on parallel algorithms and architectures, SPAA

作者： Cheng, G.-I. Feng, M. Leiserson, Ch.E. Randall, K.H. Stark, A.F. MIT Lab for Computer Science Cambridge MA United States

ISBN: (纸本)9780897919890

When two parallel threads holding no locks in common access the same memory location and at least one of the threads modifies the location, a 'data race' occurs, which is usually a bug. This paper describes the algorithms and strategies used by a debugging tool, called the Nondeterminator-2, which checks for data races in programs coded in the Cilk multithreaded language. Like its predecessor, the Nondeterminator, which checks for simple 'determinacy' races, the Nondeterminator-2 is a debugging tool, not a verifier, since it checks for data races only in the computation generated by a serial execution of the program on a given input. We give an algorithm, ALL-SETS, that determines whether the computation generated by a serial execution of a Cilk program on a given input contains a race. For a program that runs serially in time T, accesses V shared memory locations, uses a total of n locks, and holds at most k n locks simultaneously, ALL-SETS runs in. O(nkT α(V, V)) time and O(nkV) space, where α is Tarjan's functional inverse of Ackermann's function. Since ALL-SETS may be too inefficient in the worst case, we propose a much more efficient algorithm which can be used to detect races in programs that obey the 'umbrella' locking discipline, a programming methodology that is more flexible than similar disciplines proposed in the literature. We present an algorithm, BRELLY, which detects violations of the umbrella discipline in O(kT α(V, V)) time using O(kV) space. We also prove that any 'abelian' Cilk program, one whose critical sections commute, produces a determinate final state if it is deadlock free and if it generates any computation which is data-race free. Thus, the Nondeterminator-2's two algorithms can verify the determinacy of a deadlock-free abelian program running on a given input.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Deadlock-free routing in arbitrary networks via the flattest common supersequence method 98

Deadlock-free routing in arbitrary networks via the flattest...

引用

proceedings of the 1998 10th annual acm symposium on parallel algorithms and architectures, SPAA

作者： Laing, A.K. Cypher, R. The Johns Hopkins Univ Baltimore MD United States

ISBN: (纸本)9780897919890

In this paper we consider the problem of deadlock-free routing in arbitrary parallel and distributed computers. We focus on asynchronous routing algorithms which continuously receive new packets to route and which do not discard packets that encounter congestion. Specifically, we examine what we call the deadlock-free routing (DFR) problem. The input to the DFR problem consists of an arbitrary network and an arbitrary set of paths in the network. The output consists of a routing algorithm, which is a list of the buffers used along each of the paths. The routing algorithm is required to be free from deadlock and the goal is to minimize the number of buffers required in any one node. We study the DFR problem by converting it into an equivalent problem which we call the flattest common supersequence (FCS) problem. The input to the FCS problem consists of a set of sequences and the output consists of a single sequence that contains all of the input sequences as (possibly non-contiguous) subsequences. The goal of the FCS problem is to minimize the maximum frequency of any symbol in the output sequence. We present three main results. First, we prove that the decision version of the FCS problem is NP-complete, and has no polynomial-time approximation scheme unless P = NP. Next, we propose and experimentally evaluate a range of heuristics for FCS. Our experimental results show that one of these heuristics performs very well over a wide range of inputs. Lastly, we prove that this heuristic is in fact optimal for certain restricted classes of inputs.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A constant time optimal parallel algorithm for two-dimensional pattern matching

引用

SIAM JOURNAL ON COMPUTING 1998年第3期27卷 668-681页

作者： Crochemore, M Gasieniec, L Hariharan, R Muthukrishnan, S Rytter, W Univ Marne La Vallee Inst Gaspard Monge F-93160 Noisy Le Grand France Univ Warsaw Inst Informat PL-02097 Warszawa Poland Max Planck Inst Informat Saarbrucken Germany NYU Courant Inst New York NY USA AT&T Bell Labs Murray Hill NJ 07974 USA

We give an alphabet-independent deterministic parallel algorithm for finding all occurrences of a pattern array of size m(h) x m(w) in a text array of size n(h) x n(w) in the concurrent-read-concurrent-write-parallel-random-access-machine (CRCW-PRAM) model. Our algorithm runs in O(1) time performing optimal, that is, O(n(h) x n(w)) work, following preprocessing of the pattern. This improves the previous best bound of O(loglogm) time with optimal work [A. Amir, G. Benson, and M. Farach, proceedings 5th annual acm symposium on parallel algorithms and architectures, acm, New York, 1993, pp. 79-85], following preprocessing of the pattern, where m = max {m(h),m(w)}. The preprocessing required by our algorithm (and that due to Amir, Benson, and Farach) can be accomplished in O(loglogm) time and O(m(h) x m(w)) work [M. Crochemore et al., manuscript, 1993], [R. Cole et al., manuscript, 1993].

关键词： pattern matching PRAM two-dimensional witnesses duelling periodicity

来源：评论

学校读者我要写书评

暂无评论

A framework for simple sorting algorithms on parallel disk systems (extended abstract) 98

A framework for simple sorting algorithms on parallel disk s...

引用

proceedings of the tenth annual acm symposium on parallel algorithms and architectures

作者： Sanguthevar Rajasekaran Dept. of CISE University of Florida Gainesville FL

来源：评论

学校读者我要写书评

暂无评论

Distributed and parallel computing issues in data warehousing (abstract) 98

Distributed and parallel computing issues in data warehousin...

引用

proceedings of the tenth annual acm symposium on parallel algorithms and architectures

作者： Hector Garcia-Molina Wilburt J. Labio Janet L. Wiener Yue Zhuge Stanford University

No abstract available.

ISBN: (纸本)9780897919890

No abstract available.

关键词：

来源：评论

学校读者我要写书评

暂无评论

parallel continuous randomized load balancing (extended abstract) 98

Parallel continuous randomized load balancing (extended abst...

引用

proceedings of the tenth annual acm symposium on parallel algorithms and architectures

作者： Petra Berenbrink Tom Friedetzky Ernst W. Mayr Department of Mathematics and Computer Science Paderborn University Germany Institut für Informatik Technische Universität München Germany

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：