We give efficient CRCW PRAM algorithms for the construction of two data structures, the Lsuffix Tree of a square matrix and the Nested Suffix Tree of a general matrix. the Lsuffix tree, introduced by Giancarlo [11], r...
详细信息
Recent architectural advances in the computer industry have been focused to solve numerically intensive flows, such as in oil reservoirs, on several processors simultaneously. As a result of connecting these processor...
详细信息
Recent architectural advances in the computer industry have been focused to solve numerically intensive flows, such as in oil reservoirs, on several processors simultaneously. As a result of connecting these processors together, different computer architecturers have evolved: Shared-memory with a few processors, Distributed-memory with upto a few hundred processors and massively parallel with several thousand processors. Researchers in the oil industry have been developing efficient techniques to improve hydrocarbon recovery in reservoirs using these computers. In this work, a generic approach is developed to solve the large system of sparse linear equations that arises in reservoir simulation. this approach uses a combination of domain decomposition and multigrid techniques that results in efficient and robust algorithms for sequential computers with one processor as well as for parallel computers with few to several tens of processors. the efficiency and robustness of these methods is comparable with widely used sequential solvers for problems of practical interest which include implicit wells and faults. In parallel, these methods prove to be an order of magnitude faster on a 32-node iPSC/860 hypercube.
this paper concerns synchronization under read/write atomicity in shared memory multiprocessors. We present a new algorithm for N-process mutual exclusion that requires only read and write operations and that has O(lo...
详细信息
ISBN:
(纸本)0897916131
this paper concerns synchronization under read/write atomicity in shared memory multiprocessors. We present a new algorithm for N-process mutual exclusion that requires only read and write operations and that has O(log2 N) time complexity, where `time' is measured by counting remote memory references. the time complexity of this algorithm is better than that of all prior solutions to the mutual exclusion problem that are based upon atomic read and write instructions;in fact, the time complexity of most prior solutions is unbounded. Performance studies are presented that show that our mutual exclusion algorithm exhibits scalable performance under heavy contention. In the second part of the paper, we discuss two extensions of our mutual exclusion algorithm. In the first extension, we modify the algorithm so that in the absence of contention only O(1) memory references are required. In the second extension, the technique of software combining is introduced for the purpose of increasing concurrency when using the algorithm to implement combinable read-modify-write operations.
In recent years, there is a growing tendency to support high-level synchronization operations, such as read-modify-write, FIFO queues and stacks, as part of the programmer's shared memory model. this paper examine...
详细信息
ISBN:
(纸本)0897916131
In recent years, there is a growing tendency to support high-level synchronization operations, such as read-modify-write, FIFO queues and stacks, as part of the programmer's shared memory model. this paper examines the problem of implementing hybrid consistency with high-level synchronization operations. Lower bounds on the time required to implement several common synchronization operations are presented. they are followed by several efficient implementations for hybrid consistency that supports various synchronization operations. Specifically, it is shown that for any implementation of weak consistency, the time required to execute a read-modify-write, a dequeue or a pop operation is Ω(d), where d is the network delay. An efficient implementation for hybrid consistency that supports any type of high-level synchronization operations and weak read and write operations is presented. this implementation assumes that weak and strong operations never access the same objects. Weak read and write operations are executed instantaneously, while the time required to execute strong operations is O(d). this is within a constant factor of the lower bounds for most of the commonly used types of operations. Following this, another efficient implementation for hybrid consistency is presented. In this implementation, the only types of strong operations which are supported are read, write and read-modify-write. On the other hand, weak and strong operations are allowed to access the same objects.
An approach is proposed for modeling off-the-shelf hardware and for modeling parallelalgorithms, along with a design methodology to use the information provided by these models, to design a class of macro-pipelined s...
详细信息
ISBN:
(纸本)0818606347
An approach is proposed for modeling off-the-shelf hardware and for modeling parallelalgorithms, along with a design methodology to use the information provided by these models, to design a class of macro-pipelined special-purpose architectures. Nine parameters to form a model of the characteristics of parallel/distributed algorithms and the environment in which they must execute are presented. In addition, a set of tuples to model the characteristics of computer architectures is presented. By combining the tuples withthe parameters, the execution time of the algorithm modeled by the parameters on the hardware modeled by the tuples can be approximated. the combination of these models could be used as a basis for computer-aided tools used in the design of macro-pipelined parallel/distributed processors.
the following topics are dealt with: array processing;pipelined CPUs;caches and registers;LISP machines;special-purpose parallel processors;multiprocessors;uniprocoessors;logic programming machines;commercial multipro...
详细信息
ISBN:
(纸本)0818606347
the following topics are dealt with: array processing;pipelined CPUs;caches and registers;LISP machines;special-purpose parallel processors;multiprocessors;uniprocoessors;logic programming machines;commercial multiprocessors;data retrieval architectures;multiprocessor issues;systolic arrays;data flow and reduction;interconnection networks;and multiprocessor performance. 52 papers were presented of which 50 are published in full in the present proceedings, and 1 as digest only. Abstracts of individual papers can be found under the relevant classification codes in this or other issues.
this conference proceedings contains 30 papers. these papers present a comprehensive up-to-date view in the area of programming languages. the main subjects are operating systems, program compilers, computer programmi...
详细信息
ISBN:
(纸本)0897911474
this conference proceedings contains 30 papers. these papers present a comprehensive up-to-date view in the area of programming languages. the main subjects are operating systems, program compilers, computer programming, algorithms, parallel processing, concurrent programming, programming languages design and implementation, pattern recognition, and software development environment. Also discussed are functional programming, denotational semantics, and object-oriented computer programming languages.
Multiple-processor systems based on an open bus architecture and composite Von Neumann-type computing devices can be used effectively in AI and other CPU-memory intensive applications. A combination of parallel and hi...
详细信息
ISBN:
(纸本)0818606347
Multiple-processor systems based on an open bus architecture and composite Von Neumann-type computing devices can be used effectively in AI and other CPU-memory intensive applications. A combination of parallel and hierarchically functioning commercial microprocessor-based computers (MC680X0 family) working from an industry-standard bus (IEEE-796 Multibus), together with certain other architectural enhancements, provides a flexible hardware solution to a large class of computationally severe problems. Such an architecture, deployed in both a development (host) computer and an embedded (target) computer, can be an effective interim engineering emulation of future non-Von parallel/hierarchical architectures. this architecture, dubbed closely coupled asynchronous hierarchical and parallel processing (CCAHPP), is presented.
A model for predicting multiprocessor performance on iterative algorithms, where each iteration consists of some amount of access to global data and some amount of local processing, is presented. the application cycle...
详细信息
ISBN:
(纸本)0818606347
A model for predicting multiprocessor performance on iterative algorithms, where each iteration consists of some amount of access to global data and some amount of local processing, is presented. the application cycles may be synchronous or asynchronous, and the processors may or may not incur waiting time, depending on the relationship between the access time and processing time. the amount of processing time and global data accesses incurred by the parallel processes depends on characteristics of the algorithm and its decomposition. the decompositions of several sample algorithms are studied, and several decomposition groups are identified. Finally, the effect of decomposition on the performance of the Poisson partial-differential equation algorithm is investigated.
Architecture and supporting parallel garbage collection algorithms designed for a virtual memory system with separate processors for list processing and for garbage collection are presented. Each processor has its own...
详细信息
ISBN:
(纸本)0818606347
Architecture and supporting parallel garbage collection algorithms designed for a virtual memory system with separate processors for list processing and for garbage collection are presented. Each processor has its own primary memory;in addition, there is a small common memory which both processors may access. Individual memories swap off a common secondary memory, but no locking mechanism is required. In particular, a page may reside in both memories simultaneously, and indeed may be accessed and modified freely by each processor. A secondary memory controller ensures consistency without necessitating numerous lockouts on the pages.
暂无评论