In practice, various techniques are used to speed up the reasoning in logic programming and parallel machines. Three major approaches have generally been adapted to solve this problem. The most common approach involve...
详细信息
In practice, various techniques are used to speed up the reasoning in logic programming and parallel machines. Three major approaches have generally been adapted to solve this problem. The most common approach involves some methods of the development of AND and OR parallelism, as in Parlog[Clark84], Concurrent-Prolog[Shapiro83] and IDIOM[Guptas&Hermenegildo]. In these schemes, the three main forms of implicit parallelism-Independent AND-parallelism, Dependent AND-parallelism and OR-parallelism are exploited. The second approach is to build parallelarchitectures to execute different level parallelism inherent in inference, such as DADO[Stolfo84, Miranker90], NON-VON[Hillyer86] and PSM[Gupta87]. The third approach is to develop faster match and search algorithms, as in Rete[Forgy82] and Treat[Miranker87]. The bottle-neck in inference systems is the match phase. Around 90% of execution rime is consumed in this phase[Gupta87]. In this paper, we present algorithms to realize the connection method on systolic arrays. The algorithms try to partition the paths in connections matrices for parallel inference. Firstly, parallelism in reasoning is discussed;then the parallel inference on systolic arrays and algorithms for partition of paths are introduced. Finally, the correctness and completeness of the algorithms is shown. The paper consists of five sections. The connection method is presented and parallel inference algorithms on systolic arrays are designed after introduction. The third section describes an example in partition of the paths in the connection method, the example executing on normal systolic and tree systolic models are shown. The fourth section discusses the analysis of the algorithms. The final section works out conclusions and related work.
National Research Center for Intelligent Computing Systems (NCIC for short) is the unique national hi-tech R/D center for advanced computing technology in China. In this overview we first introduce China's Hi-Tech...
详细信息
National Research Center for Intelligent Computing Systems (NCIC for short) is the unique national hi-tech R/D center for advanced computing technology in China. In this overview we first introduce China's Hi-Tech R&D Programme (863 programme) and NCIC, then we reported the state of the art of parallel processing at NCIC. This article discussed the key technologies being exploited by the representative Chinese R&D teams and the wide applications of parallel computers in China. The key technologies in parallel processing we are attacking and reported in this article include wormhole routing and other efficient switching techniques, the Easter series MPP systems, the Dawning series symmetric and multi-thread multiprocessor, parallel operating systems and parallel file systems, parallel compiler and efficient programming tool. The future research directions at NCIC are also mentioned.
Barrier algorithms are central to the performance of numerous algorithms on scalable, high-performance architectures. Numerous barrier algorithms have been suggested and studied for Non-Uniform Memory Access (NUMA) ar...
详细信息
ISBN:
(纸本)0818656026
Barrier algorithms are central to the performance of numerous algorithms on scalable, high-performance architectures. Numerous barrier algorithms have been suggested and studied for Non-Uniform Memory Access (NUMA) architectures, but less work has been done for Cache Only Memory Access (COMA) or attraction memory [1] architectures such as the KSR-1. In this paper, we presented two new barrier algorithms that offer the best performance we have recorded on the KSR-1 distributed cache multiprocessor. We discuss the trade-offs and the performance of seven algorithms on two architectures. The new barrier algorithms adapt well to a hierarchical caching memory model and take advantage of parallel communication offered by most multiprocessor interconnection networks,. Performance results are shown for a 256-processor KSR-1 and a 20-processor Sequent Symmetry.
Today available parallel database systems use conventional parallel hardware architectures employing a highly parallel software architecture. It is an emerging technique to speed up the execution by declustering the s...
详细信息
Today available parallel database systems use conventional parallel hardware architectures employing a highly parallel software architecture. It is an emerging technique to speed up the execution by declustering the stored data sets among a number of parallel and independent disk drives. In this paper we revisit parallel relational database algorithms for range declustering. We adapt the conventional known and well studied parallel algorithms to declustered data, exploit the inherent order property of the partitioned data sets and compare analytically the performance of the algorithms. It is shown that the parallel range declustered variants generally outperform their conventional parallel counterparts.
We consider the Block PRAM model of Aggarwal et al. (in ''Proceedings, First Annual ACM symposium on parallel algorithms and architectures, 1989,'' pp. 11-21). For a Block PRAM model with n/log n proce...
We consider the Block PRAM model of Aggarwal et al. (in ''Proceedings, First Annual ACM symposium on parallel algorithms and architectures, 1989,'' pp. 11-21). For a Block PRAM model with n/log n processors and communication latency l = O(log n), we show that prefix sums can be performed in time O(l log n/log 1), but list ranking requires time OMEGA(l log n);these bounds are tight. These results justify an intuitive observation of Gazit et al (in ''Proceedings, 1987 Princeton Workshop on Algorithm, Architecture and Technology Issues for Models of Concurrent Computation,'' pp. 139-156) that algorithm designers should, when possible, replace the list ranking procedure with the prefix sums procedure. We demonstrate the value of this technique in choosing between two optimal PRAM algorithms for finding the connected components of dense graphs. We also give theoretical improvements for integer sorting and many other algorithms based on prefix sums, and suggest a relationship between the issue of graph density for the connected components problem and alternative approaches to integer sorting. (C) 1994 Academic Press, Inc.
The problem considered in this paper is the definition of an efficient parallel algorithm for texture analysis of an image. The target architectures are distributed-memory general-purpose MIMD parallel machine. The so...
详细信息
The problem considered in this paper is the definition of an efficient parallel algorithm for texture analysis of an image. The target architectures are distributed-memory general-purpose MIMD parallel machine. The solutions here proposed are based on two different methods: the Statistic Feature Matrix and the Wavelet Decomposition.
The efficiency of scheduling algorithms is essential in order to attain optimal performances from parallelprogramming systems. In this paper we use a portable parallelprogramming environment we have implemented, the...
详细信息
A simple greedy algorithm is presented for task clustering with duplication (or recomputation) which, for a task graph with arbitrary granularity, produces a schedule whose makespan is at most twice optimal. Furthermo...
详细信息
A simple greedy algorithm is presented for task clustering with duplication (or recomputation) which, for a task graph with arbitrary granularity, produces a schedule whose makespan is at most twice optimal. Furthermore, the quality of the schedule improves as the granularity of the task graph increases. For example, if the granularity is at least 1/2 , the makespan of the schedule is at most 5/3 times optimal. For a task graph with n tasks and e inter-task communication constraints, the algorithm runs in O(n(n lg n + e)) time, which is n times faster than the currently best known algorithm for this problem. Similar algorithms are developed that produce: (1) optimal schedules for coarse grain graphs;(2) 2-optimal schedules for trees with no task duplication;and (3) optimal schedules for coarse grain trees with no task duplication.
Currently, many parallel algorithms are defined for shared- memory architectures. The prefered machine model for designing these algorithms is the PRAM. However, this model does not take into account properties of exi...
详细信息
Many combinatorial problems can be efficiently solved for partial k-trees (graphs of treewidth bounded by k). The edge-coloring problem is one of the well-known combinatorial problems for which no NC algorithms have b...
详细信息
Many combinatorial problems can be efficiently solved for partial k-trees (graphs of treewidth bounded by k). The edge-coloring problem is one of the well-known combinatorial problems for which no NC algorithms have been obtained for partial k-trees. This paper gives an optimal and first NC parallel algorithm to find an edge-coloring of any given partial k-tree using a minimum number of colors if k and the maximum degree Δ are bounded.
暂无评论