A type of incomplete decomposition preconditioner based on local block factorization is considered, for the matrices derived from discreting 2-D or 3-D elliptic partial differential equations. We prove that the condit...
详细信息
ISBN:
(纸本)0769515126
A type of incomplete decomposition preconditioner based on local block factorization is considered, for the matrices derived from discreting 2-D or 3-D elliptic partial differential equations. We prove that the condition numbers of the preconditioned matrices are small, which means that the constructed preconditioners are effective. Further we consider an efficient parallel version of the preconditioner which depends only on a single integer argument. When its value is small, the iterations needed on multiple processors to converge is much more than on a single processor But withthe increase of this value, the difference decreases step by step. Finally, we have many experiments on a cluster of 6 PCs with main frequencies of 1.8GHz the results show that the local block factorizations constructed are efficient in serial implementation, if compared to some well-known effective preconditioners, and the parallel versions are efficient also.
Improving the computation efficiency is a key issue in image processing, especially in edge detection, because edge detection is very computationally intensive. Withthe development of real-time application of image p...
详细信息
ISBN:
(纸本)0769515126
Improving the computation efficiency is a key issue in image processing, especially in edge detection, because edge detection is very computationally intensive. Withthe development of real-time application of image processing, fast processing response is becoming more critical. In this paper, a technique for distributed image processing on Spiral Architecture is proposed, which provides a platform for speeding up image processing based on clusters.
the technology and application trends leading to current day multiprocessor architectures such as chip multiprocessors, embedded architectures, and massively parallelarchitectures, demand faster, mode efficient, and ...
详细信息
ISBN:
(纸本)3540440496
the technology and application trends leading to current day multiprocessor architectures such as chip multiprocessors, embedded architectures, and massively parallelarchitectures, demand faster, mode efficient, and more scalable cache coherence schemes than the existing ones. In this paper we present a new scheme that has a potential to meet such a demand. the software support for our scheme is in the form of program annotations to detect shared accesses as well as release synchronizations that represent data sharing boundaries. A small hardware called Coherence Buffer, (CB) with an associated controller, local to each processor forms the control unit to locally enforce cache coherence actions which are off the critical path. Our simulation study shows that a 8 entry 4-way associative CB helps achieve a speedup of 1.07 - 4.31 over full-map 3-hop directory scheme for five of the SPLASH-2 benchmarks (representative of migratory sharing, producer-consumer and write-many workloads), under Release Consistency model.
this paper presents a static scheduler to carry out the best assignment of a Directed Acyclic Graph (DAG) representing an application program. Some characteristics of the DAG, a decision model, and the evaluation para...
详细信息
ISBN:
(纸本)3540440496
this paper presents a static scheduler to carry out the best assignment of a Directed Acyclic Graph (DAG) representing an application program. Some characteristics of the DAG, a decision model, and the evaluation parameters for choosing the best solution provided by the selected scheduling algorithms are defined. the selection of the scheduling algorithms is based on five decision levels. At each level, a subset of scheduling algorithms is selected. When the scheduler was tested with a series of DAGs having different characteristics, the scheduler's decision was right 100% of the time in those cases in which the number of available processors is known.
Power-List, ParList and PList data structures are efficient tools for functional descriptions of parallel programs that are divide & conquer in nature. the goal of this work is to develop three parallel variants f...
详细信息
ISBN:
(纸本)3540440496
Power-List, ParList and PList data structures are efficient tools for functional descriptions of parallel programs that are divide & conquer in nature. the goal of this work is to develop three parallel variants for Fast Fourier Transformation using these theories. the variants are implied by the degree of the polynomial, which can be a power of two, a prime number, or a product of prime factors. the last variant includes the first two, and represents a general and efficient parallel algorithm for Fast Fourier Transformation. this general algorithm has a very good time complexity, and can be mapped on a recursive interconnection network.
We study parallel solutions to the problem of weighted multiselection to select r elements on given weighted-ranks from a, set S of n weighted elements, where an element is on weighted rank k if it is the smallest ele...
详细信息
ISBN:
(纸本)0769515126
We study parallel solutions to the problem of weighted multiselection to select r elements on given weighted-ranks from a, set S of n weighted elements, where an element is on weighted rank k if it is the smallest element such that the aggregated weight of all elements not greater than it in S is not smaller than k. We propose efficient algorithms on two of the most popular parallelarchitectures, hypercube and mesh. For a hypercube with p < n processors, we present a parallel algorithm running in O(n(epsilon) min{r, log p}) time for p = n(1-epsilon), 0 < epsilon < 1, which is cost optimal when r greater than or equal to p. Our algorithm on rootp x rootp mesh runs in O(rootp + n/p log(3) p) time P which is the same as multiselection on mesh when r greater than or equal to log p, and thus has the same optimality as multiselection in this case.
Recent research efforts of parallelprocessing on non-dedicated clusters have focused on high execution performance, parallelism management, transparent access to resources, and making clusters easy to use. However as...
详细信息
ISBN:
(纸本)0769515126
Recent research efforts of parallelprocessing on non-dedicated clusters have focused on high execution performance, parallelism management, transparent access to resources, and making clusters easy to use. However as a collection of independent computers used by multiple users, clusters are susceptible to failure. this paper shows the development of a coordinated checkpointing facility for the GENESIS cluster operating system. this facility was developed by exploiting existing operating system services. High performance and low overheads are achieved by allowing the processes of a parallel application to continue executing during the creation of check-points, while maintaining low demands on cluster resources by using coordinated checkpointing.
the external selection problem is to select the record withthe K-th smallest key from the given N records that are distributed and stored evenly on the D disks for the parallel machine with D processors. Each process...
详细信息
ISBN:
(纸本)0769515126
the external selection problem is to select the record withthe K-th smallest key from the given N records that are distributed and stored evenly on the D disks for the parallel machine with D processors. Each processor has its own primary memory of size M records and one disk, where N/D> M. the processors are connected with a root D X rootD Mesh architecture. Based on a two-stage approach, this paper presents an efficient parallel external selection algorithm for the distributed-memory parallel systems. First, all the processors execute local external sorting in parallel, each processor sorts the N/D records on its own disk. Next, they execute parallel external selection from the D sorted sub files on the D disks. this algorithm is asymptotically optimal and has a small constant factor of time complexity.
Heterogeneous parallel systems are becoming increasingly more common, especially withthe increasing use of cluster computers, such as PCs and networks of workstations for parallel computing. the main concern of this ...
详细信息
ISBN:
(纸本)0769515126
Heterogeneous parallel systems are becoming increasingly more common, especially withthe increasing use of cluster computers, such as PCs and networks of workstations for parallel computing. the main concern of this paper is measuring and evaluating the performance of such parallel systems, based on dynamic load balancing algorithm for parallel search algorithm depth-first search algorithm (DFS). the implementation of dynamic load balancing is running under the MPI (message passing interface) that allows parallel execution on cluster of heterogeneous 6 SUN workstations (COHW), operating with Solaris operating system and cluster of 10 PCs operating with Linux operating system, parallel program of dynamic load balancing is written in C language.
this paper presents CODACS (COnfigurable DAtaflow Computing System) architecture, a high performance reconfigurable computing system prototype with a highly scalable degree able to directly execute in hardware dataflo...
详细信息
ISBN:
(纸本)3540440496
this paper presents CODACS (COnfigurable DAtaflow Computing System) architecture, a high performance reconfigurable computing system prototype with a highly scalable degree able to directly execute in hardware dataflow processes (dataflow graphs). the reconfigurable environment consists of a set of FPGA based platform-processors created by a set of identical Multi Purpose Functional Units (MPFUs) and a reconfigurable interconnect to allow a straightforward one-to-one mapping between dataflow actors and MPFUs. Since CODACS does not support the conventional processor cycle, the platform-processor computation is completely asynchronous according to the dataflow graph execution paradigm proposed in [8].
暂无评论