this paper develops a new approach to compiling C programs for multiple address space, multi-processor DSPs. It integrates a novel data transformation technique that exposes the processor location of partitioned data ...
详细信息
this paper develops a new approach to compiling C programs for multiple address space, multi-processor DSPs. It integrates a novel data transformation technique that exposes the processor location of partitioned data into a parallelization strategy. When this is combined with a new address resolution mechanism, it generates efficient programs that run on multiple address spaces without using message passing. this approach is applied to the UTDSP benchmark suite and evaluated on a four processor TigerSHARC board, where it is shown to outperform existing approaches and give an average speedup of 3.25 on the parallel benchmarks.
Neuroinformatics is a multidiscipline that results from synergetic actions of several theories such as achievement, processing, storage, transmissions recovery and diffusion of neural information. From neuroinformatic...
详细信息
Neuroinformatics is a multidiscipline that results from synergetic actions of several theories such as achievement, processing, storage, transmissions recovery and diffusion of neural information. From neuroinformatics point of view, the neural complex (natural or artificial neural nets) is considered an automata with self-control, a memory machine and hemostats (hemostats represent the whole internal processes and behavior that have as a main goal - the achievement of an equilibrium state in several changes of environment). Neural nets (natural or artificial) are neural complex systems with C/sup 3/I protocol (commands, communication, control and information). Neural nets consist of cellular units strongly interconnected. Excitatory/inhibitory activities of cellular units propagate information to the entire system. parallel information processing in these units leads to network convergence by cost function minimizing. Neural activity is described by the percentage of the excitatory/inhibitory cellular units. the excitatory activity is described as negentropy (the uncertainty parameter) and the inhibitory activity is described as posentropy (the certainty parameter).
作者:
Huang, HChinese Acad Sci
Supercomp Ctr Comp Network Informat Ctr Beijing 100080 Peoples R China
In this paper, we use a new language-TPL (Tensor product Language) to compute the Fast Fourier Transform. It can provide good performance and portability. We detail the method and application to the FFT of TPL, andext...
详细信息
ISBN:
(纸本)0769515126
In this paper, we use a new language-TPL (Tensor product Language) to compute the Fast Fourier Transform. It can provide good performance and portability. We detail the method and application to the FFT of TPL, andextendto Sande-Tucky FFT algorithm.
As to Markov cipher, its transition probability matrix is a doubly stochastic one. the eigenvalue of the matrix with maximum magnitude less than one plays an important role in designing Markov cipher this paper provid...
详细信息
ISBN:
(纸本)0769515126
As to Markov cipher, its transition probability matrix is a doubly stochastic one. the eigenvalue of the matrix with maximum magnitude less than one plays an important role in designing Markov cipher this paper provides a parallel algorithm for computing the eigenvalue of the doubly stochastic matrix A of size 65535x65535, which comes from a Markov cipher shrunken model with both 16 bits plaintext and ciphertext, an analysis on the complexity of the parallel algorithm is also considered.
Investigations of the parallel computing of the non-ideal 3-D space detonation wave propagation are presented in this paper on the hi-performance computer based on CC-NUMA architecture. Upon analyzing and testing the ...
详细信息
ISBN:
(纸本)0769515126
Investigations of the parallel computing of the non-ideal 3-D space detonation wave propagation are presented in this paper on the hi-performance computer based on CC-NUMA architecture. Upon analyzing and testing the previous serial program, the computation of curvature, the first-order and the second-order difference were determined to be the main objects of parallelization. Some processing techniques were applied to convert the serial program into parallel program, such as the strategy of "Divide and Conquer", the balance of the loading distribution. Numerical simulation computation of the parallel program results in a great increase of computing speed of the non-ideal 3-D space detonation wave propagation.
An algorithm, which solves the cooperative concurrent computing tasks by using the idle cycle of a number of high performance heterogeneous workstations interconnected by a high-speed network, is proposed. In order to...
详细信息
ISBN:
(纸本)0769515126
An algorithm, which solves the cooperative concurrent computing tasks by using the idle cycle of a number of high performance heterogeneous workstations interconnected by a high-speed network, is proposed. In order to get better parallel computation performance, this paper gives a model and an algorithm of task scheduling among heterogeneous workstations, in which the costs of loading data, computing, communication and collecting results are considered. Using this efficient algorithm, an optimal subset of heterogeneous workstations withthe shortest parallel executing time of tasks can be selected.
It is presented in this paper that the design and analysis of finite difference domain decomposition algorithms for the two-dimensional heat equation and the numerical results have shown the stability and accuracy of ...
详细信息
ISBN:
(纸本)0769515126
It is presented in this paper that the design and analysis of finite difference domain decomposition algorithms for the two-dimensional heat equation and the numerical results have shown the stability and accuracy of the algorithms. the algorithms in the paper have further extended those developed by Dawson and the others [6].
A type of incomplete decomposition preconditioner based on local block factorization is considered, for the matrices derived from discreting 2-D or 3-D elliptic partial differential equations. We prove that the condit...
详细信息
ISBN:
(纸本)0769515126
A type of incomplete decomposition preconditioner based on local block factorization is considered, for the matrices derived from discreting 2-D or 3-D elliptic partial differential equations. We prove that the condition numbers of the preconditioned matrices are small, which means that the constructed preconditioners are effective. Further we consider an efficient parallel version of the preconditioner which depends only on a single integer argument. When its value is small, the iterations needed on multiple processors to converge is much more than on a single processor But withthe increase of this value, the difference decreases step by step. Finally, we have many experiments on a cluster of 6 PCs with main frequencies of 1.8GHz the results show that the local block factorizations constructed are efficient in serial implementation, if compared to some well-known effective preconditioners, and the parallel versions are efficient also.
Improving the computation efficiency is a key issue in image processing, especially in edge detection, because edge detection is very computationally intensive. Withthe development of real-time application of image p...
详细信息
ISBN:
(纸本)0769515126
Improving the computation efficiency is a key issue in image processing, especially in edge detection, because edge detection is very computationally intensive. Withthe development of real-time application of image processing, fast processing response is becoming more critical. In this paper, a technique for distributed image processing on Spiral Architecture is proposed, which provides a platform for speeding up image processing based on clusters.
We study parallel solutions to the problem of weighted multiselection to select r elements on given weighted-ranks from a, set S of n weighted elements, where an element is on weighted rank k if it is the smallest ele...
详细信息
ISBN:
(纸本)0769515126
We study parallel solutions to the problem of weighted multiselection to select r elements on given weighted-ranks from a, set S of n weighted elements, where an element is on weighted rank k if it is the smallest element such that the aggregated weight of all elements not greater than it in S is not smaller than k. We propose efficient algorithms on two of the most popular parallelarchitectures, hypercube and mesh. For a hypercube with p < n processors, we present a parallel algorithm running in O(n(epsilon) min{r, log p}) time for p = n(1-epsilon), 0 < epsilon < 1, which is cost optimal when r greater than or equal to p. Our algorithm on rootp x rootp mesh runs in O(rootp + n/p log(3) p) time P which is the same as multiselection on mesh when r greater than or equal to log p, and thus has the same optimality as multiselection in this case.
暂无评论