In this paper four parallel algorithms for the evaluation of finite series of orthogonal polynomials are introduced. The algorithms are based on the Forsythe and Clenshaw sequential algorithms. Several tests carried o...
详细信息
In this paper four parallel algorithms for the evaluation of finite series of orthogonal polynomials are introduced. The algorithms are based on the Forsythe and Clenshaw sequential algorithms. Several tests carried out on a Cray T3D are presented.
This paper addresses the problem of developing efficient parallel algorithms for the training procedure of a neural network-based Fingerprint Image Comparison (FIC) system. The target architecture is assumed to be a c...
详细信息
This paper addresses the problem of developing efficient parallel algorithms for the training procedure of a neural network-based Fingerprint Image Comparison (FIC) system. The target architecture is assumed to be a coarse-grain distributed-memory parallel architecture. Two types of parallelism-node parallelism and training set parallelism (TSP)-are investigated. Theoretical analysis and experimental results show that node parallelism has low speedup and poor scalability, while TSP proves to have the best speedup performance. TSP, however, is amenable to a slow convergence rate. To reduce this effect, a modified training set parallel algorithm using weighted contributions of synaptic connections is proposed. Experimental results show that this algorithm provides a fast convergence rate while keeping the best speedup performance obtained. The combination of TSP with node parallelism is also investigated. A good performance is achieved by this approach. This provides better scalability with the trade-off of a slight decrease in speedup. The above algorithms are implemented on a 32-node CM-5.
Coupled-cluster (CC) methods are now widely used in quantum chemistry to calculate the electron correlation energy and many other properties of atoms and molecules. In this paper we outline the basics of the theory, d...
详细信息
Coupled-cluster (CC) methods are now widely used in quantum chemistry to calculate the electron correlation energy and many other properties of atoms and molecules. In this paper we outline the basics of the theory, discuss some computational aspects, and review work that has been done toward developing and implementing algorithms for CC methods on parallel computers. (C) 2000 Elsevier Science B.V. All rights reserved.
One of the recent thrust areas in research on hyperelliptic curve cryptography has been to obtain explicit formulae for performing arithmetic in the Jacobian of such curves. We continue this line of research by obtain...
详细信息
A large variety of methods based on partial differential equations (PDE) use the interface propagation. For their flexibility these methods are being more and more applied to various problems ranging from physics, flu...
详细信息
A large variety of methods based on partial differential equations (PDE) use the interface propagation. For their flexibility these methods are being more and more applied to various problems ranging from physics, fluid mechanics to control theory and computer vision. The solution of the PDE-based interface evolution is in itself a complex iterative computational task involving a great number of iterations (unknown a priori). Therefore, these applications are very demanding on the hardware and their real-time implementation is still a challenging problem. An efficient implementation could be done by using a specific parallel architecture. This paper proposes an original, entirely parallel algorithm to solve the Eikonal equation. Which is the base of applications using a weighted distance function. This algorithm allows the parallel implementation of active contours methods or continuous watershed on a specific hardware.
We present a tomographic reconstruction algorithm based on a frequential decomposition of the data. We show that the frequential components of the attenuation function to be identified can be reconstructed from the fr...
详细信息
We present a tomographic reconstruction algorithm based on a frequential decomposition of the data. We show that the frequential components of the attenuation function to be identified can be reconstructed from the frequential decomposition of the data. Moreover, downsampling techniques added to the identification of null components and coupled to compression techniques, speed up the reconstruction time up to six compared to the classical FBP. We identify the optimal number of frequential components. We show reconstructions from real data. A parallel implementation of our new algorithm is then proposed and evaluated on two small PC clusters.
As general-purpose parallel computers are increasingly being used to speed up different VLSI applications, the development of parallel algorithms for circuit testing, logic minimization and simulation, HDL-based synth...
详细信息
As general-purpose parallel computers are increasingly being used to speed up different VLSI applications, the development of parallel algorithms for circuit testing, logic minimization and simulation, HDL-based synthesis, etc. is currently a field of increasing research activity. In some of these applications the circuit partitioning problem occurs. That implies dividing a circuit into non-overlapping subcircuits while minimizing the number of cuts after the division and balancing the load associated to each one. Very effective heuristic algorithms have been developed in order to solve this problem, but it is unknown how good the partitions are since the problem is NP-complete. In these cases the use of parallel processing can be very useful. This paper describes a parallel evolutionary algorithm for circuit partitioning, where parallelism improves the solutions found by the corresponding sequential algorithm, which indeed is quite effective compared with other previously proposed procedures.
A study on efficient visualization and real-time interactivity of large-scale scenes is discussed. Introducing parallel processing technology, we present a parallelizable strategy with the pipeline algorithm, realize ...
详细信息
ISBN:
(纸本)0780379292
A study on efficient visualization and real-time interactivity of large-scale scenes is discussed. Introducing parallel processing technology, we present a parallelizable strategy with the pipeline algorithm, realize this parallel algorithm based on shared-memory, and then apply this program to a test site, the Peking Olympic Games planning mixed scenes, including real-time rendering, dynamical texture loading, quick browsing and so on. The results show a running performance and real-time interactivity improvement of DEPS (Digital Earth Prototype System) when using this algorithm. The parallel program of this paper was developed and running on a Silicon Graphics multiprocessor, Onyx 3200, with four MIPS R12000 processors and InfiniteReality 3 graphic accelerator, under IRIX 6.5 operating system.
We present an optimal parallel selection algorithm on the EREW PRAM. This algorithm runs in O(log n) time with n/log n processors. This complexity matches the known lower bound for parallel selection on the EREW PRAM ...
详细信息
ISBN:
(纸本)9780898715385
We present an optimal parallel selection algorithm on the EREW PRAM. This algorithm runs in O(log n) time with n/log n processors. This complexity matches the known lower bound for parallel selection on the EREW PRAM model. We therefore close this problem which has been open for more than a decade.
Existing parallel association rule mining algorithms suffer from many problems when mining massive transactional datasets. One major problem is that most of the parallel algorithms for a shared nothing environment are...
详细信息
Existing parallel association rule mining algorithms suffer from many problems when mining massive transactional datasets. One major problem is that most of the parallel algorithms for a shared nothing environment are Apriori-based algorithms. Apriori-based algorithms are proven to be not scalable due to many reasons, mainly: (1) the repetitive I/O disk scans, (2) the huge computation and communication involved during the candidacy generation. This paper proposes a new disk-based parallel association rule mining algorithm called Inverted Matrix, which achieves its efficiency by applying three new ideas. First, transactional data is converted into a new database layout called Inverted Matrix that prevents multiple scanning of the database during the mining phase, in which finding globally frequent patterns could be achieved in less than a full scan with random access. This data structure is replicated among the parallel nodes. Second, for each frequent item assigned to a parallel node, a relatively small independent tree is built summarizing co-occurrences. Finally, a simple and non-recursive mining process reduces the memory requirements as minimum candidacy generation and counting is needed, and no communication between nodes is required to generate all globally frequent patterns.
暂无评论