The question "what Monte Carlo can do and cannot do efficiently" is discussed for some functional spaces that define the regularity of the input data. Important for practical computations data classes are co...
详细信息
The question "what Monte Carlo can do and cannot do efficiently" is discussed for some functional spaces that define the regularity of the input data. Important for practical computations data classes are considered: classes of functions with bounded derivatives and Holder type conditions. Theoretical performance analysis of some algorithms with unimprovable rate of convergence is given. Estimates of complexity of two classes of algorithms e terministic and randomized for the solution of a class of integral equations are presented
High performance computing in heterogeneous environments is a dynamically developing area. A number of highly efficient heterogeneous parallel algorithms have been designed over last decade. At the same time, scientif...
详细信息
High performance computing in heterogeneous environments is a dynamically developing area. A number of highly efficient heterogeneous parallel algorithms have been designed over last decade. At the same time, scientific software based on the algorithms is very much under par. The paper analyses main issues encountered by scientific programmers during implementation of heterogeneous parallel algorithms in a portable form. It explains how programming systems can address the issues in order to maximally facilitate implementation of parallel algorithms for heterogeneous platforms and outlines two existing programming systems for high performance heterogeneous computing, mpC and HeteroMPI
To improve the performance of extra-low speed in direct torque control (DTC) system, this paper applies wavelet neural network (WNN) to constitute flux observer by deep researching nonlinear mathematic model of stator...
详细信息
To improve the performance of extra-low speed in direct torque control (DTC) system, this paper applies wavelet neural network (WNN) to constitute flux observer by deep researching nonlinear mathematic model of stator flux of asynchronous motor. Furthermore, in order to improve rapidity and real time characteristics of wavelet neural network flux observer, the paper applies ant colony algorithm (ACA) with embedded deterministic searching strategy to optimize dilation factor, translation factor and output weight of wavelet neural network. In order to confirm on-line identification precision of wavelet neural network flux observer based on ant colony algorithm, the paper compares this method with wavelet neural network flux observer optimized by gradient descent algorithm. Simulation shows that the former not only can reduce the node numbers of hidden layers and quicken the convergence rate of WNN, but also can improve on-line identification precision of flux observer, so it can effectively improve low speed performance of DTC system
This paper presents a new generalized particle model (GPM) to generate the prediction coding for lossless data compression. We discuss the GPM-based parallel algorithm, its properties and realization scheme. The propo...
详细信息
This paper presents a new generalized particle model (GPM) to generate the prediction coding for lossless data compression. We discuss the GPM-based parallel algorithm, its properties and realization scheme. The proposed GPM approach has advantages in terms of parallelism, scalability and easy hardware implementation over other sequential lossless compression methods
Often parallel scientific applications are instrumented and traces are collected and analyzed to identify processes with performance problems or operations that cause delays in program execution. The execution of inst...
详细信息
ISBN:
(纸本)9781424400546
Often parallel scientific applications are instrumented and traces are collected and analyzed to identify processes with performance problems or operations that cause delays in program execution. The execution of instrumented codes may generate large amounts of performance data, and the collection, storage, and analysis of such traces are time and space demanding. To address this problem, this paper presents an efficient, systematic, multi-step methodology, based on hierarchical clustering, for analysis of communication traces of parallel scientific applications. The methodology is used to discover potential communication performance problems of three applications: TRACE, REMO, and SWEEP3D
This paper describes the FPGA implementation of a scalable very high radix Montgomery multiplier using quotient pipelining. It improves upon previous designs by removing critical dependencies between successive proces...
详细信息
This paper describes the FPGA implementation of a scalable very high radix Montgomery multiplier using quotient pipelining. It improves upon previous designs by removing critical dependencies between successive processing elements. This design can perform 1024-bit modular exponentiation in 5.1 ms using 3825 4-input lookup tables and 32 18 times 18 multipliers, a 20% speed increase over a comparable design without quotient pipelining.
The use of clusters of symmetric multiprocessor (SMP) configurations in database processing has become a key factor in allowing greater scalability. It has also posed many challenges in the implementation of one of th...
详细信息
The use of clusters of symmetric multiprocessor (SMP) configurations in database processing has become a key factor in allowing greater scalability. It has also posed many challenges in the implementation of one of the most costly operations within relational algebra: the join operation. When massive data is involved, usually the join cannot be performed in-memory and is processed out of core. In this case, performance depends on an effective use of the memory hierarchy, such that I/O and memory contention are minimized. In this paper, we propose a parallel algorithm for out of core join processing that dynamically adapts its behavior to the resources available in the system. We evaluate and compare our proposal against other parallel approaches in a real SMP cluster in a major commercial database, the IBM/spl reg/ DB2P Universal Database7 product (DB2 UDB). Results show that our proposal outperforms previous work significantly.
In this paper a Hopfield neural network (HNN) based parallel algorithm is presented for predicting the secondary structure of ribonucleic acids (RNA). The HNN here is used to find the near-maximum independent set of a...
详细信息
In this paper a Hopfield neural network (HNN) based parallel algorithm is presented for predicting the secondary structure of ribonucleic acids (RNA). The HNN here is used to find the near-maximum independent set of an adjacent graph made of RNA base pairs and then compute the stable secondary structure of RNA. We modified the motion equation proposed in paper to reflect more biological essence of RNA secondary structure in which the ther mo dynamic parameters of base pair is used in our algorithm to control the variation rate of inhibitory and encouragement terms in the equation. Comparisons with the algorithm presented in paper and other two classical prediction methods (Zuker 's and Nussinov 's) show that our method is more sensitive and specific. In addition, our algorithm can be very efficient and be applied to sequences up to several thousands of base long with more degree of parallelism
Fringe analysis uses the distribution of bottom subtrees or fringe of search trees under the assumption of random insertion of keys, yielding an average case analysis of the fringe. The results in the fringe give uppe...
详细信息
Fringe analysis uses the distribution of bottom subtrees or fringe of search trees under the assumption of random insertion of keys, yielding an average case analysis of the fringe. The results in the fringe give upper and lower bounds for several measures for the whole tree. We are interested in the fringe analysis of the synchronized parallel insertion algorithms of Paul, Vishkin, and Wagener (PVW) on 2-3 trees. This algorithm inserts k keys with k processors into a tree of size n with time O(logn + log k). As the direct analysis of this algorithm is very difficult we tackle this problem by introducing a new family of algorithms, denoted by MacroSplit algorithms, and our main theorem proves that two algorithms of this family, denoted MaxMacroSplit and MinMacroSplit, bound the behavior of the fringe in the PVW algorithm. Previous work deals with the fringe analysis of sequential algorithms, but this type of analysis was still an open problem for parallel algorithms on search trees. We extend fringe analysis to parallel algorithms and we get a rich mathematical structure giving new interpretations even in the sequential case. We prove that random insertion of keys generates a binomial distribution, that the synchronized insertion of keys can be modeled by a Markov chain, and that the coefficients of the transition matrix of the Markov chain are related to the expected local behavior of our algorithm. Finally, we show that the coefficients of the power expansion of this matrix over (n+1)(-1) are the binomial transform of the expected local behavior of the algorithm. We finally show that the fringe of the PVW algorithm asymptotically converges to the sequential case. (C) 2002 Elsevier Science B.V. All rights reserved.
Processing of biological sequences is a compute-intensive problem. The amount of data available in biology is enormous that sequential techniques will take a very long time to process them. In this paper we present ma...
详细信息
Processing of biological sequences is a compute-intensive problem. The amount of data available in biology is enormous that sequential techniques will take a very long time to process them. In this paper we present many parallel algorithms for biological data processing. We consider a scenario where the operations performed on the sequences are arbitrary. In particular we assume that a sequential algorithm (or program) is given as an input (with the option of making several copies). The input consists of a file of sequences to be processed. The goal is to process all these sequences as efficiently as possible.
暂无评论