The Bethe-Salpeter eigenvalue problem is a dense structured eigenvalue problem arising from discretized Bethe-Salpeter equation in the context of computing exciton energies and states. A computational challenge is tha...
详细信息
The Bethe-Salpeter eigenvalue problem is a dense structured eigenvalue problem arising from discretized Bethe-Salpeter equation in the context of computing exciton energies and states. A computational challenge is that at least half of the eigenvalues and the associated eigenvectors are desired in practice. We establish the equivalence between Bethe-Salpeter eigenvalue problems and real Hamiltonian eigenvalue problems. Based on theoretical analysis, structure preserving algorithms for a class of Bethe-Salpeter eigenvalue problems are proposed. We also show that for this class of problems all eigenvalues obtained from the Tamm-Dancoff approximation are overestimated. In order to solve large scale problems of practical interest, we discuss parallel implementations of our algorithms targeting distributed memory systems. Several numerical examples are presented to demonstrate the efficiency and accuracy of our algorithms. (C) 2015 Elsevier Inc. All rights reserved.
The real-time computation of the Jacobian that relates the manipulator joint velocities to the linear and angular velocities of the manipulator end-effector is pursued. Since the Jacobian can be expressed in the form ...
详细信息
The real-time computation of the Jacobian that relates the manipulator joint velocities to the linear and angular velocities of the manipulator end-effector is pursued. Since the Jacobian can be expressed in the form of a first-order linear recurrence, the time lower bound for computing the Jacobian can be proved to be of order O(N) on uniprocessor computers and of order O(log/sub 2/ N) on both single-instruction-stream-multiple-data-stream (SIMD) and VLSI pipelined parallel processors, where N is the number of links of the manipulator. To achieve the lower bound, the authors developed a generalized-k method for uniprocessor computers, a parallel forward and backward recursive doubling algorithm (PFABRD) for SIMD computers, and a parallel systolic architecture for VLSI pipelines. All the methods are capable of computing the Jacobian at any desired reference coordinate frame k from the base coordinate frame to the end-effector coordinate frame. The computational effort in terms of floating-point operations is minimal when k is in the range (4,N-3) for the generalized-k method, and k=(N+1)/2 for both the PFABRD algorithm and the parallel pipeline.< >
We present 0(log n) time algorithms in the EREW PRAM model, using n /log n processors, to find cut vertices, bridges, and blocks (often called biconnected components) of an interval graph having n vertices. It is assu...
详细信息
We present 0(log n) time algorithms in the EREW PRAM model, using n /log n processors, to find cut vertices, bridges, and blocks (often called biconnected components) of an interval graph having n vertices. It is assumed the interval graph is represented by an interval model, with ends presorted. If the ends are not presorted, our algorithms, preceded by an optimal sort, form an 0(log n) time algorithm using n processors, which is shown to be optimal. The algorithms rely heavily on the parallel prefix algorithm.
Performance improvement is a fundamental concern in computer science and engineering. Observing the history of the field, one would expect that any improvement in the ability of computer systems would be quickly met b...
详细信息
Performance improvement is a fundamental concern in computer science and engineering. Observing the history of the field, one would expect that any improvement in the ability of computer systems would be quickly met by applications utilizing it. This article presents a software-centric approach, in which ease of programming is a first priority for both uniprocessors and multiprocessors. This article outlines two concrete reasons and one general reason why parallel programs could give gain in performance over serial code on uniprocessors, especially with the current trends in uniprocessor architecture.
This correspondence introduces scalable data parallel algorithms for image processing. Focusing on Gibbs and Markov random field model representation for textures, we present parallel algorithms for texture synthesis,...
详细信息
This correspondence introduces scalable data parallel algorithms for image processing. Focusing on Gibbs and Markov random field model representation for textures, we present parallel algorithms for texture synthesis, compression, and maximum likelihood parameter estimation, currently implemented on Thinking Machines CM-2 and CM-5. Use of fine-grained, data parcel processing techniques yields real-time algorithms for texture synthesis and compression that are substantially faster than the previously known sequential implementations. Although current implementations are on Connection Machines, the methodology presented here enables machine-independent scalable algorithms for a number of problems in image processing and analysis.
The lattice Boltzmann method has become an attractive and promising approach in computational fluid dynamics. In this paper, the D3Q19 multi-relaxation-time lattice Boltzmann method is employed to simulate complex flu...
详细信息
A method is proposed for converting an algorithm admitting no parallel treatment into a new algorithm, in essence, with much better parallel properties. The method is intended for tackling the so called T-algorithms, ...
详细信息
A method is proposed for converting an algorithm admitting no parallel treatment into a new algorithm, in essence, with much better parallel properties. The method is intended for tackling the so called T-algorithms, the term ensuing from first examples of such algorithms concerned in the context of Toeplitz-like matrices. Generalized T-algorithms are also considered.
A characterization of interval graphs is used to arrive at an O(n2) interval graph recognition algorithm which also builds an interval representation for interval graphs. The algorithm is fairly simple and directly yi...
详细信息
A characterization of interval graphs is used to arrive at an O(n2) interval graph recognition algorithm which also builds an interval representation for interval graphs. The algorithm is fairly simple and directly yields a simple and elegant parallel algorithm for the same problem.
Background and objective Integrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time...
详细信息
Background and objective Integrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time needs or restrictions on the number of datasets that they can integrate. In this paper we report efficient sequential and parallel algorithms for record linkage which handle any number of datasets and outperform previous algorithms. Methods Our algorithms employ hierarchical clustering algorithms as the basis. A key idea that we use is radix sorting on certain attributes to eliminate identical records before any further processing. Another novel idea is to form a graph that links similar records and find the connected components. Results Our sequential and parallel algorithms have been tested on a real dataset of 1083878 records and synthetic datasets ranging in size from 50000 to 9000000 records. Our sequential algorithm runs at least two times faster, for any dataset, than the previous best-known algorithm, the two-phase algorithm using faster computation of the edit distance (TPA (FCED)). The speedups obtained by our parallel algorithm are almost linear. For example, we get a speedup of 7.5 with 8 cores (residing in a single node), 14.1 with 16 cores (residing in two nodes), and 26.4 with 32 cores (residing in four nodes). Conclusions We have compared the performance of our sequential algorithm with TPA (FCED) and found that our algorithm outperforms the previous one. The accuracy is the same as that of this previous best-known algorithm.
The problem of mathematical modeling of the spread of contamination from point sources in the air has been considered. An approach that uses the idea of splitting and organization of computation with explicit differen...
详细信息
暂无评论