Given two sorted arrays A = (a(1), a(2), ..., a(n)) and B = (b(1), b(2), ..., b(n)) of records such that (1) the n records are sorted according to one field which is called the key, and (2) the values of the keys are ...
详细信息
ISBN:
(纸本)9783642131356
Given two sorted arrays A = (a(1), a(2), ..., a(n)) and B = (b(1), b(2), ..., b(n)) of records such that (1) the n records are sorted according to one field which is called the key, and (2) the values of the keys are serial numbers. Merging data records has many applications in computer science especially in database. We develop an algorithm that runs in O(log n) time on EREW PRAM to merge two sorted arrays of records using n/log n processors even the keys of the data records are repeated. the algorithm is cost-optimal, deterministic, stable and uses linear number of space.
Robust and efficient parallel numerical algorithms and their implementation in easy-to-use portable software components are crucial for computational science and engineering applications. they are strongly influenced ...
详细信息
A parallel scheme for distributed memory hierarchy system is presented to solve the large-scale three-dimensional heat equation. Since managing interprocess communications and coordination is the main difficulty with ...
详细信息
ISBN:
(纸本)9783642131356
A parallel scheme for distributed memory hierarchy system is presented to solve the large-scale three-dimensional heat equation. Since managing interprocess communications and coordination is the main difficulty withthe system, the local physics/global algebraic object paradigm is introduced. Domain decomposition method is used to partition the modeling area, as well as the intensive computational effort and large memory requirement. Efficient storage and assembly of sparse matrix and parallel iterative solution of linear system are considered and developed. the efficiency and scalability of the parallel program are demonstrated by completing two experiments on Linux cluster, in which different preconditioning methods are tested and analyzed. And the results demonstrate this method could achieve desirable parallel performance.
In this paper, we propose an efficient algorithm for parallel prefix computation in recursive dual-net, a newly proposed network. the recursive dual-net RDNk (B) for k > 0 has (2n(0))(2k) /2 nodes and d(0) + k link...
详细信息
ISBN:
(纸本)9783642131189
In this paper, we propose an efficient algorithm for parallel prefix computation in recursive dual-net, a newly proposed network. the recursive dual-net RDNk (B) for k > 0 has (2n(0))(2k) /2 nodes and d(0) + k links per node, where no and do are the number of nodes and the node-degree of the base network B, respectively. Assume that each node holds one data item, the communication and computation time complexities of the algorithm for parallel prefix computation in RDNk (B),k > 0, are 2(k+1) - 2 + 2k * T-comm(0) and 2(k+1) - 2 + 2(k) * T-comp(0), respectively, where T-comm(0) and T-comp(0) are the communication and computation time complexities of the algorithm for parallel prefix computation in the base network B, respectively.
In this paper we report on the recent progress in computing bivariate polynomial resultants on Graphics processing Units (GPU). Given two polynomials in Z[x, y], our algorithm first maps the polynomials to a prime fie...
详细信息
ISBN:
(纸本)9783642131189
In this paper we report on the recent progress in computing bivariate polynomial resultants on Graphics processing Units (GPU). Given two polynomials in Z[x, y], our algorithm first maps the polynomials to a prime field. then, each modular image is processed individually. the GPU evaluates the polynomials at a number of points and computes univariate modular resultants in parallel. the remaining "combine" stage of the algorithm is executed sequentially on the host machine. Porting this stage to the graphics hardware is an object of ongoing research. Our algorithm is based on an efficient modular arithmetic from [1]. Withthe theory of displacement structure we have been able to parallelize the resultant algorithm up to a very fine scale suitable for realization on the GPU. Our benchmarks show a substantial speed-up over a host-based resultant algorithm [2] from CGAL (***).
In this paper, we present a general architecture of hybrid prefix/carry-select adder. Based on this architecture, we formalize the hybrid adder's algorithm using the first-order recursive equations and develop a p...
详细信息
ISBN:
(纸本)9783642131189
In this paper, we present a general architecture of hybrid prefix/carry-select adder. Based on this architecture, we formalize the hybrid adder's algorithm using the first-order recursive equations and develop a proof framework to prove its correctness. Since several previous adders in the literature are special cases of this general architecture, our methodology can be used to prove the correctness of different hybrid prefix/carry-select adders. the formal proof for a special hybrid prefix/carry-select adder shows the effectiveness of the algebraic structures built in this paper.
the Pattern Matching with Swaps problem is a variation of the classical pattern matching problem in which a match is allowed to include disjoint local swaps. In 2009, Cantone and Faro devised a new dynamic programming...
详细信息
ISBN:
(纸本)9783642131189
the Pattern Matching with Swaps problem is a variation of the classical pattern matching problem in which a match is allowed to include disjoint local swaps. In 2009, Cantone and Faro devised a new dynamic programming algorithm for this problem that runs in time O(nm), where n is the length of the text and m is the length of the pattern. In this paper, first, we present an improved dynamic programming formulation of the approach of Cantone and Faro. then, we present an optimal parallelization of our algorithm, based on a linear array model, that runs in time O(m(2)) using [n/m-1] processors.
Soft-core system allows designers to modify the components which are in the architecture they designed conveniently. In some systems, uni-core processor can not provide enough computing power to support a huge amount ...
详细信息
ISBN:
(纸本)9783642131356
Soft-core system allows designers to modify the components which are in the architecture they designed conveniently. In some systems, uni-core processor can not provide enough computing power to support a huge amount of computing for specific applications. In order to improve the performance of a multi-core system, in addition to the hardware architecture design, parallel programming is an important issue. the current parallelizing compilers are hard to parallelize the programs effectively. the programmer must think about how to allot the task to each processor in the beginning. In this paper, we present a software framework for designing parallel program. the proposed framework provides a convenient parallel programming environment for programmers to design the multi-core system's software. From the experiments, the proposed framework can parallelize the program effectively by applying the provided functions.
the article is concerned with a parallel iterative domain decomposition algorithm for high-order finite element solutions of the Helmholtz wave equation. the iteration is performed in a block-Jacobi manner. For the in...
详细信息
ISBN:
(纸本)9783642131356
the article is concerned with a parallel iterative domain decomposition algorithm for high-order finite element solutions of the Helmholtz wave equation. the iteration is performed in a block-Jacobi manner. For the interface operator, a Robin interface boundary condition is employed in a modified form which allows possible discontinuities of the discrete normal flux on the subdomain interfaces. the convergence of the algorithm is analyzed using energy estimates. Numerical results are given to show the effectiveness and parallel efficiency of the algorithm for the simulation of high-frequency waves in heterogeneous media in the two-dimensional space. the algorithm is carried out on a 16-node Linux cluster;it has been observed more than 97% parallel efficiency for all tested problems.
Micro-Electro-Mechanical System (MEMS) is the integration of mechanical elements, sensors, actuators, and electronics on a common silicon substrate through micro fabrication technology. With MEMS technologies, micron-...
详细信息
ISBN:
(纸本)9783642131356
Micro-Electro-Mechanical System (MEMS) is the integration of mechanical elements, sensors, actuators, and electronics on a common silicon substrate through micro fabrication technology. With MEMS technologies, micron-scale sensors and other smart products can be manufactured. Because of its micron-scale. MEMS products' structure is nearly invisible, even the designer is hard to know whether the device is well-designed and well-produced. So a visual 3D MEMS simulation implement, named ZProcess[1], was proposed in our previous work to help designers realizing and improving their designs. ZProcess shows the MEMS device's 3D model using voxel method. Its accurate, but its speed is unacceptable when the scale of voxel-data is large. In this paper, an improved parallel MEMS simulation implementation is presented to accelerate ZProcess by using GPU (Graphic processing Unit). the experimental results show the parallel implement gets maximum 160 times speed up comparing withthe sequential program.
暂无评论