The numerical solution of 3D linear elasticity equations is considered. The problem is described by a coupled system of second-order elliptic partial differential equations. This system is discretized by trilinear par...
详细信息
The numerical solution of 3D linear elasticity equations is considered. The problem is described by a coupled system of second-order elliptic partial differential equations. This system is discretized by trilinear parallelepipedal finite elements. The preconditioned conjugate gradient iterative method is used for solving of the large-scale linear algebraic systems arising after the finite element method (FEM) discretization of. the problem. Displacement decomposition technique is applied at the first step to construct a preconditioner using the decoupled block-diagonal part of the original matrix. Then circulant block-factorization is used for preconditioning of the obtained block-diagonal matrix. Both techniques, displacement decomposition and circulant block-factorization, are highly parallelizable. A parallel algorithm is invented for the proposed preconditioner. The theoretical analysis of the execution time shows that the algorithm is highly efficient for coarse-grain parallel computer systems. A portable MPI parallel FEM code is developed. Numerical tests for real-life engineering problems of the geomechanics in geosciences on a number of modem parallel computers are presented. The reported speed-up and parallel efficiency well illustrate the parallel features of the proposed method and its implementation. (C) 2002 IMACS. Published by Elsevier Science B.V. All rights reserved.
Many of the operations to eliminate complaints concerning respiration impairments fail. In order to improve the success rate, it is important to recognize the responsiveness of the flow field within the nasal cavities...
详细信息
Many of the operations to eliminate complaints concerning respiration impairments fail. In order to improve the success rate, it is important to recognize the responsiveness of the flow field within the nasal cavities. Therefore, we are developing a computer assisted surgery (CAS) system that combines computational fluid dynamics (CFD) and virtual reality (VR) technology. However, the primary prerequisite for VR-based applications is real-time interaction. A single graphics workstation is not capable of satisfying this condition and of simultaneously calculating flow features employing the huge CFD data set. In this paper, we will present our approach of a distributed system that relieves the load on the graphics workstation and makes use of an "off-the-shelf'' parallel Linux cluster calculating streamlines. Moreover, we introduce first results and discuss remaining difficulties.
An efficient derivative estimates parallel simulation algorithm is presented based on the new Performance Potentials theory (Cao and Chen, 1997). Two main ideas are introduced: First, a new processor-partitioning patt...
详细信息
An efficient derivative estimates parallel simulation algorithm is presented based on the new Performance Potentials theory (Cao and Chen, 1997). Two main ideas are introduced: First, a new processor-partitioning pattern, Screwy Partitioning, which can make complete load balance on a time-costing simulation part; Second, modified Common Random Number, which can remove the large amount of broadcasting cost of sample path data at the price of only adding a very little workload. The simulation experiments on an SPMD parallel computer show that this algorithm can achieve near linear speedup.
The analogies observed between parallel computing and system integration modeling are presented and discussed. Two models, the Computation Structure Model and the parallel Integration Evaluation Model are utilized for...
详细信息
The analogies observed between parallel computing and system integration modeling are presented and discussed. Two models, the Computation Structure Model and the parallel Integration Evaluation Model are utilized for representing the analogies. The comparison shows that techniques utilized in performance analysis of parallel computing algorithms, can be taken as basis for developing models for the integration process of distributed production tasks.
In this paper we show that it is impossible to solve a number of "natural" two-dimensional geometric problems in polylog time with a polynomial number of processors (unless P = NC). Thus, we disprove a popul...
详细信息
In this paper we show that it is impossible to solve a number of "natural" two-dimensional geometric problems in polylog time with a polynomial number of processors (unless P = NC). Thus, we disprove a popular belief that there are no natural 1)complete geometric problems in the plane. The problems we address include instances of polygon triangulation, planar partitioning, and geometric layering. Our results are based on non-trivial reductions from the monotone circuit value and planar circuit value problems.
The main contribution of this work is to show that a number of digital geometry problems can be solved elegantly on meshes with multiple broadcasting by using a time-optimal solution to the leftmost one problem as a b...
详细信息
The main contribution of this work is to show that a number of digital geometry problems can be solved elegantly on meshes with multiple broadcasting by using a time-optimal solution to the leftmost one problem as a basic subroutine. Consider a binary image pretiled onto a mesh with multiple broadcasting of size root n x root n one pixel per processor. Our first contribution is to prove an Omega(n(1/6)) time lower bound for the problem of deciding whether the image contains at least one black pixel. We then obtain time lower bounds for many other digital geometry problems by reducing this fundamental problem to all the other problems of interest. Specifically the problems that we address are: detecting whether an image contains at least one black pixel, computing the convex hull of the image, computing the diameter of an image, deciding whether a set of digital points is a digital line, computing the minimum distance between two images, deciding whether two images are linearly separable, computing the perimeter, area and width of a given image. Our second contribution is to show that the time lower bounds obtained are tight by exhibiting simple O(n(1/6)) time algorithms for these problems. As previously mentioned, an interesting feature of these algorithms is that they use, directly or indirectly, an algorithm for the leftmost one problem recently developed by one of the authors.
The advancement of the engine control increases the amount of computation. The production ECU (Electronic Control Unit), which is made of single-core architecture, cannot have a higher clock speed. Using multi- / many...
详细信息
The advancement of the engine control increases the amount of computation. The production ECU (Electronic Control Unit), which is made of single-core architecture, cannot have a higher clock speed. Using multi- / many-core architecture is the only way to decrease execution time. However, when implementing the engine control software, various problems occur in utilization of the multi- / many-core ECU. One of the biggest problems is sequential structure of control software because the software can only execute with one core on the multi- / many-core ECU. The purpose of this paper is to describe the parallelized control design method, which has decomposed sequential structure and decreases execution time in the embedded multi- / many-core production ECU. (C) 2016, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
We investigate the parallel complexity of recognition problems for context-free and regular array (image) sets. We show that the sequential time complexity of the recognition of an n × n image is O(n 5 ). The spa...
详细信息
We investigate the parallel complexity of recognition problems for context-free and regular array (image) sets. We show that the sequential time complexity of the recognition of an n × n image is O(n 5 ). The space required for these recognition problems is O(n 5 ). We prove that there are log 2 n time parallel algorithms with BM (n 4 ) and n 2 BM (n) processors for the recognition of context-free and regular array sets, respectively, where BM (n) is the number of processors sufficient to multiply two boolean n × n matrices in logarithmic time. We develop also a methodology for processing images using composition systems.
We present a cost-optimal parallel algorithm for the maximum matching problem on bipartite permutation graphs on an EREW PRAM. Previously, Chen and Yesha have dealt with this problem. Their solution relies on Dekel an...
详细信息
We present a cost-optimal parallel algorithm for the maximum matching problem on bipartite permutation graphs on an EREW PRAM. Previously, Chen and Yesha have dealt with this problem. Their solution relies on Dekel and Sahni's matching algorithm for convex bipartite graphs, which runs in O(log2n) time usingO(n) processors. Given a permutation diagram, our algorithm runs in O(logn) time by using O(n/logn) processors. Our method starts with an easily understood greedy algorithm. We define a nontrivial binary operation which is associative and equivalent to the greedy algorithm. Thus parallel prefix can be applied to the problem.
An overview of parallel computing is provided, with reference to numerical analysis and, in particular, to computational electromagnetics. The history of parallelism is reviewed, and the general principles are provide...
详细信息
An overview of parallel computing is provided, with reference to numerical analysis and, in particular, to computational electromagnetics. The history of parallelism is reviewed, and the general principles are provided. The two main types of parallelism encountered, pipelining and replication are discussed, and an example of each is described. A parallel algorithm for forming a matrix-vector product is presented and analyzed. This is then used as the core of a parallel conjugate gradient algorithm. The theoretically predicted efficiency and the measured efficiency are compared. A glossary and a brief discussion of the available literature on parallel processing are included.< >
暂无评论