In the paper the method of computation all deadlocks and traps in the Petri net is presented. This method is based on Thelen method [9] and it was proposed in [10]. Methods of calculation of all deadlocks and trap v i...
详细信息
ISBN:
(纸本)078039402X
In the paper the method of computation all deadlocks and traps in the Petri net is presented. This method is based on Thelen method [9] and it was proposed in [10]. Methods of calculation of all deadlocks and trap v in Petri nets are very time consuming. Therefore it is very important to optimize a computation. The parallel computation method for the time reduction is proposed. Experimental results of presented method are discussed, as well.
While kernel support vector machines are powerful classification algorithms, their computational overhead can be significant, especially for large and high-dimensional data sets. A recent biomedical dataset, for insta...
详细信息
ISBN:
(纸本)9780898715934
While kernel support vector machines are powerful classification algorithms, their computational overhead can be significant, especially for large and high-dimensional data sets. A recent biomedical dataset, for instance, could take as long as 3 weeks to compute its RBF kernel matrix on a modern, single-processor workstation. In this paper, we develop methods for high-performance parallel computation of kernel matrices. There are two key components to a parallel implementation: distribution of the computation across nodes and communication to combine the results. To address the first, we employ a dimension-wise data partition that yields efficient computation and low communication overhead during the initial phase. This partition provides dramatic speedups on large and high-dimensional data, applies to a wide variety of kernel functions, and is an exact computation, producing the same kernel matrix as its sequential implementation. To address communication needs during the second phase, we introduce an approximation specific to the Gaussian RBF kernel that yields sparse partial kernel matrices and, thus, efficient communication. We analyze the approximation error of this method, demonstrating that it falls off exponentially with N, the parameter of the approximation. We also examine the positive definiteness of the approximation with respect to Mercer's condition and show that (a) in the limit of N our approximation becomes positive definite for any data set and (b) for a fixed data set, there exists a finite N yielding a positive definite kernel matrix. We also give a simple iterative method for selecting N to yield a positive definite kernel matrix on any fixed data set. In practice, we find that positive definiteness is achieved on all of the data sets we examine with very small N (2-5). Finally, we test the empirical performance of our two methods on a variety of large, real-world data sets, demonstrating large computational speedups with little or no impact on
In order to improve the computation speed of ordered subset expectation maximization (OSEM) algorithm for fully 3-D single photon emission computed tomography (SPECT) reconstruction, a parallelizing, scheme of OSEM re...
详细信息
ISBN:
(纸本)0780392213
In order to improve the computation speed of ordered subset expectation maximization (OSEM) algorithm for fully 3-D single photon emission computed tomography (SPECT) reconstruction, a parallelizing, scheme of OSEM reconstruction algorithm was implemented on an experimental beowulf-type cluster and impact factors on the parallel efficiency were investigated. Two approaches were employed to improve the efficiency: (1) the communication cost was minimized via overlapping communication with computation and (2) the idle time of processes was reduced by auto load balancing. Performance of the optimized parallel algorithm was evaluated in terms of computation time, speedup factor and parallel efficiency. Improvements were observed after optimization. The efficiency was raised from 83.86% to 92.07% in fully 3-D 128 x 128 x 128 SPECT reconstruction.
This paper presents a parallel blocked algorithm for the algebraic path problem (APP). It is known that the complexity of the APP is the same as that of the classical matrix-matrix multiplication-, however, solving th...
详细信息
ISBN:
(纸本)3540290311
This paper presents a parallel blocked algorithm for the algebraic path problem (APP). It is known that the complexity of the APP is the same as that of the classical matrix-matrix multiplication-, however, solving the APP takes much more running time because of its unique data dependencies that limits data reuse drastically. We examine a parallel implementation of a blocked algorithm for the APP on the one-chip Intrinsity FastMATH adaptive processor, which consists of a scalar MIPS processor extended with a SIMD matrix coprocessor. The matrix coprocessor supports native matrix instructions on an array of 4 x 4 processing elements. Implementing with matrix instructions requires us to transform algorithms in terms of matrix-matrix operations. Conventional vectorization for SIMD vector processing deals with only the innermost loop;however, on the FastMATH processor, we need to vectorize two or three nested loops in order to convert the loops to equivalent one matrix operation. Our experimental results show a peak performance of 9.27 COPS and high usage rates of matrix instructions for solving the APP. Findings from our experimental results indicate that the SIMD matrix extension to (super)scalar processor would be very useful for fast solution of many matrix-formulated problems.
This paper analyzes the performance of two parallel algorithms for solving the linear-quadratic optimal control problem arising in discrete-time periodic linear systems. The algorithms perform a sequence of orthogonal...
详细信息
This paper analyzes the performance of two parallel algorithms for solving the linear-quadratic optimal control problem arising in discrete-time periodic linear systems. The algorithms perform a sequence of orthogonal reordering transformations on formal matrix products associated with the periodic linear system and then employ the so-called matrix disk function to solve the resulting discrete-time periodic algebraic Riccati equations needed to determine the optimal periodic feedback. We parallelize these solvers using two different approaches, based on a coarse-grain and a medium-grain distribution of the computational load. The experimental results report the high performance and scalability of the parallel algorithms on a Beowulf cluster. (C) 2002 Elsevier Science (USA).
A basic problem in graphs and hypergraphs is that of finding a large independent set-one of guaranteed size. Understanding the parallel complexity of this and related independent set problems on hypergraphs is a funda...
详细信息
A basic problem in graphs and hypergraphs is that of finding a large independent set-one of guaranteed size. Understanding the parallel complexity of this and related independent set problems on hypergraphs is a fundamental open issue in parallel computation. Caro and Tuza [J. Graph Theory, 15 (1991), pp. 99-107] have shown a certain lower bound alpha(k)(H) on the size of a maximum independent set in a given k-uniform hypergraph H and have also presented an efficient sequential algorithm to find an independent set of size alpha k(H). They also show that alpha(k)(H) is the size of the maximum independent set for various hypergraph families. Here, we show that an RNC algorithm due to Beame and Luby [in Proceedings of the ACM-SIAM Symposium on Discrete algorithms, 1990, pp. 212-218] finds an independent set of expected size alpha(k)(H) and also derandomizes it for certain special cases. (An intriguing conjecture of Beame and Luby implies that understanding this algorithm better may yield an RNC algorithm to find a maximal independent set in hypergraphs, which is among the outstanding open questions in parallel computation.) We also present lower bounds on independent set size for nonuniform hypergraphs using this algorithm. For graphs, we get an NC algorithm to find independent sets of size essentially that guaranteed by the general (degree-sequence based) version of Turan's theorem.
Understanding the demixing effect on the dispersion of particles by large-scale turbulence is very important in practical applications. Using pseudospectral and Lagrangian approaches, we have simulated a three-dimensi...
详细信息
Understanding the demixing effect on the dispersion of particles by large-scale turbulence is very important in practical applications. Using pseudospectral and Lagrangian approaches, we have simulated a three-dimensional particle-laden mixing layer under one-way coupling effect. However, the computer resource required to simulate such a two-phase flow with high Reynolds number and two-way momentum coupling effect exceeds the limit of the current single processor. In this paper. the computation of particles and the two-way momentum coupling terms are partitioned in the span-wise direction because particles are distributed most evenly in this direction. The computation of the tree-dimensional flow field is first partitioned into three groups of processors because of the most independence of the computation among the three spatial dimensions. In each group, the domain is then partitioned using two different schemes based on the property of the fast Fourier transformation. The first one, the master slave scheme, is employed for Algorithm MS due to its simplicity and overlapping of communication and computation. The second one, the transpose approach, is used for Algorithm TP order to partition all of the flow field computation. An analysis shows that compared to Algorithm MS, Algorithm TP can also reduce nearly a half of the amount of communication work. Experiments show that Algorithm MS has obtained a speedup of 4.3 using 9 HP workstations and a speedup of 6.4 using 15 nodes of IBM SP-2 for a problem size on the order of 64(3), and Algorithm TP has achieved speedups 44% higher than Algorithm MS. (C) 2001 Elsevier Science.
Rapid advancements in adaptive sonar beamforming algorithms have greatly increased the computation and communication demands on beamforming arrays, particularly for applications that require in-array autonomous operat...
详细信息
Rapid advancements in adaptive sonar beamforming algorithms have greatly increased the computation and communication demands on beamforming arrays, particularly for applications that require in-array autonomous operation. By coupling each transducer node in a distributed array with a microprocessor, and networking them together, embedded parallel processing for adaptive beamformers can significantly reduce execution time, power consumption and cost, and increase scalability and dependability. In this paper, the basic narrowband Minimum Variance Distortionless Response (MVDR) beamformer is enhanced by incorporating broadband processing, a technique to enhance the robustness of the algorithm, and speedup of the matrix inversion task using sequential regression. Using this Robust Broadband MVDR (RB-MVDR) algorithm as a sequential baseline, two novel parallel algorithms are developed and analyzed. Performance results are included, among them execution time, scaled speedup, parallel efficiency, result latency and memory utilization. The testbed used is a distributed system comprised of a cluster of personal computers connected by a conventional network.
The Bean critical-state model describes the Penetration of magnetic field into type-II superconductors. Mathematically, it is a free boundary problem, and fast algorithms for its solution are needed in applied superco...
详细信息
The Bean critical-state model describes the Penetration of magnetic field into type-II superconductors. Mathematically, it is a free boundary problem, and fast algorithms for its solution are needed in applied superconductivity. Existence and uniqueness of solution, parallel algorithms, stability, and error estimation for this model are discussed.
In this paper, we will be mainly concerned with a parallel algorithm (in time and space) which is used to solve the incompressible Navier-Stokes problem. This relies on two main ideas: (a) a splitting of the main diff...
详细信息
In this paper, we will be mainly concerned with a parallel algorithm (in time and space) which is used to solve the incompressible Navier-Stokes problem. This relies on two main ideas: (a) a splitting of the main differential operator which permits to consider independently the most important difficulties (nonlinearity and incompressibility) and (b) the approximation of the resulting stationary problems by a family of second-order one-dimensional linear systems. The same strategy can be applied to two-dimensional and three-dimensional problems and involves the same level of difficulty. It can be also useful for the solution of other more complicate systems like Boussinesq or turbulence models. The behavior of the method is illustrated with some numerical experiments.
暂无评论