The theme of this paper is that the primary computational bottleneck in the solution of stiff ordinary differential equations (ODEs) and the parallel solution of nonstiff ODEs is the implicitness of the ODE rather tha...
详细信息
The theme of this paper is that the primary computational bottleneck in the solution of stiff ordinary differential equations (ODEs) and the parallel solution of nonstiff ODEs is the implicitness of the ODE rather than the approximation of the integration process (or in conventional terminology, numerical stability rather than accuracy), and therefore it may be fruitful to apply (at least conceptually) the iterative techniques needed to overcome implicitness in continuous time, before discretization—to waveforms rather than values at a point in time. Several classical iterations, based on splitting, are discussed, but the emphasis is on those not based on a partitioning of the ODE system. The shifted Picard iteration is proposed as a compromise between the cheap but slow Picard iteration and the fast but expensive Newton iteration. By varying the shift parameter from one iteration to the next, a good rate of convergence seems possible. As an alternative, the author also examines the more classical acceleration technique applied to the Picard iteration. Some experimental results are given. However, the practical aspects of discretization are beyond the scope of this paper.
We present a parallel algorithm for finding the convex hull of a sorted set of points in the plane. Our algorithm runs inO(logn/log logn) time usingO(n log logn/logn) processors in theCommon crcw pram computational mo...
详细信息
We present a parallel algorithm for finding the convex hull of a sorted set of points in the plane. Our algorithm runs inO(logn/log logn) time usingO(n log logn/logn) processors in theCommon crcw pram computational model, which is shown to be time and cost optimal. The algorithm is based onn 1/3 divide-and-conquer and uses a simple pointer-based data structure.
We present scalable algorithms to simulate large-scale stochastic particle systems amenable for modeling dense colloidal suspensions, glasses and gels. To handle the large number of particles and consequent many-body ...
详细信息
We present scalable algorithms to simulate large-scale stochastic particle systems amenable for modeling dense colloidal suspensions, glasses and gels. To handle the large number of particles and consequent many-body interactions present in such systems, we leverage an Accelerated Stokesian Dynamics (ASD) approach, for which we developed parallel algorithms in a distributed memory architecture. We present parallelization of the sparse near-field (including singular lubrication) interactions, and of the matrix-free many body far-field interactions, along with a strategy for communicating and mapping the distributed data structures between the near-and far field. Scaling to up to tens of thousands of processors for a million particles is demonstrated. In addition, we propose a novel algorithm to efficiently simulate correlated Brownian motion with hydrodynamic interactions. The original Accelerated Stokesian Dynamics approach requires the separate computation of far-field and near-field Brownian forces. Recent advancements propose computation of a far-field velocity using positive spectral Ewald decomposition. We present an alternative approach for calculating the far-field Brownian velocity by implementing the fluctuating force coupling method and embedding it using a nested scheme into ASD. This straightforward and flexible approach reduces the computational time of the Brownian far field force construction from O(NlogN)(1+vertical bar alpha vertical bar) to O(NlogN). (C) 2021 Elsevier Inc. All rights reserved.
Metaheuristics, providing high level guidelines for heuristic optimisation, have successfully been applied to many complex problems over the past decades. However, their performances often vary depending on the choice...
详细信息
Metaheuristics, providing high level guidelines for heuristic optimisation, have successfully been applied to many complex problems over the past decades. However, their performances often vary depending on the choice of the initial settings for their parameters and operators along with the characteristics of the given problem instance handled. Hence, there is a growing interest into designing adaptive search methods that automate the selection of efficient operators and setting of their parameters during the search process. In this study, an adaptive binary parallel evolutionary algorithm, referred to as ABPEA, is introduced for solving the uncapacitated facility location problem which is proven to be an NP-hard optimisation problem. The approach uses a unary and two other binary operators. A reinforcement learning mechanism is used for assigning credits to operators considering their recent impact on generating improved solutions to the problem instance in hand. An operator is selected adaptively with a greedy policy for perturbing a solution. The performance of the proposed approach is evaluated on a set of well-known benchmark instances using ORLib and M*, and its scaling capacity by running it with different starting points on an increasing number of threads. Parameters are adjusted to derive the best configuration of three different rewarding schemes, which are instant, average and extreme. A performance comparison to the other state-of-the-art algorithms illustrates the superiority of ABPEA. Moreover, ABPEA provides up to a factor of 3.9 times acceleration when compared to the sequential algorithm based on a single-operator.
In an earlier paper, an approximate SVD updating scheme has been derived as an interlacing of a QR updating on the one hand and a Jacobi-type SVD procedure on the other hand, possibly supplemented with a certain re-or...
详细信息
In an earlier paper, an approximate SVD updating scheme has been derived as an interlacing of a QR updating on the one hand and a Jacobi-type SVD procedure on the other hand, possibly supplemented with a certain re-orthogonalization scheme. This paper maps this updating algorithm onto a systolic array with 0(n2) parallelism for 0(n2) Complexity, resulting in an 0(n0) throughput. Furthermore, it is shown how a square root-free implementation is obtained by combining modified Givens rotations with approximate SVD schemes.
In this paper, we propose and analyze the parallel Robin-Robin domain decomposition method based on the modified characteristic finite element method for the time-dependent dual-porosity-Navier-Stokes model with the B...
详细信息
In this paper, we propose and analyze the parallel Robin-Robin domain decomposition method based on the modified characteristic finite element method for the time-dependent dual-porosity-Navier-Stokes model with the Beavers-Joseph interface condition. For the coupling terms, we treat them in an explicit manner which takes advantage of information obtained in previous time steps to construct a non-iteration domain decomposition method. By this means, two single dual-porosity equations and a single Navier-Stokes equation are needed to solve at each time. In particular, we solve the Navier-Stokes equation by the modified characteristic finite element method, which avoids the computational inefficiency caused by the nonlinear convection term. Furthermore, we prove the error convergence of solutions by mathematical induction, whose proof implies the uniform L-infinity-boundedness of the fully discrete velocity solution in conduit flow. Finally, some numerical examples are presented to show the effectiveness and efficiency of the proposed method.
A parallel iterative layered-medium integral-equation solver is presented for fast and scalable network parameter extraction of electronic packages. The solver, which relies on a 2-D fast Fourier transform (FFT)-based...
详细信息
A parallel iterative layered-medium integral-equation solver is presented for fast and scalable network parameter extraction of electronic packages. The solver, which relies on a 2-D fast Fourier transform (FFT)-based algorithm and a sparse preconditioner to reduce computational complexity, is parallelized using three workload decomposition strategies, including a pencil decomposition that increases the scalability of the computationally dominant FFT-based multiplication stage. A set of increasingly difficult benchmark problems, which require network parameter computations for N-trace = 1 to 257 package-scale interconnects, are solved on a petaflop scale computer to quantify the solver's accuracy, efficiency, and scalability. The total serialized computation time is observed to scale asymptotically as Ntrace2.6logNtrace. For the largest problem, using similar to 1.14 million unknowns and 1536 processes, the solver requires a wall-clock time of similar to 0.05 s per iteration, similar to 1 minute per excitation, similar to 9 h per frequency, and similar to 424 hours to extract the 514-port network parameters at 40 sample frequencies between 1 to 40 GHz.
Purpose - The paper aim is the application of a novel hybrid algorithm, called MeTEO, based on the combination of three heuristics inspired by artificial life to the optimization of electrodes voltages of Multistage D...
详细信息
Purpose - The paper aim is the application of a novel hybrid algorithm, called MeTEO, based on the combination of three heuristics inspired by artificial life to the optimization of electrodes voltages of Multistage Depressed Collector.
Design/methodology/approach - The Flock-of-Starlings Optimization (FSO), the Particle Swarm Optimization (PSO) and the Bacterial Chemotaxis Algorithm (BCA) were adapted to implement a hybrid and parallel algorithm: the FSO has been powerfully employed for exploring the whole space of solutions, whereas the PSO+BCA has been used to refine the FSO-found solutions, exploiting their better performances in local search.
Findings - The optimization of the voltage of the electrodes of multistage depressed collector are efficiently handled with a moderate computational effort.
Practical implications - The development of an efficient method for the solution of a complicated electromagnetic optimization problem, exploiting the different characteristic of different approaches based on evolutionary computation algorithm.
Originality/value - The paper shows that the combination of stochastic methods having different exploration properties with appositely developed FE electromagnetic simulator allows us to produce effective solutions of multimodal electromagnetic optimization problems, with an acceptable computational cost.
In most recent substructuring methods, a fundamental role is played by the coarse space. For some of these methods (e.g. BDDC and FETI-DP), its definition relies on a 'minimal' set of coarse nodes (sometimes c...
详细信息
In most recent substructuring methods, a fundamental role is played by the coarse space. For some of these methods (e.g. BDDC and FETI-DP), its definition relies on a 'minimal' set of coarse nodes (sometimes called corners) which assures invertibility of local subdomain problems and also of the global coarse problem. This basic set is typically enhanced by enforcing continuity of functions at some generalized degrees of freedom, such as average values on edges or faces of subdomains. We revisit existing algorithms for selection of corners. The main contribution of this paper consists of proposing a new heuristic algorithm for this purpose. Considering faces as the basic building blocks of the interface, inherent parallelism, and better robustness with respect to disconnected subdomains are among features of the new technique. The advantages of the presented algorithm in comparison to some earlier approaches are demonstrated on three engineering problems of structural analysis solved by the BDDC method. (c) 2011 IMACS. Published by Elsevier B.V. All rights reserved.
In this paper a parallel implementation of an Adaptive Generalized Predictive Control (AGPC) algorithm is presented. Since the AGPC algorithm needs to be fed with knowledge of the plant transfer function, the parallel...
详细信息
In this paper a parallel implementation of an Adaptive Generalized Predictive Control (AGPC) algorithm is presented. Since the AGPC algorithm needs to be fed with knowledge of the plant transfer function, the parallelization of a standard Recursive Least Squares (RLS) estimator and a GPC predictor is discussed here. Also, since a matrix inversion operation is required in the GPC predictor algorithm, special attention is given to its parallelization. A small DSP network with up to 3 processors is used to investigate, the performance of the parallel implementation. To exploit an heterogeneous architecture the parallel algorithm is mapped over a network builded up of transputers as communication elements, and DSPs as computing elements. Further some heterogeneous topologies are compared. Execution times and efficiency results of the RLS and GPC steps are presented to show the performance of the parallel algorithm, over different topologies.
暂无评论